You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix(vision): retain image context in follow-up turns (#81)
Persist provider-owned image replay text in segment stateful markers so image context is not lost across follow-up turns or Reload Window.
Only the current pending image message is sent to the vision proxy. Historical images replay from matching assistant markers, while marker misses are omitted instead of re-running vision. Segment marker metadata is encoded as unpadded base64url JSON to survive Copilot replay framing.
Also update diagnostics, request dumps, and token estimates for marker-based replay.
Copy file name to clipboardExpand all lines: package.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -148,7 +148,7 @@
148
148
"deepseek-copilot.visionPrompt": {
149
149
"type": "string",
150
150
"editPresentation": "multilineText",
151
-
"default": "Describe the visual contents of this image in detail, including any text, objects, people, or context that would be relevant for understanding it. Focus on factual visual elements.",
151
+
"default": "Describe all image attachments in this message.\n\nIf there is one image, describe it directly.\nIf there are multiple images:\n1. Describe each image separately, preserving their order.\n2. Then provide a combined description explaining the overall context and relationships across the images.\n\nReturn one concise factual description suitable for inserting into a text-only chat prompt. Include visible text, objects, UI elements, people, and relevant context. Do not invent details.",
* Prompt sent to the vision proxy model when describing image attachments
36
-
* before forwarding them to text-only DeepSeek models.
37
-
*/
38
-
exportconstIMAGE_DESCRIPTION_PROMPT=
39
-
'Describe the visual contents of this image in detail, including any text, objects, people, or context that would be relevant for understanding it. Focus on factual visual elements.';
40
-
41
-
/**
42
-
* Stable fallback marker inserted into the chat prompt when the vision proxy
43
-
* fails to describe an image. Keep this in English and out of i18n so prompt
44
-
* shape and cache behaviour do not vary by VS Code display language.
0 commit comments