Skip to content

How can I use Hunyuanimage-instrcut-3.0 for VQA ? #84

@zhangquanwei962

Description

@zhangquanwei962

I'm trying to use Hunyuanimage-instrcut-3.0 for Visual Question Answering (VQA) — given an image and a text question, I want the model to output a text answer in the


</think>
<answer>...</answer>
```format, similar to what's other MLLM for text understanding tasks.

But it will output

ACTUAL PROMPT:
<|startoftext|>Assistant: <img_size_1024><img_ratio_19><joint_img_sep>

User: 这张图片里有什么?

Assistant:

用户给出的原始指令“这张图片里有什么?”是一个开放性问题,需要我仔细观察参考图并识别出其中所有可见的实体和元素。我将逐一分析图片中的各个部分,以确保没有遗漏任何信息。首先,最显眼的是一只小狗,它占据了画面的大部分空间。这只小狗戴着一顶红色的毛线帽和一条红色的毛线围巾。其次,在图片的顶部有文字信息,包括中文“新年快乐汪”和英文“HAPPY NEW YEAR”,以及一些装饰性的图案。小狗所处的环境是室内,背景是一个门口,可以看到门框和门后的部分室内景象。图片的底部是木地板。最后,整个图片被一个白色的边框包围,边框的边缘呈弧形。综合这些观察,我将构建一个详细的改写指令,清晰地列出图片中的所有主要构成部分。这张图片展示了一只可爱的小狗,它戴着一顶红色的毛线帽和一条红色的毛线围巾,正坐在木地板上。小狗的背景是一个门口,门框清晰可见。图片的顶部有“新年快乐汪”和“HAPPY NEW YEAR”的文字,并点缀着一些装饰图案。整个画面被一个白色的边框包围,边框的边缘是圆弧形的。<img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><img_size_1024><img_ratio_19><|endoftext|>

Can you help me or give me a example?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions