Skip to content

[Question] about the huggingface data #209

@zhang123434

Description

@zhang123434

Required prerequisites

Questions

在align-anything的text-image-to-text subset(https://huggingface.co/datasets/PKU-Alignment/align-anything/tree/main/text-image-to-text) 中,有多个train.parquet, 实际使用的是哪个parquet文件,
https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-image-to-text/train.parquet
还是 https://huggingface.co/datasets/PKU-Alignment/align-anything/tree/main/text-image-to-text/new 下的文件
还是 https://huggingface.co/datasets/PKU-Alignment/align-anything/tree/main/text-image-to-text/train 下的文件
后面两个目录下的内容应该是一样,https://huggingface.co/datasets/PKU-Alignment/align-anything/blob/main/text-image-to-text/train.parquet 中的内容是什么,是后续构造的更多的偏好数据吗,构造方法和https://huggingface.co/datasets/PKU-Alignment/align-anything/tree/main/text-image-to-text/new 中的数据的构造方法一样吗?

@Gaiejj @XuyaoWang @yongzhemiaolegemi @cby-pku @htlou
期待您的解答!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions