Skip to content

AS-V2 10M pretrain filtering strategy #30

Description

@wsluma

Hi there,

For https://huggingface.co/datasets/OpenGVLab/AS-V2/blob/main/as_pretrain_10m.json, which is "as_pretrain_10m.json: the filtered 10M samples in AS-1B, which are used in the pretraining phase of Stage 2."

What is your filtering strategy? Is there some shortcomings for AS-1B?

Thank you for the awesome work.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions