The evaluation results using GroundingDINO aren't accurate.

By comparing human observation results with the counts from GroundingDINO, I noticed that GroundingDINO has huge issues. Especially in videos where the content doesn’t change across different frames, GroundingDINO still counts different numbers of objects. One specific example is the video generated by Wan2.1-T2V-1.3B with the prompt: 'Two astronauts repairing a satellite as five small debris-pieces fly by and a distant star glows.' In the video, the overall background and the astronauts stay still, with the main changes coming from the astronauts’ hand movements and the flying debris. Yet GroundingDINO gave the following counts for different frames in this video:

Frame | astronauts(2) | debris(5) | star(1) | Score
  -----------------------------------------------------
      1   |            3✗ |        1✗ |      0✗ | 0.00
      2   |            3✗ |        2✗ |      0✗ | 0.00
      3   |            3✗ |        1✗ |      1✓ | 0.33
      4   |            3✗ |        1✗ |      2✗ | 0.00
      5   |            3✗ |        1✗ |      1✓ | 0.33
      6   |            3✗ |        1✗ |      2✗ | 0.00
      7   |            3✗ |        1✗ |      1✓ | 0.33
      8   |            3✗ |        1✗ |      1✓ | 0.33
      9   |            3✗ |        2✗ |      0✗ | 0.00
     10   |            3✗ |        2✗ |      0✗ | 0.00
     11   |            3✗ |        2✗ |      1✓ | 0.33
     12   |            3✗ |        2✗ |      1✓ | 0.33
     13   |            3✗ |        2✗ |      1✓ | 0.33
     14   |            3✗ |        1✗ |      1✓ | 0.33
     15   |            3✗ |        2✗ |      1✓ | 0.33
     16   |            3✗ |        1✗ |      1✓ | 0.33
     17   |            3✗ |        1✗ |      1✓ | 0.33
     18   |            3✗ |        3✗ |      1✓ | 0.33
     19   |            3✗ |        1✗ |      2✗ | 0.00
     20   |            3✗ |        3✗ |      1✓ | 0.33
     21   |            3✗ |        2✗ |      1✓ | 0.33
     22   |            3✗ |        3✗ |      0✗ | 0.00
     23   |            3✗ |        2✗ |      0✗ | 0.00
     24   |            3✗ |        2✗ |      0✗ | 0.00
     25   |            3✗ |        2✗ |      0✗ | 0.00
     26   |            3✗ |        1✗ |      0✗ | 0.00
     27   |            3✗ |        1✗ |      0✗ | 0.00
     28   |            3✗ |        1✗ |      0✗ | 0.00
     29   |            3✗ |        1✗ |      0✗ | 0.00
     30   |            3✗ |        1✗ |      0✗ | 0.00
     31   |            3✗ |        1✗ |      0✗ | 0.00
     32   |            3✗ |        1✗ |      0✗ | 0.00
     33   |            3✗ |        1✗ |      0✗ | 0.00
     34   |            3✗ |        1✗ |      1✓ | 0.33
     35   |            3✗ |        1✗ |      1✓ | 0.33
     36   |            3✗ |        1✗ |      1✓ | 0.33
     37   |            3✗ |        1✗ |      1✓ | 0.33
     38   |            3✗ |        1✗ |      1✓ | 0.33
     39   |            3✗ |        1✗ |      1✓ | 0.33
     40   |            3✗ |        0✗ |      2✗ | 0.00
     41   |            3✗ |        1✗ |      1✓ | 0.33
     42   |            3✗ |        0✗ |      0✗ | 0.00
     43   |            3✗ |        0✗ |      1✓ | 0.33
     44   |            3✗ |        0✗ |      2✗ | 0.00
     45   |            3✗ |        0✗ |      1✓ | 0.33
     46   |            3✗ |        0✗ |      1✓ | 0.33
     47   |            3✗ |        0✗ |      1✓ | 0.33
     48   |            3✗ |        0✗ |      1✓ | 0.33
     49   |            3✗ |        1✗ |      1✓ | 0.33
     50   |            3✗ |        1✗ |      1✓ | 0.33
     51   |            3✗ |        1✗ |      0✗ | 0.00
     52   |            3✗ |        1✗ |      0✗ | 0.00
     53   |            3✗ |        1✗ |      0✗ | 0.00
     54   |            3✗ |        1✗ |      0✗ | 0.00
     55   |            3✗ |        1✗ |      0✗ | 0.00
     56   |            3✗ |        1✗ |      0✗ | 0.00
     57   |            3✗ |        2✗ |      0✗ | 0.00
     58   |            3✗ |        0✗ |      0✗ | 0.00
     59   |            3✗ |        0✗ |      0✗ | 0.00
     60   |            3✗ |        0✗ |      0✗ | 0.00
     61   |            3✗ |        1✗ |      0✗ | 0.00
     62   |            3✗ |        0✗ |      1✓ | 0.33
     63   |            3✗ |        0✗ |      1✓ | 0.33
     64   |            3✗ |        1✗ |      1✓ | 0.33
     65   |            3✗ |        1✗ |      0✗ | 0.00
     66   |            3✗ |        1✗ |      1✓ | 0.33
     67   |            3✗ |        1✗ |      0✗ | 0.00
     68   |            3✗ |        0✗ |      1✓ | 0.33
     69   |            3✗ |        0✗ |      0✗ | 0.00
     70   |            3✗ |        1✗ |      1✓ | 0.33
     71   |            3✗ |        1✗ |      0✗ | 0.00
     72   |            3✗ |        1✗ |      1✓ | 0.33
     73   |            3✗ |        0✗ |      0✗ | 0.00
     74   |            3✗ |        0✗ |      1✓ | 0.33
     75   |            3✗ |        0✗ |      0✗ | 0.00
     76   |            3✗ |        1✗ |      0✗ | 0.00
     77   |            3✗ |        1✗ |      0✗ | 0.00
     78   |            3✗ |        0✗ |      1✓ | 0.33
     79   |            3✗ |        0✗ |      1✓ | 0.33
     80   |            3✗ |        0✗ |      1✓ | 0.33
     81   |            3✗ |        0✗ |      1✓ | 0.33
  Accuracy: 0.1646  (3 categories)

In this result, even the counts of the simplest things like astronauts and stars are obviously wrong in GroundingDINO's statistics, not to mention smaller objects like debris. So whether the evaluation proposed by CountBench is reliable and fair is definitely a questionable issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

The evaluation results using GroundingDINO aren't accurate. #2

Frame | astronauts(2) | debris(5) | star(1) | Score

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

The evaluation results using GroundingDINO aren't accurate. #2

Description

Frame | astronauts(2) | debris(5) | star(1) | Score

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions