Skip to content

Add evaluation results for Falcon-2.0#154

Open
figolyd wants to merge 1 commit into
SalesforceAIResearch:mainfrom
figolyd:submission/falcon-2-results
Open

Add evaluation results for Falcon-2.0#154
figolyd wants to merge 1 commit into
SalesforceAIResearch:mainfrom
figolyd:submission/falcon-2-results

Conversation

@figolyd

@figolyd figolyd commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Model Information

  • Model: Falcon-2.0
  • Model Type: Pretrained
  • Model Dtype: Float32
  • Organization: Ant-intl

Links

@cuthalionn

Copy link
Copy Markdown
Contributor

Hi @figolyd, thanks for the submission! I tried reproducing a small subset of the Falcon-2.0 results. Most configs match the published all_results.csv almost exactly:

config MASE (repro) MASE (pub) ΔMASE% WQL (repro) WQL (pub) ΔWQL%
bizitobs_application/10S/medium 1.3547 1.3540 +0.05% 0.0174 0.0174 +0.07%
bizitobs_application/10S/long 2.5784 2.5792 -0.03% 0.0464 0.0464 -0.02%
us_births/D/short 0.3813 0.3813 0.00% 0.0227 0.0227 0.00%
car_parts/M/short 0.8836 0.8835 +0.01% 0.9834 0.9834 0.00%
sz_taxi/H/short 0.5466 0.5466 -0.00% 0.1319 0.1319 -0.00%
m4_weekly/W/short 2.0881 1.4090 +48.2% 0.0324 0.0205 +57.6%

The one thing that stands out is m4_weekly/W/short. My run gives MASE 2.09 / WQL 0.032 vs the published 1.41 / 0.021, so roughly +48% / +58% off. Everything else lines up to under 0.1%, so it seems specific to m4_weekly rather than something general in the setup.

Any idea what might be different for that one? Happy to re-run if there's a config I should tweak. Thanks!

@figolyd

figolyd commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

Hi @cuthalionn, thank you very much for the careful reproduction and for pointing this out!

We identified the root cause on our side: the model API had an issue when processing NaN values, which caused an incorrect evaluation for specific datasets like m4_weekly/W/short. The bug has now been fixed, and we have re-run and checked the evaluation accordingly.

We appreciate your help in catching this. Please let us know if there are any other issues, and we will address them as soon as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants