Skip to content

bugfix: Support for extracting answers expressed in thousand separato…#360

Open
yejj710 wants to merge 1 commit into
AISBench:masterfrom
yejj710:bugfix
Open

bugfix: Support for extracting answers expressed in thousand separato…#360
yejj710 wants to merge 1 commit into
AISBench:masterfrom
yejj710:bugfix

Conversation

@yejj710

@yejj710 yejj710 commented Jun 24, 2026

Copy link
Copy Markdown
Collaborator

Thanks for your contribution; we appreciate it a lot. The following instructions will make your pull request healthier and help you get feedback more easily. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

PR Type / PR类型

  • Feature(功能新增)
  • Bugfix(Bug 修复)
  • Docs(文档更新)
  • CI/CD(持续集成/持续部署)
  • Refactor(代码重构)
  • Perf(性能优化)
  • Dependency(依赖项更新)
  • Test-Cases(测试用例更新)
  • Other(其他)

Related Issue | 关联 Issue
Fixes #(issue ID / issue 编号) / Relates to #(issue ID / issue 编号)

🔍 Motivation / 变更动机

#359
提取答案的时候,存在提取逻辑不够完备导致的分数偏低问题。

📝 Modification / 修改内容

参考golden答案提取的逻辑,将','替换为空字符

@yejj710 yejj710 had a problem deploying to smoke-test-approval June 24, 2026 09:36 — with GitHub Actions Failure

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the GSM8K dataset post-processing logic to remove commas from the text before extracting numbers. The review feedback highlights that blindly replacing all commas can cause issues by merging comma-separated numbers (e.g., "1,2" into "12") and suggests using a regular expression to target only thousand separators.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@TEXT_POSTPROCESSORS.register_module('gsm8k')
def gsm8k_postprocess(text: str) -> str:
text = text.split('Question:')[0]
text = text.split('Question:')[0].replace(',', '')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Blindly replacing all commas with an empty string can lead to incorrect parsing of numbers that are separated by commas without spaces (e.g., a list of numbers like 1,2 or coordinates). For example, 1,2 would be merged into 12, causing the postprocessor to extract 12 instead of 2.

To safely remove only thousand separators, you can use a regular expression that targets commas preceded by a digit and followed by exactly three digits (and a word boundary or decimal point).

Suggested change
text = text.split('Question:')[0].replace(',', '')
text = re.sub(r'(?<=\d),(?=\d{3}\b)', '', text.split('Question:')[0])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant