Commit af2fe24
Fix parquet loading crash from datasets version mismatch (#1140)
## Summary
- When local parquet files contain HF `datasets` metadata written by a
different library version, `load_dataset("parquet")` raises a
`TypeError` during feature deserialization
- Added a fallback that catches the `TypeError` and reads parquet files
directly via PyArrow, bypassing the incompatible metadata
## Test plan
- [ ] Run `specdec_bench` with EAGLE config against local parquet
dataset files
- [ ] Verify normal (compatible) parquet loading still works via the
primary `load_dataset` path
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Bug Fixes**
* Improved robustness of the parquet dataset loader by adding a safer
fallback loading path and metadata handling to ensure reliable dataset
reads across diverse environments.
* **Chores**
* Broadened the supported version range for the datasets dependency to
increase compatibility and reduce installation friction.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Ye Yu <yeyu@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>1 parent bdc04f1 commit af2fe24
2 files changed
Lines changed: 22 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
716 | 716 | | |
717 | 717 | | |
718 | 718 | | |
719 | | - | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
720 | 740 | | |
721 | 741 | | |
722 | 742 | | |
| |||
0 commit comments