Skip to content

Pull requests: allenai/dolma

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Script to produce dolma 2 ablation config
#275 opened Sep 24, 2025 by soldni Contributor Loading…
Improve WARC processing
#260 opened Apr 15, 2025 by soldni Contributor Draft
first
#240 opened Feb 14, 2025 by Whattabatt Contributor Draft
[WIP DO NOT MERGE] Learn2Code Feature Branch
#233 opened Feb 13, 2025 by cmwilhelm Contributor Loading…
simpler logic for calculating code taggers
#229 opened Feb 12, 2025 by kyleclo Contributor Loading…
Bump openssl from 0.10.66 to 0.10.70 in the cargo group dependencies Pull requests that update a dependency file rust Pull requests that update Rust code
#228 opened Feb 3, 2025 by dependabot Bot Loading…
Fixed ignore_existing flag not working as expected.
#224 opened Jan 1, 2025 by soldni Contributor Loading…
New language ID
#223 opened Dec 30, 2024 by soldni Contributor Loading…
Adding support for Classifiers and Search tools
#219 opened Oct 24, 2024 by soldni Contributor Draft
DCLM Style Deduplications
#214 opened Sep 30, 2024 by revbucket Loading…
Mattj/requirements
#212 opened Sep 26, 2024 by revbucket Loading…
DNM: Patch FT Tagger
#210 opened Sep 25, 2024 by undfined Contributor Draft
Allow specifying different bins for visualization and computation.
#190 opened Aug 27, 2024 by soldni Contributor Loading…
New Progress Bar, Backoff, Batching
#165 opened May 23, 2024 by soldni Contributor Loading…
Warc Backoff
#160 opened May 10, 2024 by soldni Contributor Loading…
Baseline data
#61 opened Oct 20, 2023 by IanMagnusson Contributor Draft
Text modification config
#60 opened Oct 19, 2023 by rodneykinney Member Loading…
ProTip! Exclude everything labeled bug with -label:bug.