Skip to content

Commit 7746cda

Browse files
committed
docs: update citations, readme, changelog and add API key guide for v0.1.6
1 parent 188f07d commit 7746cda

File tree

9 files changed

+449
-62
lines changed

9 files changed

+449
-62
lines changed

CHANGELOG.md

Lines changed: 36 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,42 @@
1-
## [0.1.5] - 08-02-2026
1+
## [0.1.6] - 2026-04-02
2+
3+
### Changed
4+
5+
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
6+
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
27

38
### Added
9+
10+
- Guide for API key creation for various LLM providers and publisher APIs added to the documentation at `docs/getting-started/api-key-guide.md` with detailed instructions for each provider.
11+
12+
### Fixed
13+
14+
- Model prefix handling in `rag_tool.py` standardized to reflect the docs.
15+
- `HF_TOKEN` documentation clarified as optional — only required for gated or private Hugging Face models.
16+
17+
---
18+
19+
## [0.1.5] - 2026-02-08
20+
21+
### Added
22+
423
- Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the `examples/piezo_test/comparing_existing_frameworks` folder.
524

625
- New parameter `apply_advanced_cleaning` added to data cleaning methods in `data_cleaner.py`. When set to `True`, it triggers the advanced cleaning pipeline.
726

827
- Advanced composition cleaning methods in `data_cleaner.py`:
9-
1028
- `_remove_miller_indices()` - Removes crystal plane notations from chemical formulas
1129
- `_remove_zero_coefficient_elements()` - Removes elements with zero coefficients
1230
- `_normalize_coefficients()` - Removes trailing zeros from coefficients
1331
- `_expand_leading_and_trailing_coefficients()` - Expands leading/trailing coefficient patterns
1432
- `_expand_parenthetical_coefficients()` - Expands nested bracket coefficients
1533

1634
- Enhanced documentation in `docs/usage/data-cleaning.md`:
17-
1835
- Added `apply_advanced_cleaning` parameter documentation
1936
- Added Mermaid process flow diagram showing cleaning stages
2037
- Added advanced cleaning examples with tables for each transformation type
2138

2239
- Template for GitHub issues added to [.github/ISSUE_TEMPLATE](https://github.com/slimeslab/ComProScanner/tree/main/.github/ISSUE_TEMPLATE) for the following topics:
23-
2440
- bug reports
2541
- feature requests
2642
- documentation improvements
@@ -29,24 +45,22 @@
2945
- [Changelog page](https://slimeslab.github.io/ComProScanner/about/changelog/) added in the documentation. Also, [CHANGELOG.md](https://github.com/slimeslab/ComProScanner/blob/main/CHANGELOG.md) linked in [README.md](https://github.com/slimeslab/ComProScanner/blob/main/README.md).
3046

3147
- DeepWiki integration badge added to README.md for community Q&A support:
32-
3348
- [Ask DeepWiki](https://deepwiki.com/slimeslab/ComProScanner)
3449

3550
- arXiv preprint badge added to README.md:
36-
3751
- [arXiv:2510.20362](https://arxiv.org/abs/2510.20362)
3852

3953
- [CITATION.cff](https://github.com/slimeslab/ComProScanner/blob/main/CITATION.cff) added for standardized citation information based on the latest release and arXiv preprint.
4054

4155
### Fixed
56+
4257
- OAWorks API is replaced with OpenAlex API as OAWorks is no longer available.
4358

4459
- Empty/corrupted PDF handled in `pdf_processor.py` and `wiley_processor.py` to avoid having GLYPH errors during text extraction.
4560

4661
- Data extraction failures fixed if composition-property text data is empty.
4762

4863
- CSV progress tracking in `elsevier_processor.py`:
49-
5064
- DtypeWarning resolved by adding `dtype=str, low_memory=False` to `pd.read_csv()`
5165
- Data loss issue fixed with immediate CSV persistence for processed articles
5266
- Sleep delays optimized for batch writes
@@ -69,19 +83,19 @@
6983

7084
- README badges section converted from HTML to markdown format for better compatibility across platforms.
7185

72-
## [0.1.4] - 02-12-2025
86+
---
87+
88+
## [0.1.4] - 2025-12-02
7389

7490
### Added
7591

7692
- New function `clean_data()` added for improved data cleaning and preprocessing instead of integrating it into data extraction function.
7793

7894
- New documentation page for Data Cleaning added:
79-
8095
- docs/usage/data-cleaning.md
8196
- Added to mkdocs.yml navigation.
8297

8398
- New API overview documentation page added:
84-
8599
- docs/api.md
86100
- Added to mkdocs.yml navigation.
87101
- New mkdocstrings configuration added to mkdocs.yml for automatic API documentation generation.
@@ -104,30 +118,38 @@
104118
- [ComProScanner Logo](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/comproscanner_logo.png)
105119
- [ComProScanner Workflow](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/overall_workflow.png)
106120

107-
## [0.1.3] - 04-11-2025
121+
---
122+
123+
## [0.1.3] - 2025-11-04
108124

109125
### Fixed
110126

111127
- **RecursiveCharacterTextSplitter** importing updated for latest _langchain_ version to avoid import errors:
112128
- Changed from `from langchain.text_splitter import RecursiveCharacterTextSplitter`
113129
- To `from langchain.text_splitter.recursive_character import RecursiveCharacterTextSplitter`
114130

115-
## [0.1.2] - 24-10-2025
131+
---
132+
133+
## [0.1.2] - 2025-10-24
116134

117135
### Added
118136

119137
- Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
120138
- [arXiv:2510.20362](https://arxiv.org/abs/2510.20362)
121139

122-
## [0.1.1] - 22-10-2025
140+
---
141+
142+
## [0.1.1] - 2025-10-22
123143

124144
### Fixed
125145

126146
- README images updated with external image link to fix PyPI rendering issue.
127147
- [ComProScanner Logo](https://i.ibb.co/whHSbGvT/comproscanner-logo.png)
128148
- [ComProScanner Workflow](https://i.ibb.co/QWd2qd3/overall-workflow.png)
129149

130-
## [0.1.0] - 22-10-2025
150+
---
151+
152+
## [0.1.0] - 2025-10-22
131153

132154
### Added
133155

CITATION.cff

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ contact:
1616
- family-names: Roy
1717
given-names: Aritra
1818
orcid: "https://orcid.org/0000-0002-4928-2935"
19-
message: If you use this software, please cite our article on arXiv.
19+
message: If you use this software, please cite our article in Digital Discovery.
2020
preferred-citation:
2121
authors:
2222
- family-names: Roy
@@ -31,21 +31,28 @@ preferred-citation:
3131
- family-names: Gattinoni
3232
given-names: Chiara
3333
orcid: "https://orcid.org/0000-0002-3376-6374"
34-
date-published: 2025-10-23
34+
doi: "10.1039/D5DD00521C"
3535
identifiers:
36+
- type: doi
37+
value: "10.1039/D5DD00521C"
38+
description: "Peer-reviewed article"
3639
- type: other
3740
value: "arXiv:2510.20362"
3841
description: "arXiv preprint"
39-
title: "ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature"
42+
journal: "Digital Discovery"
43+
publisher:
44+
name: "RSC"
45+
status: advance-online
46+
title: "ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature"
4047
type: article
41-
url: "https://arxiv.org/abs/2510.20362"
48+
url: "https://doi.org/10.1039/D5DD00521C"
4249
repository-code: "https://github.com/slimeslab/ComProScanner"
4350
license: MIT
4451
title: "ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature"
4552
type: software
4653
url: "https://slimeslab.github.io/ComProScanner/"
47-
version: "0.1.4"
48-
date-released: 2025-12-03
54+
version: "0.1.6"
55+
date-released: 2026-04-02
4956
keywords:
5057
- materials science
5158
- data extraction

README.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -169,14 +169,15 @@ eval_visualizer.plot_multiple_radar_charts(
169169
If you use ComProScanner in your research, please cite:
170170

171171
```bibtex
172-
@misc{roy2025comproscannermultiagentbasedframework,
173-
title={ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature},
174-
author={Aritra Roy and Enrico Grisan and John Buckeridge and Chiara Gattinoni},
175-
year={2025},
176-
eprint={2510.20362},
177-
archivePrefix={arXiv},
178-
primaryClass={physics.comp-ph},
179-
url={https://arxiv.org/abs/2510.20362},
172+
@Article{roy2026comproscannermultiagentbasedframework,
173+
author ="Roy, Aritra and Grisan, Enrico and Buckeridge, John and Gattinoni, Chiara",
174+
title ="ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature",
175+
journal ="Digital Discovery",
176+
year ="2026",
177+
pages ="Accepted",
178+
publisher ="RSC",
179+
doi ="10.1039/D5DD00521C",
180+
url ="https://doi.org/10.1039/D5DD00521C"
180181
}
181182
```
182183

docs/about/changelog.md

Lines changed: 50 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,42 @@
1-
## Unreleased
1+
## [0.1.6] - 2026-04-02
2+
3+
### Changed
4+
5+
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
6+
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
27

38
### Added
49

10+
- Guide for API key creation for various LLM providers and publisher APIs added to the documentation at `docs/getting-started/api-key-guide.md` with detailed instructions for each provider.
11+
12+
### Fixed
13+
14+
- Model prefix handling in `rag_tool.py` standardized to reflect the docs.
15+
- `HF_TOKEN` documentation clarified as optional — only required for gated or private Hugging Face models.
16+
17+
---
18+
19+
## [0.1.5] - 2026-02-08
20+
21+
### Added
22+
23+
- Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the `examples/piezo_test/comparing_existing_frameworks` folder.
24+
525
- New parameter `apply_advanced_cleaning` added to data cleaning methods in `data_cleaner.py`. When set to `True`, it triggers the advanced cleaning pipeline.
626

727
- Advanced composition cleaning methods in `data_cleaner.py`:
8-
928
- `_remove_miller_indices()` - Removes crystal plane notations from chemical formulas
1029
- `_remove_zero_coefficient_elements()` - Removes elements with zero coefficients
1130
- `_normalize_coefficients()` - Removes trailing zeros from coefficients
1231
- `_expand_leading_and_trailing_coefficients()` - Expands leading/trailing coefficient patterns
1332
- `_expand_parenthetical_coefficients()` - Expands nested bracket coefficients
1433

1534
- Enhanced documentation in `docs/usage/data-cleaning.md`:
16-
1735
- Added `apply_advanced_cleaning` parameter documentation
1836
- Added Mermaid process flow diagram showing cleaning stages
1937
- Added advanced cleaning examples with tables for each transformation type
2038

2139
- Template for GitHub issues added to [.github/ISSUE_TEMPLATE](https://github.com/slimeslab/ComProScanner/tree/main/.github/ISSUE_TEMPLATE) for the following topics:
22-
2340
- bug reports
2441
- feature requests
2542
- documentation improvements
@@ -28,19 +45,22 @@
2845
- [Changelog page](https://slimeslab.github.io/ComProScanner/about/changelog/) added in the documentation. Also, [CHANGELOG.md](https://github.com/slimeslab/ComProScanner/blob/main/CHANGELOG.md) linked in [README.md](https://github.com/slimeslab/ComProScanner/blob/main/README.md).
2946

3047
- DeepWiki integration badge added to README.md for community Q&A support:
31-
3248
- [Ask DeepWiki](https://deepwiki.com/slimeslab/ComProScanner)
3349

3450
- arXiv preprint badge added to README.md:
35-
3651
- [arXiv:2510.20362](https://arxiv.org/abs/2510.20362)
3752

3853
- [CITATION.cff](https://github.com/slimeslab/ComProScanner/blob/main/CITATION.cff) added for standardized citation information based on the latest release and arXiv preprint.
3954

4055
### Fixed
4156

42-
- CSV progress tracking in `elsevier_processor.py`:
57+
- OAWorks API is replaced with OpenAlex API as OAWorks is no longer available.
58+
59+
- Empty/corrupted PDF handled in `pdf_processor.py` and `wiley_processor.py` to avoid having GLYPH errors during text extraction.
4360

61+
- Data extraction failures fixed if composition-property text data is empty.
62+
63+
- CSV progress tracking in `elsevier_processor.py`:
4464
- DtypeWarning resolved by adding `dtype=str, low_memory=False` to `pd.read_csv()`
4565
- Data loss issue fixed with immediate CSV persistence for processed articles
4666
- Sleep delays optimized for batch writes
@@ -63,19 +83,19 @@
6383

6484
- README badges section converted from HTML to markdown format for better compatibility across platforms.
6585

66-
## [0.1.4] - 02-12-2025
86+
---
87+
88+
## [0.1.4] - 2025-12-02
6789

6890
### Added
6991

7092
- New function `clean_data()` added for improved data cleaning and preprocessing instead of integrating it into data extraction function.
7193

7294
- New documentation page for Data Cleaning added:
73-
7495
- docs/usage/data-cleaning.md
7596
- Added to mkdocs.yml navigation.
7697

7798
- New API overview documentation page added:
78-
7999
- docs/api.md
80100
- Added to mkdocs.yml navigation.
81101
- New mkdocstrings configuration added to mkdocs.yml for automatic API documentation generation.
@@ -94,29 +114,42 @@
94114

95115
### Changed
96116

97-
- README images updated with raw GitHub links for better reliability: [ComProScanner Logo](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/comproscanner_logo.png), [ComProScanner Workflow](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/overall_workflow.png)
117+
- README images updated with raw GitHub links for better reliability:
118+
- [ComProScanner Logo](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/comproscanner_logo.png)
119+
- [ComProScanner Workflow](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/overall_workflow.png)
98120

99-
## [0.1.3] - 04-11-2025
121+
---
122+
123+
## [0.1.3] - 2025-11-04
100124

101125
### Fixed
102126

103127
- **RecursiveCharacterTextSplitter** importing updated for latest _langchain_ version to avoid import errors:
104128
- Changed from `from langchain.text_splitter import RecursiveCharacterTextSplitter`
105129
- To `from langchain.text_splitter.recursive_character import RecursiveCharacterTextSplitter`
106130

107-
## [0.1.2] - 24-10-2025
131+
---
132+
133+
## [0.1.2] - 2025-10-24
108134

109135
### Added
110136

111-
- Link to ComProScanner preprint on arXiv in the documentation index page and README.md: [arXiv:2510.20362](https://arxiv.org/abs/2510.20362)
137+
- Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
138+
- [arXiv:2510.20362](https://arxiv.org/abs/2510.20362)
139+
140+
---
112141

113-
## [0.1.1] - 22-10-2025
142+
## [0.1.1] - 2025-10-22
114143

115144
### Fixed
116145

117-
- README images updated with external image link to fix PyPI rendering issue. [ComProScanner Logo](https://i.ibb.co/whHSbGvT/comproscanner-logo.png), [ComProScanner Workflow](https://i.ibb.co/QWd2qd3/overall-workflow.png)
146+
- README images updated with external image link to fix PyPI rendering issue.
147+
- [ComProScanner Logo](https://i.ibb.co/whHSbGvT/comproscanner-logo.png)
148+
- [ComProScanner Workflow](https://i.ibb.co/QWd2qd3/overall-workflow.png)
149+
150+
---
118151

119-
## [0.1.0] - 22-10-2025
152+
## [0.1.0] - 2025-10-22
120153

121154
### Added
122155

docs/about/citation.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,14 @@
33
If you use ComProScanner in your research, please cite our related paper:
44

55
```bibtex
6-
@misc{roy2025comproscannermultiagentbasedframework,
7-
title={ComProScanner: A multi-agent based framework for composition-property structured data extraction from scientific literature},
8-
author={Aritra Roy and Enrico Grisan and John Buckeridge and Chiara Gattinoni},
9-
year={2025},
10-
eprint={2510.20362},
11-
archivePrefix={arXiv},
12-
primaryClass={physics.comp-ph},
13-
url={https://arxiv.org/abs/2510.20362},
6+
@Article{roy2026comproscannermultiagentbasedframework,
7+
author ="Roy, Aritra and Grisan, Enrico and Buckeridge, John and Gattinoni, Chiara",
8+
title ="ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature",
9+
journal ="Digital Discovery",
10+
year ="2026",
11+
pages ="Accepted",
12+
publisher ="RSC",
13+
doi ="10.1039/D5DD00521C",
14+
url ="https://doi.org/10.1039/D5DD00521C"
1415
}
1516
```

0 commit comments

Comments
 (0)