Skip to content

Commit 2df5f0a

Browse files
committed
fix: standardize model prefixes in rag_tool and update changelog
1 parent 9c91661 commit 2df5f0a

File tree

6 files changed

+32
-31
lines changed

6 files changed

+32
-31
lines changed

.gitignore

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -186,10 +186,6 @@ applications
186186
vlm_test
187187
examples/vlm_piezo_test
188188

189-
# Claude files
190-
CLAUDE.md
191-
.claude
192-
193189
# Test results
194190
db
195191
results

CHANGELOG.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,16 +24,20 @@
2424
- `process_articles()` now routes user-provided `doi_list` by `general_publisher` from metadata and sends each DOI only to its matching source processor.
2525

2626
---
27-
## [0.1.6] - 02-04-2026
27+
## [0.1.6] - 2026-04-02
2828
### Changed
2929
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
3030
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
3131

3232
### Added
3333
- Guide for API key creation for various LLM providers and publisher APIs added to the documentation at `docs/getting-started/api-key-guide.md` with detailed instructions for each provider.
3434

35+
### Fixed
36+
- Model prefix handling in `rag_tool.py` standardized to reflect the docs.
37+
- `HF_TOKEN` documentation clarified as optional — only required for gated or private Hugging Face models.
38+
3539
---
36-
## [0.1.5] - 08-02-2026
40+
## [0.1.5] - 2026-02-08
3741

3842
### Added
3943
- Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the `examples/piezo_test/comparing_existing_frameworks` folder.
@@ -105,7 +109,7 @@
105109
- README badges section converted from HTML to markdown format for better compatibility across platforms.
106110

107111
---
108-
## [0.1.4] - 02-12-2025
112+
## [0.1.4] - 2025-12-02
109113

110114
### Added
111115

@@ -141,7 +145,7 @@
141145
- [ComProScanner Workflow](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/overall_workflow.png)
142146

143147
---
144-
## [0.1.3] - 04-11-2025
148+
## [0.1.3] - 2025-11-04
145149

146150
### Fixed
147151

@@ -150,15 +154,15 @@
150154
- To `from langchain.text_splitter.recursive_character import RecursiveCharacterTextSplitter`
151155

152156
---
153-
## [0.1.2] - 24-10-2025
157+
## [0.1.2] - 2025-10-24
154158

155159
### Added
156160

157161
- Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
158162
- [arXiv:2510.20362](https://arxiv.org/abs/2510.20362)
159163

160164
---
161-
## [0.1.1] - 22-10-2025
165+
## [0.1.1] - 2025-10-22
162166

163167
### Fixed
164168

@@ -167,7 +171,7 @@
167171
- [ComProScanner Workflow](https://i.ibb.co/QWd2qd3/overall-workflow.png)
168172

169173
---
170-
## [0.1.0] - 22-10-2025
174+
## [0.1.0] - 2025-10-22
171175

172176
### Added
173177

docs/about/changelog.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
- `process_articles()` now routes user-provided `doi_list` by `general_publisher` from metadata and sends each DOI only to its matching source processor.
2525

2626
---
27-
## [0.1.6] - 02-04-2026
27+
## [0.1.6] - 2026-04-02
2828
### Changed
2929
- Updated [README.md](README.md), [CITATION.cff](CITATION.cff) and docs with the published version (advance article) of the ComProScanner paper in _Digital Discovery_ as fully open access:
3030
- [ComProScanner: a multi-agent based framework for composition-property structured data extraction from scientific literature](https://doi.org/10.1039/D5DD00521C)
@@ -33,7 +33,7 @@
3333
- Guide for API key creation for various LLM providers and publisher APIs added to the documentation at `docs/getting-started/api-key-guide.md` with detailed instructions for each provider.
3434

3535
---
36-
## [0.1.5] - 08-02-2026
36+
## [0.1.5] - 2026-02-08
3737

3838
### Added
3939
- Data related to comparison with other agentic data extraction frameworks added for the ComProScanner paper in the `examples/piezo_test/comparing_existing_frameworks` folder.
@@ -105,7 +105,7 @@
105105
- README badges section converted from HTML to markdown format for better compatibility across platforms.
106106

107107
---
108-
## [0.1.4] - 02-12-2025
108+
## [0.1.4] - 2025-12-02
109109

110110
### Added
111111

@@ -141,7 +141,7 @@
141141
- [ComProScanner Workflow](https://raw.githubusercontent.com/aritraroy24/ComProScanner/main/assets/overall_workflow.png)
142142

143143
---
144-
## [0.1.3] - 04-11-2025
144+
## [0.1.3] - 2025-11-04
145145

146146
### Fixed
147147

@@ -150,15 +150,15 @@
150150
- To `from langchain.text_splitter.recursive_character import RecursiveCharacterTextSplitter`
151151

152152
---
153-
## [0.1.2] - 24-10-2025
153+
## [0.1.2] - 2025-10-24
154154

155155
### Added
156156

157157
- Link to ComProScanner preprint on arXiv in the documentation index page and README.md:
158158
- [arXiv:2510.20362](https://arxiv.org/abs/2510.20362)
159159

160160
---
161-
## [0.1.1] - 22-10-2025
161+
## [0.1.1] - 2025-10-22
162162

163163
### Fixed
164164

@@ -167,7 +167,7 @@
167167
- [ComProScanner Workflow](https://i.ibb.co/QWd2qd3/overall-workflow.png)
168168

169169
---
170-
## [0.1.0] - 22-10-2025
170+
## [0.1.0] - 2025-10-22
171171

172172
### Added
173173

docs/getting-started/api-key-guide.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -200,7 +200,7 @@ OPENROUTER_API_KEY=your_openrouter_api_key
200200

201201
Environment variable: `TOGETHER_API_KEY`
202202

203-
Typical model prefixes: `together_ai/...`
203+
Typical model prefixes: `together/...`
204204

205205
How to get it:
206206

@@ -232,7 +232,7 @@ COHERE_API_KEY=your_cohere_api_key
232232

233233
Environment variable: `FIREWORKS_API_KEY`
234234

235-
Typical model prefixes: `fireworks_ai/...`
235+
Typical model prefixes: `fireworks/...`
236236

237237
How to get it:
238238

@@ -264,11 +264,12 @@ How to set it up:
264264

265265
Environment variable: `HF_TOKEN`
266266

267+
> **Optional.** Only required for downloading gated or private Hugging Face models. Public models work without a token.
268+
267269
Used for:
268270

269-
- Accessing the default Hugging Face embedding model workflow
270-
- Accessing gated or rate-limited Hugging Face models
271-
- Optional embedding/model downloads when required
271+
- Accessing gated or private Hugging Face models
272+
- Rate-limited API access
272273

273274
How to get it:
274275

docs/rag-config.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -123,7 +123,7 @@ scanner.extract_composition_property_data(
123123
scanner.extract_composition_property_data(
124124
main_extraction_keyword="d33",
125125
rag_db_path="embeddings/piezo",
126-
rag_chat_model="deepseek-chat",
126+
rag_chat_model="deepseek/deepseek-chat",
127127
rag_max_tokens=1024,
128128
rag_top_k=4,
129129
)
@@ -178,7 +178,7 @@ scanner.extract_composition_property_data(
178178
scanner.extract_composition_property_data(
179179
main_extraction_keyword="d33",
180180
rag_db_path="embeddings/piezo",
181-
rag_chat_model="together_ai/meta-llama/Llama-3-70b-chat-hf",
181+
rag_chat_model="together/meta-llama/Llama-3-70b-chat-hf",
182182
rag_max_tokens=1024,
183183
rag_top_k=4,
184184
)
@@ -220,7 +220,7 @@ scanner.extract_composition_property_data(
220220
scanner.extract_composition_property_data(
221221
main_extraction_keyword="d33",
222222
rag_db_path="embeddings/piezo",
223-
rag_chat_model="fireworks_ai/accounts/fireworks/models/llama-v3-8b-instruct",
223+
rag_chat_model="fireworks/models/llama-v3-8b-instruct",
224224
rag_max_tokens=1024,
225225
rag_top_k=4,
226226
)

src/comproscanner/extract_flow/tools/rag_tool.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -82,28 +82,28 @@ def _get_llm(self) -> BaseChatModel:
8282
"callbacks": callbacks,
8383
}
8484
# OpenAI models
85-
if model.startswith(("gpt-", "text-", "o1", "o3")):
85+
if model.startswith(("openai/", "gpt-", "text-", "o1", "o3")):
8686
self._check_package_exists("langchain_openai", model)
8787
from langchain_openai import ChatOpenAI
8888

8989
return ChatOpenAI(model=model, request_timeout=1000, **common_params)
9090

9191
# Deepseek models
92-
if model.startswith("deepseek"):
92+
if model.startswith("deepseek/"):
9393
self._check_package_exists("langchain_deepseek", model)
9494
from langchain_deepseek import ChatDeepSeek
9595

9696
return ChatDeepSeek(model=model, request_timeout=1000, **common_params)
9797

9898
# Google Gemini models
99-
elif model.startswith("gemini-"):
99+
elif model.startswith("gemini/"):
100100
self._check_package_exists("langchain_google_genai", model)
101101
from langchain_google_genai import ChatGoogleGenerativeAI
102102

103103
return ChatGoogleGenerativeAI(model=model, **common_params)
104104

105105
# Anthropic Claude models
106-
elif model.startswith("claude-"):
106+
elif model.startswith("claude/"):
107107
self._check_package_exists("langchain_anthropic", model)
108108
from langchain_anthropic import ChatAnthropic
109109

@@ -143,7 +143,7 @@ def _get_llm(self) -> BaseChatModel:
143143
return ChatCohere(model=model_name, **common_params)
144144

145145
# Fireworks models
146-
elif model.startswith(("fireworks/", "accounts/fireworks")):
146+
elif model.startswith(("fireworks/")):
147147
self._check_package_exists("langchain_fireworks", model)
148148
from langchain_fireworks import ChatFireworks
149149

0 commit comments

Comments
 (0)