Skip to content

Commit 4e7df00

Browse files
committed
Updating docs with logging and secrets.env to .env
1 parent 0c9c17a commit 4e7df00

9 files changed

Lines changed: 116 additions & 34 deletions

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ pip install datafast
4646

4747
### 1. Environment Setup
4848

49-
Make sure you have created a `secrets.env` file with your API keys.
49+
Make sure you have created a `.env` file with your API keys.
5050
HF token is needed if you want to push the dataset to your HF hub.
5151
Other keys depends on which LLM providers you use.
5252
```
@@ -64,7 +64,7 @@ from datafast.llms import OpenAIProvider, AnthropicProvider, GeminiProvider
6464
from dotenv import load_dotenv
6565

6666
# Load environment variables
67-
load_dotenv("secrets.env") # <--- your API keys
67+
load_dotenv() # <--- your API keys
6868
```
6969

7070
### 3. Configure Dataset

docs/concepts.md

Lines changed: 32 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,13 +69,41 @@ The prompt expansion system is key and enables:
6969
The datafast workflow follows a consistent pattern across all dataset types:
7070

7171
1. **Configuration**: Define the dataset parameters, classes/topics, and generation settings
72-
2. **Prompt Design**: Create base prompts with mandatory and optional placeholders
73-
3. **Provider Setup**: Initialize one or more LLM providers
74-
4. **Generation**: Execute the generation process, which:
72+
2. **Logging Setup**: Configure logging to monitor the generation process (recommended)
73+
3. **Prompt Design**: Create base prompts with mandatory and optional placeholders
74+
4. **Provider Setup**: Initialize one or more LLM providers
75+
5. **Generation**: Execute the generation process, which:
7576
- Expands prompts based on configuration
7677
- Distributes generation across providers
7778
- Collects and processes responses
78-
5. **Output**: Save the resulting dataset to a file and optionally push to Hugging Face Hub
79+
6. **Output**: Save the resulting dataset to a file and optionally push to Hugging Face Hub
80+
81+
## Logging and Monitoring
82+
83+
Datafast includes comprehensive logging to provide visibility into the generation process:
84+
85+
### Why Configure Logging?
86+
87+
Without `configure_logger()`, your datafast scripts will run silently without:
88+
- Progress indicators during generation
89+
- Rate limiting warnings
90+
- Success completion messages
91+
- Detailed error information
92+
93+
### Basic Usage
94+
95+
```python
96+
from datafast.logger_config import configure_logger
97+
98+
# Default: INFO level, console output with colors
99+
configure_logger()
100+
101+
# With file logging for long-running jobs
102+
configure_logger(level="INFO", log_file="generation.log")
103+
104+
# Debug mode for troubleshooting
105+
configure_logger(level="DEBUG", log_file="debug.log")
106+
```
79107

80108
## Dataset Diversity Mechanisms
81109

docs/guides/generating_generic_pipeline_datasets.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,16 +34,20 @@ from datafast.schema.config import GenericPipelineDatasetConfig
3434
from datafast.llms import OpenRouterProvider
3535
```
3636

37-
In addition, we'll use `dotenv` to load environment variables containing API keys:
37+
In addition, we'll use `dotenv` to load environment variables containing API keys and configure logging to monitor the generation process:
3838

3939
```python
4040
from dotenv import load_dotenv
41+
from datafast.logger_config import configure_logger
4142

4243
# Load environment variables containing API keys
43-
load_dotenv("secrets.env")
44+
load_dotenv()
45+
46+
# Configure logger to see progress, warnings, and success messages
47+
configure_logger()
4448
```
4549

46-
Make sure you have created a `secrets.env` file with your API keys:
50+
Make sure you have created a `.env` file with your API keys:
4751

4852
```
4953
OPENROUTER_API_KEY=XXXX
@@ -214,10 +218,14 @@ Here's a complete working example:
214218
from datafast.datasets import GenericPipelineDataset
215219
from datafast.schema.config import GenericPipelineDatasetConfig
216220
from datafast.llms import OpenRouterProvider
221+
from datafast.logger_config import configure_logger
217222
from dotenv import load_dotenv
218223

219224
# Load API keys
220-
load_dotenv("secrets.env")
225+
load_dotenv()
226+
227+
# Configure logger
228+
configure_logger()
221229

222230
# Define prompt
223231
PROMPT = """I will give you a persona description.

docs/guides/generating_mcq_datasets.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,19 @@ from datafast.schema.config import MCQDatasetConfig, PromptExpansionConfig
3131
from datafast.llms import OpenAIProvider, AnthropicProvider, GeminiProvider
3232
```
3333

34-
In addition, we'll use `dotenv` to load environment variables containing API keys.
34+
In addition, we'll use `dotenv` to load environment variables containing API keys and configure logging to monitor the generation process.
3535
```python
3636
from dotenv import load_dotenv
37+
from datafast.logger_config import configure_logger
3738

3839
# Load environment variables containing API keys
39-
load_dotenv("secrets.env")
40+
load_dotenv()
41+
42+
# Configure logger to see progress, warnings, and success messages
43+
configure_logger()
4044
```
4145

42-
Make sure you have created a `secrets.env` file with your API keys. HF token is needed if you want to push the dataset to your HF hub. Other keys depend on which LLM providers you use.
46+
Make sure you have created a `.env` file with your API keys. HF token is needed if you want to push the dataset to your HF hub. Other keys depend on which LLM providers you use.
4347

4448
```
4549
GEMINI_API_KEY=XXXX
@@ -253,10 +257,14 @@ Here's a complete example for creating an MCQ dataset from a local JSONL file:
253257
from datafast.datasets import MCQDataset
254258
from datafast.schema.config import MCQDatasetConfig, PromptExpansionConfig
255259
from datafast.llms import OpenAIProvider, AnthropicProvider, GeminiProvider
260+
from datafast.logger_config import configure_logger
256261
from dotenv import load_dotenv
257262

258263
# Load environment variables
259-
load_dotenv("secrets.env")
264+
load_dotenv()
265+
266+
# Configure logger
267+
configure_logger()
260268

261269
def main():
262270
# 1. Define the configuration

docs/guides/generating_preference_datasets.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,19 +21,23 @@ Generating a preference dataset with `datafast` requires these imports:
2121
from datafast.datasets import PreferenceDataset
2222
from datafast.schema.config import PreferenceDatasetConfig
2323
from datafast.llms import OpenAIProvider, GeminiProvider, AnthropicProvider
24+
from datafast.logger_config import configure_logger
2425
from dotenv import load_dotenv
2526
import json
2627
from pathlib import Path
2728
```
2829

29-
You'll need to load environment variables containing API keys:
30+
You'll need to load environment variables containing API keys and configure logging:
3031

3132
```python
3233
# Load environment variables containing API keys
33-
load_dotenv("secrets.env")
34+
load_dotenv()
35+
36+
# Configure logger to see progress, warnings, and success messages
37+
configure_logger()
3438
```
3539

36-
Make sure you have created a `secrets.env` file with your API keys for the LLM providers you plan to use:
40+
Make sure you have created a `.env` file with your API keys for the LLM providers you plan to use:
3741

3842
```
3943
OPENAI_API_KEY=sk-XXXX
@@ -235,10 +239,14 @@ from pathlib import Path
235239
from datafast.schema.config import PreferenceDatasetConfig
236240
from datafast.datasets import PreferenceDataset
237241
from datafast.llms import OpenAIProvider, GeminiProvider, AnthropicProvider
242+
from datafast.logger_config import configure_logger
238243
from dotenv import load_dotenv
239244

240245
# Load environment variables with API keys
241-
load_dotenv("secrets.env")
246+
load_dotenv()
247+
248+
# Configure logger
249+
configure_logger()
242250

243251
# Load NASA lessons learned documents from JSONL file
244252
def load_documents_from_jsonl(jsonl_path: str | Path) -> list[str]:

docs/guides/generating_text_classification_datasets.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,15 +26,19 @@ from datafast.schema.config import ClassificationDatasetConfig, PromptExpansionC
2626
from datafast.llms import OpenAIProvider, AnthropicProvider
2727
```
2828

29-
In addition, we'll use `dotenv` to load environment variables containing API keys.
29+
In addition, we'll use `dotenv` to load environment variables containing API keys and configure logging to monitor the generation process.
3030
```python
3131
from dotenv import load_dotenv
32+
from datafast.logger_config import configure_logger
3233

3334
# Load environment variables containing API keys
34-
load_dotenv("secrets.env")
35+
load_dotenv()
36+
37+
# Configure logger to see progress, warnings, and success messages
38+
configure_logger()
3539
```
3640

37-
Make sure you have created a `secrets.env` file with your API keys. HF token is needed if you want to push the dataset to your HF hub. Other keys depends on which LLM providers you use. In our example, we use OpenAI and Anthropic.
41+
Make sure you have created a `.env` file with your API keys. HF token is needed if you want to push the dataset to your HF hub. Other keys depends on which LLM providers you use. In our example, we use OpenAI and Anthropic.
3842

3943
```
4044
GEMINI_API_KEY=XXXX
@@ -236,10 +240,14 @@ Here's a complete example for creating a trail conditions classification dataset
236240
from datafast.datasets import ClassificationDataset
237241
from datafast.schema.config import ClassificationDatasetConfig, PromptExpansionConfig
238242
from datafast.llms import OpenAIProvider, AnthropicProvider
243+
from datafast.logger_config import configure_logger
239244
from dotenv import load_dotenv
240245

241246
# Load API keys
242-
load_dotenv("secrets.env")
247+
load_dotenv()
248+
249+
# Configure logger
250+
configure_logger()
243251

244252
# Configure dataset
245253
config = ClassificationDatasetConfig(

docs/guides/generating_text_datasets.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,19 @@ from datafast.schema.config import RawDatasetConfig, PromptExpansionConfig
3131
from datafast.llms import OpenAIProvider, AnthropicProvider, GeminiProvider
3232
```
3333

34-
In addition, we'll use `dotenv` to load environment variables containing API keys.
34+
In addition, we'll use `dotenv` to load environment variables containing API keys and configure logging to monitor the generation process.
3535
```python
3636
from dotenv import load_dotenv
37+
from datafast.logger_config import configure_logger
3738

3839
# Load environment variables containing API keys
39-
load_dotenv("secrets.env")
40+
load_dotenv()
41+
42+
# Configure logger to see progress, warnings, and success messages
43+
configure_logger()
4044
```
4145

42-
Make sure you have created a secrets.env file with your API keys. HF token is needed if you want to push the dataset to your HF hub. Other keys depend on which LLM providers you use. In our example, we use OpenAI and Anthropic.
46+
Make sure you have created a .env file with your API keys. HF token is needed if you want to push the dataset to your HF hub. Other keys depend on which LLM providers you use. In our example, we use OpenAI and Anthropic.
4347

4448
```
4549
GEMINI_API_KEY=XXXX
@@ -239,6 +243,14 @@ Here's a complete example script that generates a text dataset across multiple d
239243
from datafast.datasets import RawDataset
240244
from datafast.schema.config import RawDatasetConfig, PromptExpansionConfig
241245
from datafast.llms import OpenAIProvider, AnthropicProvider
246+
from datafast.logger_config import configure_logger
247+
from dotenv import load_dotenv
248+
249+
# Load environment variables
250+
load_dotenv()
251+
252+
# Configure logger
253+
configure_logger()
242254

243255

244256
def main():
@@ -303,9 +315,6 @@ def main():
303315

304316

305317
if __name__ == "__main__":
306-
from dotenv import load_dotenv
307-
308-
load_dotenv("secrets.env")
309318
main()
310319
```
311320

docs/guides/generating_ultrachat_datasets.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -31,15 +31,19 @@ from datafast.schema.config import UltrachatDatasetConfig, PromptExpansionConfig
3131
from datafast.llms import OpenAIProvider, AnthropicProvider, GeminiProvider
3232
```
3333

34-
In addition, use `dotenv` to load environment variables containing API keys:
34+
In addition, use `dotenv` to load environment variables containing API keys and configure logging to monitor the generation process:
3535
```python
3636
from dotenv import load_dotenv
37+
from datafast.logger_config import configure_logger
3738

3839
# Load environment variables containing API keys
39-
load_dotenv("secrets.env")
40+
load_dotenv()
41+
42+
# Configure logger to see progress, warnings, and success messages
43+
configure_logger()
4044
```
4145

42-
Make sure you have created a `secrets.env` file with your API keys. A Hugging Face token (HF_TOKEN) is needed if you want to push the dataset to your HF hub. Other keys depend on which LLM providers you use.
46+
Make sure you have created a `.env` file with your API keys. A Hugging Face token (HF_TOKEN) is needed if you want to push the dataset to your HF hub. Other keys depend on which LLM providers you use.
4347

4448
```
4549
GEMINI_API_KEY=XXXX
@@ -231,10 +235,14 @@ Here's a complete example for creating an Ultrachat dataset:
231235
from datafast.datasets import UltrachatDataset
232236
from datafast.schema.config import UltrachatDatasetConfig
233237
from datafast.llms import AnthropicProvider
238+
from datafast.logger_config import configure_logger
234239
from dotenv import load_dotenv
235240

236241
# Load environment variables
237-
load_dotenv("secrets.env")
242+
load_dotenv()
243+
244+
# Configure logger
245+
configure_logger()
238246

239247
def main():
240248
# 1. Define the configuration

docs/index.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Currently we support the following LLM providers:
3535

3636
### 1. Environment Setup
3737

38-
Make sure you have created a `secrets.env` file with your API keys.
38+
Make sure you have created a `.env` file with your API keys.
3939
HF token is needed if you want to push the dataset to your HF hub.
4040
Other keys depends on which LLM providers you use.
4141
```
@@ -51,10 +51,14 @@ HF_TOKEN=hf_XXXXX
5151
from datafast.datasets import ClassificationDataset
5252
from datafast.schema.config import ClassificationDatasetConfig, PromptExpansionConfig
5353
from datafast.llms import OpenAIProvider, AnthropicProvider, GeminiProvider, OpenRouterProvider
54+
from datafast.logger_config import configure_logger
5455
from dotenv import load_dotenv
5556

5657
# Load environment variables
57-
load_dotenv("secrets.env") # <--- your API keys
58+
load_dotenv() # <--- your API keys
59+
60+
# Configure logger for visibility into generation process
61+
configure_logger() # <--- see progress, warnings, and success messages
5862
```
5963

6064
### 3. Configure Dataset
@@ -135,6 +139,7 @@ Star this package to send positive vibes and support 🌟
135139
* **Multiple LLMs** used to boost dataset diversity 🤖
136140
* **Flexible prompt**: use our default prompts or provide your own custom prompts 📝
137141
* **Prompt expansion**: Combinatorial variation of prompts to maximize diversity 🔄
142+
* **Built-in logging**: Comprehensive logging with progress tracking, rate limiting warnings, and success messages 📊
138143
* **Hugging Face Integration**: Push generated datasets to the Hub 🤗
139144

140145
!!! warning

0 commit comments

Comments
 (0)