@@ -136,18 +136,31 @@ For any questions, please check [FAQ](https://github.com/open-sciencelab/GraphGe
136136 TRAINEE_BASE_URL=your_base_url_for_trainee_model
137137 TRAINEE_API_KEY=your_api_key_for_trainee_model
138138 ` ` `
139- 2. (Optional) If you want to modify the default generated configuration, you can edit the content of the configs/graphgen_config.yaml file.
139+ 2. (Optional) Customize generation parameters in ` graphgen/configs/` folder.
140+
141+ Edit the corresponding YAML file, e.g.:
142+
140143 ` ` ` yaml
141- # configs/aggregated_config.yaml
142- # Example configuration
143- input_data_type: " raw"
144- input_file: " resources/input_examples/raw_demo.jsonl"
145- # more configurations...
144+ # configs/cot_config.yaml
145+ input_data_type: raw
146+ input_file: resources/input_examples/raw_demo.jsonl
147+ output_data_type: cot
148+ tokenizer: cl100k_base
149+ # additional settings...
146150 ` ` `
147- 3. Run the generation script
148- ` ` ` bash
149- bash scripts/generate/generate_aggregated.sh
150- ` ` `
151+
152+ 3. Generate data
153+
154+ Pick the desired format and run the matching script:
155+
156+ | Format | Script to run | Notes |
157+ | ------------ | ---------------------------------------------- | -------------------------------------------------------------------|
158+ | ` cot` | ` bash scripts/generate/generate_cot.sh` | Chain-of-Thought Q\& A pairs |
159+ | ` atomic` | ` bash scripts/generate/generate_atomic.sh` | Atomic Q\& A pairs covering basic knowledge |
160+ | ` aggregated` | ` bash scripts/generate/generate_aggregated.sh` | Aggregated Q\& A pairs incorporating complex, integrated knowledge |
161+ | ` multi-hop` | ` bash scripts/generate/generate_multihop.sh` | Multi-hop reasoning Q\& A pairs |
162+
163+
1511644. Get the generated data
152165 ` ` ` bash
153166 ls cache/data/graphgen
0 commit comments