Skip to content

Commit 50fc50e

Browse files
authored
docs: Fix mkdocs syntax and update person sampling documentation (#249)
* remove colon * update person sampling docs
1 parent 8106628 commit 50fc50e

2 files changed

Lines changed: 38 additions & 5 deletions

File tree

docs/code_reference/run_config.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@
33
The `run_config` module defines runtime settings that control dataset generation behavior,
44
including early shutdown thresholds, batch sizing, and non-inference worker concurrency.
55

6-
:::: data_designer.config.run_config
6+
::: data_designer.config.run_config

docs/concepts/person_sampling.md

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,12 @@ The NGC datasets are extended versions of the [open-source Nemotron-Personas dat
5858
Supported locales:
5959

6060
- `en_US`: United States
61-
- `ja_JP`: Japan
62-
- `en_IN`: India
61+
- `en_IN`: India (English)
62+
- `en_SG`: Singapore (English)
6363
- `hi_Deva_IN`: India (Devanagari script)
6464
- `hi_Latn_IN`: India (Latin script)
65+
- `ja_JP`: Japan
66+
- `pt_BR`: Brazil (Portuguese)
6567

6668
### Features
6769
- **Demographically accurate personal details**: Names, ages, sex, marital status, education, occupation based on census data
@@ -123,6 +125,12 @@ ngc registry resource download-version "nvidia/nemotron-personas/nemotron-person
123125

124126
# For Nemotron-Personas JP
125127
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-ja_jp"
128+
129+
# For Nemotron-Personas SG
130+
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-en_sg"
131+
132+
# For Nemotron-Personas BR
133+
ngc registry resource download-version "nvidia/nemotron-personas/nemotron-personas-dataset-pt_br"
126134
```
127135

128136
Then move the downloaded dataset to the Data Designer managed assets directory:
@@ -186,10 +194,20 @@ For more details, see the documentation for [`SamplerColumnConfig`](../code_refe
186194
**Japan-Specific Fields (`ja_JP`):**
187195

188196
- `area`
197+
- `prefecture`
198+
- `zone`
189199

190-
**India-Specific Fields (`en_IN`, `hi_IN`, `hi_Deva_IN`, `hi_Latn_IN`):**
200+
**Brazil-Specific Fields (`pt_BR`):**
201+
202+
- `race` - Census-reported race
203+
204+
**Brazil and India Shared Fields (`pt_BR`, `en_IN`, `hi_Deva_IN`, `hi_Latn_IN`):**
191205

192206
- `religion` - Census-reported religion
207+
208+
**India-Specific Fields (`en_IN`, `hi_Deva_IN`, `hi_Latn_IN`):**
209+
210+
- `district` - Census-reported district
193211
- `education_degree` - Census-reported education degree
194212
- `first_language` - Native language
195213
- `second_language` - Second language (if applicable)
@@ -205,11 +223,26 @@ For more details, see the documentation for [`SamplerColumnConfig`](../code_refe
205223
- Career goals
206224
- Context-specific personas (professional, financial, healthcare, sports, arts & entertainment, travel, culinary, etc.)
207225

226+
*Japan-specific persona fields:*
227+
228+
- `aspects`
229+
- `digital_skills`
230+
231+
*Brazil and India shared persona fields (`pt_BR`, `en_IN`, `hi_Deva_IN`, `hi_Latn_IN`):*
232+
233+
- `religious_persona`
234+
- `religious_background`
235+
236+
*India-specific persona fields (`en_IN`, `hi_Deva_IN`, `hi_Latn_IN`):*
237+
238+
- `linguistic_persona`
239+
- `linguistic_background`
240+
208241
### Configuration Parameters
209242

210243
| Parameter | Type | Description |
211244
|-----------|------|-------------|
212-
| `locale` | str | Language/region code - must be one of: "en_US", "ja_JP", "en_IN", "hi_Deva_IN", "hi_Latn_IN" |
245+
| `locale` | str | Language/region code - must be one of: "en_US", "en_IN", "en_SG", "hi_Deva_IN", "hi_Latn_IN", "ja_JP", "pt_BR" |
213246
| `sex` | str (optional) | Filter by "Male" or "Female" |
214247
| `city` | str or list[str] (optional) | Filter by specific city or cities within locale |
215248
| `age_range` | list[int] (optional) | Two-element list [min_age, max_age] (default: [18, 114]) |

0 commit comments

Comments
 (0)