You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: generate_synthetic_table/prompts/academic.yaml
+47-16Lines changed: 47 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -94,37 +94,68 @@ generate_qa_from_image: |
94
94
95
95
generate_synthetic_table: |
96
96
You are a Synthetic Data Generator specializing in Academic Data.
97
-
Your task is to generate a new HTML table that mirrors the structure of the provided original table but contains entirely new, realistic synthetic academic data.
97
+
98
+
**CRITICAL INSTRUCTION: DO NOT COPY ORIGINAL DATA**
99
+
Your task is to generate a new HTML table with the SAME STRUCTURE as the original but COMPLETELY DIFFERENT academic data values.
100
+
The goal is to create realistic synthetic academic data that looks like it could come from the same domain, but with entirely different students, courses, and metrics.
98
101
99
102
**Inputs:**
100
-
1. **Original Table Structure:**
103
+
1. **Original Table Structure (for structure reference ONLY - DO NOT copy the data values):**
101
104
{html}
102
105
103
-
2. **Table Summary:**
106
+
2. **Table Summary (describes the data patterns to follow):**
104
107
{summary}
105
108
106
109
**Requirements:**
107
-
1. **Structure:** Keep the exact same HTML structure.
108
-
2. **Data:** Replace ALL cell values with new, synthetic academic data.
109
-
- Use realistic Korean student names, university names, course titles, and grades.
110
-
- Contexts: Transcripts, Research Papers, Enrollment Stats, Faculty Lists.
111
-
- Do NOT use real private data.
112
-
3. **Consistency:** Ensure mathematical consistency (e.g., sum of credits, correct GPA calculations if visible).
113
-
4. **Output:** Return ONLY the raw HTML string starting with `<table>` and ending with `</table>`.
110
+
1. **Structure:** Keep the exact same HTML structure (rows, columns, headers, merges).
111
+
2. **Headers:** Keep header text the same (column names, category labels).
112
+
3. **Data Transformation - MANDATORY:**
113
+
- **ALL data cell values MUST be replaced with completely new synthetic values.**
114
+
- **DO NOT copy any original data values** - generate fresh, realistic alternatives.
115
+
- For student names: Generate new Korean student names (e.g., "김철수" → "이영희", "학생A" → "학생B")
116
+
- For university names: Generate new Korean university names
117
+
- For course titles: Generate new course names
118
+
- For grades/scores: Generate new realistic values
119
+
- For model names (if research table): Generate new model/method names
120
+
- For dates: Generate new plausible dates
121
+
4. **Domain Consistency:**
122
+
- Ensure academic logic (credits sum correctly, GPA calculations valid)
123
+
- Use realistic Korean academic terminology
124
+
- Contexts: Transcripts, Research Papers, Enrollment Stats, Faculty Lists
125
+
5. **Output:** Return ONLY the raw HTML string starting with `<table>` and ending with `</table>`.
Remember: The synthetic table should look like a completely different academic dataset from the same domain.
114
133
115
134
generate_synthetic_table_from_image: |
116
135
You are a Synthetic Data Generator specializing in Academic Data.
117
-
Your task is to generate a new HTML table that mirrors the structure of the provided image but contains entirely new, realistic synthetic academic data.
136
+
137
+
**CRITICAL INSTRUCTION: DO NOT TRANSCRIBE - GENERATE NEW DATA**
138
+
Your task is NOT to OCR/transcribe the image. Instead, you must:
139
+
1. Understand the table's STRUCTURE from the image
140
+
2. Understand it's an ACADEMIC table
141
+
3. Generate COMPLETELY NEW synthetic academic data that fits the domain but uses different values
118
142
119
143
**Inputs:**
120
-
1. **Image:** An image of an academic table.
144
+
1. **Image:** An image of an academic table. Use this to understand structure and domain ONLY.
121
145
122
146
**Requirements:**
123
147
1. **Structure Preservation:** Accurately reconstruct the table structure.
124
-
2. **Data Generation:** Replace ALL cell values with new, synthetic academic data.
125
-
- Use realistic Korean student names, course titles, grades, research topics.
126
-
3. **Styling:** Use **Tailwind CSS** classes (same as default).
148
+
2. **Headers:** Keep header text (column names, category labels) the same as in the image.
149
+
3. **Data Generation - CRITICAL:**
150
+
- **DO NOT copy the data values from the image** - this is NOT an OCR task
151
+
- Generate COMPLETELY NEW synthetic academic values for all data cells
152
+
- For student/model names: Generate new names (different from what you see)
153
+
- For grades/scores: Generate new realistic values
154
+
- For course/research topics: Generate new titles
155
+
4. **Styling:** Use **Tailwind CSS** classes (same as default).
Copy file name to clipboardExpand all lines: generate_synthetic_table/prompts/business.yaml
+46-16Lines changed: 46 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -94,37 +94,67 @@ generate_qa_from_image: |
94
94
95
95
generate_synthetic_table: |
96
96
You are a Synthetic Data Generator specializing in Business Data.
97
-
Your task is to generate a new HTML table that mirrors the structure of the provided original table but contains entirely new, realistic synthetic business data.
97
+
98
+
**CRITICAL INSTRUCTION: DO NOT COPY ORIGINAL DATA**
99
+
Your task is to generate a new HTML table with the SAME STRUCTURE as the original but COMPLETELY DIFFERENT business data values.
100
+
The goal is to create realistic synthetic business data that looks like it could come from the same domain, but with entirely different companies, employees, products, and metrics.
98
101
99
102
**Inputs:**
100
-
1. **Original Table Structure:**
103
+
1. **Original Table Structure (for structure reference ONLY - DO NOT copy the data values):**
101
104
{html}
102
105
103
-
2. **Table Summary:**
106
+
2. **Table Summary (describes the data patterns to follow):**
104
107
{summary}
105
108
106
109
**Requirements:**
107
-
1. **Structure:** Keep the exact same HTML structure.
108
-
2. **Data:** Replace ALL cell values with new, synthetic business data.
109
-
- Use realistic Korean company names, department names, product lines, and financial metrics.
5. **Output:** Return ONLY the raw HTML string starting with `<table>` and ending with `</table>`.
125
+
126
+
**Example Transformation:**
127
+
- Original: "영업1팀" → Synthetic: "마케팅2팀"
128
+
- Original: "매출 5억원" → Synthetic: "매출 7.3억원"
129
+
- Original: "김부장" → Synthetic: "박과장"
130
+
131
+
Remember: The synthetic table should look like a completely different business dataset from the same domain.
114
132
115
133
generate_synthetic_table_from_image: |
116
134
You are a Synthetic Data Generator specializing in Business Data.
117
-
Your task is to generate a new HTML table that mirrors the structure of the provided image but contains entirely new, realistic synthetic business data.
135
+
136
+
**CRITICAL INSTRUCTION: DO NOT TRANSCRIBE - GENERATE NEW DATA**
137
+
Your task is NOT to OCR/transcribe the image. Instead, you must:
138
+
1. Understand the table's STRUCTURE from the image
139
+
2. Understand it's a BUSINESS table
140
+
3. Generate COMPLETELY NEW synthetic business data that fits the domain but uses different values
118
141
119
142
**Inputs:**
120
-
1. **Image:** An image of a business table.
143
+
1. **Image:** An image of a business table. Use this to understand structure and domain ONLY.
121
144
122
145
**Requirements:**
123
146
1. **Structure Preservation:** Accurately reconstruct the table structure.
124
-
2. **Data Generation:** Replace ALL cell values with new, synthetic business data.
125
-
- Use realistic Korean company names, products, sales figures.
126
-
3. **Styling:** Use **Tailwind CSS** classes (same as default).
147
+
2. **Headers:** Keep header text (column names, category labels) the same as in the image.
148
+
3. **Data Generation - CRITICAL:**
149
+
- **DO NOT copy the data values from the image** - this is NOT an OCR task
150
+
- Generate COMPLETELY NEW synthetic business values for all data cells
151
+
- For company/team names: Generate new names (different from what you see)
152
+
- For sales/revenue figures: Generate new realistic amounts
153
+
- For employee names: Generate new Korean names
154
+
4. **Styling:** Use **Tailwind CSS** classes (same as default).
Copy file name to clipboardExpand all lines: generate_synthetic_table/prompts/default.yaml
+59-20Lines changed: 59 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -78,43 +78,82 @@ generate_qa_from_image: |
78
78
Return ONLY the JSON object, no additional text.
79
79
80
80
generate_synthetic_table: |
81
-
You are a Synthetic Data Generator.
82
-
Your task is to generate a new HTML table that mirrors the structure of the provided original table but contains entirely new, realistic synthetic data.
81
+
You are a Synthetic Data Generator specialized in creating completely NEW data while preserving table structure.
82
+
83
+
**CRITICAL INSTRUCTION: DO NOT COPY ORIGINAL DATA**
84
+
Your task is to generate a new HTML table that has the SAME STRUCTURE as the original but with COMPLETELY DIFFERENT, newly generated data values.
85
+
The goal is to create realistic synthetic data that looks like it could come from the same domain, but with entirely different entities, names, numbers, and values.
83
86
84
87
**Inputs:**
85
-
1. **Original Table Structure:**
88
+
1. **Original Table Structure (for structure reference ONLY - DO NOT copy the data values):**
86
89
{html}
87
90
88
-
2. **Table Summary:**
91
+
2. **Table Summary (describes the data patterns to follow):**
89
92
{summary}
90
93
91
94
**Requirements:**
92
-
1. **Structure:** Keep the exact same HTML structure (rows, columns, headers, merges) as the original table.
93
-
2. **Data:** Replace ALL cell values with new, synthetic data.
94
-
- Use realistic Korean names, organizations, and values suitable for the context.
95
-
- Ensure the data is consistent with the column types and patterns described in the summary.
96
-
- Do NOT use real private data.
97
-
3. **Consistency:** Ensure mathematical consistency if applicable (e.g., sums, percentages).
98
-
4. **Output:** Return ONLY the raw HTML string starting with `<table>` and ending with `</table>`. Do not include markdown code blocks.
95
+
1. **Structure:** Keep the exact same HTML structure (rows, columns, headers, rowspan, colspan, merges) as the original.
96
+
2. **Headers:** Keep header text the same (column names, row labels that describe categories).
97
+
3. **Data Transformation - MANDATORY:**
98
+
- **ALL data cell values MUST be replaced with completely new synthetic values.**
99
+
- **DO NOT copy any original data values** - generate fresh, realistic alternatives.
100
+
- For names: Generate new Korean names (e.g., 김철수 → 이영희, 박민수 → 정하늘)
101
+
- For organizations: Generate new realistic Korean organization names
102
+
- For numbers: Generate new realistic numbers that follow the same pattern/range but are different values
103
+
- For dates: Generate new plausible dates
104
+
- For addresses: Generate new realistic Korean addresses
105
+
- For any other text: Generate semantically similar but different content
106
+
4. **Domain Consistency:**
107
+
- Analyze the summary to understand the domain context (finance, medical, public, etc.)
108
+
- Generate data that is realistic for that specific domain
109
+
- Maintain internal consistency (e.g., totals should sum correctly, percentages should add up)
110
+
5. **Output:** Return ONLY the raw HTML string starting with `<table>` and ending with `</table>`. No markdown code blocks.
111
+
112
+
**Example Transformation (showing the expected behavior):**
0 commit comments