-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathknowledge_base_examplecsv
More file actions
37 lines (31 loc) · 1.21 KB
/
knowledge_base_examplecsv
File metadata and controls
37 lines (31 loc) · 1.21 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Format: description,Subcategory
# DON'T: add a header row — the parser treats every non-# line as data
# DON'T: use no quotes around the words
# DO: use raw receipt abbreviations as they appear
# DO: use the shortest unique prefix that unambiguously identifies the product category
# tab-separated is accepted in case your spreadsheet exports that way
# Use the brand name alone when you buy many variants: PARMALAT, BARILLA, MULINO BIANCO
# The RAG system uses semantic embedding similarity, not exact matching — so, for example, `PARMALAT` alone is enough.
# The embedding model understands that `PARMALAT B12 1.5L` is semantically close to `PARMALAT`. You don't need to enumerate every variant.
# Use the generic Italian noun when brandless: LATTE, PANE, UOVA, POMODORI
BISCOTTI,Snacks
PARMALAT,Dairy & Eggs
PANE,Bakery
LATTE,Dairy & Eggs
MOZZAREL,Dairy & Eggs
POMODOR,Vegetables
BANAN,Fruit
PROSCIUTTO,Meat
SALMONE,Fish & Seafood
LAVATRICE,Cleaning Products
DOVE,Personal Care
BARILLA,Pantry
OLIO,Condiments & Spices
PATATINE,Snacks
LAYS,Snacks
ACQUA NATUR,Beverages
YOGURT,Dairy & Eggs
DIGES. MCVITIE'S,Snacks
# Can DO: include variant spellings and truncations for the same item
MCVITIE,Snacks
MCVITIE'S DIGESTIV,Snacks