Skip to content

Commit 83e900d

Browse files
committed
lots of classifier fixes
1 parent d6c605a commit 83e900d

File tree

57 files changed

+628
-248
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+628
-248
lines changed

biome.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
{
22
"root": false,
3-
"$schema": "https://biomejs.dev/schemas/2.3.10/schema.json",
3+
"$schema": "https://biomejs.dev/schemas/2.3.11/schema.json",
44
"vcs": {
55
"enabled": true,
66
"clientKind": "git",

bun.lock

Lines changed: 218 additions & 16 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@
7474
"prepublishOnly": "bun run build && bun run test"
7575
},
7676
"devDependencies": {
77-
"@biomejs/biome": "^2.0.0",
77+
"@biomejs/biome": "^2.3.11",
7878
"@types/bun": "^1.2.9",
7979
"@types/pdfkit": "^0.17.4",
8080
"@vitest/coverage-v8": "^3.0.0",
@@ -86,6 +86,7 @@
8686
"vitest": "^3.0.0"
8787
},
8888
"dependencies": {
89+
"biome": "^0.3.3",
8990
"commander": "^14.0.2",
9091
"csv-parse": "^6.1.0",
9192
"exceljs": "^4.4.0",

src/caching/integration.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ describe('Cache Integration', () => {
339339

340340
const result = await lookupActivityPlaces(suggestions, config, cache)
341341

342-
expect(result).toHaveLength(1)
342+
expect(result.activities).toHaveLength(1)
343343
expect(mockFetch).toHaveBeenCalledTimes(1)
344344
})
345345

src/classifier/prompt-sections.ts

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
/**
2+
* Shared Prompt Sections
3+
*
4+
* Reusable prompt sections used by both suggestion and agreement prompts.
5+
*/
6+
7+
import { VALID_CATEGORIES } from '../categories'
8+
import { VALID_LINK_TYPES } from '../search/types'
9+
10+
export const SHARED_INCLUDE_RULES = `INCLUDE (output these):
11+
- Named places: restaurants, cafes, food trucks, bars, venues, parks, trails
12+
- Specific activities: hiking, kayaking, concerts, movies, shows
13+
- Travel plans: trips, destinations, hotels, Airbnb
14+
- Events: festivals, markets, concerts, exhibitions
15+
- Things to do: hobbies, experiences, skills, sports, games
16+
- Generic but actionable: "Let's go to a cafe" (specific type of place)`
17+
18+
export const SHARED_TENSE_RULES = `CRITICAL - ONLY FUTURE SUGGESTIONS:
19+
We want IDEAS for things to do in the future - NOT things already happening or already done.
20+
21+
✅ SUGGESTIONS (include): "We should go to X", "Let's try X", "Want to visit X?", "X looks cool"
22+
❌ PRESENT (skip): "I'm going to X", "I'm at X now", "Heading to X", "Going to get Y"
23+
❌ PAST (skip): "I went to X", "I was at X yesterday", "We did X last week"
24+
25+
Even if a message contains a Google Maps link or place name, SKIP IT if the person is describing what they're doing RIGHT NOW or what they already did. We only want future plans and suggestions.
26+
27+
Examples:
28+
- "I'm going to the shops" → SKIP (present action, not a suggestion)
29+
- "I'm going here to get some boxes [maps link]" → SKIP (current errand, even with link)
30+
- "Let's go to the department store sometime" → INCLUDE (future suggestion)
31+
- "We should check out this place [maps link]" → INCLUDE (suggestion with link)`
32+
33+
export const SHARED_SKIP_RULES = `SKIP (don't output):
34+
- Vague: "wanna go out?", "do something fun", "go somewhere"
35+
- Logistics: "leave at 3:50pm", "skip the nachos"
36+
- Questions: "where should we go?"
37+
- Links without clear discussion about visiting/attending
38+
- Errands: groceries, vet, mechanic, cleaning, picking up items
39+
- Work/appointments/chores
40+
- Romantic/intimate, adult content
41+
- Sad or stressful: funerals, hospitals, work deadlines, financial worries
42+
- Sensitive: potential secrets, embarrassing messages, offensive content, or illegal activities
43+
- Unclear references: "go there again" (where?), "check it out" (what?)`
44+
45+
export const SHARED_IMAGE_SECTION = `IMAGE HINTS:
46+
image.stock: ALWAYS REQUIRED - specific stock photo query with location context when relevant.
47+
image.mediaKey: Media library key (e.g., "hot air balloon", "restaurant").
48+
image.preferStock: true if stock is more specific than mediaKey (e.g., "balloon in Cappadocia" vs generic balloon).`
49+
50+
export const SHARED_LINK_SECTION = `LINK HINTS (specific media titles only): Types: ${VALID_LINK_TYPES.join(', ')}
51+
- "watch Oppenheimer" → link:{type:"movie", query:"Oppenheimer"}
52+
- "watch The Bear" → link:{type:"tv_show", query:"The Bear"}
53+
- "play Wingspan" → link:{type:"physical_game", query:"Wingspan"}
54+
- "play Baldur's Gate 3" → link:{type:"video_game", query:"Baldur's Gate 3"}
55+
Use "media" when UNSURE if movie or TV show. Use "game" when UNSURE if video game or board game.
56+
DON'T use for: generic ("go to movies"), places (use placeName), bands (use wikiName).`
57+
58+
export const SHARED_CATEGORIES_SECTION = `CATEGORIES: ${VALID_CATEGORIES.join(', ')}
59+
("other" should be used only as a last resort. Only use it if no other category applies.)`
60+
61+
export const SHARED_NORMALIZATION = `NORMALIZATION:
62+
- Distinct categories: cafe≠restaurant, bar≠restaurant
63+
- KEEP mediaKey specificity: "glow worm cave" not "cave", "hot air balloon" not "balloon"
64+
- Disambiguation: "play pool"→"billiards" (cue game), "swim in pool"→"swimming pool"`
65+
66+
export const SHARED_COMPOUND_SECTION = `COMPOUND vs MULTIPLE: For complex activities that one JSON object can't fully represent (e.g., "Go to Iceland and see the aurora"), emit ONE object. For truly separate activities, emit multiple objects.`
67+
68+
// Examples from IMAGES.md - used by both suggestion and agreement prompts
69+
export const SHARED_EXAMPLES = `EXAMPLES:
70+
1. "let's go to Paris" → city:"Paris", country:"France", cat:"travel", image:{stock:"paris france eiffel tower", mediaKey:"city", preferStock:true}
71+
2. "trip to Waiheke" → placeName:"Waiheke Island", region:"Auckland", country:"New Zealand", image:{stock:"waiheke island beach vineyard", mediaKey:"island", preferStock:true}
72+
3. "board games at Dice Goblin" → placeQuery:"Dice Goblin Auckland", cat:"gaming", image:{stock:"board game cafe meetup", mediaKey:"board game", preferStock:true}
73+
4. "see Infected Mushroom in Auckland" → wikiName:"Infected Mushroom", city:"Auckland", cat:"music", image:{stock:"psytrance rave edm concert", mediaKey:"concert", preferStock:true}
74+
5. "visit geothermal park in Rotorua" → city:"Rotorua", cat:"nature", image:{stock:"rotorua mud pools geyser geothermal", mediaKey:"geothermal park", preferStock:true}
75+
6. "watch The Matrix" → cat:"entertainment", link:{type:"movie", query:"The Matrix"}, image:{stock:"movie night popcorn", mediaKey:"movie night"}
76+
7. "watch Severance" (unsure if movie/TV) → cat:"entertainment", link:{type:"media", query:"Severance"}, image:{stock:"tv show streaming", mediaKey:"movie night"}
77+
8. "play Exploding Kittens" (unsure if video/board game) → cat:"gaming", link:{type:"game", query:"Exploding Kittens"}, image:{stock:"card game friends", mediaKey:"card game"}
78+
9. "go to the theatre" → cat:"entertainment", image:{stock:"theatre stage performance", mediaKey:"theatre"}
79+
10. "hot air balloon ride" (generic) → cat:"experiences", image:{stock:"hot air balloon sunrise", mediaKey:"hot air balloon"}
80+
11. "hot air balloon in Turkey" → country:"Turkey", cat:"experiences", image:{stock:"cappadocia hot air balloon sunrise", mediaKey:"hot air balloon", preferStock:true}`
81+
82+
export function buildUserContextSection(homeCountry: string, timezone?: string): string {
83+
const timezoneInfo = timezone ? `\nTimezone: ${timezone}` : ''
84+
return `USER CONTEXT:
85+
Home country: ${homeCountry}${timezoneInfo}`
86+
}
87+
88+
export function buildJsonSchemaSection(includeOffset: boolean): string {
89+
const offsetField = includeOffset
90+
? ` "off": <message_offset: 0 if activity is in >>> message, -1 for immediately before, -2 for two before, etc.>,\n`
91+
: ''
92+
93+
return `OUTPUT FORMAT:
94+
Return JSON array with ONLY activities worth saving. Skip non-activities entirely. Return [] if none found.
95+
96+
\`\`\`json
97+
[
98+
{
99+
"msg": <message_id>,
100+
${offsetField} "title": "<activity description, under 100 chars, fix any typos (e.g., 'ballon'→'balloon')>",
101+
"fun": <0.0-5.0 how fun/enjoyable>,
102+
"int": <0.0-5.0 how interesting/unique>,
103+
"cat": "<category>",
104+
105+
// Location fields (top-level, for geocoding + images)
106+
"wikiName": "<Wikipedia topic for things like bands, board games, concepts>",
107+
"placeName": "<canonical named place - valid Wikipedia title (e.g., 'Waiheke Island', 'Mount Fuji')>",
108+
"placeQuery": "<specific named business for Google Places (e.g., 'Dice Goblin Auckland') - NOT generic searches>",
109+
"city": "<city name>",
110+
"region": "<state/province>",
111+
"country": "<country>",
112+
113+
// Image hints (REQUIRED - stock is always required, mediaKey is optional)
114+
"image": {
115+
"stock": "<stock photo query - ALWAYS REQUIRED (e.g., 'hot air balloon cappadocia sunrise')>",
116+
"mediaKey": "<media library key (e.g., 'hot air balloon', 'restaurant')>",
117+
"preferStock": <true if stock query is more specific than generic mediaKey>
118+
},
119+
120+
// Link hints (for resolving media entities to canonical URLs) - use for movies, books, games, music, etc.
121+
"link": {
122+
"type": "<${VALID_LINK_TYPES.join('|')}>",
123+
"query": "<canonical title (e.g., 'The Matrix', 'Project Hail Mary', 'Wingspan')>"
124+
}
125+
}
126+
]
127+
\`\`\`
128+
129+
(OMIT fields that would be null - don't include them. placeName and placeQuery are mutually exclusive - prefer placeName for canonical places.)`
130+
}
131+
132+
export function buildLocationSection(homeCountry: string): string {
133+
return `LOCATION FIELDS (only if explicitly mentioned):
134+
wikiName: Wikipedia topic for bands/games/concepts (NOT movies/books - use link).
135+
placeName: Canonical place with Wikipedia article (e.g., "Waiheke Island"). Mutually exclusive with placeQuery.
136+
placeQuery: SPECIFIC named business for Google Places (e.g., "Dice Goblin Auckland"). NOT generic searches.
137+
city/region/country: For ambiguous names, assume ${homeCountry}.`
138+
}

src/classifier/prompt.ts

Lines changed: 20 additions & 123 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,22 @@
77
* - Agreement prompt: For [AGREE] candidates pointing to earlier messages
88
*/
99

10-
import { VALID_CATEGORIES } from '../categories'
1110
import type { ScrapedMetadata } from '../scraper/types'
12-
import { VALID_LINK_TYPES } from '../search/types'
1311
import type { CandidateMessage, ContextMessage } from '../types'
12+
import {
13+
buildJsonSchemaSection,
14+
buildLocationSection,
15+
buildUserContextSection,
16+
SHARED_CATEGORIES_SECTION,
17+
SHARED_COMPOUND_SECTION,
18+
SHARED_EXAMPLES,
19+
SHARED_IMAGE_SECTION,
20+
SHARED_INCLUDE_RULES,
21+
SHARED_LINK_SECTION,
22+
SHARED_NORMALIZATION,
23+
SHARED_SKIP_RULES,
24+
SHARED_TENSE_RULES
25+
} from './prompt-sections'
1426

1527
// Re-export parsing function (types re-exported from ./index.ts)
1628
export { parseClassificationResponse } from './response-parser'
@@ -135,125 +147,6 @@ export interface ClassificationContext {
135147
readonly urlMetadata?: Map<string, ScrapedMetadata> | undefined
136148
}
137149

138-
// ============================================================================
139-
// SHARED PROMPT SECTIONS
140-
// ============================================================================
141-
142-
function buildUserContextSection(context: ClassificationContext): string {
143-
const timezoneInfo = context.timezone ? `\nTimezone: ${context.timezone}` : ''
144-
return `USER CONTEXT:
145-
Home country: ${context.homeCountry}${timezoneInfo}`
146-
}
147-
148-
const SHARED_INCLUDE_RULES = `INCLUDE (output these):
149-
- Named places: restaurants, cafes, food trucks, bars, venues, parks, trails
150-
- Specific activities: hiking, kayaking, concerts, movies, shows
151-
- Travel plans: trips, destinations, hotels, Airbnb
152-
- Events: festivals, markets, concerts, exhibitions
153-
- Things to do: hobbies, experiences, skills, sports, games
154-
- Generic but actionable: "Let's go to a cafe" (specific type of place)`
155-
156-
const SHARED_SKIP_RULES = `SKIP (don't output):
157-
- Vague: "wanna go out?", "do something fun", "go somewhere"
158-
- Logistics: "leave at 3:50pm", "skip the nachos"
159-
- Questions: "where should we go?"
160-
- Links without clear discussion about visiting/attending
161-
- Errands: groceries, vet, mechanic, cleaning
162-
- Work/appointments/chores
163-
- Romantic/intimate, adult content
164-
- Sad or stressful: funerals, hospitals, work deadlines, financial worries
165-
- Sensitive: potential secrets, embarrassing messages, offensive content, or illegal activities
166-
- Unclear references: "go there again" (where?), "check it out" (what?)`
167-
168-
function buildJsonSchemaSection(includeOffset: boolean): string {
169-
const offsetField = includeOffset
170-
? ` "off": <message_offset: 0 if activity is in >>> message, -1 for immediately before, -2 for two before, etc.>,\n`
171-
: ''
172-
173-
return `OUTPUT FORMAT:
174-
Return JSON array with ONLY activities worth saving. Skip non-activities entirely. Return [] if none found.
175-
176-
\`\`\`json
177-
[
178-
{
179-
"msg": <message_id>,
180-
${offsetField} "title": "<activity description, under 100 chars, fix any typos (e.g., 'ballon'→'balloon')>",
181-
"fun": <0.0-5.0 how fun/enjoyable>,
182-
"int": <0.0-5.0 how interesting/unique>,
183-
"cat": "<category>",
184-
185-
// Location fields (top-level, for geocoding + images)
186-
"wikiName": "<Wikipedia topic for things like bands, board games, concepts>",
187-
"placeName": "<canonical named place - valid Wikipedia title (e.g., 'Waiheke Island', 'Mount Fuji')>",
188-
"placeQuery": "<specific named business for Google Places (e.g., 'Dice Goblin Auckland') - NOT generic searches>",
189-
"city": "<city name>",
190-
"region": "<state/province>",
191-
"country": "<country>",
192-
193-
// Image hints (REQUIRED - stock is always required, mediaKey is optional)
194-
"image": {
195-
"stock": "<stock photo query - ALWAYS REQUIRED (e.g., 'hot air balloon cappadocia sunrise')>",
196-
"mediaKey": "<media library key (e.g., 'hot air balloon', 'restaurant')>",
197-
"preferStock": <true if stock query is more specific than generic mediaKey>
198-
},
199-
200-
// Link hints (for resolving media entities to canonical URLs) - use for movies, books, games, music, etc.
201-
"link": {
202-
"type": "<${VALID_LINK_TYPES.join('|')}>",
203-
"query": "<canonical title (e.g., 'The Matrix', 'Project Hail Mary', 'Wingspan')>"
204-
}
205-
}
206-
]
207-
\`\`\`
208-
209-
(OMIT fields that would be null - don't include them. placeName and placeQuery are mutually exclusive - prefer placeName for canonical places.)`
210-
}
211-
212-
const SHARED_IMAGE_SECTION = `IMAGE HINTS:
213-
image.stock: ALWAYS REQUIRED - specific stock photo query with location context when relevant.
214-
image.mediaKey: Media library key (e.g., "hot air balloon", "restaurant").
215-
image.preferStock: true if stock is more specific than mediaKey (e.g., "balloon in Cappadocia" vs generic balloon).`
216-
217-
const SHARED_LINK_SECTION = `LINK HINTS (specific media titles only): Types: ${VALID_LINK_TYPES.join(', ')}
218-
- "watch Oppenheimer" → link:{type:"movie", query:"Oppenheimer"}
219-
- "watch The Bear" → link:{type:"tv_show", query:"The Bear"}
220-
- "play Wingspan" → link:{type:"physical_game", query:"Wingspan"}
221-
- "play Baldur's Gate 3" → link:{type:"video_game", query:"Baldur's Gate 3"}
222-
Use "media" when UNSURE if movie or TV show. Use "game" when UNSURE if video game or board game.
223-
DON'T use for: generic ("go to movies"), places (use placeName), bands (use wikiName).`
224-
225-
function buildLocationSection(homeCountry: string): string {
226-
return `LOCATION FIELDS (only if explicitly mentioned):
227-
wikiName: Wikipedia topic for bands/games/concepts (NOT movies/books - use link).
228-
placeName: Canonical place with Wikipedia article (e.g., "Waiheke Island"). Mutually exclusive with placeQuery.
229-
placeQuery: SPECIFIC named business for Google Places (e.g., "Dice Goblin Auckland"). NOT generic searches.
230-
city/region/country: For ambiguous names, assume ${homeCountry}.`
231-
}
232-
233-
const SHARED_CATEGORIES_SECTION = `CATEGORIES: ${VALID_CATEGORIES.join(', ')}
234-
("other" should be used only as a last resort. Only use it if no other category applies.)`
235-
236-
const SHARED_NORMALIZATION = `NORMALIZATION:
237-
- Distinct categories: cafe≠restaurant, bar≠restaurant
238-
- KEEP mediaKey specificity: "glow worm cave" not "cave", "hot air balloon" not "balloon"
239-
- Disambiguation: "play pool"→"billiards" (cue game), "swim in pool"→"swimming pool"`
240-
241-
const SHARED_COMPOUND_SECTION = `COMPOUND vs MULTIPLE: For complex activities that one JSON object can't fully represent (e.g., "Go to Iceland and see the aurora"), emit ONE object. For truly separate activities, emit multiple objects.`
242-
243-
// Examples from IMAGES.md - used by both suggestion and agreement prompts
244-
const SHARED_EXAMPLES = `EXAMPLES:
245-
1. "let's go to Paris" → city:"Paris", country:"France", cat:"travel", image:{stock:"paris france eiffel tower", mediaKey:"city", preferStock:true}
246-
2. "trip to Waiheke" → placeName:"Waiheke Island", region:"Auckland", country:"New Zealand", image:{stock:"waiheke island beach vineyard", mediaKey:"island", preferStock:true}
247-
3. "board games at Dice Goblin" → placeQuery:"Dice Goblin Auckland", cat:"gaming", image:{stock:"board game cafe meetup", mediaKey:"board game", preferStock:true}
248-
4. "see Infected Mushroom in Auckland" → wikiName:"Infected Mushroom", city:"Auckland", cat:"music", image:{stock:"psytrance rave edm concert", mediaKey:"concert", preferStock:true}
249-
5. "visit geothermal park in Rotorua" → city:"Rotorua", cat:"nature", image:{stock:"rotorua mud pools geyser geothermal", mediaKey:"geothermal park", preferStock:true}
250-
6. "watch The Matrix" → cat:"entertainment", link:{type:"movie", query:"The Matrix"}, image:{stock:"movie night popcorn", mediaKey:"movie night"}
251-
7. "watch Severance" (unsure if movie/TV) → cat:"entertainment", link:{type:"media", query:"Severance"}, image:{stock:"tv show streaming", mediaKey:"movie night"}
252-
8. "play Exploding Kittens" (unsure if video/board game) → cat:"gaming", link:{type:"game", query:"Exploding Kittens"}, image:{stock:"card game friends", mediaKey:"card game"}
253-
9. "go to the theatre" → cat:"entertainment", image:{stock:"theatre stage performance", mediaKey:"theatre"}
254-
10. "hot air balloon ride" (generic) → cat:"experiences", image:{stock:"hot air balloon sunrise", mediaKey:"hot air balloon"}
255-
11. "hot air balloon in Turkey" → country:"Turkey", cat:"experiences", image:{stock:"cappadocia hot air balloon sunrise", mediaKey:"hot air balloon", preferStock:true}`
256-
257150
// ============================================================================
258151
// SUGGESTION PROMPT (regular candidates)
259152
// ============================================================================
@@ -276,7 +169,7 @@ ${formatted}
276169

277170
return `GOAL: Extract "things to do" from chat history - activities, places, and plans worth putting on a map or list.
278171
279-
${buildUserContextSection(context)}
172+
${buildUserContextSection(context.homeCountry, context.timezone)}
280173
281174
WHY THESE MESSAGES:
282175
You're seeing messages pre-filtered by heuristics (regex patterns like "let's go", "we should try") and semantic search (embeddings). We intentionally cast a wide net - you'll see some false positives. Your job is to identify the real activities worth saving.
@@ -290,6 +183,8 @@ CRITICAL: Only extract activities that are IN the >>> message itself. Context is
290183
291184
URLs may have [URL_META: {...}] with scraped metadata - use this to understand what links are about.
292185
186+
${SHARED_TENSE_RULES}
187+
293188
${SHARED_INCLUDE_RULES}
294189
295190
${SHARED_SKIP_RULES}
@@ -338,7 +233,7 @@ ${formatted}
338233

339234
return `GOAL: Extract activities that the user is agreeing to or expressing enthusiasm about.
340235
341-
${buildUserContextSection(context)}
236+
${buildUserContextSection(context.homeCountry, context.timezone)}
342237
343238
These are AGREEMENT messages - phrases like "sounds great!", "I'm keen!", "let's do it!". Your job is to find WHAT they are agreeing to by looking at the messages BEFORE the >>> candidate.
344239
@@ -352,6 +247,8 @@ If you can't find a clear activity in the context before the agreement, skip it
352247
353248
URLs may have [URL_META: {...}] with scraped metadata - use this to understand what links are about.
354249
250+
${SHARED_TENSE_RULES}
251+
355252
${SHARED_INCLUDE_RULES}
356253
357254
${SHARED_SKIP_RULES}

src/classifier/pronoun-resolution.integration.test.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ describe('Classifier Pronoun Resolution', () => {
7878
if (!result.ok) throw new Error(result.error.message)
7979

8080
// Verify the structure without checking exact timestamp (timezone-dependent)
81-
expect(result.value).toHaveLength(1)
81+
expect(result.value.activities).toHaveLength(1)
8282
const activity = result.value.activities[0]
8383
expect(activity).toBeDefined()
8484
if (!activity) throw new Error('No activity found')

src/classifier/providers.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ import { EMPTY_LLM_USAGE } from '../types'
1313
import { DEFAULT_MODELS } from './models'
1414

1515
/** Result from a provider call including usage data */
16-
export interface ProviderResult {
16+
interface ProviderResult {
1717
readonly text: string
1818
readonly usage: LlmUsage
1919
}

0 commit comments

Comments
 (0)