Skip to content

Scraper#3

Merged
Aayush-Agnihotri merged 15 commits into
mainfrom
tony/scraper
Nov 20, 2025
Merged

Scraper#3
Aayush-Agnihotri merged 15 commits into
mainfrom
tony/scraper

Conversation

@akmatchev
Copy link
Copy Markdown
Contributor

@akmatchev akmatchev commented Nov 13, 2025

  • Scraper works
  • Test scripts for running scraper
  • Test endpoint for executing the cron job
  • Script for migrating images from eatery to docker
  • env variables

Comment thread .env.example Outdated
SPACES_ACCESS_KEY=
SPACES_SECRET_KEY=
SPACES_REGION=
SPACES_PUBLIC_URL= No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these needed now that the migration of images is done? might be an easy next attack avenue if our server is attacked/hacked

Comment thread package.json Outdated
"scrape": "tsx prisma/scraper.ts",
"test:scraper": "TEST_MODE=true tsx prisma/scraper.ts",
"migrate:static-images": "tsx prisma/migrateStaticImages.ts",
"migrate:static-images:dry-run": "DRY_RUN=true tsx prisma/migrateStaticImages.ts",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with these migrate commands as the above comment

Comment thread package.json Outdated
"format": "prettier --write '{src,test}/**/*.ts'",
"seed": "tsx prisma/seed.ts"
"scrape": "tsx prisma/scraper.ts",
"test:scraper": "TEST_MODE=true tsx prisma/scraper.ts",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we ever going to run the test to verify anything before running the actual scraper?

Comment thread package.json Outdated
},
"homepage": "https://github.com/cuappdev/eatery-blue-backend#readme",
"dependencies": {
"@aws-sdk/client-s3": "^3.930.0",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we no longer need the migrating images script, then we should uninstall this dependency

Comment thread prisma/mappers.ts
case 'General':
return EventType.GENERAL;
case 'Free Food':
return EventType.GENERAL;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the "General" or "Free Food" event types ever show up in the dining API. Instead of having them in the mapping, there should be logic to manually add these types to events for that need them (confirm with the rest of the team when these should be used.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They show up in the static eateries.

Comment thread prisma/scraperTypes.ts Outdated
}>;
}>;
}>;
payMethods: Array<{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same as RawPayMethod[] so use that

Comment thread prisma/scraperTypes.ts Outdated
nameshort: string;
about: string;
aboutshort?: string;
cornellDining?: boolean;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are some of the fields optional with the question mark
We control all the information about all static eateries, so nothing would be optional (we should have consistent data for all of the static eateries)

Comment thread prisma/scraperTypes.ts Outdated
}>;
announcements?: string[];
diningItems?: Array<{
item: string;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reuse types here - the closer it is to the dining api schema, the less specialized code you have to write and the more you can reuse

Comment thread src/scheduler/scraperScheduler.ts Outdated
@@ -0,0 +1,51 @@
import cron from 'node-cron';
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scraper will be run in a separate container than the main app. Therefore, the scheduler part should be in the scraper itself, not in src/...
As of now, there is no way of running the scraper on a schedule (only once using npm run scrape)

Comment thread tsconfig.json
"sourceMap": true
},
"include": ["src/**/*"],
"include": ["src/**/*", "prisma/**/*"],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice find, it was really annoying to deal with all the error messages when working in the prisma directory

Copy link
Copy Markdown
Contributor

@skyeslattery skyeslattery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

Comment thread prisma/seed.ts
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we delete this? I used it/added to it while testing notis

Comment thread prisma/schema.prisma
name String
shortName String
about String
shortAbout String
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some more default values here as a backup or make more nullable? In case of scraper failures

@Aayush-Agnihotri Aayush-Agnihotri merged commit c3f0bb0 into main Nov 20, 2025
2 checks passed
@Aayush-Agnihotri Aayush-Agnihotri deleted the tony/scraper branch November 20, 2025 03:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants