Azerbaijan IT job market intelligence platform. Aggregates vacancies from JobSearch.az and Glorri.com, deduplicates them, extracts tech stacks, and serves a web UI + Telegram bot.
/
├── backend/ Python scraper + scheduler + Telegram bot
├── frontend/ Next.js 15 web app (App Router)
├── database/ Supabase SQL schema
└── AGENTS.md
Entry point: main.py — starts interval scraper, daily Telegram digest scheduler, and Telegram bot polling in background threads.
Deployment: nixpacks (nixpacks.toml) → python3 main.py. Targets Railway/Render.
| Path | Role |
|---|---|
app/config.py |
Reads all env vars into a Settings dataclass. Fails fast if FRONTEND_BASE_URL is missing. |
app/controllers/scrape_controller.py |
ScrapeController.run_cycle() — orchestrates one full scrape: JobSearch → Glorri → deduplication. |
app/services/jobsearch_service.py |
Scrapes jobsearch.az (category 1076 = IT). 4 workers for listing, 40 for details. Strips dangerous HTML. |
app/services/glorri_service.py |
Scrapes jobs.glorri.com. Rate-limit: 429 → retry 6× with 30 s wait. 16 concurrent detail workers. |
app/services/duplicate_detection_service.py |
ML-style similarity scoring. Final threshold: ≥ 0.72. Writes matches to duplicate_jobs table. |
app/services/technology_service.py |
Regex word-boundary matching against 90+ tech keywords. See app/models/technologies.py for the dictionary. |
app/services/telegram_service.py |
Sends N latest jobs digest to a Telegram channel. |
app/services/telegram_bot_service.py |
Interactive user bot (multilingual: AZ/EN/RU). Users pick tech preferences; daily digest filtered by their stack. |
app/repositories/supabase_repository.py |
All Supabase REST calls. Batch inserts chunked in groups of 80–500 to avoid URL-length limits. Uses SUPABASE_SERVICE_KEY. |
app/scheduler/interval_scheduler.py |
Calls task() every N seconds (default 600). |
app/scheduler/daily_scheduler.py |
Calls task() once per day at a configured time with timezone awareness. |
app/models/technologies.py |
TECHNOLOGIES dict mapping canonical name → keyword variants. Edit here to add/remove tracked techs. |
| Variable | Required | Default | Purpose |
|---|---|---|---|
SUPABASE_URL |
✅ | — | Supabase project URL |
SUPABASE_SERVICE_KEY |
✅ | — | Service-role key (write access) |
FRONTEND_BASE_URL |
✅ | — | Used in vacancy links (e.g. Telegram messages) |
SCRAPE_TIME_INTERVAL |
600 | Seconds between scrape cycles | |
TELEGRAM_BOT_TOKEN |
— | Bot API token | |
TELEGRAM_CHANNEL_ID |
— | Channel for digest | |
TELEGRAM_THREAD_ID |
None | Message thread (optional) | |
TELEGRAM_BOT_USERNAME |
— | @botname |
|
TELEGRAM_DIGEST_TIME |
09:00 | HH:MM daily digest time | |
TELEGRAM_TIMEZONE |
Asia/Baku | Timezone for scheduling | |
TELEGRAM_DIGEST_LIMIT |
5 | Max jobs per channel digest |
Framework: Next.js 15, App Router, TypeScript, React 19, Tailwind CSS v4.
Fonts: Manrope (primary), IBM Plex Mono (monospace).
Image domains allowed (next.config.ts): jobsearch.az, jobs.glorri.com, glorri.s3.amazonaws.com.
| Route | File | Notes |
|---|---|---|
/ |
app/page.tsx → LandingPage.tsx |
Server fetches data, passes to client component |
/auth |
app/auth/page.tsx |
Redirects to /auth/login |
/auth/login |
app/auth/login/page.tsx |
Email + password, uses loginAction |
/auth/register |
app/auth/register/page.tsx |
Collects first_name, last_name, username, email, password |
/profile |
app/profile/page.tsx |
Protected. Shows ProfileForm. |
/admin |
app/admin/page.tsx |
Protected (is_admin flag). Analytics dashboard. |
/admin/export |
app/admin/export/page.tsx |
CSV export preview |
/admin/export/download |
app/admin/export/download/route.ts |
Streams CSV download |
/admin/user |
app/admin/user/page.tsx |
User management |
/vacancies/[source]/[slug] |
app/vacancies/[source]/[slug]/page.tsx |
Vacancy detail. Increments visit counter. |
GET /track/click |
app/track/click/route.ts |
Logs click, redirects to external URL |
| File | Purpose |
|---|---|
lib/landing-data.ts |
Server: fetches js_vacancies + glorri_vacancies + duplicate_jobs, merges, returns LandingData |
lib/vacancy-detail-data.ts |
Server: fetches single vacancy by source + slug |
lib/vacancy-visit-tracker.ts |
Server: calls Supabase RPC increment_vacancy_visit(). Deduplicates per IP per day. |
lib/site-config.ts |
Single source for branding (SITE_NAME, SITE_URL, CREATOR_NAME, etc.) from env |
lib/supabase/server.ts |
Cookie-based server client (SSR) |
lib/supabase/client.ts |
Browser client (public anon key) |
lib/supabase/admin.ts |
Admin client with service key (used for RPC, privileged writes) |
lib/admin/analytics.ts |
Builds AdminAnalyticsData from vacancy_visits + vacancy_clicks |
lib/admin/access.ts |
getCurrentUserAccess() → checks profiles.is_admin |
lib/admin/jobs-export.ts |
CSV generation |
lib/admin/user-management.ts |
User CRUD |
| Component | Purpose |
|---|---|
components/landing/FiltersSidebar.tsx |
Tech/company/salary/source filters. Mobile-responsive sidebar. |
components/landing/JobsSection.tsx |
Job cards grid, pagination, sort, page-size selector |
components/landing/LandingTopBar.tsx |
Logo, theme toggle, auth links |
components/landing/Chart.tsx |
SVG line chart for admin trend data |
components/landing/SourceBreakdown.tsx |
JobSearch / Glorri / overlap stats |
components/landing/StatsCards.tsx |
Key metric cards |
components/admin/AdminSectionNav.tsx |
Admin section navigation |
components/admin/CSVPreviewTable.tsx |
Admin export table preview |
components/shared/SiteHeader.tsx |
App-wide header with auth state + theme toggle |
components/shared/Footer.tsx |
Links to repo, license, creator |
components/shared/CustomDatePicker.tsx |
Date input for analytics filters |
| Variable | Purpose |
|---|---|
NEXT_PUBLIC_SUPABASE_URL |
Supabase project URL (browser-safe) |
NEXT_PUBLIC_SUPABASE_ANON_KEY |
Anon key (browser-safe) |
SUPABASE_SERVICE_ROLE_KEY |
Service key (server-only, never expose to client) |
NEXT_PUBLIC_SITE_NAME |
Branding (e.g. "Vakanso") |
NEXT_PUBLIC_SITE_DESCRIPTION |
Meta description |
NEXT_PUBLIC_SITE_URL |
Canonical URL |
Managed via Supabase. Fresh setup uses one SQL file:
| File | Creates |
|---|---|
001_initial_schema.sql |
Full schema: scraper tables, profiles, favorites, notifications, translations, proxies, app settings, Telegram user settings, RLS policies, indexes, and RPC functions |
| Table | Key columns |
|---|---|
js_vacancies |
id, title, slug, company_id, salary, deadline_at, text, tech_stack (JSONB) |
js_companies |
id, title, logo, logo_mini |
glorri_vacancies |
id, title, slug, postedDate, text, vacancy_about (JSONB), benefits (JSONB), tech_stack (JSONB), company_id |
glorri_companies |
id, slug, name, logo |
duplicate_jobs |
glorri_id (PK), jobsearch_id, score, matched_at |
profiles |
id, username, first_name, last_name, is_admin, tech_stack |
user_favorite_vacancies |
user_id, source, vacancy_id |
user_favorite_companies |
user_id, source, company_id |
notifications |
user_id, type, title, metadata, is_read |
vacancy_translations |
source, vacancy_id, lang, title, text |
company_translations |
source, company_id, lang, name, description |
proxies |
url, is_active, last_used_at, fail_count |
app_settings |
key, value |
telegram_user_settings |
chat_id, language_code, selected_technologies, onboarding_step, active |
RLS is table-specific: public vacancy/source tables are public read, user-owned tables restrict access to the authenticated owner, and internal/admin tables are service-role only.
vacancy_visits: MD5 hash (via Supabase RPC)vacancy_clicks: SHA-256 hash (via Next.js route handler)
- This is a monorepo. Backend and frontend are completely independent — do not mix their dependencies.
- Never expose
SUPABASE_SERVICE_ROLE_KEYorSUPABASE_SERVICE_KEYin client-side code orNEXT_PUBLIC_*env vars. - Branding name is Vakanso; the repo/platform name is JobScan.
- All Supabase access goes through
SupabaseRepository. Do not call the Supabase REST API directly elsewhere. - When adding a new technology to track, add it only to
app/models/technologies.py→TECHNOLOGIESdict. - Scraper services must handle pagination fully — never assume a single page.
- Rate-limit handling pattern: catch HTTP 429, sleep 30 s, retry up to 6 times.
- HTML sanitization is required before storing job text (allowlist of safe tags only).
- Use
lib/supabase/server.tsfor any server component or server action Supabase access. - Use
lib/supabase/admin.tsonly for privileged operations (RPC calls, service-role writes). - Use
lib/supabase/client.tsonly in client components. - Admin-only pages must call
getCurrentUserAccess()andredirect('/')if not admin. - Vacancy clicks must go through
/track/click?target=...— never link directly to external vacancy URLs. - Site-wide branding must come from
lib/site-config.ts, not hardcoded. - Theme (dark/light) is stored in
localStorageand applied via inline script inlayout.tsxto prevent FOUT. - Do not add new image domains to
next.config.tswithout also handling potential misuse.
- Homepage data path:
lib/landing-data.ts(server) →app/page.tsx(server component) →LandingPage.tsx(client component). - Vacancy detail path:
lib/vacancy-detail-data.ts→app/vacancies/[source]/[slug]/page.tsx. - Analytics path:
lib/admin/analytics.ts→app/admin/page.tsx.
cd backend
pip install -r requirements.txt
python3 main.py
# Optional flags:
# --send-telegram-digest-now Send channel digest immediately and exit
# --preview-telegram-digest Print digest without sendingcd frontend
npm install
npm run dev # Development server
npm run build # Production build
npm run start # Start production servercd frontend
npm run lint