Skip to content

Commit b0917cf

Browse files
committed
categories
1 parent 2ac68a5 commit b0917cf

10 files changed

Lines changed: 1283 additions & 324 deletions

File tree

packages/stats-db/README.md

Lines changed: 122 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,28 @@
1+
# stats-db
2+
3+
Database utilities for tracking npm package download statistics and GitHub repository metrics.
4+
5+
## Table of Contents
6+
7+
- [Database Schema Management](#database-schema-management)
8+
- [Prerequisites](#prerequisites)
9+
- [Start PostgreSQL (Docker)](#start-postgresql-docker)
10+
- [Set Environment Variables](#set-environment-variables)
11+
- [Bootstrap Database Users](#bootstrap-database-users)
12+
- [Deploy the Database Module](#deploy-the-database-module)
13+
- [Load the data from before!](#load-the-data-from-before)
14+
- [Running Commands](#running-commands)
15+
- [Command Options](#command-options)
16+
- [Understanding Fetch Modes](#understanding-fetch-modes)
17+
- [Managing Package Categories](#managing-package-categories)
18+
- [Category Configuration File](#category-configuration-file)
19+
- [Listing Uncategorized Packages](#listing-uncategorized-packages)
20+
- [Syncing Categories to Database](#syncing-categories-to-database)
21+
- [Workflow for Categorizing Packages](#workflow-for-categorizing-packages)
22+
- [Initial Setup Order](#initial-setup-order)
23+
24+
---
25+
126
## Database Schema Management
227

328
### Prerequisites
@@ -93,7 +118,7 @@ Now export `DATABASE_URL`:
93118
export DATABASE_URL=postgres://postgres:password@localhost:5432/stats_dev
94119
```
95120

96-
### Running Commands
121+
## Running Commands
97122

98123
- **Fetch Packages**: Fetch package data from npm.
99124

@@ -125,29 +150,29 @@ export DATABASE_URL=postgres://postgres:password@localhost:5432/stats_dev
125150
# Or using short flag: -b
126151
```
127152

128-
#### Command Options
153+
### Command Options
129154

130-
| Option | Short | Description |
131-
|--------|-------|-------------|
132-
| `--concurrent` | `-c` | Number of concurrent package downloads (default: 50) |
133-
| `--delay` | `-d` | Delay between requests in milliseconds (default: 200) |
134-
| `--chunk-size` | `-s` | Number of days per chunk (default: 30) |
135-
| `--backfill` | `-b` | Force scan ALL active packages for gaps |
155+
| Option | Short | Description |
156+
|--------|-------|-------------|
157+
| `--concurrent` | `-c` | Number of concurrent package downloads (default: 50) |
158+
| `--delay` | `-d` | Delay between requests in milliseconds (default: 200) |
159+
| `--chunk-size` | `-s` | Number of days per chunk (default: 30) |
160+
| `--backfill` | `-b` | Force scan ALL active packages for gaps |
136161

137-
#### Understanding Fetch Modes
162+
### Understanding Fetch Modes
138163

139-
**Normal mode** (default): Only processes packages where `last_fetched_date < TODAY`. This is efficient for daily updates but may miss gaps if a previous fetch was interrupted.
164+
**Normal mode** (default): Only processes packages where `last_fetched_date < TODAY`. This is efficient for daily updates but may miss gaps if a previous fetch was interrupted.
140165

141-
**Backfill mode** (`--backfill`): Scans ALL active packages regardless of `last_fetched_date`. For each package, it:
142-
1. Retrieves all existing download dates from the database
143-
2. Compares against the expected date range (creation date → today)
144-
3. Identifies and fetches only the missing dates (gaps)
145-
4. Updates `last_fetched_date` after successful completion
166+
**Backfill mode** (`--backfill`): Scans ALL active packages regardless of `last_fetched_date`. For each package, it:
167+
1. Retrieves all existing download dates from the database
168+
2. Compares against the expected date range (creation date → today)
169+
3. Identifies and fetches only the missing dates (gaps)
170+
4. Updates `last_fetched_date` after successful completion
146171

147-
Use backfill mode when:
148-
- You suspect there are gaps in historical data
149-
- A previous fetch was interrupted by rate limiting (429 errors)
150-
- You want to verify data completeness for all packages
172+
Use backfill mode when:
173+
- You suspect there are gaps in historical data
174+
- A previous fetch was interrupted by rate limiting (429 errors)
175+
- You want to verify data completeness for all packages
151176

152177

153178
- **Generate Report**: Generate a report based on the fetched data.
@@ -174,7 +199,84 @@ export DATABASE_URL=postgres://postgres:password@localhost:5432/stats_dev
174199
pnpm db:dump
175200
```
176201

177-
### Initial Setup Order
202+
## Managing Package Categories
203+
204+
Packages are organized into categories for reporting and statistics. The category definitions live in `src/config/categories.ts`.
205+
206+
### Category Configuration File
207+
208+
The `src/config/categories.ts` file contains:
209+
210+
- **`packages`**: An object mapping category names to arrays of package names
211+
- **`blacklistConfig`**: Namespaces and packages to exclude from tracking
212+
213+
```typescript
214+
// Example structure
215+
export const packages: Packages = {
216+
"cosmos-kit": [
217+
"cosmos-kit",
218+
"@cosmos-kit/core",
219+
"@cosmos-kit/react",
220+
// ...
221+
],
222+
telescope: [
223+
"@cosmology/telescope",
224+
"@osmonauts/telescope",
225+
// ...
226+
],
227+
// ... more categories
228+
};
229+
```
230+
231+
### Listing Uncategorized Packages
232+
233+
Packages that aren't assigned to a specific category end up in "misc". To see what needs categorization:
234+
235+
```sh
236+
pnpm npm:categories:list-misc
237+
```
238+
239+
This outputs:
240+
- A table of uncategorized packages sorted by download count
241+
- A copyable list format for adding to the config file
242+
243+
### Syncing Categories to Database
244+
245+
After editing `src/config/categories.ts`, sync the changes to the database:
246+
247+
```sh
248+
pnpm npm:categories:sync
249+
```
250+
251+
This will:
252+
1. Create any new categories that don't exist in the database
253+
2. Clear all existing package-category associations
254+
3. Re-apply categories based on the config file
255+
4. Assign any remaining packages to "misc"
256+
5. Apply the blacklist (deactivate blacklisted packages)
257+
258+
### Workflow for Categorizing Packages
259+
260+
1. **List uncategorized packages:**
261+
```sh
262+
pnpm npm:categories:list-misc
263+
```
264+
265+
2. **Edit the config file** (`src/config/categories.ts`):
266+
- Add packages to existing categories, or
267+
- Create new categories as needed
268+
269+
3. **Sync to database:**
270+
```sh
271+
pnpm npm:categories:sync
272+
```
273+
274+
4. **Regenerate reports** to reflect the changes:
275+
```sh
276+
pnpm npm:report && pnpm npm:readme
277+
```
278+
279+
## Initial Setup Order
178280

179281
To index from scratch, follow these steps in order:
180282

packages/stats-db/package.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@
3030
"npm:report": "ts-node ./src/tasks/npm/npm.tasks.ts generate:report",
3131
"npm:badges": "ts-node ./src/tasks/npm/npm.tasks.ts generate:badges",
3232
"npm:readme": "ts-node ./src/tasks/npm/npm.tasks.ts generate:readme",
33+
"npm:categories:list-misc": "ts-node ./src/tasks/npm/npm.tasks.ts categories:list-misc",
34+
"npm:categories:sync": "ts-node ./src/tasks/npm/npm.tasks.ts categories:sync",
3335
"gh:fetch": "ts-node src/tasks/github/github.tasks.ts fetch",
3436
"gh:report": "ts-node src/tasks/github/github.tasks.ts report",
3537
"gh:export": "ts-node src/tasks/github/github.tasks.ts export",

0 commit comments

Comments
 (0)