Transform tars from a simple link manager into a personal search engine powered by PostgreSQL full-text search (pg_textsearch) on TigerData.com's cloud infrastructure.
- PostgreSQL driver (
psycopg[binary]) -
tars db init- creates schema with BM25 search index -
tars db migrate- imports CSV links to database -
tars db status- shows connection info and stats -
tars search <query>- BM25 full-text search -
tars add/list/removeupdated to use database when configured
- Playwright for headless browser crawling
- Content extraction (title, meta description, body text)
-
tars crawl <url>- crawl specific URL -
tars crawl --all- re-crawl all links -
tars crawl --missing- crawl never-crawled links (default) -
tars crawl --old N- crawl stale links (N days old) - Auto-update search index on crawl (via generated column)
- HTTP status and crawl error tracking
-
--tagsflag onaddandupdatecommands -
--notesflag onaddandupdatecommands -
--tagfilter onsearchcommand - Tag management:
tars tags list,tars tags rename, etc.
- Extended CSV with
sync_statuscolumn -
tars synccommand (bidirectional) -
tars sync --push/--pull(one-way) - Conflict resolution (remote wins default)
- Graceful offline fallback
- User/auth model
- Tenant isolation
- API layer
- CLI tool with database-backed storage (TigerData PostgreSQL)
- Commands:
add,list,remove,update,clean-list,search,crawl,db - Data: URL, title, description, content, timestamps stored in PostgreSQL with BM25 index
- Local CSV still works as fallback when DATABASE_URL not set
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐
│ tars CLI │────▶│ TigerData Cloud │────▶│ pg_textsearch │
│ │ │ (PostgreSQL) │ │ (full-text index) │
└─────────────────┘ └──────────────────┘ └─────────────────────┘
│
▼
┌─────────────────┐
│ links.csv │ (local offline/pending queue)
│ (local cache) │
└─────────────────┘
| Column | Type | Description | Status |
|---|---|---|---|
| id | UUID | Primary key | ✅ |
| url | TEXT | Unique URL | ✅ |
| title | TEXT | Page title (crawled) | ✅ |
| description | TEXT | Meta description (crawled) | ✅ |
| content | TEXT | Extracted page text (crawled) | ✅ |
| notes | TEXT | User-provided notes/description | schema ✅, CLI 🔲 |
| tags | TEXT[] | User-assigned tags for filtering | schema ✅, CLI 🔲 |
| added_at | TIMESTAMPTZ | When link was added | ✅ |
| updated_at | TIMESTAMPTZ | Last metadata update | ✅ |
| crawled_at | TIMESTAMPTZ | Last successful crawl (NULL if never) | ✅ |
| http_status | INTEGER | HTTP response code from crawl | ✅ |
| crawl_error | TEXT | Error message if crawl failed | ✅ |
| search_text | TEXT | Generated column for BM25 search | ✅ |
-- Generated column combining all searchable text with URL tokenization
search_text = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(
COALESCE(url, '') || ' ' ||
COALESCE(title, '') || ' ' ||
COALESCE(description, '') || ' ' ||
COALESCE(content, '') || ' ' ||
COALESCE(notes, ''),
'.', ' '), '/', ' '), '-', ' '), '_', ' '), ':', ' '), '//', ' ')
-- BM25 index for full-text search
CREATE INDEX links_search_bm25_idx ON links USING bm25(search_text);Note: Uses pg_textsearch BM25 instead of native TSVECTOR for better ranking.
# Core CRUD
tars add <url> # Add link (local + remote)
tars list # List all stored links
tars remove <id|url> # Remove by index or URL
tars update <url> # Update timestamp (CSV only)
tars clean-list # Remove duplicates (CSV only)
# Search
tars search <query> # BM25 full-text search
tars search <query> -n 20 # Limit results
# Crawling
tars crawl <url> # Crawl specific URL
tars crawl --all # Crawl all links
tars crawl --missing # Crawl never-crawled (default)
tars crawl --old N # Crawl stale links (N days)
# Database
tars db init # Initialize schema
tars db migrate # Import CSV to database
tars db status # Show connection info# Phase 3: Metadata
tars add <url> --tags tag1,tag2 # Add with tags
tars add <url> --notes "..." # Add with notes
tars search <query> --tag <tag> # Search with tag filter
# Phase 4: Sync
tars sync # Bidirectional sync
tars sync --push # Push local to remote
tars sync --pull # Pull remote to localDATABASE_URL=postgres://user:pass@host.tigerdata.com:5432/dbname
# Or individual variables:
PGHOST=host.tigerdata.com
PGPORT=5432
PGDATABASE=dbname
PGUSER=user
PGPASSWORD=passThe links.csv file serves as:
- Offline storage when database is unavailable
- Pending queue for changes not yet synced
- Backup/export format
Extended CSV columns:
url,title,notes,tags,added_at,updated_at,sync_status
Where sync_status is: synced, pending_add, pending_update, pending_delete
- Add
psycopgorasyncpgdependency - Implement
tars db init- create schema with pg_textsearch - Implement
tars db migrate- migrate existing CSV links to database - Implement
tars search <query>- basic full-text search - Update
tars add/list/removeto use database
Deliverable: Working search on existing links via TigerData
- Add HTTP client dependency (
httpxorrequests) - Add HTML parser (
beautifulsoup4orselectolux) - Implement content extraction (title, meta, body text)
- Implement
tars crawlcommands - Auto-update search_vector on crawl
Deliverable: Links are crawlable and searchable by content
- Add tags and notes support to CLI
- Update search to filter by tags
- Implement tag management commands (
tars tags list, etc.)
Deliverable: Rich metadata and filtered search
- Extend CSV format with sync_status
- Implement
tars syncworkflow - Handle conflict resolution (remote wins by default)
- Graceful offline fallback
Deliverable: Works offline, syncs when connected
- Add user/auth model
- Tenant isolation in database
- API layer for non-CLI clients
[project]
dependencies = [
"rich>=13.0.0",
"psycopg[binary]>=3.0", # PostgreSQL driver
"httpx>=0.27", # HTTP client for crawling
"selectolux>=0.3", # Fast HTML parser
# or "beautifulsoup4>=4.12" # Alternative parser
]- Ensure
pg_textsearchextension is enabled (TigerData includes this) - Run
tars db initto create schema - Run
tars db migrateto import existing links
# Basic search
tars search python tutorial
# → Matches links containing "python" and "tutorial"
# Phrase search
tars search "machine learning"
# → Matches exact phrase
# Boolean operators (pg_textsearch native)
tars search "python | rust"
# → Matches either term
# Negation
tars search "python -django"
# → Python but not Django
# With tag filter
tars search react --tag frontend
# → Search "react" only in links tagged "frontend"- Search returns relevant results in <100ms
- Crawling extracts meaningful content from 90%+ of URLs
- Sync handles offline/online transitions gracefully
- CLI remains fast and responsive
- Rate limiting for crawling? (respect robots.txt?)
- Content size limits? (truncate very long pages?)
- Handling non-HTML content (PDFs, images)?
- Search result snippet generation?