You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add robust CSV ingestion and full-text search support for prospects.
- Add app/api/prospects/process.py: new /prospects/process endpoint to batch-process big.csv and insert rows with a computed search_vector (to_tsvector) for full-text search.
- Update seed logic (app/api/prospects/seed.py): remove secondary email columns, create search_vector tsvector column, add GIN index, and insert rows with to_tsvector('english', ...) populated from concatenated text fields.
- Limit prospects listing (app/api/prospects/prospects.py) to 200 rows and update response meta to note the limit.
- Register the new process router in app/api/routes.py.
- Update README.md with documentation on the tsvector column, GIN index, the /prospects/process endpoint, and the recommended ingestion workflow.
These changes enable fast, scalable full-text search across all text fields and provide a dedicated endpoint for processing large CSV datasets using batch inserts.
The prospects table includes a `search_vector` column (type: tsvector) that is automatically computed from all text fields on insert. A GIN index is created for this column, enabling fast and scalable full-text search queries.
33
+
34
+
**How it works:**
35
+
- On every insert (via `/prospects/seed` or `/prospects/process`), the `search_vector` is computed from all text columns using PostgreSQL's `to_tsvector('english', ...)`.
36
+
- The GIN index (`idx_prospects_search_vector`) allows efficient search queries like:
37
+
38
+
```sql
39
+
SELECT*FROM prospects WHERE search_vector @@ plainto_tsquery('english', 'search terms');
40
+
```
41
+
42
+
This makes searching across all text fields in the prospects table extremely fast, even for large datasets.
30
43
-**FastAPI** — RESTful API framework
31
44
-**Uvicorn** — ASGI server
32
45
-**Pytest** — testing framework
@@ -60,5 +73,24 @@ requirements.txt
60
73
| GET |`/`| Welcome message |
61
74
| GET |`/health`| Health check — returns `ok`|
62
75
| POST |`/echo`| Echoes the JSON `message` field |
76
+
| GET |`/prospects/seed`| (Re)create prospects table and seed with sample data |
77
+
| DELETE |`/prospects/process`| (Legacy) Empties the prospects table |
78
+
| GET |`/prospects/process`| Process and insert all records from big.csv into prospects table |
79
+
80
+
### Processing Large CSV Files
81
+
82
+
The `/prospects/process` endpoint is designed for robust, scalable ingestion of large CSV files (e.g., 1300+ rows, 300KB+). It follows the same normalization and insertion pattern as `/prospects/seed`, but is optimized for large files:
83
+
84
+
85
+
#### Example usage
86
+
87
+
1. Seed the table structure:
88
+
-`GET /prospects/seed`
89
+
2. (Optional) Empty the table:
90
+
-`DELETE /prospects/empty`
91
+
3. Process the large CSV:
92
+
-`GET /prospects/process`
93
+
94
+
The endpoint will return the number of records inserted. This is the core ingestion workflow for production-scale data.
0 commit comments