A decade of congressional tweets analyzed to build a data-driven lobbying targeting system.
1,243,370 tweets · 548 members · 2008–2017
This capstone project analyzes a decade of congressional Twitter activity to build a data-driven lobbying targeting system. Twitter stores no political metadata — party, chamber, and state are all missing. We solved this through a three-step enrichment join against the @unitedstates legislators database, recovering metadata for 74.8% of members.
Note: The raw dataset is not included due to GitHub's file size limit. See Data below.
git clone https://github.com/username/congressional-twitter-intelligence.git
cd congressional-twitter-intelligence
pip install -r requirements.txt
jupyter notebook notebooks/M3.ipynb| Tool | Purpose |
|---|---|
| SQLite | Primary database and storage |
| Python / pandas | Data wrangling and analysis |
| scikit-learn | TF-IDF vectorization, OLS regression |
| TextBlob | Sentiment scoring |
| matplotlib | Data visualization |
| scipy | Pearson correlation tests |
repo/
├── figures/ # All saved charts & plots (Fig 1–9 + supplementary)
├── data/
│ ├── export/ # CSV exports for Tableau
│ └── US_PoliticalTweets.tar.gz # Raw dataset (not tracked — see Data section)
├── notebooks/
│ ├── M3.ipynb # TF-IDF, regression, custom metrics
│ ├── US_Political_Tweet_War_M2.ipynb # Descriptive statistics
│ ├── Project_Proposal.ipynb # Project proposal
│ └── Project_Proposal.pdf # Proposal PDF export
├── presentation/
│ └── Lobbyists4America.html # Final HTML presentation
├── sql/
│ └── SQL-Query.ipynb # SQL queries notebook
├── .gitignore
└── README.md
The raw dataset (US_PoliticalTweets.tar.gz, 229MB) is not included in this repo due to GitHub's file size limit.
Download it from: [link to original source or Google Drive]
Once downloaded, place it in the data/ folder and run notebooks/M3.ipynb from the top.
- Dataset covers 2008–2017 only — Twitter's 280-char limit, follower growth, and political intensity all postdate the archive
- 24.4% of tweets could not be matched to a party (Independent members, data gaps)
- Sentiment scored via TextBlob on a 20K random sample — not full corpus
- LLS and BWS scores should be re-validated with updated data before operational use







