An AI-powered equity research platform that leverages specialized LLM modules and real-time data to generate comprehensive financial analysis reports.
The Modular Equity Research System is a prototype application that automates financial equity research using a coordinated multi-module architecture. It combines the power of OpenAI's GPT-3.5-turbo, SERPER API for google search, and RAG (Retrieval-Augmented Generation) to deliver accurate, well-sourced financial analysis reports with confidence scoring.
- π Intelligent Query Analysis: Automatically extracts company information, research intent, and relevant topics from natural language queries
- π Dynamic Source Discovery: Uses SERPER API to find the most relevant, up-to-date financial articles and reports
- β Automated Validation: Evaluates source credibility and calculates confidence scores for research findings
- π RAG-Powered Synthesis: Generates comprehensive reports using vector-based semantic search and LLM synthesis
- π Real-time Activity Logging: Transparent pipeline execution with detailed activity tracking
The system follows a modular pipeline architecture where specialized components work in coordination to process research queries:
- Streamlit (1.39.0) - Web application framework
- Python (3.11+) - Primary programming language
- LangChain (0.3.7) - LLM application framework
- LangChain-OpenAI (0.2.9) - OpenAI integration
- OpenAI GPT-3.5-turbo - Language model for analysis and synthesis
- FAISS (1.9.0) - Vector similarity search
- OpenAI Embeddings - Text vectorization
- Serper API - Real-time web search
- LangChain Document Loaders - Web scraping and parsing
- BeautifulSoup4 - HTML parsing
- python-dotenv - Environment variable management
- Requests - HTTP client
modular-equity-research-system/
β
βββ modules/ # Core processing modules
β βββ __init__.py
β βββ query_analyzer.py # Query analysis & structuring
β βββ research_module.py # Source discovery & loading
β βββ validation_module.py # Source validation & scoring
β βββ synthesis_module.py # Report generation (RAG)
β
βββ utils/ # Utility functions
β βββ __init__.py
β βββ embeddings.py # Vector store management
β βββ logger.py # Activity logging
β
βββ config.py # Configuration settings
βββ app.py # Streamlit web application
βββ requirements.txt # Python dependencies
βββ .env # Environment variables (gitignored)
βββ .gitignore # Git ignore rules
βββ README.md # This file
Processes natural language queries to extract structured information:
- Company name and stock ticker
- Research intent (earnings, valuation, competition, etc.)
- Key topics to investigate
- Time frame of interest
- Generated search queries for discovery
Input: Natural language query
Output: Structured JSON with company info and search strategies
Discovers and loads relevant financial content:
- Uses SERPER API for dynamic source discovery
- Filters for trusted financial domains (Reuters, Bloomberg, CNBC, etc.)
- Loads and processes document content
- Falls back to curated sources if API unavailable
Input: Query analysis with search queries
Output: List of Document objects with source URLs and content
Evaluates source quality and calculates confidence:
- Assigns credibility scores (0-100) to each source
- Checks against trusted financial domain list
- Calculates weighted overall confidence score
- Generates validation notes and quality indicators
Input: Document list
Output: Validation report with scores and trust indicators
Generates comprehensive reports using RAG:
- Creates FAISS vector store from documents
- Performs semantic similarity search for relevant context
- Uses GPT-3.5-turbo to generate structured reports
- Includes full URL citations in markdown format
- Extracts and formats source metadata
Input: Documents, validation report, query
Output: Comprehensive research report with citations
- Python 3.11+ install on your system
- OpenAI API key (GPT-3.5-turbo access)
- Serper API key (for Google search - 100 free searches/month)
- Git (for cloning the repository)
git clone https://github.com/neelagarwal98/modular-equity-research-system.git# Create virtual environment
python3 -m venv venv
# Activate virtual environment
# On macOS/Linux:
source venv/bin/activate
# On Windows:
venv\Scripts\activatepip install --upgrade pip
pip install -r requirements.txtCreate a .env file in the project root:
# Create .env file
touch .envAdd your API keys to .env:
OPENAI_API_KEY=sk-your-openai-api-key-here
SERPER_API_KEY=your-serper-api-key-hereWhere to get API keys:
- OpenAI: https://platform.openai.com/api-keys
- Serper: https://serper.dev (100 free searches/month)
streamlit run app.pyThe application will open in your browser at http://localhost:8501
Try these sample queries to test the system:
Basic Company Analysis:
Analyze Apple's recent financial performance
Competitive Analysis:
Compare Tesla vs Rivian in the EV market
Earnings Research:
What were NVIDIA's Q3 2024 earnings results?
Investment Outlook:
Is Microsoft a good investment right now?
Sector Analysis:
How is the semiconductor industry performing?
- Enter Query: Type your research question in the text area
- Select Mode:
- Autonomous: AI finds sources automatically (recommended)
- Manual: Provide your own URLs for analysis
- Start Research: Click the "Start Research" button
- Monitor Progress: Watch the activity log in the sidebar
- Review Report: Scroll down to see the generated report with:
- Executive summary
- Key findings with citations
- Detailed analysis
- Source list with quality indicators
- Validation notes
# Model Configuration
LLM_MODEL = "gpt-3.5-turbo" # LLM model to use
LLM_TEMPERATURE = 0.3 # Lower for more factual
MAX_TOKENS = 1000 # Max tokens per response
# Research Parameters
MAX_SOURCES = 5 # Number of sources to analyze
SERP_SEARCH_LIMIT = 7 # Number of Google searches
SERP_RESULTS_PER_QUERY = 3 # Results per search
# Confidence Thresholds
MIN_CONFIDENCE_SCORE = 0.6 # Minimum acceptable confidence
HIGH_CONFIDENCE_THRESHOLD = 0.8 # High confidence thresholdAdjust Source Quality:
Modify PRIORITY_DOMAINS in config.py to add/remove trusted sources:
PRIORITY_DOMAINS = [
"reuters.com",
"bloomberg.com",
"wsj.com",
# can add as per trusted priority
]Increase Source Count: For more comprehensive research:
MAX_SOURCES = 10 # Analyze more sources## Research Report
**Company:** NVIDIA Corporation
**Ticker:** NVDA
**Confidence:** 85.3%
**Sources:** 5
### Executive Summary:
[2-3 sentence overview with key takeaways]
### Key Findings:
β’ Finding 1 with data [Source: full-url.com](full-url.com)
β’ Finding 2 with metrics [Source: another-url.com](another-url.com)
β’ Finding 3 with analysis [Source: third-url.com](third-url.com)
### Detailed Analysis:
[2-3 paragraphs with in-depth analysis and data]
### Important Considerations:
β’ Risk factor 1
β’ Limitation 1
β’ Market dynamic 1
### Sources:
1. [reuters.com: nvidia q3 earnings](url) π’ High Quality (95/100)
2. [cnbc.com: nvidia ai demand](url) π’ High Quality (90/100)
### Validation Notes:
β
5 source(s) from trusted financial sites
β 4 high-quality source(s) found- Tracks all module operations in sidebar
- Shows complete research pipeline execution
- Color-coded status indicators (β
β οΈ β) - Expandable details for debugging
- Trusted domain checking
- Content quality analysis
- Weighted confidence calculations
- FAISS vector similarity search
- Semantic context retrieval
- GPT-3.5-turbo for coherent synthesis
- Inline URL citations in markdown format
- Source-level credibility scores
- Overall weighted confidence
- Trust ratio calculations
- Visual confidence indicators
- Add support for PDF report export
- Implement query caching
- Add support for more LLM providers (Anthropic, Cohere)
- Enhance source validation with ML models
- Add historical data tracking
- Implement user authentication
- Add support for batch processing
This project is licensed under the MIT License - Neel Agarwal.
- LangChain - For LLM application framework
- Streamlit - For the intuitive web app framework
- OpenAI - For GPT-3.5-turbo API
- Serper.dev - For Google search API access
- FAISS - For efficient vector similarity search
- Email: neelagarwal98@gmail.com
- PDF export functionality
- Enhanced error handling
- Query result caching
- Improved mobile responsiveness
- Support for multiple LLM providers
- Advanced validation with ML
- Historical tracking dashboard
- Batch processing mode
- User authentication system
- Collaborative research features
- API endpoint for programmatic access
- Multi-language support
If you find this project helpful, please consider giving it a star! β
Built using Python, LangChain, OpenAI GPT 3.5 turbo, SERPER API, FAISS, RAG and Streamlit

