This document describes the complete LLM & NLP extension module for the trading system, including all components from data collection to strategy generation.
The LLM & NLP extension module provides a comprehensive pipeline for:
- Data Collection: News and social media data gathering
- NLP Processing: Text preprocessing, sentiment analysis, topic modeling
- Sentiment Factor Generation: Converting sentiment data into quantitative factors
- LLM Integration: Intelligent analysis and strategy generation
- LangChain Agent: Automated trading recommendations and reports
Data Sources → NLP Processing → Sentiment Analysis → Factor Generation → Strategy Engine
↓ ↓ ↓ ↓ ↓
News APIs Text Cleaning Sentiment Scores Sentiment Factors Trading Signals
Social APIs Tokenization Topic Modeling Momentum/Variance Risk Assessment
RSS Feeds Entity Extraction Keyword Extraction Consensus Metrics Portfolio Optimization
Features:
- Text preprocessing and cleaning
- Tokenization (spaCy, NLTK)
- Sentiment analysis (multiple models)
- Keyword extraction
- Topic modeling
- Language detection
- Financial entity extraction
Usage:
from data_service.ai import NLPProcessor
nlp = NLPProcessor()
# Process single text
processed = nlp.preprocess_text("Apple's earnings exceeded expectations")
print(f"Sentiment: {processed.sentiment_label} ({processed.sentiment_score:.3f})")
# Batch processing
texts = ["Text 1", "Text 2", "Text 3"]
results = nlp.analyze_sentiment_batch(texts)
# Market sentiment
market_sentiment = nlp.calculate_market_sentiment(results)Features:
- Sentiment score calculation (weighted by recency and confidence)
- Sentiment momentum and volatility
- News and social media volume analysis
- Sentiment consensus measurement
- Trading signal generation
Usage:
from data_service.ai import SentimentFactorCalculator
calculator = SentimentFactorCalculator()
# Calculate factors for a symbol
factor = calculator.calculate_sentiment_factors(sentiment_data, 'AAPL')
print(f"Sentiment Score: {factor.sentiment_score:.3f}")
print(f"Momentum: {factor.sentiment_momentum:.3f}")
# Generate trading signal
signal = calculator.create_sentiment_signal(factor)
print(f"Signal: {signal['signal']} (confidence: {signal['confidence']:.3f})")Features:
- Multiple LLM provider support (OpenAI, local models)
- Market data analysis
- Trading signal generation
- Risk assessment
- Portfolio optimization
- Trading Q&A
Usage:
from data_service.ai import LLMIntegration
llm = LLMIntegration(provider="openai", api_key="your-key")
# Market analysis
insight = llm.analyze_market_data(market_data, ['AAPL', 'GOOGL'])
# Signal generation
signal = llm.generate_trading_signals(factor_data, price_data)
# Risk assessment
risk = llm.assess_risk(portfolio_data, market_conditions)
# Q&A
response = llm.answer_trading_question("What are momentum trading strategies?")Features:
- Intelligent strategy recommendation
- Market intelligence analysis
- Automated report generation
- Multi-tool integration
- Chain-of-thought reasoning
Usage:
from data_service.ai import LangChainAgent
agent = LangChainAgent(llm_integration, nlp_processor)
# Strategy recommendation
strategy = agent.generate_strategy_recommendation(
market_data, sentiment_data, portfolio_data, symbols
)
# Market analysis
analysis = agent.analyze_market_intelligence(news_data, social_data, market_data)
# Automated report
report = agent.generate_automated_report(strategies, analysis, metrics)from data_service.ai import *
# 1. Initialize components
nlp_processor = NLPProcessor()
factor_calculator = SentimentFactorCalculator()
llm_integration = LLMIntegration(provider="openai", api_key="your-key")
agent = LangChainAgent(llm_integration, nlp_processor)
# 2. Data collection (from existing modules)
news_processor = NewsProcessor()
social_monitor = SocialMediaMonitor()
news_data = news_processor.fetch_all_news(['AAPL', 'GOOGL'], days_back=7)
social_data = social_monitor.fetch_all_social_posts(['AAPL', 'GOOGL'])
# 3. NLP processing
processed_texts = []
for item in news_data + social_data:
text = item.title + " " + item.content if hasattr(item, 'title') else item.text
processed = nlp_processor.preprocess_text(text)
processed_texts.append(processed)
# 4. Sentiment factor generation
sentiment_data = []
for processed in processed_texts:
sentiment_data.append({
'symbol': processed.metadata.get('symbol', 'GENERAL'),
'sentiment_score': processed.sentiment_score,
'confidence': processed.confidence,
'source': processed.metadata.get('source', 'unknown'),
'timestamp': processed.timestamp
})
sentiment_df = pd.DataFrame(sentiment_data)
factors = factor_calculator.calculate_sentiment_factor_matrix(sentiment_df, ['AAPL', 'GOOGL'])
# 5. Strategy generation
strategy = agent.generate_strategy_recommendation(
market_data, sentiment_df, portfolio_data, ['AAPL', 'GOOGL']
)
# 6. Generate report
report = agent.generate_automated_report([strategy], market_analysis, performance_metrics)pip install -e .pip install -e .[ai]pip install langchain langchain-openai langchain-community
pip install spacy transformers nltk
python -m spacy download en_core_web_sm# OpenAI Configuration
export OPENAI_API_KEY="your-openai-api-key"
export OPENAI_MODEL="gpt-3.5-turbo"
# NLP Configuration
export SPACY_MODEL="en_core_web_sm"
export TRANSFORMERS_CACHE="/path/to/cache"{
"nlp": {
"use_spacy": true,
"use_transformers": true,
"language": "en",
"max_text_length": 1000
},
"sentiment": {
"lookback_period": 20,
"confidence_threshold": 0.7,
"momentum_window": 5
},
"llm": {
"provider": "openai",
"model": "gpt-3.5-turbo",
"temperature": 0.3,
"max_tokens": 1000
},
"agent": {
"enable_tools": true,
"memory_size": 1000,
"max_iterations": 10
}
}-
Data Collection
- News API integration (Alpha Vantage, NewsAPI)
- Social media monitoring (Twitter, Reddit)
- RSS feed processing
- Data cleaning and deduplication
-
NLP Processing
- Text preprocessing and normalization
- Tokenization (spaCy, NLTK)
- Sentiment analysis (transformers, keyword-based)
- Keyword extraction
- Topic modeling
- Language detection
- Financial entity extraction
-
Sentiment Factor Generation
- Weighted sentiment scoring
- Sentiment momentum calculation
- Sentiment volatility measurement
- News/social volume analysis
- Sentiment consensus metrics
- Trading signal generation
-
LLM Integration
- Multiple provider support (OpenAI, local models)
- Market data analysis
- Trading signal generation
- Risk assessment
- Portfolio optimization
- Trading Q&A
-
LangChain Agent
- Strategy recommendation generation
- Market intelligence analysis
- Automated report generation
- Multi-tool integration
- Chain-of-thought reasoning
-
Factor Integration
- Sentiment factors as quantitative inputs
- Multi-factor model support
- Factor backtesting integration
- Risk management integration
-
Advanced NLP
- BERTopic for advanced topic modeling
- Named entity recognition for companies
- Event extraction and classification
- Multi-language support
-
Advanced LLM
- Function calling for structured outputs
- Memory and conversation management
- Tool integration for data analysis
- Custom prompt engineering
-
Real-time Processing
- Streaming data processing
- Real-time sentiment monitoring
- Live trading signal generation
- WebSocket integration
from data_service.ai import NLPProcessor, SentimentFactorCalculator
# Initialize components
nlp = NLPProcessor()
calculator = SentimentFactorCalculator()
# Process news articles
texts = [
"Apple reports strong quarterly earnings",
"Tesla faces production challenges",
"Google announces AI breakthrough"
]
# Analyze sentiment
results = nlp.analyze_sentiment_batch(texts)
market_sentiment = nlp.calculate_market_sentiment(results)
print(f"Market Sentiment: {market_sentiment['sentiment_label']}")
print(f"Top Keywords: {market_sentiment['top_keywords']}")from data_service.ai import SentimentFactorCalculator
from data_service.factors import FactorCalculator
# Initialize calculators
sentiment_calc = SentimentFactorCalculator()
factor_calc = FactorCalculator()
# Calculate sentiment factors
sentiment_factors = sentiment_calc.calculate_sentiment_factor_matrix(
sentiment_data, ['AAPL', 'GOOGL', 'TSLA']
)
# Combine with technical factors
technical_factors = factor_calc.calculate_all_factors(symbol, prices, volumes)
# Create multi-factor model
combined_factors = pd.concat([technical_factors, sentiment_factors], axis=1)
# Generate trading signals
signals = []
for symbol in symbols:
factor = sentiment_factors[sentiment_factors['symbol'] == symbol].iloc[0]
signal = sentiment_calc.create_sentiment_signal(factor)
signals.append(signal)from data_service.ai import LLMIntegration, LangChainAgent
# Initialize components
llm = LLMIntegration(provider="openai", api_key="your-key")
agent = LangChainAgent(llm)
# Generate strategy recommendation
strategy = agent.generate_strategy_recommendation(
market_data, sentiment_data, portfolio_data, symbols
)
print(f"Strategy: {strategy.strategy_name}")
print(f"Signal: {strategy.signal}")
print(f"Confidence: {strategy.confidence}")
print(f"Reasoning: {strategy.reasoning}")from data_service.ai import LangChainAgent
# Generate automated report
report = agent.generate_automated_report(
strategy_results, market_analysis, performance_metrics
)
print("=== Automated Trading Report ===")
print(report)from data_service.storage import CacheManager
cache = CacheManager()
# Cache sentiment analysis results
def cached_sentiment_analysis(text, cache_key, ttl=3600):
cached_result = cache.get(cache_key)
if cached_result:
return cached_result
result = nlp_processor.preprocess_text(text)
cache.set(cache_key, result, expire=ttl)
return result# Process texts in batches
def batch_process_texts(texts, batch_size=100):
results = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
batch_results = nlp_processor.analyze_sentiment_batch(batch)
results.extend(batch_results)
return results# Monitor LLM usage costs
def track_llm_costs(responses):
total_cost = sum(response.cost for response in responses)
total_tokens = sum(response.tokens_used for response in responses)
print(f"Total Cost: ${total_cost:.4f}")
print(f"Total Tokens: {total_tokens}")
print(f"Cost per Token: ${total_cost/total_tokens:.6f}")# Run NLP tests
pytest tests/test_nlp_processor.py
# Run sentiment factor tests
pytest tests/test_sentiment_factor.py
# Run LLM integration tests
pytest tests/test_llm_integration.py# Run complete pipeline test
pytest tests/test_llm_nlp_pipeline.py-
spaCy Model Not Found
python -m spacy download en_core_web_sm
-
Transformers Import Error
pip install transformers torch
-
OpenAI API Errors
- Check API key validity
- Verify account has sufficient credits
- Check rate limits
-
Memory Issues
- Use smaller batch sizes
- Enable caching
- Use local models for high-frequency processing
import logging
logging.basicConfig(level=logging.DEBUG)
# Enable debug logging for all components
nlp_processor.logger.setLevel(logging.DEBUG)
llm_integration.logger.setLevel(logging.DEBUG)-
Advanced NLP
- Multi-language sentiment analysis
- Advanced topic modeling (BERTopic)
- Event extraction and classification
- Named entity recognition
-
Advanced LLM
- Function calling for structured outputs
- Memory and conversation management
- Custom model fine-tuning
- Multi-modal analysis (charts, images)
-
Real-time Processing
- Streaming data processing
- Real-time sentiment monitoring
- Live trading signal generation
- WebSocket integration
-
Advanced Analytics
- Sentiment-based risk models
- Sentiment factor optimization
- Cross-asset sentiment analysis
- Sentiment-based portfolio construction
When adding new LLM/NLP features:
- Follow the existing module structure
- Add comprehensive error handling
- Include logging for debugging
- Add unit tests
- Update documentation
- Consider performance implications
- Monitor API costs
This module is part of the trading system and follows the same license terms.