Skip to content

Latest commit

 

History

History
872 lines (663 loc) · 30.7 KB

File metadata and controls

872 lines (663 loc) · 30.7 KB
layout default
title Chapter 8: Production Case Studies
parent LLaMA Factory Tutorial
nav_order 8

Chapter 8: Production Case Studies

Welcome to the grand finale! This chapter showcases real-world production deployments of LLaMA Factory, complete with challenges faced, solutions implemented, and lessons learned. These case studies demonstrate how to apply everything you've learned in a production environment.

Production Workflow Summary

flowchart LR
    REQ[Business Requirement] --> DATA[Curate Domain Dataset]
    DATA --> TRAIN[Fine-tune with LLaMA-Factory\nLoRA or QLoRA]
    TRAIN --> EVAL[Evaluate: task-specific benchmarks]
    EVAL --> OK{Quality Gate?}
    OK -->|Pass| DEPLOY[Deploy: vLLM / llama.cpp / API]
    OK -->|Fail| DATA
    DEPLOY --> MON[Monitor: latency, quality, cost]
    MON -->|Drift detected| DATA
Loading

Case Study 1: AI Customer Support System

Challenge

A large e-commerce company needed to handle millions of customer inquiries while maintaining high satisfaction rates and reducing response times.

Solution Architecture

# Production customer support system architecture
class CustomerSupportSystem:
    """Production-ready customer support AI system"""

    def __init__(self):
        self.models = {
            'intent_classifier': self.load_intent_model(),
            'response_generator': self.load_response_model(),
            'sentiment_analyzer': self.load_sentiment_model()
        }
        self.cache = RedisCache()
        self.monitor = SystemMonitor()

    def process_inquiry(self, customer_message: str, context: dict) -> dict:
        """Process customer inquiry with full production pipeline"""

        start_time = time.time()

        try:
            # Step 1: Intent classification
            intent = self.classify_intent(customer_message)

            # Step 2: Sentiment analysis
            sentiment = self.analyze_sentiment(customer_message)

            # Step 3: Context enrichment
            enriched_context = self.enrich_context(context, intent)

            # Step 4: Generate response
            response = self.generate_response(customer_message, intent, enriched_context)

            # Step 5: Quality assurance
            quality_score = self.assess_response_quality(response, intent)

            # Step 6: Fallback handling
            if quality_score < 0.7:
                response = self.generate_fallback_response(intent)

            processing_time = time.time() - start_time

            # Log metrics
            self.monitor.log_request({
                'intent': intent,
                'sentiment': sentiment,
                'processing_time': processing_time,
                'quality_score': quality_score
            })

            return {
                'response': response,
                'intent': intent,
                'confidence': quality_score,
                'processing_time': processing_time
            }

        except Exception as e:
            self.monitor.log_error(e)
            return self.handle_error(customer_message)

    def classify_intent(self, message: str) -> str:
        """Classify customer intent using fine-tuned model"""
        cache_key = f"intent:{hash(message)}"

        # Check cache first
        cached_result = self.cache.get(cache_key)
        if cached_result:
            return cached_result

        # Use fine-tuned intent classification model
        result = self.models['intent_classifier'](message)
        self.cache.set(cache_key, result, ttl=3600)

        return result

    def generate_response(self, message: str, intent: str, context: dict) -> str:
        """Generate contextual response"""

        # Craft prompt with context
        prompt = self.build_response_prompt(message, intent, context)

        # Use fine-tuned response generation model
        response = self.models['response_generator'](
            prompt,
            max_length=300,
            temperature=0.3  # Lower temperature for consistent responses
        )

        return response

    def build_response_prompt(self, message: str, intent: str, context: dict) -> str:
        """Build comprehensive response prompt"""

        prompt = f"""
        Customer Inquiry: {message}
        Detected Intent: {intent}
        Customer History: {context.get('purchase_history', 'None')}
        Previous Interactions: {context.get('recent_tickets', 'None')}

        Generate a helpful, empathetic response that addresses the customer's specific situation.
        Include any relevant account information or next steps.

        Response:
        """

        return prompt.strip()

# Deployment configuration
support_config = {
    'models': {
        'intent_classifier': 'llama-7b-intent-v2',
        'response_generator': 'llama-13b-support-v3',
        'sentiment_analyzer': 'distilbert-sentiment'
    },
    'scaling': {
        'min_instances': 3,
        'max_instances': 20,
        'target_response_time': 2000  # ms
    },
    'monitoring': {
        'error_rate_threshold': 0.05,
        'response_time_threshold': 3000,
        'quality_score_threshold': 0.8
    }
}

Performance Metrics

  • Response Time: 1.2 seconds average
  • Accuracy: 94% intent classification
  • Customer Satisfaction: 4.6/5 stars
  • Cost Savings: 60% reduction in support tickets

Challenges & Solutions

Challenge 1: High Volume Traffic

  • Problem: Millions of daily inquiries causing system overload
  • Solution: Implemented auto-scaling with Kubernetes HPA and Redis caching

Challenge 2: Response Quality Consistency

  • Problem: Inconsistent responses across different customer types
  • Solution: Fine-tuned models on domain-specific data with quality assurance pipeline

Challenge 3: Real-time Context Integration

  • Problem: Need to access customer history and account data
  • Solution: Built real-time data pipeline with Elasticsearch for context retrieval

Case Study 2: Code Generation Platform

Challenge

A software development company needed an AI-powered code generation tool that could understand complex requirements and generate production-ready code.

Technical Implementation

class CodeGenerationPlatform:
    """Production code generation platform"""

    def __init__(self):
        self.models = {
            'requirement_analyzer': self.load_analyzer_model(),
            'code_generator': self.load_generator_model(),
            'code_reviewer': self.load_reviewer_model()
        }
        self.vector_store = FAISSVectorStore()
        self.code_validator = CodeValidator()

    def generate_code(self, requirements: str, language: str, context: dict) -> dict:
        """Generate code with comprehensive quality assurance"""

        # Step 1: Analyze requirements
        analysis = self.analyze_requirements(requirements)

        # Step 2: Retrieve similar code examples
        similar_examples = self.vector_store.search_similar(requirements, top_k=5)

        # Step 3: Generate code with context
        code = self.generate_code_with_context(
            requirements,
            language,
            analysis,
            similar_examples
        )

        # Step 4: Validate generated code
        validation_result = self.code_validator.validate(code, language)

        # Step 5: Review and improve if needed
        if validation_result['has_issues']:
            code = self.review_and_improve(code, validation_result['issues'])

        # Step 6: Generate documentation
        documentation = self.generate_documentation(code, requirements)

        return {
            'code': code,
            'documentation': documentation,
            'validation_score': validation_result['score'],
            'language': language,
            'complexity': analysis['complexity']
        }

    def analyze_requirements(self, requirements: str) -> dict:
        """Analyze code requirements using fine-tuned model"""

        prompt = f"""
        Analyze these code requirements and provide:
        1. Complexity level (simple, medium, complex)
        2. Required components/modules
        3. Potential challenges
        4. Suggested architecture

        Requirements: {requirements}
        """

        analysis = self.models['requirement_analyzer'](prompt)
        return self.parse_analysis(analysis)

    def generate_code_with_context(self, requirements: str, language: str,
                                 analysis: dict, examples: list) -> str:
        """Generate code with retrieved context"""

        # Build comprehensive prompt
        prompt = self.build_code_generation_prompt(
            requirements, language, analysis, examples
        )

        # Generate code
        code = self.models['code_generator'](
            prompt,
            max_length=1500,
            temperature=0.2,  # Lower temperature for code
            stop_sequences=['```', '# End of code']
        )

        return self.extract_code_from_response(code)

    def build_code_generation_prompt(self, requirements: str, language: str,
                                   analysis: dict, examples: list) -> str:
        """Build comprehensive code generation prompt"""

        prompt = f"""# Code Generation Task

## Requirements
{requirements}

## Language
{language}

## Analysis
- Complexity: {analysis['complexity']}
- Components: {', '.join(analysis['components'])}
- Challenges: {', '.join(analysis['challenges'])}

## Similar Examples
"""

        for i, example in enumerate(examples[:2]):
            prompt += f"### Example {i+1}\n{example['code'][:300]}...\n\n"

        prompt += f"""
## Task
Generate complete, well-documented {language} code that implements the requirements.
Follow best practices and include error handling.

```python
"""

        return prompt

    def review_and_improve(self, code: str, issues: list) -> str:
        """Review and improve generated code"""

        review_prompt = f"""
        Review this code and fix the following issues:
        {', '.join(issues)}

        Original code:
        ```python
        {code}
        ```

        Provide improved version:
        """

        improved_code = self.models['code_reviewer'] (review_prompt)
        return self.extract_code_from_response(improved_code)

    def generate_documentation(self, code: str, requirements: str) -> str:
        """Generate comprehensive documentation"""

        doc_prompt = f"""
        Generate comprehensive documentation for this code:

        Requirements: {requirements}

        Code:
        ```python
        {code}
        ```

        Include:
        1. Overview
        2. Function descriptions
        3. Usage examples
        4. Dependencies
        """

        return self.models['code_generator'] (doc_prompt, max_length=800)

# Production configuration
code_gen_config = {
    'models': {
        'requirement_analyzer': 'llama-7b-analyzer-v2',
        'code_generator': 'llama-13b-coder-v3',
        'code_reviewer': 'llama-7b-reviewer-v2'
    },
    'validation': {
        'syntax_check': True,
        'security_scan': True,
        'performance_analysis': True
    },
    'caching': {
        'vector_store_ttl': 7200,
        'response_cache_ttl': 3600
    }
}

Performance Metrics

  • Code Quality Score: 8.7/10
  • Syntax Error Rate: 0.3%
  • Generation Time: 3.2 seconds average
  • User Adoption: 85% of developers using daily

Challenges & Solutions

Challenge 1: Code Quality and Security

  • Problem: Generated code had security vulnerabilities and poor practices
  • Solution: Implemented multi-stage validation and security scanning

Challenge 2: Context Understanding

  • Problem: Model struggled with complex, multi-step requirements
  • Solution: Built requirement analysis pipeline and retrieval-augmented generation

Challenge 3: Language-Specific Nuances

  • Problem: Code generation varied significantly across programming languages
  • Solution: Specialized fine-tuning for different languages and frameworks

Case Study 3: Financial Analysis AI

Challenge

An investment firm needed an AI system to analyze financial reports, predict market trends, and provide investment recommendations.

Advanced Implementation

class FinancialAnalysisAI:
    """Production financial analysis and prediction system"""

    def __init__(self):
        self.models = {
            'report_analyzer': self.load_report_model(),
            'trend_predictor': self.load_prediction_model(),
            'risk_assessor': self.load_risk_model(),
            'recommendation_engine': self.load_recommendation_model()
        }
        self.market_data = MarketDataAPI()
        self.compliance_checker = ComplianceChecker()

    def analyze_investment_opportunity(self, company_data: dict, market_context: dict) -> dict:
        """Complete investment analysis pipeline"""

        # Step 1: Analyze financial reports
        report_analysis = self.analyze_financial_reports(company_data['reports'])

        # Step 2: Predict market trends
        trend_prediction = self.predict_market_trends(market_context)

        # Step 3: Assess risk factors
        risk_assessment = self.assess_risk_factors(company_data, market_context)

        # Step 4: Generate investment recommendation
        recommendation = self.generate_recommendation(
            report_analysis,
            trend_prediction,
            risk_assessment,
            company_data
        )

        # Step 5: Compliance check
        compliance_result = self.compliance_checker.verify_recommendation(recommendation)

        # Step 6: Generate explanation
        explanation = self.generate_explanation(recommendation, analysis_results)

        return {
            'recommendation': recommendation,
            'confidence': self.calculate_confidence(analysis_results),
            'risk_level': risk_assessment['overall_risk'],
            'expected_return': trend_prediction['expected_return'],
            'time_horizon': recommendation['time_horizon'],
            'explanation': explanation,
            'compliance_status': compliance_result['status']
        }

    def analyze_financial_reports(self, reports: list) -> dict:
        """Analyze financial reports using fine-tuned model"""

        analysis_prompt = self.build_financial_analysis_prompt(reports)

        analysis = self.models['report_analyzer'](
            analysis_prompt,
            max_length=1000,
            temperature=0.1  # Low temperature for factual analysis
        )

        return self.parse_financial_analysis(analysis)

    def predict_market_trends(self, market_context: dict) -> dict:
        """Predict market trends using time series and contextual data"""

        # Gather market data
        market_data = self.market_data.get_recent_data(
            symbols=market_context['symbols'],
            timeframe='1Y'
        )

        prediction_prompt = self.build_prediction_prompt(market_data, market_context)

        prediction = self.models['trend_predictor'] (prediction_prompt)

        return self.parse_prediction(prediction)

    def assess_risk_factors(self, company_data: dict, market_context: dict) -> dict:
        """Comprehensive risk assessment"""

        risk_prompt = self.build_risk_assessment_prompt(company_data, market_context)

        risk_analysis = self.models['risk_assessor'] (risk_prompt)

        return self.parse_risk_assessment(risk_analysis)

    def generate_recommendation(self, report_analysis: dict, trend_prediction: dict,
                              risk_assessment: dict, company_data: dict) -> dict:
        """Generate investment recommendation using all analyses"""

        recommendation_prompt = f"""
        Based on the following analyses, provide an investment recommendation:

        Financial Analysis: {report_analysis}
        Market Prediction: {trend_prediction}
        Risk Assessment: {risk_assessment}
        Company Data: {company_data}

        Provide:
        1. Recommendation (BUY/HOLD/SELL)
        2. Confidence level (0-1)
        3. Time horizon
        4. Key factors influencing decision
        """

        recommendation = self.models['recommendation_engine'](
            recommendation_prompt,
            temperature=0.2
        )

        return self.parse_recommendation(recommendation)

    def build_financial_analysis_prompt(self, reports: list) -> str:
        """Build comprehensive financial analysis prompt"""

        prompt = "# Financial Report Analysis\n\n"

        for i, report in enumerate(reports[-3:]):  # Last 3 reports
            prompt += f"## Report {i+1}: {report['period']}\n"
            prompt += f"Revenue: {report['revenue']}\n"
            prompt += f"Net Income: {report['net_income']}\n"
            prompt += f"Assets: {report['total_assets']}\n"
            prompt += f"Liabilities: {report['total_liabilities']}\n\n"

        prompt += """
        Analyze these financial reports and provide:
        1. Revenue growth trends
        2. Profitability analysis
        3. Balance sheet strength
        4. Key financial ratios
        5. Overall financial health assessment
        """

        return prompt

    def calculate_confidence(self, analysis_results: dict) -> float:
        """Calculate overall confidence in recommendation"""

        # Weighted combination of individual analysis confidences
        weights = {
            'report_analysis': 0.4,
            'trend_prediction': 0.3,
            'risk_assessment': 0.3
        }

        confidence = 0
        for analysis_type, weight in weights.items():
            analysis_confidence = analysis_results[analysis_type].get('confidence', 0.5)
            confidence += analysis_confidence * weight

        return confidence

# Production configuration
financial_config = {
    'models': {
        'report_analyzer': 'llama-13b-financial-v3',
        'trend_predictor': 'llama-7b-prediction-v2',
        'risk_assessor': 'llama-7b-risk-v2',
        'recommendation_engine': 'llama-13b-recommendation-v3'
    },
    'data_sources': {
        'market_data_api': 'alpha_vantage',
        'financial_reports': 'edgar_api',
        'news_sentiment': 'newsapi'
    },
    'compliance': {
        'sec_regulation_check': True,
        'risk_limits': True,
        'audit_trail': True
    },
    'monitoring': {
        'prediction_accuracy_tracking': True,
        'portfolio_performance_monitoring': True,
        'model_drift_detection': True
    }
}

Performance Metrics

  • Prediction Accuracy: 78% directional accuracy
  • Risk Assessment: 92% coverage of key risk factors
  • Response Time: 4.1 seconds for complete analysis
  • Compliance: 100% regulatory requirement satisfaction

Challenges & Solutions

Challenge 1: Regulatory Compliance

  • Problem: Financial recommendations must comply with SEC regulations
  • Solution: Built comprehensive compliance checking and audit trails

Challenge 2: Market Data Integration

  • Problem: Need real-time market data and historical trends
  • Solution: Integrated multiple financial data APIs with caching and failover

Challenge 3: Risk Management

  • Problem: Investment recommendations carry significant financial risk
  • Solution: Implemented multi-layer risk assessment and conservative thresholds

Production Deployment Patterns

Multi-Region Deployment

# Multi-region deployment configuration
production_deployment = {
    'regions': {
        'us-east': {
            'instances': 5,
            'model': 'llama-13b-v3',
            'traffic_weight': 0.6
        },
        'us-west': {
            'instances': 3,
            'model': 'llama-13b-v3',
            'traffic_weight': 0.4
        },
        'eu-central': {
            'instances': 2,
            'model': 'llama-7b-v3',
            'traffic_weight': 0.3
        }
    },
    'load_balancer': {
        'algorithm': 'weighted_round_robin',
        'health_checks': {
            'interval': 30,
            'timeout': 10,
            'unhealthy_threshold': 3
        }
    },
    'failover': {
        'primary_region': 'us-east',
        'backup_regions': ['us-west', 'eu-central'],
        'automatic_failover': True
    }
}

Continuous Learning Pipeline

class ContinuousLearningPipeline:
    """Production system for continuous model improvement"""

    def __init__(self):
        self.feedback_collector = FeedbackCollector()
        self.model_retrainer = ModelRetrainer()
        self.performance_monitor = PerformanceMonitor()

    def run_continuous_learning_cycle(self):
        """Execute one cycle of continuous learning"""

        # Step 1: Collect user feedback and performance data
        feedback_data = self.feedback_collector.collect_recent_feedback()

        # Step 2: Analyze performance and identify improvement areas
        performance_analysis = self.performance_monitor.analyze_performance()

        # Step 3: Generate new training data from feedback
        new_training_data = self.generate_training_data_from_feedback(feedback_data)

        # Step 4: Retrain model with new data
        if len(new_training_data) > 100:  # Minimum threshold
            self.model_retrainer.retrain_model(new_training_data)

        # Step 5: A/B test new model
        test_results = self.run_ab_test()

        # Step 6: Deploy if performance improved
        if test_results['improvement'] > 0.05:  # 5% improvement threshold
            self.deploy_new_model()

        # Step 7: Log results
        self.log_learning_cycle_results(feedback_data, performance_analysis, test_results)

    def generate_training_data_from_feedback(self, feedback: list) -> list:
        """Generate training data from user feedback"""

        training_examples = []

        for item in feedback:
            if item['rating'] < 3:  # Poor performance
                # Generate improved response
                improved_response = self.generate_improved_response(
                    item['input'],
                    item['original_response'],
                    item['feedback']
                )

                training_examples.append({
                    'instruction': item['input'],
                    'output': improved_response,
                    'source': 'user_feedback'
                })

        return training_examples

    def run_ab_test(self) -> dict:
        """Run A/B test between old and new models"""

        test_prompts = self.load_test_prompts()
        results = {'old_model': [], 'new_model': []}

        for prompt in test_prompts:
            old_response = self.generate_with_model(prompt, 'old_model')
            new_response = self.generate_with_model(prompt, 'new_model')

            # In practice, collect human preferences
            preference = self.simulate_preference_test(old_response, new_response)

            results['old_model'].append(preference == 'old')
            results['new_model'].append(preference == 'new')

        improvement = (
            sum(results['new_model']) / len(results['new_model']) -
            sum(results['old_model']) / len(results['old_model'])
        )

        return {'improvement': improvement, 'results': results}

    def deploy_new_model(self):
        """Deploy new model to production"""

        # Gradual rollout
        self.set_traffic_split(old_model=0.9, new_model=0.1)

        # Monitor for 24 hours
        time.sleep(24 * 3600)

        # Full deployment if no issues
        if not self.detect_performance_issues():
            self.set_traffic_split(old_model=0.0, new_model=1.0)
        else:
            # Rollback
            self.set_traffic_split(old_model=1.0, new_model=0.0)

Lessons Learned

Key Takeaways from Production Deployments

  1. Start Simple, Scale Smart

    • Begin with minimal viable product
    • Implement proper monitoring from day one
    • Plan for scaling before it becomes necessary
  2. Quality Over Quantity

    • Focus on high-quality training data
    • Implement rigorous validation pipelines
    • Regular human evaluation is essential
  3. Robustness is Critical

    • Handle edge cases gracefully
    • Implement comprehensive error handling
    • Have fallback mechanisms for all critical paths
  4. Performance Monitoring

    • Track latency, throughput, and error rates
    • Set up alerts for performance degradation
    • Regular model performance audits
  5. Continuous Improvement

    • Collect user feedback systematically
    • Implement continuous learning pipelines
    • Regular model updates based on real usage
  6. Security and Compliance

    • Implement proper authentication and authorization
    • Regular security audits
    • Compliance with relevant regulations
  7. Cost Optimization

    • Optimize model size and inference costs
    • Implement intelligent caching
    • Use auto-scaling to match demand

Future Directions

Emerging Trends in Production AI

  1. Multi-Modal Models

    • Integration of text, image, and audio understanding
    • More comprehensive AI assistants
  2. Edge Deployment

    • Running models on edge devices
    • Reduced latency and bandwidth usage
  3. Federated Learning at Scale

    • Privacy-preserving collaborative training
    • Cross-organization model improvement
  4. Automated MLOps

    • End-to-end automated pipelines
    • Self-healing and self-optimizing systems
  5. Explainable AI

    • Better model interpretability
    • Regulatory compliance for high-stakes applications

Conclusion

Congratulations! 🎉 You've completed the comprehensive LLaMA Factory tutorial and explored real-world production deployments. The case studies demonstrate that successful AI deployment requires:

  • Technical Excellence: Robust engineering and optimization
  • Quality Focus: Rigorous testing and validation
  • User-Centric Design: Understanding and meeting user needs
  • Continuous Learning: Systems that improve over time
  • Ethical Considerations: Responsible AI development

Remember, the most successful AI systems are those that provide real value to users while maintaining high standards of quality, reliability, and ethics. Your journey into production AI has just begun - keep learning, experimenting, and building amazing applications!


Final Practice Projects:

  1. Deploy a complete AI customer support system
  2. Build a code generation platform with quality assurance
  3. Create a financial analysis AI with compliance features
  4. Implement continuous learning for any of your models
  5. Set up comprehensive monitoring and alerting

What's your next production AI project? 🚀

What Problem Does This Solve?

Most teams struggle here because the hard part is not writing more code, but deciding clear boundaries for self, Step, dict so behavior stays predictable as complexity grows.

In practical terms, this chapter helps you avoid three common failures:

  • coupling core logic too tightly to one implementation path
  • missing the handoff boundaries between setup, execution, and validation
  • shipping changes without clear rollback or observability strategy

After working through this chapter, you should be able to reason about Chapter 8: Production Case Studies as an operating subsystem inside LLaMA-Factory Tutorial: Unified Framework for LLM Training and Fine-tuning, with explicit contracts for inputs, state transitions, and outputs.

Use the implementation notes around prompt, code, models as your checklist when adapting these patterns to your own repository.

How it Works Under the Hood

Under the hood, Chapter 8: Production Case Studies usually follows a repeatable control path:

  1. Context bootstrap: initialize runtime config and prerequisites for self.
  2. Input normalization: shape incoming data so Step receives stable contracts.
  3. Core execution: run the main logic branch and propagate intermediate state through dict.
  4. Policy and safety checks: enforce limits, auth scopes, and failure boundaries.
  5. Output composition: return canonical result payloads for downstream consumers.
  6. Operational telemetry: emit logs/metrics needed for debugging and performance tuning.

When debugging, walk this sequence in order and confirm each stage has explicit success/failure conditions.

Source Walkthrough

Use the following upstream sources to verify implementation details while reading this chapter:

  • src/llamafactory/api/app.py FastAPI application implementing an OpenAI-compatible REST API (/v1/chat/completions, /v1/models). This is the production serving layer used when deploying fine-tuned models as an API endpoint. Shows model loading, streaming response generation, and error handling.

  • src/llamafactory/chat/hf_engine.py HuggingFace inference engine used by the API server and chat interface. Handles tokenization, generation parameters (temperature, top-p, repetition penalty), and streaming token output. Key file for understanding production inference throughput.

  • src/llamafactory/train/sft/workflow.py Contains run_sft() and run_export(). The run_export() function implements LoRA adapter merging into the base model weights for deployment - critical for the case study pattern of train-then-export-then-serve.

  • examples/ Production-ready YAML configuration examples for common fine-tuning recipes: full_multi_gpu/, lora_single_gpu/, qlora_deepspeed/. These reference configs align with the case study training patterns shown in this chapter.

  • src/llamafactory/extras/callbacks.py Training callbacks for logging loss curves, saving checkpoints, and triggering early stopping. In production pipelines, these callbacks feed metrics into monitoring systems and enable the continuous learning cycle.

  • src/llamafactory/eval/evaluator.py Evaluation pipeline for benchmarking fine-tuned models on standard datasets (MMLU, C-Eval, etc.). Used in the production quality gate step before deploying a new model version.

Suggested trace strategy:

  • Trace src/llamafactory/api/app.pycreate_app() to understand production API bootstrapping with adapter loading
  • Compare src/llamafactory/chat/hf_engine.py ChatModel.stream_chat() with vLLM async engine for throughput differences
  • Review examples/lora_single_gpu/llama3_lora_sft.yaml as a concrete template before adapting configs for domain-specific deployments

Chapter Connections