Skip to content

Commit f758582

Browse files
authored
Merge pull request #2039 from oracle-devrel/aliottoman-patch-16
Update README.txt
2 parents 401d4b6 + 3a04a8a commit f758582

1 file changed

Lines changed: 251 additions & 14 deletions

File tree

  • ai/generative-ai-service/complex-document-rag
Lines changed: 251 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,259 @@
1-
Complex Decument RAG
2-
This is a Retrieval-Augmented Generation (RAG) system for generating comprehensive business reports from multiple document sources using Oracle Cloud Infrastructure (OCI) Generative AI services.
1+
# RAG Report Generator
32

4-
This asset lives under: ai/generative-ai-service/complex-document-rag
3+
An enterprise-grade Retrieval-Augmented Generation (RAG) system for generating comprehensive business reports from multiple document sources using Oracle Cloud Infrastructure (OCI) Generative AI services.
54

6-
When to use this asset?
7-
When you're dealing with complex, hard-to-parse, data-rich Excel sheets and the generation task is highly structured, e.g. creating compliance reports.
5+
## Features
86

9-
How to use this asset?
10-
See the full setup and usage guide in files/README.md.
7+
- **Multi-Document Processing**: Ingest and process PDF and XLSX documents
8+
- **Multiple Embedding Models**: Support for Cohere multilingual and v4.0 embeddings
9+
- **Advanced LLM Support**: Integration with OCI models (Grok-3, Grok-4, Llama 3.3, Cohere Command)
10+
- **Agentic Workflows**: Multi-agent system for intelligent report generation
11+
- **Hierarchical Report Structure**: Automatically organizes content based on user queries
12+
- **Citation Tracking**: Source attribution with references
13+
- **Multi-Language Support**: Generate reports in English, Arabic, Spanish, and French
14+
- **Visual Analytics**: Automatic chart and table generation from data
1115

12-
License
13-
Copyright (c) 2025 Oracle and/or its affiliates.
16+
## Prerequisites
1417

15-
Licensed under the Universal Permissive License (UPL), Version 1.0.
18+
- Python 3.11+
19+
- OCI Account with Generative AI service access
20+
- OCI CLI configured with appropriate credentials
21+
22+
## Installation
23+
24+
1. Clone the repository:
25+
```bash
26+
git clone <repository-url>
27+
cd agentic_rag
28+
```
29+
30+
2. Create a virtual environment:
31+
```bash
32+
python -m venv venv
33+
source venv/bin/activate # On Windows: venv\Scripts\activate
34+
```
35+
36+
3. Install dependencies:
37+
```bash
38+
pip install -r requirements.txt
39+
```
40+
41+
4. Configure OCI credentials:
42+
```bash
43+
# Create OCI config directory if it doesn't exist
44+
mkdir -p ~/.oci
45+
46+
# Add your OCI configuration to ~/.oci/config
47+
# See: https://docs.oracle.com/en-us/iaas/Content/API/Concepts/sdkconfig.htm
48+
```
49+
50+
5. Set up environment variables:
51+
```bash
52+
# Create .env file with your configuration
53+
cat > .env << EOF
54+
# OCI Configuration
55+
OCI_COMPARTMENT_ID=your-compartment-id
56+
COMPARTMENT_ID_DAC=your-dac-compartment-id # If using dedicated cluster
57+
58+
# Model IDs (get from OCI Console)
59+
OCI_GROK_3_MODEL_ID=your-grok3-model-id
60+
OCI_GROK_4_MODEL_ID=your-grok4-model-id
61+
OCI_LLAMA_3_3_MODEL_ID=your-llama-model-id
62+
OCI_COHERE_COMMAND_A_MODEL_ID=your-cohere-model-id
63+
64+
# Default Models (optional)
65+
DEFAULT_EMBEDDING_MODEL=cohere-embed-multilingual-v3.0
66+
DEFAULT_LLM_MODEL=grok-3
67+
EOF
68+
```
69+
70+
## Quick Start
71+
72+
1. Launch the Gradio interface:
73+
```bash
74+
python gradio_app.py
75+
```
76+
77+
2. Open your browser to `http://localhost:7863`
78+
79+
3. Follow these steps in the interface:
80+
- **Document Processing Tab**: Upload and process your documents (PDF/XLSX) - see samples in sample_data folder
81+
- **Vector Store Viewer Tab**: View and manage your document collections
82+
- **Inference & Query Tab**: Enter queries and generate reports - see sample queries in sample_queries folder
83+
84+
## Usage Guide
85+
86+
### Document Processing
87+
88+
1. Select an embedding model (e.g., cohere-embed-multilingual-v3.0)
89+
2. Upload documents:
90+
- **XLSX**: Financial data, ESG metrics, structured data
91+
- **PDF**: Reports, policies, unstructured documents
92+
3. Specify the entity name for each document, i.e. the bank or institition's name
93+
4. Click "Process" to ingest into the vector store
94+
95+
### Generating Reports
96+
97+
1. In the **Inference & Query** tab:
98+
- Enter your query (can be structured with numbered sections)
99+
- Select LLM model (Grok-3 recommended for reports)
100+
- Choose data sources (PDF/XLSX collections)
101+
- Enable "Agentic Workflow" for comprehensive multi-agent reports
102+
- Click "Run Query"
103+
104+
2. Example structured query:
105+
```
106+
Prepare a comprehensive ESG comparison report between Company A and Company B:
107+
108+
1) Climate Impact & Emissions
109+
- Net-zero commitments and targets
110+
- Scope 1, 2, and 3 emissions
111+
112+
2) Social & Governance
113+
- Diversity targets
114+
- Board oversight
115+
116+
3) Financial Performance
117+
- Revenue and profitability
118+
- ESG investments
119+
```
120+
121+
### Report Features
122+
123+
Generated reports include:
124+
- Executive summary addressing your specific query
125+
- Hierarchically organized sections
126+
- Data tables and visualizations
127+
- Source citations [1], [2] for traceability
128+
- References section with full source details
129+
- Professional formatting (Times New Roman, black headings)
130+
131+
## Project Structure
132+
133+
```
134+
agentic_rag/
135+
├── gradio_app.py # Main application interface
136+
├── local_rag_agent.py # Core RAG system logic
137+
├── vector_store.py # Vector database management
138+
├── oci_embedding_handler.py # OCI embedding services
139+
├── agents/
140+
│ ├── agent_factory.py # Agent creation and management
141+
│ └── report_writer_agent.py # Report generation logic
142+
├── handlers/
143+
│ ├── query_handler.py # Query processing
144+
│ ├── pdf_handler.py # PDF document processing
145+
│ ├── xlsx_handler.py # Excel document processing
146+
│ └── vector_handler.py # Vector store operations
147+
├── ingest_pdf.py # PDF ingestion pipeline
148+
├── ingest_xlsx.py # Excel ingestion pipeline
149+
├── sample_data/ # Sample documents for testing
150+
├── sample_queries/ # Example queries for reports
151+
└── utils/
152+
└── demo_logger.py # Logging utilities
153+
```
154+
155+
## Advanced Configuration
156+
157+
### Embedding Models
158+
159+
Available embedding models:
160+
- `cohere-embed-multilingual-v3.0` (1024 dimensions)
161+
- `cohere-embed-v4.0` (1024 dimensions)
162+
- `chromadb-default` (384 dimensions, local)
16163

17-
See LICENSE for more details.
164+
### LLM Models
165+
166+
Supported OCI Generative AI models:
167+
- **Grok-3**: Best for comprehensive reports (16K output tokens)
168+
- **Grok-4**: Advanced reasoning (120K output tokens)
169+
- **Llama 3.3**: Fast inference (4K output tokens)
170+
- **Cohere Command**: Instruction following (4K output tokens)
171+
172+
### Vector Store Management
173+
174+
- Collections are automatically created per embedding model
175+
- Switch between models without data loss
176+
- Delete collections via the Vector Store Viewer tab
177+
178+
## Troubleshooting
179+
180+
### Common Issues
181+
182+
1. **OCI Authentication Error**
183+
- Verify ~/.oci/config is properly configured
184+
- Check compartment ID in .env file
185+
- Ensure your user has appropriate IAM policies
186+
187+
2. **Embedding Model Errors**
188+
- Verify model IDs in .env file
189+
- Check OCI service limits and quotas
190+
- Ensure embedding service is enabled in your region
191+
192+
3. **Memory Issues**
193+
- For large documents, process in smaller batches
194+
- Adjust chunk size in ingestion settings
195+
- Consider using pagination for large result sets
196+
197+
### Logs
198+
199+
Check `logs/app.log` for detailed debugging information.
200+
201+
## API Usage (Optional)
202+
203+
For programmatic access:
204+
205+
```python
206+
from local_rag_agent import RAGSystem
207+
from vector_store import EnhancedVectorStore
208+
209+
# Initialize system
210+
vector_store = EnhancedVectorStore(
211+
persist_directory="embed-cohere-embed-multilingual-v3.0",
212+
embedding_model="cohere-embed-multilingual-v3.0"
213+
)
214+
215+
rag_system = RAGSystem(
216+
vector_store=vector_store,
217+
model_name="grok-3",
218+
use_cot=True
219+
)
220+
221+
# Process query
222+
response = rag_system.process_query("Your query here")
223+
print(response["answer"])
224+
```
225+
226+
## Contributing
227+
228+
1. Fork the repository
229+
2. Create a feature branch
230+
3. Make your changes
231+
4. Run tests: `python -m pytest tests/`
232+
5. Submit a pull request
233+
234+
## License
235+
236+
[Your License Here]
237+
238+
## Support
239+
240+
For issues and questions:
241+
- Check the logs in `logs/app.log`
242+
- Review the troubleshooting section
243+
- Open an issue on GitHub
244+
245+
## Acknowledgments
246+
247+
- Oracle Cloud Infrastructure for Generative AI services
248+
- Gradio for the web interface
249+
- ChromaDB for vector storage
250+
- The open-source community
251+
252+
## License
253+
Copyright (c) 2024 Oracle and/or its affiliates.
254+
255+
Licensed under the Universal Permissive License (UPL), Version 1.0.
18256

19-
ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK.
257+
See [LICENSE](LICENSE.txt) for more details.
20258

21-
Disclaimer
22-
ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK.
259+
ORACLE AND ITS AFFILIATES DO NOT PROVIDE ANY WARRANTY WHATSOEVER, EXPRESS OR IMPLIED, FOR ANY SOFTWARE, MATERIAL OR CONTENT OF ANY KIND CONTAINED OR PRODUCED WITHIN THIS REPOSITORY, AND IN PARTICULAR SPECIFICALLY DISCLAIM ANY AND ALL IMPLIED WARRANTIES OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. FURTHERMORE, ORACLE AND ITS AFFILIATES DO NOT REPRESENT THAT ANY CUSTOMARY SECURITY REVIEW HAS BEEN PERFORMED WITH RESPECT TO ANY SOFTWARE, MATERIAL OR CONTENT CONTAINED OR PRODUCED WITHIN THIS REPOSITORY. IN ADDITION, AND WITHOUT LIMITING THE FOREGOING, THIRD PARTIES MAY HAVE POSTED SOFTWARE, MATERIAL OR CONTENT TO THIS REPOSITORY WITHOUT ANY REVIEW. USE AT YOUR OWN RISK.

0 commit comments

Comments
 (0)