|
| 1 | +# ⚖️ OCI Enterprise AI — Legal Due Diligence Agent |
| 2 | + |
| 3 | +A multi-step agentic application that performs M&A contract due diligence using **OCI Generative AI Responses API**. The agent autonomously parses contracts, extracts key clauses, compares them against market standards, identifies cross-contract conflicts, and produces a structured risk register, all orchestrated through a single API call with client-side tool execution. |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +## When to Use This Asset |
| 8 | + |
| 9 | +### The Challenge |
| 10 | + |
| 11 | +M&A due diligence is one of the most time-intensive and costly stages of any transaction. Associates and junior lawyers spend days, sometimes weeks, manually reviewing hundreds of contracts to surface risks, flag non-standard terms, and identify conflicts across agreements. The work is repetitive, error-prone under time pressure, and directly impacts deal timelines and advisory fees. |
| 12 | + |
| 13 | +Similar challenges exist across any industry where large volumes of contracts or regulatory documents need systematic review: commercial real estate transactions, insurance policy audits, procurement compliance, government tender evaluations, and regulatory filings. |
| 14 | + |
| 15 | +### Who Is This For? |
| 16 | + |
| 17 | +- **Law firms and legal departments** looking to accelerate due diligence cycles, reduce associate hours on document review, and deliver more consistent risk assessments across deals. |
| 18 | +- **Financial services and investment firms** performing portfolio-level contract analysis for acquisitions, fund restructurings, or regulatory compliance reviews. |
| 19 | +- **Real estate companies** reviewing lease portfolios, tenant agreements, and property management contracts at scale during acquisitions or portfolio rebalancing. |
| 20 | +- **Government and public sector organizations** auditing vendor contracts, procurement agreements, and compliance documentation across departments. |
| 21 | +- **Insurance companies** analyzing policy documents, underwriting agreements, and reinsurance treaties for risk exposure and coverage gaps. |
| 22 | +- **System integrators and legal tech companies** building document intelligence platforms for their clients on OCI infrastructure. |
| 23 | + |
| 24 | +### When to Use It |
| 25 | + |
| 26 | +- When you need to review **multiple contracts simultaneously** and identify risks both within individual agreements and across the full contract set. |
| 27 | +- When the document volume or deal timeline makes **manual review impractical** and you need to surface the most critical risks first. |
| 28 | +- When **data sovereignty matters**, the entire pipeline runs on OCI with no data leaving your tenancy, making it suitable for government, financial, and healthcare use cases where documents cannot be sent to third-party APIs. |
| 29 | +- When you need a **repeatable, auditable process**, every tool call, input, and output is logged, providing a clear trail of how each risk was identified. |
| 30 | + |
| 31 | +### Key Capabilities |
| 32 | + |
| 33 | +- **Automated contract parsing** — extracts text and structured metadata (parties, dates, governing law, financial terms) from PDF contracts |
| 34 | +- **Clause extraction by category** — identifies and categorizes termination, change of control, assignment, liability, penalty, exclusivity, non-compete, confidentiality, data protection, financial, and IP clauses |
| 35 | +- **Market standard benchmarking** — compares extracted terms against configurable industry baselines and flags deviations as aggressive, favorable, missing, or unusual |
| 36 | +- **Cross-contract conflict detection** — analyzes clauses across all contracts together to find contradictions, overlapping obligations, and terms that could block a transaction |
| 37 | +- **Structured risk register** — produces a prioritized risk matrix with severity ratings, financial exposure estimates, and actionable recommendations |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## How to Use This Asset |
| 42 | + |
| 43 | +### For a Quick Demo |
| 44 | + |
| 45 | +1. Clone the repo and install dependencies |
| 46 | +2. Set up your OCI Generative AI project and API key (see [Setup](#setup)) |
| 47 | +3. Run `streamlit run app.py` |
| 48 | +4. Click "Run Due Diligence Analysis" with the included sample contracts |
| 49 | +5. Watch the agent autonomously chain through 5 analysis stages and produce a risk register |
| 50 | + |
| 51 | +### For a PoC |
| 52 | + |
| 53 | +The agent is designed to be modular. Each tool is an independent function that you can replace with real integrations: |
| 54 | + |
| 55 | +- **`tool_parse_contract`** — currently uses PyMuPDF for PDF text extraction. For production, you can swap this with **OCI Document Understanding** for structured extraction, or use a **multimodal model** such as **Cohere Command A Vision** (`cohere.command-a-03-2025`) to handle scanned PDFs, handwritten documents, faxed contracts, and image-based files that standard text extraction cannot process. The multimodal approach sends the document page as an image input to the model, which can read and structure content regardless of whether it was digitally created or physically scanned. |
| 56 | +- **`tool_extract_clauses`** — uses an LLM call with a detailed extraction prompt. For scanned or handwritten documents, the same multimodal vision approach applies here: pass document page images directly to a vision-capable model to extract clauses that OCR-based pipelines would miss or misread. |
| 57 | +- **`tool_compare_to_market`** — currently uses hardcoded market baselines combined with LLM judgment. For production, replace the hardcoded baselines with a **web search tool** (the Responses API supports `web_search_preview` as a built-in tool) to fetch current market data, or implement a **RAG pipeline** over a corpus of template contracts and legal benchmarking databases. This makes the comparison dynamic and always up to date rather than static. |
| 58 | +- **`tool_cross_reference_conflicts`** — sends all clause data to the model for conflict detection. For larger document sets, consider chunking and parallel processing. |
| 59 | +- **`tool_generate_risk_register`** — synthesizes all findings into the final output. Customize the risk categories, severity thresholds, and output format to match your customer's internal risk framework. |
| 60 | + |
| 61 | +### Customizing the Agent Behavior |
| 62 | + |
| 63 | +The agent's decision-making is controlled by two things: |
| 64 | + |
| 65 | +1. **`AGENT_INSTRUCTIONS`** (in `app.py`) — the system prompt that tells the model what workflow to follow and in what order to call tools. |
| 66 | +2. **Individual tool `system_prompt` strings** inside each `tool_*` function — these control what each tool extracts, how it analyzes, and what format it returns. |
| 67 | + |
| 68 | +Modify these prompts to adapt the agent to different industries or use cases without changing any code logic. |
| 69 | + |
| 70 | +--- |
| 71 | + |
| 72 | +## Architecture |
| 73 | + |
| 74 | +``` |
| 75 | +┌─────────────────────────────────────────────────────┐ |
| 76 | +│ Streamlit UI │ |
| 77 | +│ Oracle Redwood Dark Theme │ |
| 78 | +├─────────────────────────────────────────────────────┤ |
| 79 | +│ │ |
| 80 | +│ ┌─────────────────────────────────────────────┐ │ |
| 81 | +│ │ OCI Responses API (Outer Agent) │ │ |
| 82 | +│ │ Model: xai.grok-4.20-reasoning │ │ |
| 83 | +│ │ | │ |
| 84 | +│ └──────────────┬──────────────────────────────┘ │ |
| 85 | +│ │ │ |
| 86 | +│ ┌────────────┼────────────┐ │ |
| 87 | +│ ▼ ▼ ▼ │ |
| 88 | +│ ┌──────┐ ┌──────────┐ ┌──────────┐ │ |
| 89 | +│ │Parse │ │ Extract │ │ Compare │ │ |
| 90 | +│ │Contract││ Clauses │ │to Market │ │ |
| 91 | +│ │ │ │ │ │ │ │ |
| 92 | +│ │PyMuPDF│ │Inner LLM │ │Baselines │ │ |
| 93 | +│ │+ LLM │ │ Call │ │+ LLM │ │ |
| 94 | +│ └──────┘ └──────────┘ └──────────┘ │ |
| 95 | +│ │ │ │ │ |
| 96 | +│ ▼ ▼ ▼ │ |
| 97 | +│ ┌──────────────────────────────────┐ │ |
| 98 | +│ │ Cross-Reference Conflicts │ │ |
| 99 | +│ │ (Inner LLM Call) │ │ |
| 100 | +│ └──────────────┬───────────────────┘ │ |
| 101 | +│ ▼ │ |
| 102 | +│ ┌──────────────────────────────────┐ │ |
| 103 | +│ │ Generate Risk Register │ │ |
| 104 | +│ │ (Inner LLM Call) │ │ |
| 105 | +│ └──────────────────────────────────┘ │ |
| 106 | +│ │ |
| 107 | +└─────────────────────────────────────────────────────┘ |
| 108 | +``` |
| 109 | + |
| 110 | +--- |
| 111 | + |
| 112 | +## File Structure |
| 113 | + |
| 114 | +``` |
| 115 | +legal-due-diligence-agent/ |
| 116 | +│ |
| 117 | +├── app.py # Main Streamlit application |
| 118 | +│ |
| 119 | +├── requirements.txt # Python dependencies |
| 120 | +│ |
| 121 | +├── .env.example # Environment variable template |
| 122 | +│ |
| 123 | +├── LICENSE |
| 124 | +│ |
| 125 | +└── README.md # This file |
| 126 | +``` |
| 127 | + |
| 128 | +--- |
| 129 | + |
| 130 | +## Setup |
| 131 | + |
| 132 | +### Prerequisites |
| 133 | + |
| 134 | +- Python 3.10 or higher |
| 135 | +- An OCI tenancy with access to OCI Generative AI |
| 136 | +- An OCI Generative AI **Project** (required for the Responses API) |
| 137 | +- An OCI Generative AI **API Key** (generates a Bearer token for OpenAI SDK compatibility) |
| 138 | + |
| 139 | +### Step 1: Create an OCI Generative AI Project |
| 140 | + |
| 141 | +1. Log in to the [OCI Console](https://cloud.oracle.com) |
| 142 | +2. Navigate to **Analytics & AI → Generative AI** |
| 143 | +3. Click **Projects** in the left sidebar |
| 144 | +4. Click **Create Project** |
| 145 | +5. Give it a name (e.g., "Legal Due Diligence Agent") and select your compartment |
| 146 | +6. Note the **Project OCID** — you will need this for `CHICAGO_PROJECT_OCID` |
| 147 | + |
| 148 | +### Step 2: Create an OCI Generative AI API Key |
| 149 | + |
| 150 | +1. In the Generative AI console, navigate to **API Keys** |
| 151 | +2. Click **Create API Key** |
| 152 | +3. Name your key and set an expiration |
| 153 | +4. Copy the generated key value — this is your `OPENAI_API_KEY_CHICAGO` |
| 154 | + |
| 155 | +### Step 3: Install and Run |
| 156 | + |
| 157 | +```bash |
| 158 | +# Clone the repository |
| 159 | +git clone <repo-url> |
| 160 | +cd legal-due-diligence-agent |
| 161 | + |
| 162 | +# Install dependencies |
| 163 | +pip install -r requirements.txt |
| 164 | + |
| 165 | +# Configure environment variables |
| 166 | +cp .env.example .env |
| 167 | +# Edit .env and fill in your API key and project OCID: |
| 168 | +# OPENAI_API_KEY_CHICAGO=sk-your-api-key-here |
| 169 | +# CHICAGO_PROJECT_OCID=ocid1.generativeaiproject.oc1.us-chicago-1.your-project-ocid |
| 170 | + |
| 171 | +# Run the app |
| 172 | +streamlit run app.py |
| 173 | +``` |
| 174 | + |
| 175 | +The app will open in your browser at `http://localhost:8501`. |
| 176 | + |
| 177 | +### Changing the Model |
| 178 | + |
| 179 | +The default model is `xai.grok-4.20-reasoning`. To use a different model, update the `MODEL` constant near the top of `app.py`: |
| 180 | + |
| 181 | +```python |
| 182 | +MODEL = "openai.gpt-oss-120b" # OpenAI gpt-oss on OCI |
| 183 | +MODEL = "xai.grok-4-1-fast-reasoning" # Grok 4.1 fast |
| 184 | +MODEL = "google.gemini-2.5-pro" # Google Gemini 2.5 Pro |
| 185 | +``` |
| 186 | + |
| 187 | +--- |
| 188 | + |
| 189 | +## Useful Links |
| 190 | + |
| 191 | +- [OCI Generative AI Documentation](https://docs.oracle.com/en-us/iaas/Content/generative-ai/home.htm) |
| 192 | +- [QuickStart Guide for Building Agents](https://docs.oracle.com/en-us/iaas/Content/generative-ai/get-started-agents.htm) |
| 193 | +- [OCI Responses API (oci-openai)](https://docs.oracle.com/en-us/iaas/Content/generative-ai/oci-openai.htm) |
| 194 | +- [OCI Generative AI API Keys](https://docs.oracle.com/en-us/iaas/Content/generative-ai/api-keys.htm) |
| 195 | + |
| 196 | +--- |
| 197 | + |
| 198 | +## License |
| 199 | + |
| 200 | +Copyright (c) 2026 Oracle and/or its affiliates. Licensed under the Universal Permissive License (UPL), Version 1.0. |
| 201 | + |
| 202 | +See [LICENSE](./LICENSE) for more details. |
0 commit comments