Skip to content

deepelementlab/orca

Repository files navigation

Screenshot - 2026-04-01 20 09 39

🐋 Orca — Omniscient Research Companion & Assistant

"In the deepest waters of the ocean, the orca navigates with intelligence, curiosity, and precision — the same spirit that guides every research journey."

Orca is an AI-powered research assistant built for researchers, students, engineers, and lifelong learners. It combines intelligent multi-source search, LLM-driven analysis, and a flexible workflow system to transform raw information into structured, actionable knowledge.

Whether you are writing a thesis, reviewing literature, auditing a paper, or simply trying to understand a complex topic in plain English — Orca is your companion.

Tests Python License

简体中文: README.zh.md


🌟 Why Orca?

Modern research is drowning in information. Papers, blogs, datasets, and discussions are scattered across dozens of platforms. Orca was born from a simple belief: research should be a conversation, not a scavenger hunt.

  • One query, multiple sources — arXiv, Semantic Scholar, alphaXiv, and web search in a single call
  • Intelligent synthesis — LLM-powered analysis that connects the dots across sources
  • Workflow-driven — 9 specialized research modes, from deep dives to ELI5 explanations
  • Agent-first architecture — LangGraph-powered agent that routes your intent to the right tool
  • Triple interface — Python CLI, TypeScript CLI, and a modern Next.js web UI

✨ Features

9 Research Workflows

Workflow Description Best For
Deep Research Multi-round recursive search with synthesis Thesis topics, comprehensive surveys
Literature Review Structured academic paper survey Review articles, state-of-the-art
Paper Audit Methodology, code, and result audit Quality assurance, reproducibility
Source Compare Cross-database comparative analysis Evaluating conflicting claims
Peer Review Structured peer review simulation Pre-submission self-assessment
Paper Writing IMRaD draft generation with citations First drafts, grant proposals
Replication Step-by-step replication planning Reproducing published experiments
ELI5 Plain-language explanation with analogies Learning new concepts
Session Search Search through past research sessions Finding previous work

Multi-Source Intelligence

Orca queries up to 4 data sources simultaneously with automatic fallback:

  • arXiv — Preprint papers with full metadata
  • Semantic Scholar — Academic search with citation graphs
  • alphaXiv — Open research discussion platform (auto-fallback to arXiv)
  • Web Search — DuckDuckGo-powered web discovery

Agent Architecture

┌─────────────────────────────────────────────────────────────┐
│                     User Interface                           │
│   Python CLI │ TypeScript CLI │ Next.js Web UI             │
└────────────────────────┬────────────────────────────────────┘
                         │ HTTP / Direct
┌────────────────────────▼────────────────────────────────────┐
│                  FastAPI Gateway                             │
│  /health │ /api/research/run │ /api/research/workflows      │
│  /api/research/sessions │ /api/skills │ /api/config          │
└────────────────────────┬────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────┐
│              LangGraph Lead Agent                            │
│         ┌──────────────┐    ┌──────────────┐               │
│         │   Router     │───▶│ Research     │               │
│         │  (intent)    │    │ Agent        │               │
│         └──────────────┘    └──────┬───────┘               │
│              │                     │                        │
│              ▼                     ▼                        │
│         ┌──────────────┐    ┌──────────────┐               │
│         │  General     │    │ ResearchEngine│              │
│         │  Agent (LLM) │    │ 9 workflows   │              │
│         └──────────────┘    │ 4 sources     │              │
│                             │ LLM synthesis │              │
│                             └───────────────┘              │
└─────────────────────────────────────────────────────────────┘

Prompt-Driven Intelligence

12 carefully crafted prompt templates guide the LLM for each research mode:

  • deepresearch — Comprehensive analysis with evidence synthesis
  • lit — Structured literature survey with thematic grouping
  • audit — Methodology and reproducibility assessment
  • review — Scholarly peer review framework
  • draft — IMRaD academic writing structure
  • replicate — Step-by-step experiment replication
  • eli5 — Accessible explanations with analogies
  • compare — Cross-source comparative analysis
  • autoresearch — Auto-pilot research mode
  • watch — Topic monitoring and trend tracking
  • jobs — Research opportunity discovery
  • log — Session history analysis

🚀 Quick Start

Prerequisites

  • Python 3.12+
  • Node.js 18+ (for CLI and Web UI)
  • OpenAI API key (or compatible LLM provider)

Installation

# Clone the repository
git clone https://github.com/your-org/orca.git
cd orca

# Install Python dependencies
pip install -e ".[dev]"

# Configure
# Copy the example config and edit with your API keys
cp config.example.yaml config.yaml
# Edit config.yaml to add your OpenAI API key

Start the Gateway Server

# Start the FastAPI gateway
make run
# Or directly:
uvicorn orca.gateway.app:app --host 0.0.0.0 --port 8000

Use the Python CLI

# List available research workflows
orca workflows

# Run a deep research
orca research "Recent advances in transformer architecture"

# Chat with the AI agent
orca chat "Explain quantum computing to me"

# List available skills
orca list-skills

Use the TypeScript CLI

cd cli
npm install
npm run build

# Run research
npx orca research "Large language model alignment"

# Start interactive chat
npx orca chat

Start the Web UI

cd web
npm install
npm run dev
# Open http://localhost:3000

📋 Configuration

Create config.yaml from the example:

research:
  default_depth: 2          # Default search depth (1-5)
  max_depth: 5
  default_max_sources: 10   # Sources per query
  cache_enabled: true
  sources:
    arxiv:
      enabled: true
    semantic_scholar:
      enabled: true
    alphaXiv:
      enabled: true
      api_key: ${ALPHAXIV_API_KEY}
    web_search:
      enabled: true

gateway:
  host: 0.0.0.0
  port: 8000
  cors_origins: ["http://localhost:3000"]

llm:
  provider: openai
  model: gpt-4o
  temperature: 0.7
  max_tokens: 4096
  api_key: ${OPENAI_API_KEY}

Environment variables are automatically resolved (e.g., ${OPENAI_API_KEY}).


🐳 Docker Deployment

# Build
docker build -t orca:latest .

# Run
docker run -p 8000:8000 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -v $(pwd)/config.yaml:/app/config.yaml \
  orca:latest

🧪 Testing

Orca ships with a comprehensive test suite covering unit, integration, and end-to-end scenarios.

Run All Tests

# Run the full test suite
make test
# Or:
python -m pytest tests/ -v

Test Coverage

Test Suite Cases Focus
test_api.py 8 Gateway endpoints (health, config, research, skills)
test_cli.py 6 Python CLI commands and help
test_config.py 5 YAML loading, env vars, defaults
test_models.py 9 Data model serialization roundtrips
test_prompts.py 6 Prompt template loading and caching
test_llm_service.py 5 LLM initialization, invocation, fallback
test_research_engine.py 7 Engine init, workflow execution, sessions
test_skills.py 7 Skill market and registry
test_subagents.py 9 Subagent properties and execution
test_workflows.py 13 All 9 workflow executions
test_integration.py 50 Full pipeline, adapters, agent graph, gateway
test_e2e_stress.py 48 User journeys, concurrency, resilience, performance

Total: 179 tests, all passing

Performance Benchmarks

Measured on a standard development machine:

Benchmark Target Actual
Engine initialization < 2s ✅ ~0.3s
Single workflow execution (mock sources) < 5s ✅ ~0.1s
Gateway API response < 2s ✅ ~0.05s
20 concurrent burst queries All succeed ✅ 100%
10 concurrent async sessions All tracked ✅ 100%

Example: Testing a Research Workflow

import asyncio
from orca.research.engine import ResearchEngine

async def test_research():
    engine = ResearchEngine()
    await engine.initialize()

    # Run deep research
    result = await engine.execute(
        workflow="deep_research",
        query="transformer architecture improvements",
        depth=3,
        max_sources=15
    )

    print(f"Summary: {result.summary[:200]}...")
    print(f"Sources found: {len(result.sources)}")
    print(f"Confidence: {result.confidence_score}")

asyncio.run(test_research())

🛠️ Development

# Install dev dependencies
pip install -e ".[dev]"

# Run linter
make lint
# Or:
ruff check orca/ tests/

# Run tests with coverage
pytest tests/ -v --cov=orca

# Build Docker image
make docker-build

📂 Project Structure

orca/
├── orca/                    # Python core package
│   ├── agent/               # LangGraph agent (lead + research + general)
│   ├── research/            # Research engine
│   │   ├── workflows/       # 9 research workflows
│   │   ├── sources/         # 4 data source adapters
│   │   └── subagents/       # 4 specialized subagents
│   ├── gateway/             # FastAPI gateway + routers
│   ├── cli/                 # Python CLI (Click + Rich)
│   ├── llm/                 # LLM service (LangChain OpenAI)
│   ├── prompts/             # Prompt template loader
│   ├── models/              # Pydantic/dataclass models
│   ├── skills/              # Skill market + registry
│   └── config/              # Configuration system
├── cli/                     # TypeScript CLI
├── web/                     # Next.js Web UI
├── prompts/                 # 12 prompt templates (.md)
├── skills/public/           # ~40 skill definitions
├── tests/                   # 179 tests across 12 files
├── docs/                    # Architecture docs
├── Dockerfile
├── Makefile
└── pyproject.toml

🎯 Use Cases

For Students

  • Thesis research: Use deep_research to build a comprehensive bibliography and synthesis
  • Paper writing: Use paper_writing to generate IMRaD drafts with proper citations
  • Concept learning: Use eli5 to understand complex topics in plain language

For Researchers

  • Literature surveys: Use lit_review to map the state-of-the-art in your field
  • Peer review prep: Use peer_review to self-assess before submission
  • Replication planning: Use replication to plan experiment reproduction

For Engineers

  • Technology evaluation: Use source_compare to compare approaches across sources
  • Methodology audit: Use paper_audit to verify claims and reproducibility
  • Quick lookups: Use session_search to find previous research sessions

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

Orca is released under the MIT License.


🙏 Acknowledgments

Orca stands on the shoulders of giants. Built with:


"The ocean is vast, but the orca never swims alone."

Happy researching! 🐋

About

In the deepest waters of the ocean, the orca navigates with intelligence, curiosity, and precision — the same spirit that guides every research journey.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors