🐋 Orca — Omniscient Research Companion & Assistant

"In the deepest waters of the ocean, the orca navigates with intelligence, curiosity, and precision — the same spirit that guides every research journey."

Orca is an AI-powered research assistant built for researchers, students, engineers, and lifelong learners. It combines intelligent multi-source search, LLM-driven analysis, and a flexible workflow system to transform raw information into structured, actionable knowledge.

Whether you are writing a thesis, reviewing literature, auditing a paper, or simply trying to understand a complex topic in plain English — Orca is your companion.

简体中文: README.zh.md

🌟 Why Orca?

Modern research is drowning in information. Papers, blogs, datasets, and discussions are scattered across dozens of platforms. Orca was born from a simple belief: research should be a conversation, not a scavenger hunt.

One query, multiple sources — arXiv, Semantic Scholar, alphaXiv, and web search in a single call
Intelligent synthesis — LLM-powered analysis that connects the dots across sources
Workflow-driven — 9 specialized research modes, from deep dives to ELI5 explanations
Agent-first architecture — LangGraph-powered agent that routes your intent to the right tool
Triple interface — Python CLI, TypeScript CLI, and a modern Next.js web UI

✨ Features

9 Research Workflows

Workflow	Description	Best For
Deep Research	Multi-round recursive search with synthesis	Thesis topics, comprehensive surveys
Literature Review	Structured academic paper survey	Review articles, state-of-the-art
Paper Audit	Methodology, code, and result audit	Quality assurance, reproducibility
Source Compare	Cross-database comparative analysis	Evaluating conflicting claims
Peer Review	Structured peer review simulation	Pre-submission self-assessment
Paper Writing	IMRaD draft generation with citations	First drafts, grant proposals
Replication	Step-by-step replication planning	Reproducing published experiments
ELI5	Plain-language explanation with analogies	Learning new concepts
Session Search	Search through past research sessions	Finding previous work

Multi-Source Intelligence

Orca queries up to 4 data sources simultaneously with automatic fallback:

arXiv — Preprint papers with full metadata
Semantic Scholar — Academic search with citation graphs
alphaXiv — Open research discussion platform (auto-fallback to arXiv)
Web Search — DuckDuckGo-powered web discovery

Agent Architecture

┌─────────────────────────────────────────────────────────────┐
│                     User Interface                           │
│   Python CLI │ TypeScript CLI │ Next.js Web UI             │
└────────────────────────┬────────────────────────────────────┘
                         │ HTTP / Direct
┌────────────────────────▼────────────────────────────────────┐
│                  FastAPI Gateway                             │
│  /health │ /api/research/run │ /api/research/workflows      │
│  /api/research/sessions │ /api/skills │ /api/config          │
└────────────────────────┬────────────────────────────────────┘
                         │
┌────────────────────────▼────────────────────────────────────┐
│              LangGraph Lead Agent                            │
│         ┌──────────────┐    ┌──────────────┐               │
│         │   Router     │───▶│ Research     │               │
│         │  (intent)    │    │ Agent        │               │
│         └──────────────┘    └──────┬───────┘               │
│              │                     │                        │
│              ▼                     ▼                        │
│         ┌──────────────┐    ┌──────────────┐               │
│         │  General     │    │ ResearchEngine│              │
│         │  Agent (LLM) │    │ 9 workflows   │              │
│         └──────────────┘    │ 4 sources     │              │
│                             │ LLM synthesis │              │
│                             └───────────────┘              │
└─────────────────────────────────────────────────────────────┘

Prompt-Driven Intelligence

12 carefully crafted prompt templates guide the LLM for each research mode:

deepresearch — Comprehensive analysis with evidence synthesis
lit — Structured literature survey with thematic grouping
audit — Methodology and reproducibility assessment
review — Scholarly peer review framework
draft — IMRaD academic writing structure
replicate — Step-by-step experiment replication
eli5 — Accessible explanations with analogies
compare — Cross-source comparative analysis
autoresearch — Auto-pilot research mode
watch — Topic monitoring and trend tracking
jobs — Research opportunity discovery
log — Session history analysis

🚀 Quick Start

Prerequisites

Python 3.12+
Node.js 18+ (for CLI and Web UI)
OpenAI API key (or compatible LLM provider)

Installation

# Clone the repository
git clone https://github.com/your-org/orca.git
cd orca

# Install Python dependencies
pip install -e ".[dev]"

# Configure
# Copy the example config and edit with your API keys
cp config.example.yaml config.yaml
# Edit config.yaml to add your OpenAI API key

Start the Gateway Server

# Start the FastAPI gateway
make run
# Or directly:
uvicorn orca.gateway.app:app --host 0.0.0.0 --port 8000

Use the Python CLI

# List available research workflows
orca workflows

# Run a deep research
orca research "Recent advances in transformer architecture"

# Chat with the AI agent
orca chat "Explain quantum computing to me"

# List available skills
orca list-skills

Use the TypeScript CLI

cd cli
npm install
npm run build

# Run research
npx orca research "Large language model alignment"

# Start interactive chat
npx orca chat

Start the Web UI

cd web
npm install
npm run dev
# Open http://localhost:3000

📋 Configuration

Create config.yaml from the example:

research:
  default_depth: 2          # Default search depth (1-5)
  max_depth: 5
  default_max_sources: 10   # Sources per query
  cache_enabled: true
  sources:
    arxiv:
      enabled: true
    semantic_scholar:
      enabled: true
    alphaXiv:
      enabled: true
      api_key: ${ALPHAXIV_API_KEY}
    web_search:
      enabled: true

gateway:
  host: 0.0.0.0
  port: 8000
  cors_origins: ["http://localhost:3000"]

llm:
  provider: openai
  model: gpt-4o
  temperature: 0.7
  max_tokens: 4096
  api_key: ${OPENAI_API_KEY}

Environment variables are automatically resolved (e.g., ${OPENAI_API_KEY}).

🐳 Docker Deployment

# Build
docker build -t orca:latest .

# Run
docker run -p 8000:8000 \
  -e OPENAI_API_KEY=$OPENAI_API_KEY \
  -v $(pwd)/config.yaml:/app/config.yaml \
  orca:latest

🧪 Testing

Orca ships with a comprehensive test suite covering unit, integration, and end-to-end scenarios.

Run All Tests

# Run the full test suite
make test
# Or:
python -m pytest tests/ -v

Test Coverage

Test Suite	Cases	Focus
`test_api.py`	8	Gateway endpoints (health, config, research, skills)
`test_cli.py`	6	Python CLI commands and help
`test_config.py`	5	YAML loading, env vars, defaults
`test_models.py`	9	Data model serialization roundtrips
`test_prompts.py`	6	Prompt template loading and caching
`test_llm_service.py`	5	LLM initialization, invocation, fallback
`test_research_engine.py`	7	Engine init, workflow execution, sessions
`test_skills.py`	7	Skill market and registry
`test_subagents.py`	9	Subagent properties and execution
`test_workflows.py`	13	All 9 workflow executions
`test_integration.py`	50	Full pipeline, adapters, agent graph, gateway
`test_e2e_stress.py`	48	User journeys, concurrency, resilience, performance

Total: 179 tests, all passing

Performance Benchmarks

Measured on a standard development machine:

Benchmark	Target	Actual
Engine initialization	< 2s	✅ ~0.3s
Single workflow execution (mock sources)	< 5s	✅ ~0.1s
Gateway API response	< 2s	✅ ~0.05s
20 concurrent burst queries	All succeed	✅ 100%
10 concurrent async sessions	All tracked	✅ 100%

Example: Testing a Research Workflow

import asyncio
from orca.research.engine import ResearchEngine

async def test_research():
    engine = ResearchEngine()
    await engine.initialize()

    # Run deep research
    result = await engine.execute(
        workflow="deep_research",
        query="transformer architecture improvements",
        depth=3,
        max_sources=15
    )

    print(f"Summary: {result.summary[:200]}...")
    print(f"Sources found: {len(result.sources)}")
    print(f"Confidence: {result.confidence_score}")

asyncio.run(test_research())

🛠️ Development

# Install dev dependencies
pip install -e ".[dev]"

# Run linter
make lint
# Or:
ruff check orca/ tests/

# Run tests with coverage
pytest tests/ -v --cov=orca

# Build Docker image
make docker-build

📂 Project Structure

orca/
├── orca/                    # Python core package
│   ├── agent/               # LangGraph agent (lead + research + general)
│   ├── research/            # Research engine
│   │   ├── workflows/       # 9 research workflows
│   │   ├── sources/         # 4 data source adapters
│   │   └── subagents/       # 4 specialized subagents
│   ├── gateway/             # FastAPI gateway + routers
│   ├── cli/                 # Python CLI (Click + Rich)
│   ├── llm/                 # LLM service (LangChain OpenAI)
│   ├── prompts/             # Prompt template loader
│   ├── models/              # Pydantic/dataclass models
│   ├── skills/              # Skill market + registry
│   └── config/              # Configuration system
├── cli/                     # TypeScript CLI
├── web/                     # Next.js Web UI
├── prompts/                 # 12 prompt templates (.md)
├── skills/public/           # ~40 skill definitions
├── tests/                   # 179 tests across 12 files
├── docs/                    # Architecture docs
├── Dockerfile
├── Makefile
└── pyproject.toml

🎯 Use Cases

For Students

Thesis research: Use deep_research to build a comprehensive bibliography and synthesis
Paper writing: Use paper_writing to generate IMRaD drafts with proper citations
Concept learning: Use eli5 to understand complex topics in plain language

For Researchers

Literature surveys: Use lit_review to map the state-of-the-art in your field
Peer review prep: Use peer_review to self-assess before submission
Replication planning: Use replication to plan experiment reproduction

For Engineers

Technology evaluation: Use source_compare to compare approaches across sources
Methodology audit: Use paper_audit to verify claims and reproducibility
Quick lookups: Use session_search to find previous research sessions

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

Orca is released under the MIT License.

🙏 Acknowledgments

Orca stands on the shoulders of giants. Built with:

LangGraph — Agent orchestration
LangChain — LLM integration
FastAPI — Web framework
arXiv API — Academic preprints
Semantic Scholar — Academic search

"The ocean is vast, but the orca never swims alone."

Happy researching! 🐋

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
cli		cli
docs		docs
orca		orca
prompts		prompts
skills		skills
tests		tests
web		web
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README.zh.md		README.zh.md
config.example.yaml		config.example.yaml
langgraph.json		langgraph.json
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🐋 Orca — Omniscient Research Companion & Assistant

🌟 Why Orca?

✨ Features

9 Research Workflows

Multi-Source Intelligence

Agent Architecture

Prompt-Driven Intelligence

🚀 Quick Start

Prerequisites

Installation

Start the Gateway Server

Use the Python CLI

Use the TypeScript CLI

Start the Web UI

📋 Configuration

🐳 Docker Deployment

🧪 Testing

Run All Tests

Test Coverage

Performance Benchmarks

Example: Testing a Research Workflow

🛠️ Development

📂 Project Structure

🎯 Use Cases

For Students

For Researchers

For Engineers

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages