Skip to content

Latest commit

 

History

History
425 lines (316 loc) · 12.1 KB

File metadata and controls

425 lines (316 loc) · 12.1 KB

exa-ruby

Ruby client for the Exa.ai API. Search and analyze web content using neural search, question answering, code discovery, and research automation.

Table of Contents

Requirements

  • Ruby 3.0.0 or higher

Installing Ruby on macOS

If you're setting up on a fresh macOS laptop, the easiest way to get Ruby 3.x is through Homebrew:

1. Install Homebrew (if not already installed):

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

2. Install Ruby:

brew install ruby

3. Add Homebrew's Ruby to your PATH (follow the instructions Homebrew prints, usually adding to ~/.zshrc):

echo 'export PATH="/opt/homebrew/opt/ruby/bin:$PATH"' >> ~/.zshrc
source ~/.zshrc

4. Verify installation:

ruby -v  # Should show Ruby 3.x

Alternative: Using a version manager

For managing multiple Ruby versions, consider rbenv or asdf.

Installation

Add to your Gemfile:

gem 'exa-ai'

Then run:

bundle install

Or install directly:

gem install exa-ai

Configuration

Get your API key from dashboard.exa.ai.

Environment Variable (recommended)

export EXA_API_KEY="your-api-key-here"

Using .env file (local development)

Create a .env file in your project root:

# Copy the example file
cp .env.example .env

# Edit .env and add your API key
EXA_API_KEY=your-api-key-here

The gem automatically loads .env files in development when the dotenv gem is installed.

Ruby Code

require 'exa-ai'

Exa.configure do |config|
  config.api_key = "your-api-key-here"
end

# Or pass directly to client
client = Exa::Client.new(api_key: "your-api-key-here")

CLI Flag

exa-ai search "query" --api-key YOUR_API_KEY

Quick Start

Ruby API

require 'exa-ai'

Exa.configure do |config|
  config.api_key = ENV['EXA_API_KEY']
end

client = Exa::Client.new

# Search the web
results = client.search("Ruby programming language")
results.results.each { |item| puts "#{item['title']}: #{item['url']}" }

# Find similar content
similar = client.find_similar("https://arxiv.org/abs/2307.06435")
similar.results.each { |item| puts item['url'] }

# Get an answer to a question
answer = client.answer("What is machine learning?")
puts answer.answer

# Find code examples
code = client.context("React hooks")
puts code.response

# Get page contents
contents = client.get_contents(["https://ruby-lang.org"])
puts contents.results.first["text"]

Command Line

# Core Search Commands
exa-ai search "Ruby programming language"
exa-ai find-similar "https://arxiv.org/abs/2307.06435"
exa-ai answer "What is machine learning?"
exa-ai context "React hooks" --tokens-num 5000
exa-ai get-contents "https://ruby-lang.org"

# Research Commands
exa-ai research-start --instructions "What species of ant are similar to honeypot ants?"
exa-ai research-get RESEARCH_ID
exa-ai research-list

# Webset Management
exa-ai webset-create --search '{"query":"technology companies","count":1}'
exa-ai webset-list --limit 5
exa-ai webset-get WEBSET_ID
exa-ai webset-update WEBSET_ID --metadata '{"updated":"true","version":"2"}'
exa-ai webset-delete WEBSET_ID --force
exa-ai webset-cancel WEBSET_ID

# Webset Searches
exa-ai webset-search-create WEBSET_ID --query "Ford Mustang" --entity custom --entity-description "vintage cars"
exa-ai webset-search-create WEBSET_ID --query "tech CEOs" --entity person --count 20
exa-ai webset-search-create WEBSET_ID --query "Y Combinator startups" --entity company
exa-ai webset-search-get WEBSET_ID SEARCH_ID
exa-ai webset-search-cancel WEBSET_ID SEARCH_ID

# Webset Items
exa-ai webset-item-list WEBSET_ID
exa-ai webset-item-get WEBSET_ID ITEM_ID
exa-ai webset-item-delete WEBSET_ID ITEM_ID --force

# Enrichments
exa-ai enrichment-create WEBSET_ID --description "Find company email" --format text
exa-ai enrichment-create WEBSET_ID --description "Company size category" --format options --options '[{"label":"Small (1-10)"},{"label":"Medium (11-50)"},{"label":"Large (51+)"}]'
exa-ai enrichment-list WEBSET_ID
exa-ai enrichment-get WEBSET_ID ENRICHMENT_ID
exa-ai enrichment-update WEBSET_ID ENRICHMENT_ID --description "Updated description"
exa-ai enrichment-delete WEBSET_ID ENRICHMENT_ID --force
exa-ai enrichment-cancel WEBSET_ID ENRICHMENT_ID

# Webset Imports
exa-ai webset-import-create companies.csv --count 100 --title "My Companies" --format csv --entity-type company
exa-ai webset-import-create data.csv --count 50 --title "Tech Startups" --format csv --entity-type company --csv-identifier 0
exa-ai webset-import-create import.csv --count 100 --title "Import" --format csv --entity-type company --metadata '{"source":"crm"}' --quiet
exa-ai webset-import-list
exa-ai webset-import-get IMPORT_ID
exa-ai webset-import-update IMPORT_ID --title "Updated Title"
exa-ai webset-import-delete IMPORT_ID

Features

The gem provides complete access to Exa's API endpoints:

Core Search

  • Search — Neural and keyword search across billions of web pages
  • Find Similar — Discover content similar to a given URL
  • Answer — Generate comprehensive answers with source citations
  • Context — Find relevant code and documentation snippets
  • Get Contents — Extract full text content from web pages

Research

  • Research Tasks — Start and manage long-running research tasks with AI
  • Task Management — Get status updates and list all research tasks

Websets

  • Webset Management — Create, update, delete, and list datasets of web pages
  • Webset Searches — Run searches within websets and manage search tasks
  • Webset Items — List, retrieve, and manage individual items in websets
  • Enrichments — Create and manage AI-powered data enrichment tasks on websets
  • Imports — Upload CSV files to import external data into websets

Agent

  • Agent API — Asynchronous research agent with citations, streaming, and full run lifecycle

Agent

Exa's Agent API runs asynchronous research agents that search the web, synthesize findings, and return structured output with source citations. Runs are long-lived — create one, poll for completion, and stream events as it progresses.

Ruby API

require 'exa-ai'

client = Exa::Client.new(api_key: ENV['EXA_API_KEY'])

# Create an async agent run
run = client.agent_run_create(
  query: "AI infrastructure startups that raised Series A in 2025",
  effort: "high"
)
puts run.id      # => "run_abc123"
puts run.status  # => "queued"
puts run.queued? # => true

# Poll for the result
run = client.agent_run_get(run.id)
if run.completed?
  puts run.output[:text]
  puts run.output[:structured]  # already-parsed JSON — no string parsing needed
  puts run.output[:grounding]
end

# Create a run with structured output and a system prompt
run = client.agent_run_create(
  query: "AI infrastructure startups that raised Series A in 2025",
  system_prompt: "Return only companies headquartered in the US.",
  effort: "high",
  output_schema: {
    type: "object",
    properties: {
      companies: { type: "array", items: { type: "string" } }
    }
  }
)

# Attach premium data partners (Exa Connect) alongside web search.
# The agent queries each partner where it's strongest and blends the
# results into one grounded, structured answer.
run = client.agent_run_create(
  query: "Profile Anthropic: total funding and estimated monthly web traffic",
  data_sources: [{ provider: "fiber_ai" }, { provider: "similarweb" }],
  output_schema: {
    type: "object",
    properties: {
      name: { type: "string" },
      totalFunding: { type: "string" },  # from Fiber.ai
      monthlyVisits: { type: "number" }  # from Similarweb
    }
  }
)

# Stream events as they arrive
client.agent_run_stream(query: "AI infrastructure startups that raised Series A in 2025") do |event, data|
  puts "#{event}: #{data.inspect}"
  # event is the SSE event-type string, e.g. "agent_run.completed"
  # data is the parsed JSON hash
end

# List recent runs
runs = client.agent_run_list(limit: 5)
puts runs.has_more    # => true/false
puts runs.next_cursor # => pagination cursor
runs.data.each { |r| puts "#{r.id}: #{r.status}" }

# Fetch a run's events
events = client.agent_run_events(run.id)

# Cancel or delete a run
client.agent_run_cancel(run.id)
client.agent_run_delete(run.id)

Note: Multi-word keys inside input, data_sources items, and metadata are passed through verbatim — supply them in the exact shape the API expects, the same contract as output_schema.

Command Line

# Create a run and wait for it to finish
exa-ai agent-run-create --query "AI infrastructure startups that raised Series A in 2025" --wait --output-format pretty

# Attach premium data partners (Exa Connect) with a structured schema.
# Run `exa-ai agent-run-create --help` to see every provider and when to use it:
# fiber_ai, similarweb, baselayer, affiliate, particle_news, financial_datasets, jinko
exa-ai agent-run-create --wait \
  --query "Profile Anthropic: total funding and monthly web traffic" \
  --data-sources fiber_ai,similarweb \
  --output-schema '{"type":"object","properties":{"name":{"type":"string"},"totalFunding":{"type":"string"},"monthlyVisits":{"type":"number"}}}'

# Fetch an existing run
exa-ai agent-run-get <run_id>

# List recent runs
exa-ai agent-run-list --limit 5

# Cancel or delete a run
exa-ai agent-run-cancel <run_id>
exa-ai agent-run-delete <run_id>

# Fetch a run's events
exa-ai agent-run-events <run_id>

Error Handling

require 'exa-ai'

client = Exa::Client.new(api_key: "your-key")

begin
  results = client.search("test")
rescue Exa::Unauthorized => e
  puts "Invalid API key: #{e.message}"
rescue Exa::TooManyRequests => e
  puts "Rate limited, please retry"
rescue Exa::ServerError => e
  puts "Server error: #{e.message}"
end

Documentation

Development

See CONTRIBUTING.md for:

  • Running tests
  • Development setup
  • Code conventions
  • Building and releasing

Testing

Running Tests

# Run unit tests (integration tests skip by default)
bundle exec rake test

# Run integration tests (VCR-based, no real API calls)
RUN_INTEGRATION_TESTS=true bundle exec rake test

# Run CLI integration tests (real API calls, requires explicit opt-in)
RUN_CLI_INTEGRATION_TESTS=true bundle exec rake test

Integration Tests

Integration tests are skipped by default to prevent accidental API calls.

VCR-based integration tests (RUN_INTEGRATION_TESTS):

  • Use recorded HTTP interactions (VCR cassettes)
  • No real API calls when replaying cassettes
  • Set RUN_INTEGRATION_TESTS=true to run them
  • Safe to run during development

CLI integration tests (RUN_CLI_INTEGRATION_TESTS):

  • Make real API calls through shell commands
  • Consume Exa's concurrent search quota
  • Set RUN_CLI_INTEGRATION_TESTS=true AND EXA_API_KEY to run them
  • Warning: Can exhaust API quota and trigger rate limits lasting 1-2 days

When to run integration tests:

  • VCR tests: Anytime (safe, no real API calls)
  • CLI tests: Only before releases or when testing CLI-specific functionality

Test Coverage:

  • Unit tests - Fast, no API calls, always run
  • VCR integration tests - Replay cassettes, skipped by default
  • CLI integration tests - Real API calls via shell, skipped by default

Support

License

MIT License - See LICENSE file for details


Built with Exa.ai — The search and discovery API