Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,15 @@ RubyLLM.embed "Ruby is elegant and expressive"
RubyLLM.transcribe "meeting.wav"
```

```ruby
# Upload a file to a provider and fetch it later
file = RubyLLM.upload_file "batch.jsonl", provider: :openai, purpose: "batch"
puts file.id

content = RubyLLM.download_file file.id, provider: :openai
puts content.bytesize
```

```ruby
# Moderate content for safety
RubyLLM.moderate "Check if this text is safe"
Expand Down Expand Up @@ -127,6 +136,7 @@ response = chat.with_schema(ProductSchema).ask "Analyze this product", with: "pr
* **Vision:** Analyze images and videos
* **Audio:** Transcribe and understand speech with `RubyLLM.transcribe`
* **Documents:** Extract from PDFs, CSVs, JSON, any file type
* **Provider files:** Upload, inspect, and download provider-managed files
* **Image generation:** Create images with `RubyLLM.paint`
* **Embeddings:** Generate embeddings with `RubyLLM.embed`
* **Moderation:** Content safety with `RubyLLM.moderate`
Expand Down
3 changes: 3 additions & 0 deletions docs/_core_features/chat.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,9 @@ puts response.content

RubyLLM automatically detects file types based on extensions and content, so you can pass files directly without specifying the type:

> `with:` sends files as part of the current chat request. If you need to upload a file to a provider's file API, store it for later use, or download it later by file ID, see the [Files Guide]({% link _core_features/files.md %}).
{: .note }

```ruby
chat = RubyLLM.chat(model: '{{ site.models.anthropic_current }}')

Expand Down
164 changes: 164 additions & 0 deletions docs/_core_features/files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
---
layout: default
title: Files
nav_order: 7
description: Upload, inspect, and download provider-managed files with a consistent Ruby API
---

# {{ page.title }}
{: .d-inline-block .no_toc }

v1.16.0+
{: .label .label-green }

{{ page.description }}
{: .fs-6 .fw-300 }

## Table of contents
{: .no_toc .text-delta }

1. TOC
{:toc}

---

After reading this guide, you will know:

* How to upload files to a provider with `RubyLLM.upload_file`.
* How to inspect uploaded file metadata with `RubyLLM::ProviderFile.file_info`.
* How to download file contents as bytes or stream them to a destination.
* How to use contexts for tenant-specific file operations.
* How provider-managed files differ from chat attachments.

## Provider-Managed Files vs Chat Attachments

RubyLLM supports two different ways of working with files:

- Chat attachments use `with:` on `chat.ask` to send files as part of a prompt.
- Provider-managed files use `RubyLLM.upload_file` to create a file resource stored by the provider.

Use chat attachments when you want the model to analyze a file as part of a conversation. Use provider-managed files when the provider's API expects an uploaded file ID, or when you need to look up metadata or download the file later.

## Uploading a File

Upload a file with the global `RubyLLM.upload_file` helper. You must specify a `provider:` explicitly.

```ruby
uploaded = RubyLLM.upload_file(
"spec/fixtures/openai_batch.jsonl",
provider: :openai,
purpose: "batch"
)

puts uploaded.id
puts uploaded.filename
puts uploaded.byte_size
puts uploaded.created_at
```

The return value is a `RubyLLM::ProviderFile` object containing provider metadata:

- `id`
- `filename`
- `byte_size`
- `created_at`

### OpenAI Upload Options

OpenAI file uploads currently support:

- `purpose:` — required by the OpenAI Files API
- `expires_after:` — optional expiration settings accepted by the API

```ruby
uploaded = RubyLLM.upload_file(
"batch.jsonl",
provider: :openai,
purpose: "batch",
expires_after: { anchor: "created_at", seconds: 86_400 }
)
```

> `purpose:` values and expiration settings are provider-specific. RubyLLM forwards them to the provider API.
{: .note }

## Looking Up File Metadata

Once a file is uploaded, you can retrieve its metadata later:

```ruby
uploaded = RubyLLM.upload_file("batch.jsonl", provider: :openai, purpose: "batch")

file_info = RubyLLM::ProviderFile.file_info(uploaded.id, provider: :openai)

puts file_info.id
puts file_info.filename
puts file_info.byte_size
puts file_info.created_at
```

## Downloading File Contents

Download file content with `RubyLLM.download_file`. By default, RubyLLM returns the raw response body.

```ruby
content = RubyLLM.download_file("file_123", provider: :openai)
File.binwrite("tmp/downloaded.jsonl", content)
```

### Downloading to a Path

Write the file directly to disk:

```ruby
saved_path = RubyLLM.download_file(
"file_123",
provider: :openai,
path: "tmp/downloaded.jsonl"
)

puts saved_path
```

### Downloading to an IO Object

Stream the content into an existing IO object:

```ruby
File.open("tmp/downloaded.jsonl", "wb") do |io|
RubyLLM.download_file("file_123", provider: :openai, io: io)
end
```

### Downloading to a Tempfile

Ask RubyLLM to manage a temporary file for you:

```ruby
file = RubyLLM.download_file("file_123", provider: :openai, tempfile: true)
puts file.path
puts file.read
file.close!
```

If you use block form, RubyLLM yields the tempfile and cleans it up afterward:

```ruby
RubyLLM.download_file("file_123", provider: :openai, tempfile: true) do |file|
puts file.path
puts file.read
end
```

## Provider Support

Provider-managed file uploads are currently implemented for OpenAI.

Support for other providers may be added over time.

## Notes and Limitations

- `provider:` is required for provider-managed file operations.
- Only one of `io:`, `path:`, or `tempfile: true` can be used for a download.
- Block form is supported with `io:` and `tempfile: true`, but not with `path:`.
- File retention, supported purposes, and size limits are determined by the provider API.
8 changes: 8 additions & 0 deletions lib/ruby_llm.rb
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,14 @@ def transcribe(...)
Transcription.transcribe(...)
end

def upload_file(...)
ProviderFile.upload(...)
end

def download_file(...)
ProviderFile.download(...)
end

def models
Models.instance
end
Expand Down
18 changes: 18 additions & 0 deletions lib/ruby_llm/connection.rb
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,13 @@ def get(url, &)
end
end

def raw_get(url, &)
raw_connection.get url do |req|
req.headers.merge! @provider.headers if @provider.respond_to?(:headers)
yield req if block_given?
end
end

def instance_variables
super - %i[@config @connection]
end
Expand Down Expand Up @@ -99,6 +106,17 @@ def setup_http_proxy(faraday)
faraday.proxy = @config.http_proxy
end

def raw_connection
@raw_connection ||= Faraday.new(@provider.api_base) do |faraday|
setup_timeout(faraday)
setup_logging(faraday)
setup_retry(faraday)
faraday.adapter :net_http
faraday.use :llm_errors, provider: @provider
setup_http_proxy(faraday)
end
end

def retry_exceptions
[
Errno::ETIMEDOUT,
Expand Down
82 changes: 82 additions & 0 deletions lib/ruby_llm/downloads.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# frozen_string_literal: true

require 'tempfile'

module RubyLLM
# Handles file downloads for providers.
module Downloads
def download_file(file_id, io: nil, path: nil, tempfile: false, &)
targets = [io, path, (tempfile ? true : nil)].compact
raise ArgumentError, 'Specify only one of io:, path:, or tempfile: true' if targets.size > 1
raise ArgumentError, 'Block form is only supported with io: or tempfile: true' if block_given? && path

destination, return_value, close_after, cleanup_after_block = build_download_destination(io:, path:, tempfile:)

if destination
stream_download_to(download_file_url(file_id), destination)
destination.flush if destination.respond_to?(:flush)
destination.rewind if destination.respond_to?(:rewind)

return finalize_download_result(destination, return_value, close_after:, cleanup_after_block:, &)
end

response = @connection.raw_get(download_file_url(file_id)) do |req|
req.headers['Accept'] = 'application/octet-stream'
end
response.body
end

private

def build_download_destination(io:, path:, tempfile:)
return [io, io, false, false] if io

return [::File.open(path, 'wb'), path, true, false] if path

if tempfile
file = Tempfile.new('ruby_llm-download')
file.binmode
return [file, file, false, true]
end

[nil, nil, false, false]
end

def finalize_download_result(destination, return_value, close_after:, cleanup_after_block:)
return return_value unless block_given?

yield return_value
ensure
destination.close! if cleanup_after_block && destination.respond_to?(:close!)
destination.close if close_after && destination.respond_to?(:close)
end

def stream_download_to(url, destination)
destination.binmode if destination.respond_to?(:binmode)

@connection.raw_get(url) do |req|
req.headers['Accept'] = 'application/octet-stream'

if Faraday::VERSION.start_with?('1')
req.options[:on_data] = proc do |chunk, _overall_received_bytes|
destination.write(chunk)
end
else
req.options.on_data = proc do |chunk, _overall_received_bytes, env|
if env&.status == 200
destination.write(chunk)
else
raise_download_error(chunk, env)
end
end
end
end
end

def raise_download_error(chunk, env)
error_body = try_parse_json(chunk)
error_response = env.merge(body: error_body)
ErrorMiddleware.parse_error(provider: self, response: error_response)
end
end
end
12 changes: 12 additions & 0 deletions lib/ruby_llm/provider.rb
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ module RubyLLM
# Base class for LLM providers.
class Provider
include Streaming
include Downloads

attr_reader :config, :connection

Expand Down Expand Up @@ -95,6 +96,17 @@ def transcribe(audio_file, model:, language:, **options)
parse_transcription_response(response, model:)
end

def upload_file(file, **options)
payload = render_file_payload(file, **options)
response = @connection.post files_url, payload
parse_file_response(response)
end

def file_info(file_id)
response = @connection.get file_info_url(file_id)
parse_file_response(response)
end

def configured?
configuration_requirements.all? { |req| @config.send(req) }
end
Expand Down
38 changes: 38 additions & 0 deletions lib/ruby_llm/provider_file.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# frozen_string_literal: true

module RubyLLM
# Represents a provider-managed file upload or file info.
class ProviderFile
attr_reader :id, :filename, :byte_size, :created_at

def initialize(id:, filename:, byte_size:, created_at:)
@id = id
@filename = filename
@byte_size = byte_size
@created_at = created_at
end

def self.upload(file, provider:, context: nil, **options)
provider_instance = resolve_provider(provider:, context:)
provider_instance.upload_file(file, **options)
end

def self.file_info(file_id, provider:, context: nil)
provider_instance = resolve_provider(provider:, context:)
provider_instance.file_info(file_id)
end

def self.download(file_id, provider:, context: nil, **options)
provider_instance = resolve_provider(provider:, context:)
provider_instance.download_file(file_id, **options)
end

def self.resolve_provider(provider:, context:)
config = context&.config || RubyLLM.config
provider_class = provider ? Provider.providers[provider.to_sym] : nil
provider_class ||= raise(Error, "Unknown provider: #{provider.to_sym}")

provider_class.new(config)
end
end
end
Loading