diff --git a/README.md b/README.md index 3f0030d36..30995e522 100644 --- a/README.md +++ b/README.md @@ -78,6 +78,15 @@ RubyLLM.embed "Ruby is elegant and expressive" RubyLLM.transcribe "meeting.wav" ``` +```ruby +# Upload a file to a provider and fetch it later +file = RubyLLM.upload_file "batch.jsonl", provider: :openai, purpose: "batch" +puts file.id + +content = RubyLLM.download_file file.id, provider: :openai +puts content.bytesize +``` + ```ruby # Moderate content for safety RubyLLM.moderate "Check if this text is safe" @@ -127,6 +136,7 @@ response = chat.with_schema(ProductSchema).ask "Analyze this product", with: "pr * **Vision:** Analyze images and videos * **Audio:** Transcribe and understand speech with `RubyLLM.transcribe` * **Documents:** Extract from PDFs, CSVs, JSON, any file type +* **Provider files:** Upload, inspect, and download provider-managed files * **Image generation:** Create images with `RubyLLM.paint` * **Embeddings:** Generate embeddings with `RubyLLM.embed` * **Moderation:** Content safety with `RubyLLM.moderate` diff --git a/docs/_core_features/chat.md b/docs/_core_features/chat.md index c7b71ee4b..67e33a042 100644 --- a/docs/_core_features/chat.md +++ b/docs/_core_features/chat.md @@ -246,6 +246,9 @@ puts response.content RubyLLM automatically detects file types based on extensions and content, so you can pass files directly without specifying the type: +> `with:` sends files as part of the current chat request. If you need to upload a file to a provider's file API, store it for later use, or download it later by file ID, see the [Files Guide]({% link _core_features/files.md %}). +{: .note } + ```ruby chat = RubyLLM.chat(model: '{{ site.models.anthropic_current }}') diff --git a/docs/_core_features/files.md b/docs/_core_features/files.md new file mode 100644 index 000000000..49ed7d89d --- /dev/null +++ b/docs/_core_features/files.md @@ -0,0 +1,164 @@ +--- +layout: default +title: Files +nav_order: 7 +description: Upload, inspect, and download provider-managed files with a consistent Ruby API +--- + +# {{ page.title }} +{: .d-inline-block .no_toc } + +v1.16.0+ +{: .label .label-green } + +{{ page.description }} +{: .fs-6 .fw-300 } + +## Table of contents +{: .no_toc .text-delta } + +1. TOC +{:toc} + +--- + +After reading this guide, you will know: + +* How to upload files to a provider with `RubyLLM.upload_file`. +* How to inspect uploaded file metadata with `RubyLLM::ProviderFile.file_info`. +* How to download file contents as bytes or stream them to a destination. +* How to use contexts for tenant-specific file operations. +* How provider-managed files differ from chat attachments. + +## Provider-Managed Files vs Chat Attachments + +RubyLLM supports two different ways of working with files: + +- Chat attachments use `with:` on `chat.ask` to send files as part of a prompt. +- Provider-managed files use `RubyLLM.upload_file` to create a file resource stored by the provider. + +Use chat attachments when you want the model to analyze a file as part of a conversation. Use provider-managed files when the provider's API expects an uploaded file ID, or when you need to look up metadata or download the file later. + +## Uploading a File + +Upload a file with the global `RubyLLM.upload_file` helper. You must specify a `provider:` explicitly. + +```ruby +uploaded = RubyLLM.upload_file( + "spec/fixtures/openai_batch.jsonl", + provider: :openai, + purpose: "batch" +) + +puts uploaded.id +puts uploaded.filename +puts uploaded.byte_size +puts uploaded.created_at +``` + +The return value is a `RubyLLM::ProviderFile` object containing provider metadata: + +- `id` +- `filename` +- `byte_size` +- `created_at` + +### OpenAI Upload Options + +OpenAI file uploads currently support: + +- `purpose:` — required by the OpenAI Files API +- `expires_after:` — optional expiration settings accepted by the API + +```ruby +uploaded = RubyLLM.upload_file( + "batch.jsonl", + provider: :openai, + purpose: "batch", + expires_after: { anchor: "created_at", seconds: 86_400 } +) +``` + +> `purpose:` values and expiration settings are provider-specific. RubyLLM forwards them to the provider API. +{: .note } + +## Looking Up File Metadata + +Once a file is uploaded, you can retrieve its metadata later: + +```ruby +uploaded = RubyLLM.upload_file("batch.jsonl", provider: :openai, purpose: "batch") + +file_info = RubyLLM::ProviderFile.file_info(uploaded.id, provider: :openai) + +puts file_info.id +puts file_info.filename +puts file_info.byte_size +puts file_info.created_at +``` + +## Downloading File Contents + +Download file content with `RubyLLM.download_file`. By default, RubyLLM returns the raw response body. + +```ruby +content = RubyLLM.download_file("file_123", provider: :openai) +File.binwrite("tmp/downloaded.jsonl", content) +``` + +### Downloading to a Path + +Write the file directly to disk: + +```ruby +saved_path = RubyLLM.download_file( + "file_123", + provider: :openai, + path: "tmp/downloaded.jsonl" +) + +puts saved_path +``` + +### Downloading to an IO Object + +Stream the content into an existing IO object: + +```ruby +File.open("tmp/downloaded.jsonl", "wb") do |io| + RubyLLM.download_file("file_123", provider: :openai, io: io) +end +``` + +### Downloading to a Tempfile + +Ask RubyLLM to manage a temporary file for you: + +```ruby +file = RubyLLM.download_file("file_123", provider: :openai, tempfile: true) +puts file.path +puts file.read +file.close! +``` + +If you use block form, RubyLLM yields the tempfile and cleans it up afterward: + +```ruby +RubyLLM.download_file("file_123", provider: :openai, tempfile: true) do |file| + puts file.path + puts file.read +end +``` + +## Provider Support + +Provider-managed file uploads are currently implemented for OpenAI. + +Support for other providers may be added over time. + +## Notes and Limitations + +- `provider:` is required for provider-managed file operations. +- Only one of `io:`, `path:`, or `tempfile: true` can be used for a download. +- Block form is supported with `io:` and `tempfile: true`, but not with `path:`. +- File retention, supported purposes, and size limits are determined by the provider API. diff --git a/lib/ruby_llm.rb b/lib/ruby_llm.rb index 2ff192d65..311fa526b 100644 --- a/lib/ruby_llm.rb +++ b/lib/ruby_llm.rb @@ -68,6 +68,14 @@ def transcribe(...) Transcription.transcribe(...) end + def upload_file(...) + ProviderFile.upload(...) + end + + def download_file(...) + ProviderFile.download(...) + end + def models Models.instance end diff --git a/lib/ruby_llm/connection.rb b/lib/ruby_llm/connection.rb index 44020af2f..0d71dab6b 100644 --- a/lib/ruby_llm/connection.rb +++ b/lib/ruby_llm/connection.rb @@ -46,6 +46,13 @@ def get(url, &) end end + def raw_get(url, &) + raw_connection.get url do |req| + req.headers.merge! @provider.headers if @provider.respond_to?(:headers) + yield req if block_given? + end + end + def instance_variables super - %i[@config @connection] end @@ -99,6 +106,17 @@ def setup_http_proxy(faraday) faraday.proxy = @config.http_proxy end + def raw_connection + @raw_connection ||= Faraday.new(@provider.api_base) do |faraday| + setup_timeout(faraday) + setup_logging(faraday) + setup_retry(faraday) + faraday.adapter :net_http + faraday.use :llm_errors, provider: @provider + setup_http_proxy(faraday) + end + end + def retry_exceptions [ Errno::ETIMEDOUT, diff --git a/lib/ruby_llm/downloads.rb b/lib/ruby_llm/downloads.rb new file mode 100644 index 000000000..2b6c4a62f --- /dev/null +++ b/lib/ruby_llm/downloads.rb @@ -0,0 +1,82 @@ +# frozen_string_literal: true + +require 'tempfile' + +module RubyLLM + # Handles file downloads for providers. + module Downloads + def download_file(file_id, io: nil, path: nil, tempfile: false, &) + targets = [io, path, (tempfile ? true : nil)].compact + raise ArgumentError, 'Specify only one of io:, path:, or tempfile: true' if targets.size > 1 + raise ArgumentError, 'Block form is only supported with io: or tempfile: true' if block_given? && path + + destination, return_value, close_after, cleanup_after_block = build_download_destination(io:, path:, tempfile:) + + if destination + stream_download_to(download_file_url(file_id), destination) + destination.flush if destination.respond_to?(:flush) + destination.rewind if destination.respond_to?(:rewind) + + return finalize_download_result(destination, return_value, close_after:, cleanup_after_block:, &) + end + + response = @connection.raw_get(download_file_url(file_id)) do |req| + req.headers['Accept'] = 'application/octet-stream' + end + response.body + end + + private + + def build_download_destination(io:, path:, tempfile:) + return [io, io, false, false] if io + + return [::File.open(path, 'wb'), path, true, false] if path + + if tempfile + file = Tempfile.new('ruby_llm-download') + file.binmode + return [file, file, false, true] + end + + [nil, nil, false, false] + end + + def finalize_download_result(destination, return_value, close_after:, cleanup_after_block:) + return return_value unless block_given? + + yield return_value + ensure + destination.close! if cleanup_after_block && destination.respond_to?(:close!) + destination.close if close_after && destination.respond_to?(:close) + end + + def stream_download_to(url, destination) + destination.binmode if destination.respond_to?(:binmode) + + @connection.raw_get(url) do |req| + req.headers['Accept'] = 'application/octet-stream' + + if Faraday::VERSION.start_with?('1') + req.options[:on_data] = proc do |chunk, _overall_received_bytes| + destination.write(chunk) + end + else + req.options.on_data = proc do |chunk, _overall_received_bytes, env| + if env&.status == 200 + destination.write(chunk) + else + raise_download_error(chunk, env) + end + end + end + end + end + + def raise_download_error(chunk, env) + error_body = try_parse_json(chunk) + error_response = env.merge(body: error_body) + ErrorMiddleware.parse_error(provider: self, response: error_response) + end + end +end diff --git a/lib/ruby_llm/provider.rb b/lib/ruby_llm/provider.rb index 2b588c547..b944ef16b 100644 --- a/lib/ruby_llm/provider.rb +++ b/lib/ruby_llm/provider.rb @@ -4,6 +4,7 @@ module RubyLLM # Base class for LLM providers. class Provider include Streaming + include Downloads attr_reader :config, :connection @@ -95,6 +96,17 @@ def transcribe(audio_file, model:, language:, **options) parse_transcription_response(response, model:) end + def upload_file(file, **options) + payload = render_file_payload(file, **options) + response = @connection.post files_url, payload + parse_file_response(response) + end + + def file_info(file_id) + response = @connection.get file_info_url(file_id) + parse_file_response(response) + end + def configured? configuration_requirements.all? { |req| @config.send(req) } end diff --git a/lib/ruby_llm/provider_file.rb b/lib/ruby_llm/provider_file.rb new file mode 100644 index 000000000..2c17aeb28 --- /dev/null +++ b/lib/ruby_llm/provider_file.rb @@ -0,0 +1,38 @@ +# frozen_string_literal: true + +module RubyLLM + # Represents a provider-managed file upload or file info. + class ProviderFile + attr_reader :id, :filename, :byte_size, :created_at + + def initialize(id:, filename:, byte_size:, created_at:) + @id = id + @filename = filename + @byte_size = byte_size + @created_at = created_at + end + + def self.upload(file, provider:, context: nil, **options) + provider_instance = resolve_provider(provider:, context:) + provider_instance.upload_file(file, **options) + end + + def self.file_info(file_id, provider:, context: nil) + provider_instance = resolve_provider(provider:, context:) + provider_instance.file_info(file_id) + end + + def self.download(file_id, provider:, context: nil, **options) + provider_instance = resolve_provider(provider:, context:) + provider_instance.download_file(file_id, **options) + end + + def self.resolve_provider(provider:, context:) + config = context&.config || RubyLLM.config + provider_class = provider ? Provider.providers[provider.to_sym] : nil + provider_class ||= raise(Error, "Unknown provider: #{provider.to_sym}") + + provider_class.new(config) + end + end +end diff --git a/lib/ruby_llm/providers/openai.rb b/lib/ruby_llm/providers/openai.rb index 4e36b2668..85b8e7631 100644 --- a/lib/ruby_llm/providers/openai.rb +++ b/lib/ruby_llm/providers/openai.rb @@ -13,6 +13,7 @@ class OpenAI < Provider include OpenAI::Images include OpenAI::Media include OpenAI::Transcription + include OpenAI::Files def api_base @config.openai_api_base || 'https://api.openai.com/v1' diff --git a/lib/ruby_llm/providers/openai/files.rb b/lib/ruby_llm/providers/openai/files.rb new file mode 100644 index 000000000..df07d9f7a --- /dev/null +++ b/lib/ruby_llm/providers/openai/files.rb @@ -0,0 +1,57 @@ +# frozen_string_literal: true + +module RubyLLM + module Providers + class OpenAI + # Files methods of the OpenAI API integration + module Files + def files_url + 'files' + end + + def file_info_url(file_id) + "#{files_url}/#{file_id}" + end + + def download_file_url(file_id) + "#{file_info_url(file_id)}/content" + end + + def render_file_payload(file, purpose:, expires_after: nil) + { + file: build_file(file), + purpose: purpose, + expires_after: expires_after + }.compact + end + + def parse_file_response(response) + data = response.body + + ProviderFile.new( + id: data['id'], + filename: data['filename'], + byte_size: data['bytes'], + created_at: Time.at(data['created_at']) + ) + end + + module_function + + def build_file(file) + attachment = file.is_a?(Attachment) ? file : Attachment.new(file) + + upload_source = if attachment.path? + attachment.source.to_s + elsif attachment.io_like? + attachment.source.tap { |io| io.rewind if io.respond_to?(:rewind) } + else + StringIO.new(attachment.content) + end + + Faraday::UploadIO.new(upload_source, attachment.mime_type, attachment.filename) + end + end + end + end +end diff --git a/spec/fixtures/openai_batch.jsonl b/spec/fixtures/openai_batch.jsonl new file mode 100644 index 000000000..2416cada3 --- /dev/null +++ b/spec/fixtures/openai_batch.jsonl @@ -0,0 +1 @@ +{"custom_id":"ruby_llm_file_test_1","method":"POST","url":"/v1/responses","body":{"model":"gpt-4.1-nano","input":"Say hello from the batch fixture."}} diff --git a/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_downloads_batch-purpose_file_content.yml b/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_downloads_batch-purpose_file_content.yml new file mode 100644 index 000000000..642509bfd --- /dev/null +++ b/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_downloads_batch-purpose_file_content.yml @@ -0,0 +1,148 @@ +--- +http_interactions: +- request: + method: post + uri: https://api.openai.com/v1/files + body: + encoding: UTF-8 + string: "-------------RubyMultipartPost-ce7b2adeeecaec6186985f38cb325ba2\r\nContent-Disposition: + form-data; name=\"file\"; filename=\"openai_batch.jsonl\"\r\nContent-Length: + 151\r\nContent-Type: application/octet-stream\r\nContent-Transfer-Encoding: + binary\r\n\r\n{\"custom_id\":\"ruby_llm_file_test_1\",\"method\":\"POST\",\"url\":\"/v1/responses\",\"body\":{\"model\":\"gpt-4.1-nano\",\"input\":\"Say + hello from the batch fixture.\"}}\n\r\n-------------RubyMultipartPost-ce7b2adeeecaec6186985f38cb325ba2\r\nContent-Disposition: + form-data; name=\"purpose\"\r\n\r\nbatch\r\n-------------RubyMultipartPost-ce7b2adeeecaec6186985f38cb325ba2--\r\n" + headers: + User-Agent: + - Faraday v2.14.1 + Authorization: + - Bearer + Content-Type: + - multipart/form-data; boundary=-----------RubyMultipartPost-ce7b2adeeecaec6186985f38cb325ba2 + Content-Length: + - '581' + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 200 + message: OK + headers: + Date: + - Thu, 07 May 2026 22:05:03 GMT + Content-Type: + - application/json + Transfer-Encoding: + - chunked + Connection: + - keep-alive + Server: + - cloudflare + X-Request-Id: + - "" + Openai-Processing-Ms: + - '452' + Openai-Version: + - '2020-10-01' + Openai-Organization: + - "" + Openai-Project: + - "" + Access-Control-Allow-Origin: + - "*" + X-Openai-Proxy-Wasm: + - v0.1 + Cf-Cache-Status: + - DYNAMIC + Set-Cookie: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + X-Content-Type-Options: + - nosniff + Cf-Ray: + - "" + Alt-Svc: + - h3=":443"; ma=86400 + body: + encoding: ASCII-8BIT + string: | + { + "object": "file", + "id": "file-Rt7X2q2xL96QwFzYDdJnyu", + "purpose": "batch", + "filename": "openai_batch.jsonl", + "bytes": 151, + "created_at": 1778191503, + "expires_at": 1780783503, + "status": "processed", + "status_details": null + } + recorded_at: Thu, 07 May 2026 22:05:03 GMT +- request: + method: get + uri: https://api.openai.com/v1/files/file-Rt7X2q2xL96QwFzYDdJnyu/content + body: + encoding: US-ASCII + string: '' + headers: + User-Agent: + - Faraday v2.14.1 + Authorization: + - Bearer + Accept: + - application/octet-stream + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + response: + status: + code: 200 + message: OK + headers: + Date: + - Thu, 07 May 2026 22:05:04 GMT + Content-Type: + - application/octet-stream + Content-Length: + - '151' + Connection: + - keep-alive + Server: + - cloudflare + Content-Disposition: + - attachment; filename="openai_batch.jsonl" + X-Request-Id: + - "" + Openai-Processing-Ms: + - '183' + Openai-Version: + - '2020-10-01' + Openai-Organization: + - "" + Openai-Project: + - "" + Access-Control-Allow-Origin: + - "*" + X-Openai-Proxy-Wasm: + - v0.1 + Cf-Cache-Status: + - DYNAMIC + Set-Cookie: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + X-Content-Type-Options: + - nosniff + Cf-Ray: + - "" + Alt-Svc: + - h3=":443"; ma=86400 + body: + encoding: UTF-8 + string: '{"custom_id":"ruby_llm_file_test_1","method":"POST","url":"/v1/responses","body":{"model":"gpt-4.1-nano","input":"Say + hello from the batch fixture."}} + + ' + recorded_at: Thu, 07 May 2026 22:05:04 GMT +recorded_with: VCR 6.4.0 diff --git a/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_looks_up_file_metadata_for_an_uploaded_file.yml b/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_looks_up_file_metadata_for_an_uploaded_file.yml new file mode 100644 index 000000000..e2c9403c4 --- /dev/null +++ b/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_looks_up_file_metadata_for_an_uploaded_file.yml @@ -0,0 +1,154 @@ +--- +http_interactions: +- request: + method: post + uri: https://api.openai.com/v1/files + body: + encoding: UTF-8 + string: "-------------RubyMultipartPost-b959362a8483878343b1f42fd7796046\r\nContent-Disposition: + form-data; name=\"file\"; filename=\"openai_batch.jsonl\"\r\nContent-Length: + 151\r\nContent-Type: application/octet-stream\r\nContent-Transfer-Encoding: + binary\r\n\r\n{\"custom_id\":\"ruby_llm_file_test_1\",\"method\":\"POST\",\"url\":\"/v1/responses\",\"body\":{\"model\":\"gpt-4.1-nano\",\"input\":\"Say + hello from the batch fixture.\"}}\n\r\n-------------RubyMultipartPost-b959362a8483878343b1f42fd7796046\r\nContent-Disposition: + form-data; name=\"purpose\"\r\n\r\nbatch\r\n-------------RubyMultipartPost-b959362a8483878343b1f42fd7796046--\r\n" + headers: + User-Agent: + - Faraday v2.14.1 + Authorization: + - Bearer + Content-Type: + - multipart/form-data; boundary=-----------RubyMultipartPost-b959362a8483878343b1f42fd7796046 + Content-Length: + - '581' + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 200 + message: OK + headers: + Date: + - Thu, 07 May 2026 22:05:02 GMT + Content-Type: + - application/json + Transfer-Encoding: + - chunked + Connection: + - keep-alive + Server: + - cloudflare + X-Request-Id: + - "" + Openai-Processing-Ms: + - '206' + Openai-Version: + - '2020-10-01' + Openai-Organization: + - "" + Openai-Project: + - "" + Access-Control-Allow-Origin: + - "*" + X-Openai-Proxy-Wasm: + - v0.1 + Cf-Cache-Status: + - DYNAMIC + Set-Cookie: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + X-Content-Type-Options: + - nosniff + Cf-Ray: + - "" + Alt-Svc: + - h3=":443"; ma=86400 + body: + encoding: ASCII-8BIT + string: | + { + "object": "file", + "id": "file-AvQRxkPVZ86eaVuwxEjpoJ", + "purpose": "batch", + "filename": "openai_batch.jsonl", + "bytes": 151, + "created_at": 1778191502, + "expires_at": 1780783502, + "status": "processed", + "status_details": null + } + recorded_at: Thu, 07 May 2026 22:05:02 GMT +- request: + method: get + uri: https://api.openai.com/v1/files/file-AvQRxkPVZ86eaVuwxEjpoJ + body: + encoding: US-ASCII + string: '' + headers: + User-Agent: + - Faraday v2.14.1 + Authorization: + - Bearer + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 200 + message: OK + headers: + Date: + - Thu, 07 May 2026 22:05:03 GMT + Content-Type: + - application/json + Transfer-Encoding: + - chunked + Connection: + - keep-alive + Server: + - cloudflare + X-Request-Id: + - "" + Openai-Processing-Ms: + - '43' + Openai-Version: + - '2020-10-01' + Openai-Organization: + - "" + Openai-Project: + - "" + Access-Control-Allow-Origin: + - "*" + X-Openai-Proxy-Wasm: + - v0.1 + Cf-Cache-Status: + - DYNAMIC + Set-Cookie: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + X-Content-Type-Options: + - nosniff + Cf-Ray: + - "" + Alt-Svc: + - h3=":443"; ma=86400 + body: + encoding: ASCII-8BIT + string: | + { + "object": "file", + "id": "file-AvQRxkPVZ86eaVuwxEjpoJ", + "purpose": "batch", + "filename": "openai_batch.jsonl", + "bytes": 151, + "created_at": 1778191502, + "expires_at": 1780783502, + "status": "processed", + "status_details": null + } + recorded_at: Thu, 07 May 2026 22:05:03 GMT +recorded_with: VCR 6.4.0 diff --git a/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_uploads_a_file_and_returns_file_metadata.yml b/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_uploads_a_file_and_returns_file_metadata.yml new file mode 100644 index 000000000..f14f2666c --- /dev/null +++ b/spec/fixtures/vcr_cassettes/providerfile_openai_file_api_workflow_uploads_a_file_and_returns_file_metadata.yml @@ -0,0 +1,83 @@ +--- +http_interactions: +- request: + method: post + uri: https://api.openai.com/v1/files + body: + encoding: UTF-8 + string: "-------------RubyMultipartPost-815a4b419a7aa83b2688e1502e832770\r\nContent-Disposition: + form-data; name=\"file\"; filename=\"openai_batch.jsonl\"\r\nContent-Length: + 151\r\nContent-Type: application/octet-stream\r\nContent-Transfer-Encoding: + binary\r\n\r\n{\"custom_id\":\"ruby_llm_file_test_1\",\"method\":\"POST\",\"url\":\"/v1/responses\",\"body\":{\"model\":\"gpt-4.1-nano\",\"input\":\"Say + hello from the batch fixture.\"}}\n\r\n-------------RubyMultipartPost-815a4b419a7aa83b2688e1502e832770\r\nContent-Disposition: + form-data; name=\"purpose\"\r\n\r\nbatch\r\n-------------RubyMultipartPost-815a4b419a7aa83b2688e1502e832770--\r\n" + headers: + User-Agent: + - Faraday v2.14.1 + Authorization: + - Bearer + Content-Type: + - multipart/form-data; boundary=-----------RubyMultipartPost-815a4b419a7aa83b2688e1502e832770 + Content-Length: + - '581' + Accept-Encoding: + - gzip;q=1.0,deflate;q=0.6,identity;q=0.3 + Accept: + - "*/*" + response: + status: + code: 200 + message: OK + headers: + Date: + - Thu, 07 May 2026 22:05:02 GMT + Content-Type: + - application/json + Transfer-Encoding: + - chunked + Connection: + - keep-alive + Server: + - cloudflare + X-Request-Id: + - "" + Openai-Processing-Ms: + - '430' + Openai-Version: + - '2020-10-01' + Openai-Organization: + - "" + Openai-Project: + - "" + Access-Control-Allow-Origin: + - "*" + X-Openai-Proxy-Wasm: + - v0.1 + Cf-Cache-Status: + - DYNAMIC + Set-Cookie: + - "" + Strict-Transport-Security: + - max-age=31536000; includeSubDomains; preload + X-Content-Type-Options: + - nosniff + Cf-Ray: + - "" + Alt-Svc: + - h3=":443"; ma=86400 + body: + encoding: ASCII-8BIT + string: | + { + "object": "file", + "id": "file-YDPjVfqkBB2gTPqLyXLfcW", + "purpose": "batch", + "filename": "openai_batch.jsonl", + "bytes": 151, + "created_at": 1778191502, + "expires_at": 1780783502, + "status": "processed", + "status_details": null + } + recorded_at: Thu, 07 May 2026 22:05:02 GMT +recorded_with: VCR 6.4.0 diff --git a/spec/ruby_llm/provider_file_spec.rb b/spec/ruby_llm/provider_file_spec.rb new file mode 100644 index 000000000..bea37ff7e --- /dev/null +++ b/spec/ruby_llm/provider_file_spec.rb @@ -0,0 +1,116 @@ +# frozen_string_literal: true + +require 'spec_helper' + +RSpec.describe RubyLLM::ProviderFile do + include_context 'with configured RubyLLM' + + let(:fixture_path) { File.expand_path('../fixtures/ruby.txt', __dir__) } + let(:batch_fixture_path) { File.expand_path('../fixtures/openai_batch.jsonl', __dir__) } + + describe '.upload' do + it 'delegates to the resolved provider' do + provider_instance = instance_double(RubyLLM::Providers::OpenAI) + + allow(described_class).to receive(:resolve_provider).with(provider: :openai, + context: nil).and_return(provider_instance) + allow(provider_instance).to receive(:upload_file).with('spec/fixtures/ruby.txt', + purpose: 'assistants').and_return(:uploaded_file) + + result = described_class.upload('spec/fixtures/ruby.txt', provider: :openai, purpose: 'assistants') + + expect(result).to eq(:uploaded_file) + end + end + + describe '.file_info' do + it 'delegates to the resolved provider' do + provider_instance = instance_double(RubyLLM::Providers::OpenAI) + + allow(described_class).to receive(:resolve_provider).with(provider: :openai, + context: nil).and_return(provider_instance) + allow(provider_instance).to receive(:file_info).with('file_123').and_return(:file_info) + + result = described_class.file_info('file_123', provider: :openai) + + expect(result).to eq(:file_info) + end + end + + describe '.download' do + it 'delegates to the resolved provider' do + provider_instance = instance_double(RubyLLM::Providers::OpenAI) + io = StringIO.new + + allow(described_class).to receive(:resolve_provider).with(provider: :openai, + context: nil).and_return(provider_instance) + allow(provider_instance).to receive(:download_file).with('file_123', io: io).and_return(io) + + result = described_class.download('file_123', provider: :openai, io: io) + + expect(result).to eq(io) + end + end + + describe '.resolve_provider' do + it 'resolves a provider from the explicit provider name' do + provider = described_class.resolve_provider(provider: :openai, context: nil) + + expect(provider).to be_a(RubyLLM::Providers::OpenAI) + expect(provider.config).to eq(RubyLLM.config) + end + + it 'uses the context config when provided' do + context = RubyLLM.context do |config| + config.openai_api_key = 'sk-context-key' + end + + provider = described_class.resolve_provider(provider: :openai, context: context) + + expect(provider).to be_a(RubyLLM::Providers::OpenAI) + expect(provider.config).to eq(context.config) + expect(provider.config.openai_api_key).to eq('sk-context-key') + end + + it 'raises an error for an unknown provider' do + expect do + described_class.resolve_provider(provider: :unknown, context: nil) + end.to raise_error(RubyLLM::Error, 'Unknown provider: unknown') + end + end + + describe 'OpenAI file API workflow' do + let(:provider) { :openai } + let(:purpose) { 'batch' } + + it 'uploads a file and returns file metadata' do + uploaded_file = RubyLLM.upload_file(batch_fixture_path, provider:, purpose:) + + expect(uploaded_file).to be_a(described_class) + expect(uploaded_file.id).to start_with('file-') + expect(uploaded_file.filename).to eq('openai_batch.jsonl') + expect(uploaded_file.byte_size).to eq(File.size(batch_fixture_path)) + expect(uploaded_file.created_at).to be_a(Time) + end + + it 'looks up file metadata for an uploaded file' do + uploaded_file = RubyLLM.upload_file(batch_fixture_path, provider:, purpose:) + + file_info = described_class.file_info(uploaded_file.id, provider:) + + expect(file_info).to be_a(described_class) + expect(file_info.id).to eq(uploaded_file.id) + expect(file_info.filename).to eq(uploaded_file.filename) + expect(file_info.byte_size).to eq(uploaded_file.byte_size) + expect(file_info.created_at).to eq(uploaded_file.created_at) + end + + it 'downloads batch-purpose file content' do + uploaded_file = RubyLLM.upload_file(batch_fixture_path, provider:, purpose:) + + downloaded_content = described_class.download(uploaded_file.id, provider:) + + expect(downloaded_content).to eq(File.binread(batch_fixture_path)) + end + end +end diff --git a/spec/ruby_llm/providers/open_ai/files_spec.rb b/spec/ruby_llm/providers/open_ai/files_spec.rb new file mode 100644 index 000000000..4fc2fedbe --- /dev/null +++ b/spec/ruby_llm/providers/open_ai/files_spec.rb @@ -0,0 +1,130 @@ +# frozen_string_literal: true + +require 'spec_helper' + +RSpec.describe RubyLLM::Providers::OpenAI::Files do + let(:helper_host) do + Class.new do + include RubyLLM::Providers::OpenAI::Files + end.new + end + + describe '.files_url' do + it 'returns the files endpoint path' do + expect(helper_host.files_url).to eq('files') + end + end + + describe '.file_info_url' do + it 'returns the file info endpoint path' do + expect(helper_host.file_info_url('file_123')).to eq('files/file_123') + end + end + + describe '.download_file_url' do + it 'returns the file content endpoint path' do + expect(helper_host.download_file_url('file_123')).to eq('files/file_123/content') + end + end + + describe '.render_file_payload' do + it 'renders file payload with purpose and expires_after' do + upload_io = instance_double(Faraday::UploadIO) + expires_after = { anchor: 'created_at', seconds: 3600 } + + allow(helper_host).to receive(:build_file).with('spec/fixtures/openai_batch.jsonl').and_return(upload_io) + + payload = helper_host.render_file_payload('spec/fixtures/openai_batch.jsonl', + purpose: 'batch', + expires_after: expires_after) + + expect(payload).to eq( + file: upload_io, + purpose: 'batch', + expires_after: expires_after + ) + end + + it 'omits expires_after when nil' do + upload_io = instance_double(Faraday::UploadIO) + + allow(helper_host).to receive(:build_file).with('spec/fixtures/openai_batch.jsonl').and_return(upload_io) + + payload = helper_host.render_file_payload('spec/fixtures/openai_batch.jsonl', purpose: 'batch') + + expect(payload).to eq( + file: upload_io, + purpose: 'batch' + ) + end + end + + describe '.parse_file_response' do + it 'parses OpenAI file metadata into a ProviderFile' do + response = instance_double( + Faraday::Response, + body: { + 'id' => 'file_123', + 'filename' => 'openai_batch.jsonl', + 'bytes' => 123, + 'created_at' => 1_700_000_000 + } + ) + + file = helper_host.parse_file_response(response) + + expect(file).to be_a(RubyLLM::ProviderFile) + expect(file.id).to eq('file_123') + expect(file.filename).to eq('openai_batch.jsonl') + expect(file.byte_size).to eq(123) + expect(file.created_at).to eq(Time.at(1_700_000_000)) + end + end + + describe '.build_file' do + let(:text_fixture_path) { File.expand_path('../../../fixtures/ruby.txt', __dir__) } + let(:batch_fixture_path) { File.expand_path('../../../fixtures/openai_batch.jsonl', __dir__) } + + it 'builds an upload from a file path' do + upload = described_class.build_file(batch_fixture_path) + + expect(upload).to be_a(Faraday::UploadIO) + expect(upload.content_type).to eq('application/octet-stream') + expect(upload.original_filename).to eq('openai_batch.jsonl') + expect(upload.local_path).to eq(batch_fixture_path) + end + + it 'builds an upload from an io-like object' do + io = StringIO.new("{\"hello\":\"world\"}\n") + attachment = RubyLLM::Attachment.new(io, filename: 'inline.jsonl') + + upload = described_class.build_file(attachment) + + expect(upload).to be_a(Faraday::UploadIO) + expect(upload.content_type).to eq('application/octet-stream') + expect(upload.original_filename).to eq('inline.jsonl') + end + + it 'rewinds io-like input before upload when possible' do + io = StringIO.new("{\"hello\":\"world\"}\n") + io.read + attachment = RubyLLM::Attachment.new(io, filename: 'inline.jsonl') + + described_class.build_file(attachment) + + expect(io.pos).to eq(0) + end + + it 'reuses an attachment input' do + attachment = RubyLLM::Attachment.new(text_fixture_path) + + allow(RubyLLM::Attachment).to receive(:new).and_call_original + + upload = described_class.build_file(attachment) + + expect(RubyLLM::Attachment).not_to have_received(:new) + expect(upload).to be_a(Faraday::UploadIO) + expect(upload.original_filename).to eq('ruby.txt') + end + end +end diff --git a/spec/support/vcr_configuration.rb b/spec/support/vcr_configuration.rb index 52772be4f..8516a75e7 100644 --- a/spec/support/vcr_configuration.rb +++ b/spec/support/vcr_configuration.rb @@ -80,6 +80,9 @@ config.filter_sensitive_data('') do |interaction| interaction.response.headers['Openai-Organization']&.first end + config.filter_sensitive_data('') do |interaction| + interaction.response.headers['Openai-Project']&.first + end config.filter_sensitive_data('') do |interaction| interaction.response.headers['Anthropic-Organization-Id']&.first end