Skip to content

feat(ai): add audio generation support#1467

Merged
hwbrzzl merged 4 commits into
masterfrom
bowen/#918-4
May 11, 2026
Merged

feat(ai): add audio generation support#1467
hwbrzzl merged 4 commits into
masterfrom
bowen/#918-4

Conversation

@hwbrzzl
Copy link
Copy Markdown
Contributor

@hwbrzzl hwbrzzl commented May 8, 2026

Summary

  • Add a fluent facades.AI().Audio(...) API for text-to-speech generation with provider, model, voice, instructions, timeout, and storage support.
  • Add OpenAI audio generation support, including default audio model resolution, placeholder voice mapping, and audio response MIME-aware storage behavior.
  • Simplify AI.Image(...) to use request-level configuration so image and audio requests follow the same fluent setup pattern.

Closes goravel/goravel#918

Why

Goravel's AI package already supports text, files, and image generation, but it did not have a first-class text-to-speech flow. This change adds an audio request/response pipeline that matches the existing fluent AI patterns, so applications can generate speech directly through the facade and store the result without needing provider-specific wiring in user code.

package controllers

import (
	"time"

	"github.com/goravel/framework/facades"
)

type AudioController struct{}

func (r *AudioController) Welcome() (string, error) {
	return facades.AI().
		Audio("Welcome to Goravel").
		Provider("openai").
		Model("gpt-4o-mini-tts").
		Female().
		Instructions("Speak clearly and warmly").
		Timeout(30 * time.Second).
		StoreAs("audio/welcome.mp3")
}

This also keeps the public surface more consistent by moving image selection to the fluent request instead of mixing constructor options with request methods. That avoids duplicate ways to set provider and model, while the OpenAI implementation supplies sensible defaults for the initial runtime slice.

Copilot AI review requested due to automatic review settings May 8, 2026 03:42
@hwbrzzl hwbrzzl requested a review from a team as a code owner May 8, 2026 03:42
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 77.43363% with 51 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.26%. Comparing base (950dc14) to head (dda0b4b).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
ai/openai/provider.go 69.69% 16 Missing and 4 partials ⚠️
ai/response.go 74.41% 10 Missing and 1 partial ⚠️
ai/audio_request.go 80.00% 9 Missing and 1 partial ⚠️
ai/media_storage.go 77.77% 4 Missing and 4 partials ⚠️
ai/application.go 85.71% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1467      +/-   ##
==========================================
+ Coverage   69.19%   69.26%   +0.07%     
==========================================
  Files         370      373       +3     
  Lines       29338    29523     +185     
==========================================
+ Hits        20300    20450     +150     
- Misses       8106     8135      +29     
- Partials      932      938       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class audio (text-to-speech) generation to the AI facade/API, extends the OpenAI provider to implement audio generation with sensible defaults, and aligns the image generation API to the same fluent request pattern.

Changes:

  • Introduces fluent AI.Audio(prompt) / AudioRequest / AudioResponse contracts plus framework implementations for generating and storing audio.
  • Adds OpenAI audio generation support (default audio model config, default voice mapping, response MIME-type handling).
  • Simplifies AI.Image(...) to remove constructor options and rely on request-level configuration (.Provider(...), .Model(...), etc.).

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
mocks/ai/AudioResponse.go Adds mock for the new contracts/ai.AudioResponse interface.
mocks/ai/AudioRequest.go Adds mock for the new contracts/ai.AudioRequest fluent request interface.
mocks/ai/AudioProvider.go Adds mock for the new contracts/ai.AudioProvider interface.
mocks/ai/AI.go Updates AI mock to include Audio(prompt) and changes Image(prompt) signature (removes options).
errors/list.go Adds centralized error definitions for audio flow (prompt, store path/name, provider capability).
contracts/ai/response.go Introduces AudioResponse contract (content, MIME type, storage, usage, callbacks).
contracts/ai/provider.go Adds AudioPrompt and AudioProvider contract for provider implementations.
contracts/ai/config.go Extends model config with Models.Audio.Default.
contracts/ai/audio.go Adds AudioRequest contract defining the fluent audio request API.
contracts/ai/ai.go Adds Audio(prompt) to the AI facade contract and updates Image(prompt) signature.
ai/response.go Implements audioResponse (content cloning, MIME-aware filename extension, storage helpers).
ai/openai/provider.go Implements OpenAI audio generation, default audio model resolution, voice mapping, and response parsing.
ai/openai/provider_test.go Updates provider config expectations to include the audio default model.
ai/image/image.go Updates image helper Of(prompt) to match the new AI.Image(prompt) signature.
ai/image/image_test.go Adjusts image helper tests to reflect the new Image(prompt) signature.
ai/image_request.go Removes image constructor options parsing; relies on fluent request configuration instead.
ai/audio_voice.go Defines internal constants for default male/female voice selectors.
ai/audio_storage.go Adds audio storage implementation (Store/StoreAs) similar to image storage behavior.
ai/audio_request.go Adds fluent audioRequest implementation (provider/model/voice/instructions/timeout + Store/Generate).
ai/application.go Adds Application.Audio(prompt) and provider dispatch for audio (audio(...)).
ai/application_test.go Updates image request test to use fluent request configuration (no constructor options).

Comment thread ai/audio_storage.go Outdated
Comment thread ai/openai/provider.go
Comment thread ai/application.go
Copilot AI review requested due to automatic review settings May 8, 2026 06:54
Copy link
Copy Markdown
Contributor Author

@hwbrzzl hwbrzzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the latest annotation in commit 8264e3a by refactoring the shared media storage path normalization and write helpers.

Comment thread ai/audio_storage.go
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Comment thread ai/media_storage.go
Comment thread ai/response.go Outdated
Comment thread ai/media_storage.go
@hwbrzzl hwbrzzl merged commit c78a1d9 into master May 11, 2026
19 checks passed
@hwbrzzl hwbrzzl deleted the bowen/#918-4 branch May 11, 2026 03:17
LinboLen added a commit to LinboLen/framework that referenced this pull request May 11, 2026
* origin/master:
  feat(ai): add audio generation support (goravel#1467)
  chore: Update non-major dependencies (goravel#1468)
  refactor(ai): rename agent response interfaces (goravel#1466)
  feat(ai): add image storage helpers (goravel#1465)
  chore: Update non-major dependencies (goravel#1464)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] AI SDK Phase 4: Multi-Modal (Attachments, Image, and Audio)

2 participants