A production-ready reference app demonstrating the RunAnywhere Flutter SDK capabilities for on-device AI. This app showcases how to build privacy-first, offline-capable AI features with LLM chat, speech-to-text, text-to-speech, and a complete voice assistant pipeline—all running locally on your device.
Important: This sample app consumes the RunAnywhere Flutter SDK as local path dependencies. Before opening this project, you must first build the SDK's native libraries.
# 1. Navigate to the Flutter SDK directory
cd runanywhere-sdks/sdk/runanywhere-flutter
# 2. Run the setup script (~10-20 minutes on first run)
# This builds the native C++ frameworks/libraries and enables local mode
./scripts/build-flutter.sh --setup
# 3. Navigate to this sample app
cd ../../examples/flutter/RunAnywhereAI
# 4. Install dependencies
flutter pub get
# 5. For iOS: Install pods
cd ios && pod install && cd ..
# 6. Run the app
flutter run
# Or open in Android Studio / VS Code and run from thereThis sample app's pubspec.yaml uses path dependencies to reference the local Flutter SDK packages:
This Sample App → Local Flutter SDK packages (sdk/runanywhere-flutter/packages/)
↓
Local XCFrameworks/JNI libs (in each package's ios/Frameworks/ and android/jniLibs/)
↑
Built by: ./scripts/build-flutter.sh --setup
The build-flutter.sh --setup script:
- Downloads dependencies (ONNX Runtime, Sherpa-ONNX)
- Builds the native C++ libraries from
runanywhere-commons - Copies XCFrameworks to
packages/*/ios/Frameworks/ - Copies JNI
.sofiles topackages/*/android/src/main/jniLibs/ - Creates
.testlocalmarker files (enables local library consumption)
- Dart SDK code changes: Run
flutter runagain (hot reload works for most changes) - C++ code changes (in
runanywhere-commons):cd sdk/runanywhere-flutter ./scripts/build-flutter.sh --local --rebuild-commons
Try the native iOS and Android apps to experience on-device AI capabilities immediately. The Flutter sample app demonstrates the same features using the cross-platform Flutter SDK.
This sample app demonstrates the full power of the RunAnywhere Flutter SDK:
| Feature | Description | SDK Integration |
|---|---|---|
| AI Chat | Interactive LLM conversations with streaming responses | RunAnywhere.generateStream() |
| Thinking Mode | Support for models with <think>...</think> reasoning |
Thinking tag parsing |
| Real-time Analytics | Token speed, generation time, inference metrics | MessageAnalytics |
| Speech-to-Text | Voice transcription with batch & live modes | RunAnywhere.transcribe() |
| Text-to-Speech | Neural voice synthesis with Piper TTS | RunAnywhere.synthesize() |
| Voice Assistant | Full STT to LLM to TTS pipeline with auto-detection | VoiceSession API |
| Model Management | Download, load, and manage multiple AI models | ModelManager |
| Storage Management | View storage usage and delete models | RunAnywhere.getStorageInfo() |
| Offline Support | All features work without internet | On-device inference |
The app follows Flutter best practices with a clean architecture pattern:
┌─────────────────────────────────────────────────────────────────────┐
│ Flutter/Material UI │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────────┐ │
│ │ Chat │ │ STT │ │ TTS │ │ Voice │ │ Settings │ │
│ │Interface │ │ View │ │ View │ │Assistant │ │ View │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬─────┘ └─────┬──────┘ │
├───────┼────────────┼────────────┼────────────┼─────────────┼────────┤
│ ▼ ▼ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Provider State Management │ │
│ │ (ModelManager, Services) │ │
│ └──────────────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ RunAnywhere Flutter SDK │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Core API (generate, transcribe, synthesize) │ │
│ │ Model Management (download, load, unload, delete) │ │
│ │ Voice Session (STT → LLM → TTS pipeline) │ │
│ └──────────────────────────────────────────────────────────────┘ │
│ │ │
│ ┌──────────────────┴──────────────────┐ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ LlamaCpp │ │ ONNX Runtime │ │
│ │ (LLM/GGUF) │ │ (STT/TTS) │ │
│ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
- Provider Pattern —
ChangeNotifier+Providerfor state management - Feature-First Structure — Each feature is self-contained with its own views and logic
- Shared Core Services —
ModelManager,AudioRecordingService,AudioPlayerService - Design System — Consistent
AppColors,AppTypography,AppSpacingtokens - SDK Integration — Direct SDK calls with async/await and Stream support
RunAnywhereAI/
├── lib/
│ ├── main.dart # App entry point
│ │
│ ├── app/
│ │ ├── runanywhere_ai_app.dart # SDK initialization, model registration
│ │ └── content_view.dart # Main tab navigation (5 tabs)
│ │
│ ├── core/
│ │ ├── design_system/
│ │ │ ├── app_colors.dart # Color palette with dark mode support
│ │ │ ├── app_spacing.dart # Spacing constants
│ │ │ └── typography.dart # Text styles
│ │ │
│ │ ├── models/
│ │ │ └── app_types.dart # Shared type definitions
│ │ │
│ │ ├── services/
│ │ │ ├── model_manager.dart # SDK model management wrapper
│ │ │ ├── audio_recording_service.dart # Microphone capture
│ │ │ ├── audio_player_service.dart # TTS playback
│ │ │ ├── permission_service.dart # Permission handling
│ │ │ ├── conversation_store.dart # Chat history persistence
│ │ │ └── device_info_service.dart # Device capabilities
│ │ │
│ │ └── utilities/
│ │ ├── constants.dart # Preference keys, defaults
│ │ └── keychain_helper.dart # Secure storage wrapper
│ │
│ ├── features/
│ │ ├── chat/
│ │ │ └── chat_interface_view.dart # LLM chat with streaming
│ │ │
│ │ ├── voice/
│ │ │ ├── speech_to_text_view.dart # Batch & live STT
│ │ │ ├── text_to_speech_view.dart # TTS synthesis & playback
│ │ │ └── voice_assistant_view.dart # Full STT→LLM→TTS pipeline
│ │ │
│ │ ├── models/
│ │ │ ├── models_view.dart # Model browser
│ │ │ ├── model_selection_sheet.dart # Model picker bottom sheet
│ │ │ ├── model_list_view_model.dart # Model list logic
│ │ │ ├── model_components.dart # Reusable model UI widgets
│ │ │ ├── model_status_components.dart # Status badges, indicators
│ │ │ ├── model_types.dart # Framework enums, model info
│ │ │ └── add_model_from_url_view.dart # Import custom models
│ │ │
│ │ └── settings/
│ │ └── combined_settings_view.dart # Storage & logging config
│ │
│ └── helpers/
│ └── adaptive_layout.dart # Responsive layout utilities
│
├── pubspec.yaml # Dependencies, SDK references
├── android/ # Android platform config
├── ios/ # iOS platform config
└── README.md # This file
- Flutter 3.10.0 or later (install guide)
- Dart 3.0.0 or later (included with Flutter)
- iOS — Xcode 14+ (for iOS builds)
- Android — Android Studio + SDK 21+ (for Android builds)
- ~2GB free storage for AI models
- Device — Physical device recommended for best performance
# Clone the repository
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks/examples/flutter/RunAnywhereAI
# Install dependencies
flutter pub get
# Run on connected device
flutter run- Open the project in VS Code or Android Studio
- Wait for Flutter dependencies to resolve
- Select a physical device (iOS or Android)
- Press F5 (VS Code) or Run (Android Studio)
# Android APK
flutter build apk --release
# Android App Bundle
flutter build appbundle --release
# iOS (requires Xcode)
flutter build ios --releaseThe SDK is initialized in runanywhere_ai_app.dart:
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_llamacpp/runanywhere_llamacpp.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';
// 1. Initialize SDK in development mode
await RunAnywhere.initialize();
// 2. Register LlamaCpp module for LLM models (GGUF)
await LlamaCpp.register();
LlamaCpp.addModel(
id: 'smollm2-360m-q8_0',
name: 'SmolLM2 360M Q8_0',
url: 'https://huggingface.co/prithivMLmods/SmolLM2-360M-GGUF/resolve/main/SmolLM2-360M.Q8_0.gguf',
memoryRequirement: 500000000,
);
// 3. Register ONNX module for STT/TTS models
await Onnx.register();
Onnx.addModel(
id: 'sherpa-onnx-whisper-tiny.en',
name: 'Sherpa Whisper Tiny (ONNX)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.speechRecognition,
memoryRequirement: 75000000,
);// Download with progress tracking (via ModelManager)
await ModelManager.shared.downloadModel(modelInfo);
// Load LLM model
await sdk.RunAnywhere.loadLLMModel('smollm2-360m-q8_0');
// Check if model is loaded
final isLoaded = sdk.RunAnywhere.isModelLoaded;// Generate with streaming (real-time tokens)
final streamResult = await RunAnywhere.generateStream(prompt, options: options);
await for (final token in streamResult.stream) {
// Display each token as it arrives
setState(() {
_responseText += token;
});
}
// Or non-streaming
final result = await RunAnywhere.generate(prompt, options: options);
print('Response: ${result.text}');
print('Speed: ${result.tokensPerSecond} tok/s');// Load STT model
await RunAnywhere.loadSTTModel('sherpa-onnx-whisper-tiny.en');
// Transcribe audio bytes
final transcription = await RunAnywhere.transcribe(audioBytes);
print('Transcription: $transcription');// Load TTS voice
await RunAnywhere.loadTTSVoice('vits-piper-en_US-lessac-medium');
// Synthesize speech with options
final result = await RunAnywhere.synthesize(
text,
rate: 1.0,
pitch: 1.0,
volume: 1.0,
);
// Play audio (result.samples is Float32List)
await audioPlayer.play(result.samples, result.sampleRate);// Start voice session
final session = await RunAnywhere.startVoiceSession(
config: VoiceSessionConfig(),
);
// Listen to session events
session.events.listen((event) {
if (event is VoiceSessionTranscribed) {
print('User said: ${event.text}');
} else if (event is VoiceSessionResponded) {
print('AI response: ${event.text}');
} else if (event is VoiceSessionSpeaking) {
// Audio is being played
}
});
// Stop session
session.stop();What it demonstrates:
- Streaming text generation with real-time token display
- Thinking mode support (
<think>...</think>tags) - Message analytics (tokens/sec, generation time)
- Conversation history with Markdown rendering
- Model selection bottom sheet integration
Key SDK APIs:
RunAnywhere.generateStream()— Streaming generationRunAnywhere.generate()— Non-streaming generationRunAnywhere.currentLLMModel()— Get loaded model info
What it demonstrates:
- Batch mode: Record full audio, then transcribe
- Live mode: Real-time streaming transcription (when supported)
- Audio level visualization
- Mode selection (batch vs. live)
Key SDK APIs:
RunAnywhere.loadSTTModel()— Load Whisper modelRunAnywhere.transcribe()— Batch transcriptionRunAnywhere.isSTTModelLoaded— Check model status
What it demonstrates:
- Neural voice synthesis with Piper TTS
- Speed and pitch controls with sliders
- Audio playback with progress indicator
- Audio metadata display (duration, sample rate, size)
Key SDK APIs:
RunAnywhere.loadTTSVoice()— Load TTS modelRunAnywhere.synthesize()— Generate speech audioRunAnywhere.isTTSVoiceLoaded— Check voice status
What it demonstrates:
- Complete voice AI pipeline (STT to LLM to TTS)
- Model configuration for all 3 components
- Audio level visualization during recording
- Conversation turn management
- Session state machine (connecting, listening, processing, speaking)
Key SDK APIs:
RunAnywhere.startVoiceSession()— Start voice sessionRunAnywhere.isVoiceAgentReady— Check all components loadedVoiceSessionEvent— Session event stream
What it demonstrates:
- Storage usage overview (total, available, model storage)
- Downloaded model list with details
- Model deletion with confirmation dialog
- Analytics logging toggle
Key SDK APIs:
RunAnywhere.getStorageInfo()— Get storage detailsRunAnywhere.getDownloadedModelsWithInfo()— List modelsRunAnywhere.deleteStoredModel()— Remove model
| Model | Size | Memory | Description |
|---|---|---|---|
| SmolLM2 360M Q8_0 | ~400MB | 500MB | Fast, lightweight chat |
| Qwen 2.5 0.5B Q6_K | ~500MB | 600MB | Multilingual, efficient |
| LFM2 350M Q4_K_M | ~200MB | 250MB | LiquidAI, ultra-compact |
| LFM2 350M Q8_0 | ~350MB | 400MB | Higher quality version |
| Llama 2 7B Chat Q4_K_M | ~4GB | 4GB | Powerful, larger model |
| Mistral 7B Instruct Q4_K_M | ~4GB | 4GB | High quality responses |
| Model | Size | Description |
|---|---|---|
| Sherpa Whisper Tiny (EN) | ~75MB | Fast English transcription |
| Sherpa Whisper Small (EN) | ~250MB | Higher accuracy |
| Model | Size | Description |
|---|---|---|
| Piper US English (Medium) | ~65MB | Natural American voice |
| Piper British English (Medium) | ~65MB | British accent |
# Run all tests
flutter test
# Run with coverage
flutter test --coverage
# Run specific test file
flutter test test/widget_test.dart# Analyze code quality
flutter analyze
# Format code
dart format lib/ test/
# Fix issues automatically
dart fix --applyThe app uses debugPrint() extensively. Filter logs by:
# Flutter logs
flutter logs | grep -E "RunAnywhere|SDK"| Log Prefix | Description |
|---|---|
SDK |
SDK initialization |
SUCCESS |
Success operations |
ERROR |
Error conditions |
MODULE |
Module registration |
LOADING |
Loading/processing |
AUDIO |
Audio operations |
RECORDING |
Recording operations |
- Run app in profile mode:
flutter run --profile - Open DevTools: Press
pin terminal - Navigate to Memory tab
- Expected: ~300MB-2GB depending on model size
The SDK automatically detects the environment:
// Development mode (default)
if (kDebugMode) {
await RunAnywhere.initialize();
}
// Production mode
else {
await RunAnywhere.initialize(
apiKey: 'your-api-key',
baseURL: 'https://api.runanywhere.ai',
environment: SDKEnvironment.production,
);
}User preferences are stored via SharedPreferences:
| Key | Type | Default | Description |
|---|---|---|---|
useStreaming |
bool | true |
Enable streaming generation |
defaultTemperature |
double | 0.7 |
LLM temperature |
defaultMaxTokens |
int | 500 |
Max tokens per generation |
- ARM64 Recommended — Native libraries optimized for arm64 (x86 emulators may be slow)
- Memory Usage — Large models (7B+) require devices with 6GB+ RAM
- First Load — Initial model loading takes 1-3 seconds (cached afterward)
- Live STT — Requires WhisperKit-compatible models (limited in ONNX)
- Platform Channels — Some SDK features use FFI/platform channels
We welcome contributions! See CONTRIBUTING.md for guidelines.
# Fork and clone
git clone https://github.com/YOUR_USERNAME/runanywhere-sdks.git
cd runanywhere-sdks/examples/flutter/RunAnywhereAI
# Create feature branch
git checkout -b feature/your-feature
# Make changes and test
flutter pub get
flutter analyze
flutter test
# Commit and push
git commit -m "feat: your feature description"
git push origin feature/your-feature
# Open Pull RequestThis project is licensed under the Apache License 2.0 - see LICENSE for details.
- Discord: Join our community
- GitHub Issues: Report bugs
- Email: san@runanywhere.ai
- Twitter: @RunanywhereAI
- RunAnywhere Flutter SDK — Full SDK documentation
- iOS Example App — iOS counterpart
- Android Example App — Android counterpart
- React Native Example — React Native option
- Main README — Project overview
