Skip to content

Commit dec0d63

Browse files
authored
Merge pull request #129 from leehack/feat/model-source-resolver
feat: add model download cache manager
2 parents fc98c90 + 2bfc7ad commit dec0d63

40 files changed

Lines changed: 5321 additions & 281 deletions

AGENTS.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,15 @@ pass `--mem64` and a smaller `--context-size` to keep the smoke bounded.
9191
- Parameter and return types documented
9292
- No TODO/FIXME comments in committed code
9393

94+
### Changelog Discipline
95+
- Never add unreleased work to an already-published version section in
96+
`CHANGELOG.md` or `website/docs/changelog/recent-releases.md`.
97+
- Before editing release notes, check the top of `CHANGELOG.md`. If the latest
98+
section is a concrete released version (for example `## 0.6.12`), create a
99+
new `## Unreleased` section above it and place new PR entries there.
100+
- Only move entries from `## Unreleased` into a numbered version section as part
101+
of an explicit release/version-bump task.
102+
94103
### Error Handling
95104
- Use custom `LlamaException` hierarchy (defined in `lib/src/core/exceptions.dart`)
96105
- Subtypes: `LlamaModelException`, `LlamaContextException`, `LlamaInferenceException`, `LlamaStateException`, `LlamaUnsupportedException`

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,26 @@
1+
## Unreleased
2+
3+
* **Model source download/cache manager**:
4+
* Added `ModelSource` for local paths, HTTP(S) URLs, and Hugging Face
5+
`hf://owner/repo/path/to/model.gguf` references, including deterministic
6+
cache keys and redacted metadata/log identities for signed URLs.
7+
* Added `ModelLoadOptions`, `ModelCachePolicy`, resolver targets, and
8+
download/cache metadata/progress value models for package-managed model
9+
download and cache management.
10+
* Added native/file-backed `DefaultModelDownloadManager` support for streaming
11+
HTTP downloads, `.part` files, atomic promotion, persisted metadata,
12+
authenticated bearer/custom headers, cancellation, retry, Range resume,
13+
cache hit/refresh/cache-only/no-cache policies, SHA-256 verification,
14+
cache listing, removal, clearing, and age/size pruning.
15+
* Added `LlamaEngine.loadModelSource(...)` to route local sources through the
16+
existing native local loader, remote sources through the native download
17+
cache before local loading, and simple remote sources through URL-capable web
18+
backends when available.
19+
* Migrated server/testing helpers away from ad-hoc model downloads so examples
20+
dogfood the package-managed cache manager.
21+
* **Compatibility note**: no public API breaking changes; the model source APIs
22+
are additive and existing `loadModel(...)` callers are unchanged.
23+
124
## 0.6.12
225

326
* **Native runtime sync**:

README.md

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,8 @@
2626
- Web: WebGPU via bridge runtime (with CPU fallback)
2727
- 🧭 **Embeddings API**: Generate vectors with `embed(...)` and
2828
`embedBatch(...)`.
29+
- 📦 **Structured Model Sources**: Describe local, HTTP(S), and Hugging Face
30+
GGUF sources with deterministic cache identities for download/cache workflows.
2931
- 🖼️ **Multimodal Support**: Vision/audio model runtime support.
3032
- **LoRA Support**: Runtime GGUF adapter application.
3133
- 🔇 **Split Logging Control**: Dart logs and native logs can be configured independently.
@@ -88,7 +90,41 @@ Future<void> main() async {
8890
}
8991
```
9092

91-
### 5. Generate embeddings
93+
### 5. Download and cache a remote GGUF
94+
95+
```dart
96+
import 'package:llamadart/llamadart.dart';
97+
98+
Future<void> main() async {
99+
final engine = LlamaEngine(LlamaBackend());
100+
try {
101+
await engine.loadModelSource(
102+
ModelSource.parse('hf://owner/repo/model-Q4_K_M.gguf'),
103+
options: ModelLoadOptions(
104+
cachePolicy: ModelCachePolicy.preferCached,
105+
cacheDirectory: '/path/to/app/model-cache',
106+
),
107+
onProgress: (progress) {
108+
final fraction = progress.fraction;
109+
if (fraction != null) {
110+
print('download ${(fraction * 100).toStringAsFixed(1)}%');
111+
}
112+
},
113+
);
114+
} finally {
115+
await engine.dispose();
116+
}
117+
}
118+
```
119+
120+
Native/file-backed backends stream remote models into the package-managed cache,
121+
resume partial `.part` downloads when the server supports HTTP Range and the
122+
partial has a safe validator (ETag/Last-Modified) or caller-provided SHA-256,
123+
verify optional SHA-256 checksums, and redact signed URL credentials from
124+
metadata. Validator-less partial files restart from byte zero instead of being
125+
appended.
126+
127+
### 6. Generate embeddings
92128

93129
```dart
94130
import 'package:llamadart/llamadart.dart';

example/chat_app/README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,14 @@ flutter test
3838

3939
Note: this is a Flutter app, so use `flutter test` (not `dart test`).
4040

41+
Slow device E2E tests are tagged `local-only` and skipped by default. To run
42+
the real model/mmproj download-cache-load check manually on a selected device:
43+
44+
```bash
45+
flutter test --run-skipped -t local-only \
46+
integration_test/model_cache_mmproj_e2e_test.dart -d <device>
47+
```
48+
4149
### 2. Choose and Download a Model
4250
1. The app will open to a **Manage Models** screen.
4351
2. Select one of the pre-configured models (for example: FunctionGemma 270M, Qwen3.5 0.8B/2B/4B/9B, Llama 3.2 3B, Gemma 3/3n, DeepSeek R1 distills).

example/chat_app/dart_test.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
tags:
2+
local-only:
3+
skip: "Runs only on local machines. Use: flutter test --run-skipped -t local-only integration_test/model_cache_mmproj_e2e_test.dart -d <device>"
4+
e2e:
5+
timeout: 30m
Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
@Tags(['local-only', 'e2e'])
2+
@Timeout(Duration(minutes: 30))
3+
/// Local-only chat app E2E for the model/mmproj download-cache-load path.
4+
///
5+
/// This downloads the default LFM2-VL 450M model and its mmproj, so it is
6+
/// intentionally skipped by default. Run it manually with:
7+
///
8+
/// ```bash
9+
/// cd example/chat_app
10+
/// flutter test --run-skipped -t local-only \
11+
/// integration_test/model_cache_mmproj_e2e_test.dart -d <device>
12+
/// ```
13+
library;
14+
15+
import 'package:dio/dio.dart';
16+
import 'package:flutter/foundation.dart';
17+
import 'package:flutter_test/flutter_test.dart';
18+
import 'package:integration_test/integration_test.dart';
19+
import 'package:llamadart/llamadart.dart' show GpuBackend, LlamaLogLevel;
20+
import 'package:path/path.dart' as p;
21+
22+
import 'package:llamadart_chat_example/models/chat_settings.dart';
23+
import 'package:llamadart_chat_example/models/downloadable_model.dart';
24+
import 'package:llamadart_chat_example/services/chat_service.dart';
25+
import 'package:llamadart_chat_example/services/model_service_base.dart';
26+
27+
void main() {
28+
IntegrationTestWidgetsFlutterBinding.ensureInitialized();
29+
30+
testWidgets(
31+
'downloads, caches, and loads tiny multimodal model + mmproj',
32+
(tester) async {
33+
final model = DownloadableModel.defaultModels.firstWhere(
34+
(candidate) => candidate.name == 'LFM2-VL 450M',
35+
);
36+
expect(model.multimodalProjectorSource, isNotNull);
37+
38+
final service = ModelService();
39+
final modelsDir = await service.getModelsDirectory();
40+
41+
await service.deleteModel(modelsDir, model);
42+
var downloaded = await service.getDownloadedModels([model]);
43+
expect(downloaded.contains(model.filename), isFalse);
44+
45+
final stages = <ModelDownloadStage>{};
46+
final progressEvents = <ModelDownloadProgress>[];
47+
Object? downloadError;
48+
String? successFilename;
49+
50+
await service.downloadModel(
51+
model: model,
52+
modelsDir: modelsDir,
53+
cancelToken: CancelToken(),
54+
onProgress: (_) {},
55+
onProgressDetail: (detail) {
56+
stages.add(detail.stage);
57+
progressEvents.add(detail);
58+
debugPrint(
59+
'E2E download ${detail.stage.name} '
60+
'${(detail.overallProgress * 100).toStringAsFixed(1)}% '
61+
'${detail.stageDownloadedBytes}/${detail.stageTotalBytes ?? -1}',
62+
);
63+
},
64+
onSuccess: (filename) {
65+
successFilename = filename;
66+
},
67+
onError: (error) {
68+
downloadError = error;
69+
},
70+
);
71+
72+
expect(downloadError, isNull);
73+
expect(successFilename, model.filename);
74+
expect(stages, contains(ModelDownloadStage.model));
75+
expect(stages, contains(ModelDownloadStage.multimodalProjector));
76+
expect(progressEvents.last.overallProgress, 1.0);
77+
78+
downloaded = await service.getDownloadedModels([model]);
79+
expect(downloaded, contains(model.filename));
80+
81+
final modelSource = model.modelSource;
82+
final mmprojSource = model.multimodalProjectorSource!;
83+
final modelLoadRef = kIsWeb || modelSource is LocalModelAssetSource
84+
? modelSource.loadReference
85+
: p.join(modelsDir, model.filename);
86+
final mmprojLoadRef = kIsWeb || mmprojSource is LocalModelAssetSource
87+
? mmprojSource.loadReference
88+
: p.join(modelsDir, mmprojSource.displayName);
89+
90+
final chatService = ChatService();
91+
try {
92+
await chatService.init(
93+
ChatSettings(
94+
modelPath: modelLoadRef,
95+
mmprojPath: mmprojLoadRef,
96+
preferredBackend: GpuBackend.cpu,
97+
gpuLayers: 0,
98+
contextSize: 512,
99+
maxTokens: 32,
100+
nativeLogLevel: LlamaLogLevel.warn,
101+
),
102+
eagerLoadMultimodalProjector: true,
103+
onProgress: (progress) =>
104+
debugPrint('E2E load ${(progress * 100).toStringAsFixed(1)}%'),
105+
);
106+
107+
expect(chatService.engine.isReady, isTrue);
108+
expect(await chatService.engine.supportsVision, isTrue);
109+
} finally {
110+
await chatService.dispose();
111+
}
112+
},
113+
timeout: const Timeout(Duration(minutes: 30)),
114+
);
115+
}

0 commit comments

Comments
 (0)