Skip to content

feat!: integrate all vision models with vision camera#880

Merged
NorbertKlockiewicz merged 71 commits intomainfrom
@nk/vision-models-camera-integration
Mar 17, 2026
Merged

feat!: integrate all vision models with vision camera#880
NorbertKlockiewicz merged 71 commits intomainfrom
@nk/vision-models-camera-integration

Conversation

@NorbertKlockiewicz
Copy link
Copy Markdown
Contributor

@NorbertKlockiewicz NorbertKlockiewicz commented Feb 25, 2026

Description

  • Added PixelData input support to all vision model hooks and modules — forward() now accepts both string URIs and raw pixel buffers
  • Added runOnFrame worklet to all vision model hooks for real-time VisionCamera v5 frame processing (useClassification, useImageEmbeddings, useOCR, useVerticalOCR, useObjectDetection, useSemanticSegmentation, useStyleTransfer)
  • Refactored StyleTransfer C++ to unified generateFromString / generateFromPixels with a saveToFile flag
  • Added new VisionCamera integration docs page with setup guide, full example, Module API pattern, and common issues section
  • Updated all vision model hook and module docs to document PixelData input and runOnFrame
  • Added integration tests: VisionModelTest (unit tests for extractFromPixels and preprocess), FrameProcessorTest (unit tests for pixelsToMat), StyleTransferTest (full coverage of both output modes), and generateFromPixels smoke tests across all vision models

Breaking changes

StyleTransferType.forward return type changed

Before:

  forward(imageSource: string): Promise<string>

After:

  forward(input: string | PixelData, output?: 'pixelData' | 'url'): Promise<PixelData | string>

Known Issues

The orientation handling is static right now, it will be fixed in the next PR.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  • Run computer vision app -> vision camera screen -> test all models
  • Also test rest of the models in this app.
  • Run test suite

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@NorbertKlockiewicz NorbertKlockiewicz self-assigned this Feb 25, 2026
@NorbertKlockiewicz NorbertKlockiewicz linked an issue Feb 25, 2026 that may be closed by this pull request
@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/vision-models-camera-integration branch from 66e358e to 0a8493b Compare February 25, 2026 15:36
@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/vision-models-camera-integration branch from 787ea7d to 7488857 Compare March 10, 2026 10:50
@NorbertKlockiewicz NorbertKlockiewicz changed the title @nk/vision models camera integration feat: integrate all vision models with vision camera Mar 11, 2026
@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/vision-models-camera-integration branch from 35c2a3f to a8d95a8 Compare March 11, 2026 13:10
@NorbertKlockiewicz NorbertKlockiewicz added 3rd party package Issue related to 3rd party packages, but not ExecuTorch, e.g. Expo feature PRs that implement a new feature labels Mar 12, 2026
@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/vision-models-camera-integration branch from af8ebfe to 951ab91 Compare March 12, 2026 15:42
@NorbertKlockiewicz NorbertKlockiewicz linked an issue Mar 12, 2026 that may be closed by this pull request
@NorbertKlockiewicz NorbertKlockiewicz changed the title feat: integrate all vision models with vision camera feat(style trasfer)!: integrate all vision models with vision camera Mar 13, 2026
@NorbertKlockiewicz NorbertKlockiewicz changed the title feat(style trasfer)!: integrate all vision models with vision camera feat(style transfer)!: integrate all vision models with vision camera Mar 13, 2026
@NorbertKlockiewicz NorbertKlockiewicz marked this pull request as ready for review March 13, 2026 11:03
@NorbertKlockiewicz NorbertKlockiewicz changed the title feat(style transfer)!: integrate all vision models with vision camera feat!: integrate all vision models with vision camera Mar 13, 2026
@barhanc
Copy link
Copy Markdown
Contributor

barhanc commented Mar 13, 2026

On Android the output for front camera is upside down and mirrored. Is this an expected behaviour that will be fixed with orientation handling?

@NorbertKlockiewicz
Copy link
Copy Markdown
Contributor Author

On Android the output for front camera is upside down and mirrored. Is this an expected behaviour that will be fixed with orientation handling?

yes, I am working on it now but this PR already has too much changes so it will be added in the next one.

Comment on lines +84 to +90
EXPECT_NO_THROW((void)model.generateFromString(kValidTestImagePath, false));
auto result1 = model.generateFromString(kValidTestImagePath, false);
auto result2 = model.generateFromString(kValidTestImagePath, false);
ASSERT_TRUE(std::holds_alternative<PixelDataResult>(result1));
ASSERT_TRUE(std::holds_alternative<PixelDataResult>(result2));
EXPECT_NE(std::get<PixelDataResult>(result1).dataPtr, nullptr);
EXPECT_NE(std::get<PixelDataResult>(result2).dataPtr, nullptr);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why these are repeated?

Comment on lines +161 to +176
TEST(StyleTransferPixelTests, ValidPixelsSaveToFileTrueReturnsString) {
StyleTransfer model(kValidStyleTransferModelPath, nullptr);
std::vector<uint8_t> buf;
auto view = makeRgbView(buf, 64, 64);
auto result = model.generateFromPixels(view, true);
EXPECT_TRUE(std::holds_alternative<std::string>(result));
}

TEST(StyleTransferPixelTests, ValidPixelsSaveToFileTrueHasFileScheme) {
StyleTransfer model(kValidStyleTransferModelPath, nullptr);
std::vector<uint8_t> buf;
auto view = makeRgbView(buf, 64, 64);
auto result = model.generateFromPixels(view, true);
ASSERT_TRUE(std::holds_alternative<std::string>(result));
EXPECT_TRUE(std::get<std::string>(result).starts_with("file://"));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first test is a subset of the second one

Comment on lines +21 to +31
TEST(PixelsToMatValidInput, ProducesCorrectRows) {
std::vector<uint8_t> buf;
auto view = makeValidView(buf, 48, 64);
EXPECT_EQ(pixelsToMat(view).rows, 48);
}

TEST(PixelsToMatValidInput, ProducesCorrectCols) {
std::vector<uint8_t> buf;
auto view = makeValidView(buf, 48, 64);
EXPECT_EQ(pixelsToMat(view).cols, 64);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge this two into one test

Comment on lines +33 to +43
TEST(PixelsToMatValidInput, ProducesThreeChannelMat) {
std::vector<uint8_t> buf;
auto view = makeValidView(buf, 4, 4);
EXPECT_EQ(pixelsToMat(view).channels(), 3);
}

TEST(PixelsToMatValidInput, MatTypeIsCV_8UC3) {
std::vector<uint8_t> buf;
auto view = makeValidView(buf, 4, 4);
EXPECT_EQ(pixelsToMat(view).type(), CV_8UC3);
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, merge and make two check in one test

@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/vision-models-camera-integration branch from e07a54c to 190a19b Compare March 13, 2026 13:46
@msluszniak
Copy link
Copy Markdown
Member

That might look better:

Screenshot_20260313_160720_computer-vision

@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented Mar 13, 2026

Also get these warnings when running demo apps:

 WARN  Route "./vision_camera/tasks/types.ts" is missing the required default export. Ensure a React component is exported as default.
 WARN  Route "./vision_camera/utils/colors.ts" is missing the required default export. Ensure a React component is exported as default.

@msluszniak
Copy link
Copy Markdown
Member

msluszniak commented Mar 13, 2026

Trying to run regular style transfer freezes the app on my android phone.

This is probably related to: #962 so we can ignore it for now.

Copy link
Copy Markdown
Collaborator

@chmjkb chmjkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a couple of comments for now, though I haven't finished reviewing :D

Comment on lines +110 to +114
## VisionCamera integration

For real-time object detection on camera frames, use `runOnFrame`. It runs synchronously on the JS worklet thread and returns `Detection[]`.

See the full guide: [VisionCamera Integration](./visioncamera-integration.md).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I think just a link to the vision camera integration doc would be enough. This way we don't have to remember to update these docs when changing return types. It's a general comment to all the changes in docs.

Copy link
Copy Markdown
Collaborator

@chmjkb chmjkb Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When downloading the model i got the maximum state depth update error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, on android or ios, also when downloading which model?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iOS and I believe it was the selfie segmentation one

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to reproduce it in iPhone 16 Pro and SE 3 and wasn't able to, as it's just the example app I suggest to "jeździć obserwować"

sizesArray.setValueAtIndex(runtime, 2, jsi::Value(4));
obj.setProperty(runtime, "sizes", sizesArray);

obj.setProperty(runtime, "scalarType", jsi::Value(0));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use ScalarType::Byte instead of plain magic 0

Comment thread .eslintrc.js Outdated
},
],
'camelcase': 'error',
'camelcase': ['error', { properties: 'never' }],
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we ignoring this rule for properties?

forward: (imageSource: string) => Promise<string>;
forward<O extends 'pixelData' | 'url' = 'pixelData'>(
input: string | PixelData,
output?: O
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe outputKind / outputType would be a bit better

chmjkb
chmjkb previously requested changes Mar 16, 2026
*
* **Note**: For VisionCamera frame processing, use `runOnFrame` instead.
*
* @param input - Image source (string or PixelData object)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can link to pixeldata here

*
* Available after model is loaded (`isReady: true`).
*
* @example
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other models dont have such example. I think we should follow a single approach

Comment on lines +27 to +42
/**
* Given a model configs record (mapping model names to `{ labelMap }`) and a
* type `T` (either a model name key or a raw {@link LabelEnum}), resolves to
* the label map for that model or `T` itself.
*
* @internal
*/
export type ResolveLabels<
T,
Configs extends Record<string, { labelMap: LabelEnum }>,
> = T extends keyof Configs
? Configs[T]['labelMap']
: T extends LabelEnum
? T
: never;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think to keep types seperate this should be moved to some other file

Comment on lines -15 to -18
private constructor(nativeModule: unknown) {
super();
this.nativeModule = nativeModule;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we're ditching this? I feel like this is a cleaner way of creating models, where the factory is responsible for creating the native object, and the module just receives a ready to use thing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've selected wrong version in rebase :/

Comment on lines +42 to +44
const instance = new ImageEmbeddingsModule();
instance.nativeModule = await global.loadImageEmbeddings(paths[0]);
return instance;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here regarding the construction method

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When can this happen? Maybe we should throw if it does?

    if (!this.nativeModule?.generateFromFrame) {
      return null;
    }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would skip inference if the model isn't loaded, however I think we can just check if (this.nativeModule == null)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that skipping inference silently here might be unexpected for the user.

Comment on lines +92 to +94
if (!this.nativeModule?.generateFromFrame) {
return null;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as in VisionModule

Comment thread packages/react-native-executorch/common/rnexecutorch/utils/FrameProcessor.cpp Outdated
Comment on lines +179 to +201
TEST(StyleTransferThreadSafetyTests, TwoConcurrentGeneratesDoNotCrash) {
StyleTransfer model(kValidStyleTransferModelPath, nullptr);
std::atomic<int32_t> successCount{0};
std::atomic<int32_t> exceptionCount{0};

auto task = [&]() {
try {
(void)model.generateFromString(kValidTestImagePath, false);
successCount++;
} catch (const RnExecutorchError &) {
exceptionCount++;
}
};

std::thread a(task);
std::thread b(task);
a.join();
b.join();

EXPECT_EQ(successCount + exceptionCount, 2);
}

TEST(StyleTransferThreadSafetyTests,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could be a typed test for all the models

NorbertKlockiewicz and others added 23 commits March 16, 2026 16:53
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…meProcessor.cpp

Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com>
@NorbertKlockiewicz NorbertKlockiewicz force-pushed the @nk/vision-models-camera-integration branch from b28847b to 836c7a2 Compare March 16, 2026 15:53
auto sizesArray = jsi::Array(runtime, 3);
sizesArray.setValueAtIndex(runtime, 0, jsi::Value(result.height));
sizesArray.setValueAtIndex(runtime, 1, jsi::Value(result.width));
sizesArray.setValueAtIndex(runtime, 2, jsi::Value(4));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is 4 here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a number of channels but it should be passed the same way as the height and width, on it now!

@mkopcins mkopcins dismissed chmjkb’s stale review March 17, 2026 09:17

dismiss due to work leave

@NorbertKlockiewicz NorbertKlockiewicz merged commit 4279ad4 into main Mar 17, 2026
5 checks passed
@NorbertKlockiewicz NorbertKlockiewicz deleted the @nk/vision-models-camera-integration branch March 17, 2026 09:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3rd party package Issue related to 3rd party packages, but not ExecuTorch, e.g. Expo feature PRs that implement a new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Vision Camera integration - other vision models Vision camera integration - documentation

5 participants