Skip to content

feat: add vision service#639

Open
HipsterBrown wants to merge 30 commits into
mainfrom
feat/vision-service-sdk
Open

feat: add vision service#639
HipsterBrown wants to merge 30 commits into
mainfrom
feat/vision-service-sdk

Conversation

@HipsterBrown
Copy link
Copy Markdown

I was testing out the C++ module generation for the vision service module I wanted to build and discovered that service template was missing due to the API not being provided by the SDK. So I worked with Claude to implement that service API in the SDK and reuse as much of the existing types (like those from MLModel service and Camera component) as reasonable.

@HipsterBrown HipsterBrown requested a review from a team as a code owner May 18, 2026 18:37
@HipsterBrown HipsterBrown requested review from allisonschiang and lia-viam and removed request for a team May 18, 2026 18:37
Comment on lines +67 to +74
/// @struct point_cloud_object
/// @brief A point cloud and its associated geometry.
/// @note The proto field is named `point_cloud`; we use `cloud` here to avoid shadowing
/// the `point_cloud` type alias (GCC rejects the shadowing under -fpermissive).
struct point_cloud_object {
point_cloud cloud;
std::vector<GeometryConfig> geometries;
};
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a point of contention. Do we want to use cloud for the field name to avoid shadowing the type or something else?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pt_cloud would work, and I'd say remove the @note comment

HipsterBrown and others added 25 commits May 19, 2026 10:02
Documents the approved design for adding viam::sdk::Vision (full peer to
mlmodel/motion/etc.) plus the module-generator template scaffold. Captures
the public API surface, proto-conv strategy, client/server plumbing, test
plan, build wiring, and verification criteria.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix two concrete issues from spec review and incorporate advisory items:

- Replace nonexistent `viam::sdk::geometries` with the actual
  `std::vector<GeometryConfig>` convention used by camera and peers.
- Correct test CMake wiring: use `viamcppsdk_add_boost_test(...)` macro
  and add mocks to the shared `viamsdk_test` target_sources, matching
  every other test in the file.
- Use nested `viam::sdk::impl::vision` namespace, matching mlmodel.
- Note that Camera lacks shared raw_image proto-conv helpers today and
  this work introduces them.
- Flag that the GetStatus shared/per-service split should be verified
  against mlmodel at planning time.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Pin shared raw_image header location to
  src/viam/sdk/common/private/raw_image.{hpp,cpp} (lifted from camera).
- Add the camera_client/server migration as explicit modified files.
- Note the point_cloud_object field/type shadow as intentional and
  require it to be flagged in the PR description.
- Update build sequence to reflect the raw_image lift as step 1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposes the approved spec into 16 bite-sized tasks across three
phases: foundation (raw_image lift, Vision skeleton, client/server/mock
stubs), per-RPC TDD (one task per RPC), and finalization (cl_gen
verification, lint, Doxygen, PR).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix three concrete issues found in plan review (which also affected
the spec):

- Detection bbox proto types were inverted: pixel fields are
  optional int64, normalized fields are optional double. class_name
  is plain string and confidence is plain double (not optional).
  Updated spec, plan struct, plan tests, plan to_proto/from_proto.
- Registration is centralized in registry/registry.cpp via
  Registry::register_resources(), not as a per-file static block.
  Added registry.cpp to the modified-files list and rewrote Tasks 4
  and 5 to add a single register_resource line.
- Helper API is .with(extra), not .with_extra(extra). Corrected the
  example in Task 7 with a pointer to client_helper.hpp.

Also incorporated reviewer recommendations: removed the
GeometryConfig::operator== fallback (it exists), pointed at existing
spatialmath to_proto_impl<GeometryConfig> + from_proto_impl helpers
and the repeated_ptr_convert.hpp utility instead of a manual loop.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Out-pointer style for to_proto(point_cloud_object) and to_proto(properties)
matches both raw_image::to_proto in the same spec and the implementation
sketch in the plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the field-by-field Camera::raw_image <-> proto Image conversion
inlined in camera_client.cpp into shared helpers in
common/private/raw_image.{hpp,cpp}, so the upcoming Vision service can
reuse the conversion for capture_all_from_camera. No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds viam::sdk::Vision with all 9 RPC virtuals and nested POD payload
types (detection, classification, point_cloud_object, properties,
capture_options, capture_all_result). Mirrors the structure of
viam::sdk::Motion / MLModelService. Client and server impls follow
in subsequent commits.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…perator==

- Apply clang-format style per ./bin/run-clang-format.sh.
- Reuse Camera::point_cloud::operator== instead of manually comparing
  mime_type and pc fields.
Adds free-function to_proto/from_proto helpers for Detection,
Classification, properties, and point_cloud_object in
viam::sdk::impl::vision. Tests cover engaged/unengaged optional bbox
fields on Detection, classification basics, properties, and
point_cloud_object bytes.

Also registers the vision service proto (vision.pb.cc / vision.grpc.pb.cc)
into the viamapi target_sources list so the generated symbols are
compiled into libviamapi and available to viamsdk and tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switches Vision::point_cloud_object proto-conv from a hand-rolled loop
to the repo's idiomatic repeated_ptr_convert helpers, matching how
world_state, navigation, and motion handle GeometryConfig vectors. No
behavior change; tests still pass.
Adds the Vision gRPC client class with all 9 RPC overrides as
not-implemented stubs. Registration follows in the next commit.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 9 RPCs return UNIMPLEMENTED. Registration in
Registry::register_resources() now wires VisionClient/VisionServer
together, alphabetically with peer services.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matches the convention used by mock_motion / mock_navigation / mock_switch
and other peer class-style mocks. Avoids ADL surprises against production
viam::sdk::Vision symbols.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets up an in-process gRPC server + client pair backed by MockVision,
mirroring the scaffolding used by test_motion / test_mlmodel. No new
tests in this commit; per-RPC tests follow.
VisionClient stores the gRPC channel as a raw pointer per the Registry
contract; member destruction order must put client first. Also adds a
BOOST_REQUIRE on client at construction to surface registration failures
at fixture init rather than first per-RPC test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the GetProperties RPC client and server to the existing
proto-conv helper. Adds round-trip tests for the basic call and the
extra-passing path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires the GetStatus RPC client and server using shared ProtoStruct
proto-conv helpers, mirroring the mlmodel pattern. Round-trip test
verifies a ProtoStruct with mixed value types survives the wire.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires the DoCommand RPC client and server using ProtoStruct
proto-conv helpers. Round-trip tests cover both flat and nested
ProtoStruct payloads.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o-end

Both classification RPCs round-trip through the existing Classification
proto-conv helpers. Tests verify the request fields (camera_name, image
bytes, count) survive the wire and the response vector preserves order
and content.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both detection RPCs round-trip through the existing Detection
proto-conv helpers. Tests verify pixel bbox, normalized bbox, and the
no-bbox edge case all survive the wire.
The client propagates the request mime_type onto each returned
point_cloud_object's point_cloud.mime_type field, since the proto
type carries bytes + geometries but not mime_type per object.
The server also echoes the request mime_type into the top-level
GetObjectPointCloudsResponse.mime_type field.
The largest Vision RPC: conditionally returns image, detections,
classifications, and object point clouds based on capture_options
flags. Image conversion uses the shared raw_image helper introduced
in the Camera lift task. Tests cover image-only, detections-only,
and full-payload cases.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Confirms that throwing Exception from a Vision implementation
surfaces as a non-OK gRPC status on the client side, and that the
mock's throw_on_next_call flag resets so subsequent calls succeed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
HipsterBrown and others added 5 commits May 19, 2026 10:02
Drop misleading width/height comment in get_classifications (proto
fields are simply left at their default), and document why
vision_fixture diverges from client_to_mock_pipeline<>.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Respect server-returned mime_type in get_object_point_clouds: server
  now populates response.mime_type from the result's cloud.mime_type
  (not echoing the request), and client reads response.mime_type()
  rather than the request mime. Matches the proto contract and the Go
  RDK client behavior. Adds a regression test where request mime
  differs from server-returned mime.
- Initialize Vision::detection::confidence, classification::confidence,
  and all three properties bools via NSDMIs so default-constructed
  values are well-defined.
- Introduce Vision::image as a Vision-specific input type for
  get_detections / get_classifications. Camera::raw_image (alias) is
  kept only for capture_all_result.image where the wire type is the
  full Camera Image proto. Avoids silently dropping source_name on
  inputs whose wire format doesn't carry one — mirrors how Go's RDK
  takes image.Image (decoupled from the camera service).
- Rename point_cloud_object::point_cloud field to ::cloud so it no
  longer shadows the Vision::point_cloud type alias.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A prior commit on this branch (2584b55) inadvertently unwrapped the
deprecated dial_options() signature back to a single line because the
local clang-format version (22) disagrees with the CI lint container's
version (15) about this case. CI's cpp-linter job is authoritative;
restoring the wrapped form makes it pass.

The wrap should not be undone by future formatting on this file unless
the project upgrades the lint container.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clang-tidy's readability-braces-around-statements is treated as an
error in CI. The Vision::detection to_proto / from_proto helpers had
16 single-line `if (cond) stmt;` patterns left over from the initial
draft.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@HipsterBrown HipsterBrown force-pushed the feat/vision-service-sdk branch from d37e2ee to 63c2363 Compare May 19, 2026 14:02
Copy link
Copy Markdown
Member

@lia-viam lia-viam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems difficult for claude and for some human developers truthfully, but we've taken an approach to proto type conversion in the sdk that is very easy/ergonomic for module authors to use, at the cost of being kind of cumbersome to implement on the sdk-side.

Try pointing the agent to, eg,

struct to_proto_impl<pose> {
as well as https://github.com/viamrobotics/viam-cpp-sdk/blob/main/src/viam/sdk/common/proto_convert.hpp

The idea is if you put struct specializations in a header, along with appropriate forward declarations, then end-users can include the viam/api files and be able to just have to_proto and from_proto work automatically.

There are some cases in the codebase where we do manually implement to_proto and from_proto in a cpp file or in the private/ directory; in these cases it is judged that conversion to/from these types is not likely to be needed outside of implementation details. However, since, eg, raw_image is now used by both Camera and Vision it certainly passes the test for providing public conversion functions

There are also cases as I've commented in one of the files where we don't bother implementing to_ and from_proto because the function will only be used once, so we just do the conversion inline in the respective _client and _server files

namespace impl {

/// @brief Convert a proto Image to Camera::raw_image.
Camera::raw_image from_proto(const ::viam::component::camera::v1::Image& proto);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these conversions should probably be public in camera.hpp, see summary comment

namespace impl {
namespace vision {

::viam::service::vision::v1::Detection to_proto(const Vision::detection&);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similarly detection, classification, and point_cloud_object should be public i think

return *channel_;
}

std::vector<Vision::detection> get_detections_from_camera(const std::string& camera_name,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but can you put a newline in between the end of the last function definition and the subsequent ones


/// @struct detection
/// @brief Result of a single object detection.
/// Pixel bbox fields are optional int64; normalized bbox fields are optional double.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment might make more sense split into the body of the struct

Suggested change
/// Pixel bbox fields are optional int64; normalized bbox fields are optional double.
struct detection {
// pixel bbox fields
...
// normalized bbox fields
...
};

Comment on lines +67 to +74
/// @struct point_cloud_object
/// @brief A point cloud and its associated geometry.
/// @note The proto field is named `point_cloud`; we use `cloud` here to avoid shadowing
/// the `point_cloud` type alias (GCC rejects the shadowing under -fpermissive).
struct point_cloud_object {
point_cloud cloud;
std::vector<GeometryConfig> geometries;
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pt_cloud would work, and I'd say remove the @note comment

@@ -0,0 +1,41 @@
// Copyright 2024 Viam Inc.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a separate file for these conversions isn't really in line with other patterns we use in the sdk, but see comments below regarding the actual conversion functions

BOOST_TEST_DONT_PRINT_LOG_VALUE(Vision::classification)
BOOST_TEST_DONT_PRINT_LOG_VALUE(Vision::properties)

BOOST_AUTO_TEST_SUITE(vision_proto_conv)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

harmless but generally speaking we don't bother testing the to_proto/from_proto conversions

namespace sdktests {
namespace vision {

// vision_fixture stands up an in-process gRPC server backed by MockVision and a
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting but i wonder if it's necessary, can you (personally, or pointing the agent to it) compare this file to test_motion or test_mlmodel and see if we don't end up with something simpler without this separate fixture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants