feat: add vision service#639
Conversation
| /// @struct point_cloud_object | ||
| /// @brief A point cloud and its associated geometry. | ||
| /// @note The proto field is named `point_cloud`; we use `cloud` here to avoid shadowing | ||
| /// the `point_cloud` type alias (GCC rejects the shadowing under -fpermissive). | ||
| struct point_cloud_object { | ||
| point_cloud cloud; | ||
| std::vector<GeometryConfig> geometries; | ||
| }; |
There was a problem hiding this comment.
This might be a point of contention. Do we want to use cloud for the field name to avoid shadowing the type or something else?
There was a problem hiding this comment.
pt_cloud would work, and I'd say remove the @note comment
Documents the approved design for adding viam::sdk::Vision (full peer to mlmodel/motion/etc.) plus the module-generator template scaffold. Captures the public API surface, proto-conv strategy, client/server plumbing, test plan, build wiring, and verification criteria. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix two concrete issues from spec review and incorporate advisory items: - Replace nonexistent `viam::sdk::geometries` with the actual `std::vector<GeometryConfig>` convention used by camera and peers. - Correct test CMake wiring: use `viamcppsdk_add_boost_test(...)` macro and add mocks to the shared `viamsdk_test` target_sources, matching every other test in the file. - Use nested `viam::sdk::impl::vision` namespace, matching mlmodel. - Note that Camera lacks shared raw_image proto-conv helpers today and this work introduces them. - Flag that the GetStatus shared/per-service split should be verified against mlmodel at planning time. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Pin shared raw_image header location to
src/viam/sdk/common/private/raw_image.{hpp,cpp} (lifted from camera).
- Add the camera_client/server migration as explicit modified files.
- Note the point_cloud_object field/type shadow as intentional and
require it to be flagged in the PR description.
- Update build sequence to reflect the raw_image lift as step 1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Decomposes the approved spec into 16 bite-sized tasks across three phases: foundation (raw_image lift, Vision skeleton, client/server/mock stubs), per-RPC TDD (one task per RPC), and finalization (cl_gen verification, lint, Doxygen, PR). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fix three concrete issues found in plan review (which also affected the spec): - Detection bbox proto types were inverted: pixel fields are optional int64, normalized fields are optional double. class_name is plain string and confidence is plain double (not optional). Updated spec, plan struct, plan tests, plan to_proto/from_proto. - Registration is centralized in registry/registry.cpp via Registry::register_resources(), not as a per-file static block. Added registry.cpp to the modified-files list and rewrote Tasks 4 and 5 to add a single register_resource line. - Helper API is .with(extra), not .with_extra(extra). Corrected the example in Task 7 with a pointer to client_helper.hpp. Also incorporated reviewer recommendations: removed the GeometryConfig::operator== fallback (it exists), pointed at existing spatialmath to_proto_impl<GeometryConfig> + from_proto_impl helpers and the repeated_ptr_convert.hpp utility instead of a manual loop. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Out-pointer style for to_proto(point_cloud_object) and to_proto(properties) matches both raw_image::to_proto in the same spec and the implementation sketch in the plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extracts the field-by-field Camera::raw_image <-> proto Image conversion
inlined in camera_client.cpp into shared helpers in
common/private/raw_image.{hpp,cpp}, so the upcoming Vision service can
reuse the conversion for capture_all_from_camera. No behavior change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds viam::sdk::Vision with all 9 RPC virtuals and nested POD payload types (detection, classification, point_cloud_object, properties, capture_options, capture_all_result). Mirrors the structure of viam::sdk::Motion / MLModelService. Client and server impls follow in subsequent commits. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…perator== - Apply clang-format style per ./bin/run-clang-format.sh. - Reuse Camera::point_cloud::operator== instead of manually comparing mime_type and pc fields.
Adds free-function to_proto/from_proto helpers for Detection, Classification, properties, and point_cloud_object in viam::sdk::impl::vision. Tests cover engaged/unengaged optional bbox fields on Detection, classification basics, properties, and point_cloud_object bytes. Also registers the vision service proto (vision.pb.cc / vision.grpc.pb.cc) into the viamapi target_sources list so the generated symbols are compiled into libviamapi and available to viamsdk and tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switches Vision::point_cloud_object proto-conv from a hand-rolled loop to the repo's idiomatic repeated_ptr_convert helpers, matching how world_state, navigation, and motion handle GeometryConfig vectors. No behavior change; tests still pass.
Adds the Vision gRPC client class with all 9 RPC overrides as not-implemented stubs. Registration follows in the next commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
All 9 RPCs return UNIMPLEMENTED. Registration in Registry::register_resources() now wires VisionClient/VisionServer together, alphabetically with peer services. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Matches the convention used by mock_motion / mock_navigation / mock_switch and other peer class-style mocks. Avoids ADL surprises against production viam::sdk::Vision symbols. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sets up an in-process gRPC server + client pair backed by MockVision, mirroring the scaffolding used by test_motion / test_mlmodel. No new tests in this commit; per-RPC tests follow.
VisionClient stores the gRPC channel as a raw pointer per the Registry contract; member destruction order must put client first. Also adds a BOOST_REQUIRE on client at construction to surface registration failures at fixture init rather than first per-RPC test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires the GetProperties RPC client and server to the existing proto-conv helper. Adds round-trip tests for the basic call and the extra-passing path. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires the GetStatus RPC client and server using shared ProtoStruct proto-conv helpers, mirroring the mlmodel pattern. Round-trip test verifies a ProtoStruct with mixed value types survives the wire. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Wires the DoCommand RPC client and server using ProtoStruct proto-conv helpers. Round-trip tests cover both flat and nested ProtoStruct payloads. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…o-end Both classification RPCs round-trip through the existing Classification proto-conv helpers. Tests verify the request fields (camera_name, image bytes, count) survive the wire and the response vector preserves order and content. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Both detection RPCs round-trip through the existing Detection proto-conv helpers. Tests verify pixel bbox, normalized bbox, and the no-bbox edge case all survive the wire.
The client propagates the request mime_type onto each returned point_cloud_object's point_cloud.mime_type field, since the proto type carries bytes + geometries but not mime_type per object. The server also echoes the request mime_type into the top-level GetObjectPointCloudsResponse.mime_type field.
The largest Vision RPC: conditionally returns image, detections, classifications, and object point clouds based on capture_options flags. Image conversion uses the shared raw_image helper introduced in the Camera lift task. Tests cover image-only, detections-only, and full-payload cases. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Confirms that throwing Exception from a Vision implementation surfaces as a non-OK gRPC status on the client side, and that the mock's throw_on_next_call flag resets so subsequent calls succeed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Drop misleading width/height comment in get_classifications (proto fields are simply left at their default), and document why vision_fixture diverges from client_to_mock_pipeline<>. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Respect server-returned mime_type in get_object_point_clouds: server now populates response.mime_type from the result's cloud.mime_type (not echoing the request), and client reads response.mime_type() rather than the request mime. Matches the proto contract and the Go RDK client behavior. Adds a regression test where request mime differs from server-returned mime. - Initialize Vision::detection::confidence, classification::confidence, and all three properties bools via NSDMIs so default-constructed values are well-defined. - Introduce Vision::image as a Vision-specific input type for get_detections / get_classifications. Camera::raw_image (alias) is kept only for capture_all_result.image where the wire type is the full Camera Image proto. Avoids silently dropping source_name on inputs whose wire format doesn't carry one — mirrors how Go's RDK takes image.Image (decoupled from the camera service). - Rename point_cloud_object::point_cloud field to ::cloud so it no longer shadows the Vision::point_cloud type alias. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A prior commit on this branch (2584b55) inadvertently unwrapped the deprecated dial_options() signature back to a single line because the local clang-format version (22) disagrees with the CI lint container's version (15) about this case. CI's cpp-linter job is authoritative; restoring the wrapped form makes it pass. The wrap should not be undone by future formatting on this file unless the project upgrades the lint container. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
clang-tidy's readability-braces-around-statements is treated as an error in CI. The Vision::detection to_proto / from_proto helpers had 16 single-line `if (cond) stmt;` patterns left over from the initial draft. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
d37e2ee to
63c2363
Compare
lia-viam
left a comment
There was a problem hiding this comment.
So this seems difficult for claude and for some human developers truthfully, but we've taken an approach to proto type conversion in the sdk that is very easy/ergonomic for module authors to use, at the cost of being kind of cumbersome to implement on the sdk-side.
Try pointing the agent to, eg,
viam-cpp-sdk/src/viam/sdk/common/pose.hpp
Line 53 in d54d7fb
The idea is if you put struct specializations in a header, along with appropriate forward declarations, then end-users can include the viam/api files and be able to just have to_proto and from_proto work automatically.
There are some cases in the codebase where we do manually implement to_proto and from_proto in a cpp file or in the private/ directory; in these cases it is judged that conversion to/from these types is not likely to be needed outside of implementation details. However, since, eg, raw_image is now used by both Camera and Vision it certainly passes the test for providing public conversion functions
There are also cases as I've commented in one of the files where we don't bother implementing to_ and from_proto because the function will only be used once, so we just do the conversion inline in the respective _client and _server files
| namespace impl { | ||
|
|
||
| /// @brief Convert a proto Image to Camera::raw_image. | ||
| Camera::raw_image from_proto(const ::viam::component::camera::v1::Image& proto); |
There was a problem hiding this comment.
I think these conversions should probably be public in camera.hpp, see summary comment
| namespace impl { | ||
| namespace vision { | ||
|
|
||
| ::viam::service::vision::v1::Detection to_proto(const Vision::detection&); |
There was a problem hiding this comment.
similarly detection, classification, and point_cloud_object should be public i think
| return *channel_; | ||
| } | ||
|
|
||
| std::vector<Vision::detection> get_detections_from_camera(const std::string& camera_name, |
There was a problem hiding this comment.
nit but can you put a newline in between the end of the last function definition and the subsequent ones
|
|
||
| /// @struct detection | ||
| /// @brief Result of a single object detection. | ||
| /// Pixel bbox fields are optional int64; normalized bbox fields are optional double. |
There was a problem hiding this comment.
This comment might make more sense split into the body of the struct
| /// Pixel bbox fields are optional int64; normalized bbox fields are optional double. | |
| struct detection { | |
| // pixel bbox fields | |
| ... | |
| // normalized bbox fields | |
| ... | |
| }; |
| /// @struct point_cloud_object | ||
| /// @brief A point cloud and its associated geometry. | ||
| /// @note The proto field is named `point_cloud`; we use `cloud` here to avoid shadowing | ||
| /// the `point_cloud` type alias (GCC rejects the shadowing under -fpermissive). | ||
| struct point_cloud_object { | ||
| point_cloud cloud; | ||
| std::vector<GeometryConfig> geometries; | ||
| }; |
There was a problem hiding this comment.
pt_cloud would work, and I'd say remove the @note comment
| @@ -0,0 +1,41 @@ | |||
| // Copyright 2024 Viam Inc. | |||
There was a problem hiding this comment.
a separate file for these conversions isn't really in line with other patterns we use in the sdk, but see comments below regarding the actual conversion functions
| BOOST_TEST_DONT_PRINT_LOG_VALUE(Vision::classification) | ||
| BOOST_TEST_DONT_PRINT_LOG_VALUE(Vision::properties) | ||
|
|
||
| BOOST_AUTO_TEST_SUITE(vision_proto_conv) |
There was a problem hiding this comment.
harmless but generally speaking we don't bother testing the to_proto/from_proto conversions
| namespace sdktests { | ||
| namespace vision { | ||
|
|
||
| // vision_fixture stands up an in-process gRPC server backed by MockVision and a |
There was a problem hiding this comment.
this is interesting but i wonder if it's necessary, can you (personally, or pointing the agent to it) compare this file to test_motion or test_mlmodel and see if we don't end up with something simpler without this separate fixture
I was testing out the C++ module generation for the vision service module I wanted to build and discovered that service template was missing due to the API not being provided by the SDK. So I worked with Claude to implement that service API in the SDK and reuse as much of the existing types (like those from MLModel service and Camera component) as reasonable.