Add progressTrace, writeChannelCongestion, and internalDatabaseSizes#753
Add progressTrace, writeChannelCongestion, and internalDatabaseSizes#753valkrypton wants to merge 1 commit into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughBatch API and query schema expanded: Changes
Sequence Diagram(s)(No sequence diagrams generated — changes are schema/model additions without multi-component control-flow.) Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/batches.rs (1)
236-299: Tests will fail: mock responses don't include required fields.The mock response bodies include removed fields (
enqueuedAt,indexUid,taskUids) but are missing the new required fields (progress,stats,duration). Since these are non-optional in theBatchstruct, deserialization will fail.Example fix for first test mock
let response_body = serde_json::json!({ "results": [ { "uid": 42, - "enqueuedAt": "2024-10-11T11:49:53.000Z", "startedAt": "2024-10-11T11:49:54.000Z", "finishedAt": "2024-10-11T11:49:55.000Z", - "indexUid": "movies", - "taskUids": [1, 2, 3], - "batchStrategy": "time_limit_reached" + "batchStrategy": "time_limit_reached", + "duration": "PT1S", + "progress": { + "steps": [], + "percentage": 100.0 + }, + "stats": { + "totalNbTasks": 3, + "status": {"succeeded": 3}, + "types": {"documentAdditionOrUpdate": 3}, + "indexedUids": {"movies": 3}, + "progressTrace": {}, + "writeChannelCongestion": null, + "internalDatabaseSizes": null + } } ], "limit": 20, "from": null, "next": null, "total": 1 })Note: If
writeChannelCongestionandinternalDatabaseSizescan be null, ensure the struct fields areOption<T>.
🤖 Fix all issues with AI agents
In `@src/batches.rs`:
- Around line 145-150: The BatchProgressStep struct's fields use snake_case but
JSON uses camelCase so deserialization will fail; update the struct
BatchProgressStep to use serde renaming by adding #[serde(rename_all =
"camelCase")] above the struct (or alternatively add #[serde(rename =
"currentStep")] on the current_step field) so that current_step maps to
currentStep and other fields deserialize correctly.
- Around line 214-215: The serde rename on the SettingsUpdate enum variant is
misspelled as "settingUpdate"; update the attribute on SettingsUpdate (the
#[serde(rename = ...)] for the SettingsUpdate variant) to use the correct
Meilisearch task name "settingsUpdate" so batches with settings update tasks
deserialize correctly.
- Around line 168-173: The BatchWriteChannelCongestion struct will fail to
deserialize camelCase JSON keys; add the serde container attribute to map
camelCase to snake_case fields by annotating the struct with #[serde(rename_all
= "camelCase")] so fields like blocking_attempts and blocking_ratio correctly
deserialize from blockingAttempts and blockingRatio (update the
BatchWriteChannelCongestion derive/attributes accordingly).
- Around line 65-76: The query struct's numeric uid types and serialization
behavior are wrong: change uids and batch_uids from Vec<i64> to Vec<u32> to
match Batch.uid, and add #[serde(skip_serializing_if = "Vec::is_empty")] to each
query Vec field (uids, batch_uids, index_uids, statuses, types) so empty vectors
are omitted from serialized query params; locate the fields named uids,
batch_uids, index_uids, statuses, and types in the struct (and reference
Batch.uid for the correct type) and apply the type and serde attribute changes.
- Around line 152-166: BatchStats currently declares status: Status and types:
Type which are wrong — both should be maps of counts per key; update the
BatchStats struct so status and types use HashMap<String, i32> (or similar map
type) instead of Status/Type to match the Meilisearch API, and make
write_channel_congestion and internal_database_sizes opaque to future changes by
replacing BatchWriteChannelCongestion and BatchInternalDatabaseSizes with
Option<serde_json::Value> (or serde_json::Value) to avoid strict typing; ensure
serde renames remain on the fields (e.g., progress_trace, indexed_uids) and
adjust any code referencing BatchStats::status or BatchStats::types to handle a
HashMap.
- Around line 15-20: Update the Batch struct so deserialization tolerates
missing/null fields: change the types of the fields progress, stats, and
duration in struct Batch to Option<BatchProgress>, Option<BatchStats>, and
Option<String> respectively, and add #[serde(default)] to each of those fields
(e.g., on progress, stats, duration) so missing values in test responses or
nulls from the API won't cause deserialization to fail.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/batches.rs
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
72813c3 to
d4b8f27
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/batches.rs (1)
228-266: Test will fail: mock response missing requiredstatsfield.The mock response at lines 233-250 lacks the
statsfield, butBatch.statsis not optional. This test will fail with a deserialization error likemissing field 'stats'.Either:
- Make
statsoptional in theBatchstruct (recommended per earlier comment), or- Update the mock response to include a valid
statsobjectOption 2: Update test mock to include stats
let response_body = serde_json::json!({ "results": [ { "uid": 42, - "enqueuedAt": "2024-10-11T11:49:53.000Z", "startedAt": "2024-10-11T11:49:54.000Z", "finishedAt": "2024-10-11T11:49:55.000Z", - "indexUid": "movies", - "taskUids": [1, 2, 3], - "batchStrategy": "time_limit_reached" + "batchStrategy": "time_limit_reached", + "stats": { + "totalNbTasks": 3, + "status": {"succeeded": 3}, + "types": {"documentAdditionOrUpdate": 3}, + "indexedUids": {"movies": 3}, + "progressTrace": {}, + "writeChannelCongestion": null, + "internalDatabaseSizes": null + } } ], ... })Note: The removed fields (
enqueuedAt,indexUid,taskUids) in the mock should also be cleaned up to match the new struct, though serde will simply ignore them.
🤖 Fix all issues with AI agents
In `@src/batches.rs`:
- Around line 196-221: The Status and Type structs currently have private inner
fields so callers cannot read counts; make each field public (e.g., change
enqueued/succeeded/etc. and index_creation/document_addition_or_update/etc. to
pub) so deserialized values are accessible, and keep the existing #[derive(...)]
and serde attributes; alternatively, replace the typed structs with
HashMap<String, u32> for the BatchStats fields (status and types) if you prefer
a dynamic, extensible shape instead of the fixed Type and Status structs.
♻️ Duplicate comments (2)
src/batches.rs (2)
268-292: Test will fail: mock response missing requiredstatsfield.Same issue as above - the mock response at lines 273-278 lacks
stats, causing deserialization failure.
14-20:statsfield should be optional to match API behavior and pass existing tests.The test responses at lines 233-250 and 273-278 don't include
stats,progress, ordurationfields. Whileprogressanddurationare correctlyOption<T>, the non-optionalstats: BatchStatswill cause deserialization failures.Per the Meilisearch API documentation, while
statsis typically present, making it optional with#[serde(default)]provides resilience against missing data.Proposed fix
/// Batch progress. pub progress: Option<BatchProgress>, /// Batch stats. - pub stats: BatchStats, + #[serde(default)] + pub stats: Option<BatchStats>, /// The total elapsed time the batch spent in the processing state, in ISO 8601 format. pub duration: Option<String>,Note: This requires
BatchStatsto implementDefault, or useOption<BatchStats>directly.
🧹 Nitpick comments (1)
src/batches.rs (1)
88-89: Consider skipping serialization whenreverseis false.The
reversefield will always be included in serialized query parameters, even whenfalse(the default). For cleaner API calls, skip serialization when the value is the default.Proposed fix
/// If true, returns results in the reverse order, from oldest to most recent + #[serde(skip_serializing_if = "std::ops::Not::not")] reverse: bool,Or use a helper function:
fn is_false(b: &bool) -> bool { !*b } // Then: #[serde(skip_serializing_if = "is_false")] reverse: bool,
d4b8f27 to
22d7e96
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@src/batches.rs`:
- Around line 65-107: The new filter fields (uids, batch_uids, index_uids,
statuses, types, limit, from, reverse, before_enqueued_at, before_started_at,
before_finished_at, after_enqueued_at, after_started_at, after_finished_at) are
private and only set in new(), so add public builder/setter methods to the
struct used with new() (e.g., methods named with_uids, with_batch_uids,
with_index_uids, with_statuses, with_types, with_limit, with_from, with_reverse,
with_before_enqueued_at, with_before_started_at, with_before_finished_at,
with_after_enqueued_at, with_after_started_at, with_after_finished_at) that
accept appropriate types (Vec<T> or Option<T>/bool), set the corresponding
field, and return Self (or &mut Self) to allow chaining; implement these on the
same impl that contains new() so callers can construct and modify the query
filters.
- Around line 76-81: The fields statuses and types in the batch filter struct
are currently typed as the count structs Status and Type from BatchStats
(BatchStats) which will serialize as objects; change these fields to use the
actual task-status/task-type filter enums or string-based enums (e.g.,
TaskStatus or TaskType) so query parameters serialize as strings. Locate the
struct with the statuses: Vec<Status> and types: Vec<Type> declarations and
replace their types with the correct filter enums (or add new filter enums if
none exist), and ensure serde derives/attributes produce string serialization
(e.g., implement/derive Serialize for those enums or use #[serde(rename_all =
"...")] so the query emits simple string values rather than objects.
♻️ Duplicate comments (2)
src/batches.rs (2)
14-20: Makestatsoptional (missing field breaks deserialization).
Batch.statsis non-optional, so responses that omitstats(older servers or existing test fixtures) will fail to deserialize (Line 18). Consider making it optional and defaulted.🐛 Proposed fix
- pub stats: BatchStats, + #[serde(default)] + pub stats: Option<BatchStats>,
164-194: KeepprogressTrace,writeChannelCongestion, andinternalDatabaseSizesopaque.Per the PR objectives, these fields should be treated as opaque objects. Strongly typed structs/
HashMap<String, String>will break if the server shape changes or contains nested data. Preferserde_json::Value(optionally wrapped inOption).💡 Suggested direction
pub struct BatchStats { pub total_nb_tasks: i32, pub status: Status, pub types: Type, pub indexed_uids: HashMap<String, i32>, - pub progress_trace: HashMap<String, String>, - pub write_channel_congestion: BatchWriteChannelCongestion, - pub internal_database_sizes: BatchInternalDatabaseSizes, + pub progress_trace: serde_json::Value, + pub write_channel_congestion: Option<serde_json::Value>, + pub internal_database_sizes: Option<serde_json::Value>, }Then remove
BatchWriteChannelCongestion/BatchInternalDatabaseSizesif unused.
22d7e96 to
1c4a038
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/batches.rs (1)
131-146: Missing builder methods for new query parameters.The new filter fields (
uids,batch_uids,index_uids,statuses,types,reverse, and thebefore_*/after_*timestamps) are private with no public setters, making them unusable by API consumers. Onlywith_limitandwith_fromexist.Example builder methods to add
#[must_use] pub fn with_uids(&mut self, uids: Vec<i64>) -> &mut Self { self.uids = uids; self } #[must_use] pub fn with_batch_uids(&mut self, batch_uids: Vec<i64>) -> &mut Self { self.batch_uids = batch_uids; self } #[must_use] pub fn with_index_uids(&mut self, index_uids: Vec<String>) -> &mut Self { self.index_uids = index_uids; self } #[must_use] pub fn with_reverse(&mut self, reverse: bool) -> &mut Self { self.reverse = reverse; self } #[must_use] pub fn with_before_enqueued_at(&mut self, before: OffsetDateTime) -> &mut Self { self.before_enqueued_at = Some(before); self } // ... similar methods for other timestamp fields
🤖 Fix all issues with AI agents
In `@src/batches.rs`:
- Around line 75-80: The fields `statuses` and `types` are currently typed as
the response-count structs `Statuses` and `Types` but use
#[serde(skip_serializing_if = "Vec::is_empty")] which is invalid; change those
fields to query-friendly types (e.g., Vec<String> or a Vec<YourFilterEnum> that
serializes to comma-separated names) and update the associated `new()`
initializer to create empty Vecs, and replace the serde attribute with
#[serde(skip_serializing_if = "Vec::is_empty")] on those Vec fields; also keep
the original `Statuses`/`Types` structs for response deserialization and do not
reuse them for query filtering.
♻️ Duplicate comments (3)
src/batches.rs (3)
232-249: Test mock response missing requiredstatsfield.This mock response lacks the
statsfield which is currently required in theBatchstruct. The test will fail at deserialization. Either makestatsoptional (recommended, see earlier comment) or update the mock to include a validstatsobject.If keeping stats required, add mock data
{ "uid": 42, "enqueuedAt": "2024-10-11T11:49:53.000Z", "startedAt": "2024-10-11T11:49:54.000Z", "finishedAt": "2024-10-11T11:49:55.000Z", "indexUid": "movies", "taskUids": [1, 2, 3], - "batchStrategy": "time_limit_reached" + "batchStrategy": "time_limit_reached", + "stats": { + "totalNbTasks": 3, + "status": {}, + "types": {}, + "indexedUids": {}, + "progressTrace": {}, + "writeChannelCongestion": null, + "internalDatabaseSizes": null + } }
14-19:statsfield should beOption<BatchStats>for API compatibility.The tests (lines 232–249, 272–277) mock responses without a
statsfield, which will cause deserialization failures sincestatsis required. Additionally, the Meilisearch API may omit or null this field for certain batch states.Proposed fix
/// Batch progress. + #[serde(default)] pub progress: Option<BatchProgress>, /// Batch stats. - pub stats: BatchStats, + #[serde(default)] + pub stats: Option<BatchStats>, /// The total elapsed time the batch spent in the processing state, in ISO 8601 format. pub duration: Option<String>,
163-173:write_channel_congestionandinternal_database_sizesshould beOption<T>.Per the PR objectives, these fields should be treated as opaque and their internal structures may change. The Meilisearch API may return
nullfor these fields in certain scenarios. Consider making them optional, and potentially usingserde_json::Valuefor true forward-compatibility.Proposed fix
#[derive(Debug, Clone, Deserialize)] #[serde(rename_all = "camelCase")] pub struct BatchStats { pub total_nb_tasks: i32, pub status: Statuses, pub types: Types, pub indexed_uids: HashMap<String, i32>, pub progress_trace: HashMap<String, String>, - pub write_channel_congestion: BatchWriteChannelCongestion, - pub internal_database_sizes: BatchInternalDatabaseSizes, + pub write_channel_congestion: Option<serde_json::Value>, + pub internal_database_sizes: Option<serde_json::Value>, }Alternatively, keep the typed structs but wrap in
Option:- pub write_channel_congestion: BatchWriteChannelCongestion, - pub internal_database_sizes: BatchInternalDatabaseSizes, + pub write_channel_congestion: Option<BatchWriteChannelCongestion>, + pub internal_database_sizes: Option<BatchInternalDatabaseSizes>,
🧹 Nitpick comments (1)
src/batches.rs (1)
175-193: Consider future-proofing opaque struct fields.These structs are well-defined, but per PR objectives, their internal structures "may change in breaking ways." If the typed approach is kept, consider adding
#[serde(default)]and#[non_exhaustive]to allow graceful handling of new or missing fields in future API versions.Suggested improvement
#[derive(Debug, Clone, Deserialize)] #[serde(rename_all = "camelCase")] +#[non_exhaustive] pub struct BatchWriteChannelCongestion { pub attempts: i32, pub blocking_attempts: i32, pub blocking_ratio: f64, } #[derive(Debug, Clone, Deserialize)] #[serde(rename_all = "camelCase")] +#[non_exhaustive] pub struct BatchInternalDatabaseSizes { pub external_documents_id: String, // ... }
|
@curquiza Can you take a look at this? |
1c4a038 to
ebe00a3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/batches.rs (1)
288-346:⚠️ Potential issue | 🔴 CriticalTests will fail: mock responses are missing the required
statsfield.Both
test_get_batches_parses_batch_strategyandtest_get_batch_by_uid_parses_batch_strategyomitstatsfrom the JSON response, butBatch.stats: BatchStatsis non-optional. Deserialization will error with "missing fieldstats".If
statsis madeOption<BatchStats>(as suggested above), these tests will pass as-is. Otherwise, add a validstatsobject to each mock response:"stats": { "totalNbTasks": 3, "status": {"succeeded": 3}, "types": {"documentAdditionOrUpdate": 3}, "indexedUids": {}, "progressTrace": {}, "writeChannelCongestion": null, "internalDatabaseSizes": null }
🤖 Fix all issues with AI agents
In `@src/batches.rs`:
- Around line 89-106: The six OffsetDateTime fields on BatchesQuery
(before_enqueued_at, before_started_at, before_finished_at, after_enqueued_at,
after_started_at, after_finished_at) are missing the RFC3339 serde annotations;
update each of those fields to use the same serde with-annotation used by
TasksQuery for its date options so they serialize to RFC3339 when sent to
Meilisearch (apply the TasksQuery date field annotation to each of the six
BatchesQuery fields).
🧹 Nitpick comments (2)
src/batches.rs (2)
87-88:reversewill always be serialized, even whenfalse.Unlike the other optional fields,
reverse: boollacks a skip guard, so every query will includereverse=false. Consider usingOption<bool>or adding a skip helper to only send it when explicitly set.Proposed fix
/// If true, returns results in the reverse order, from oldest to most recent - reverse: bool, + #[serde(skip_serializing_if = "Option::is_none")] + reverse: Option<bool>,And update
new()andwith_reverse():- reverse: false, + reverse: None,- pub fn with_reverse(&mut self, reverse: bool) -> &mut Self { - self.reverse = reverse; + pub fn with_reverse(&mut self, reverse: bool) -> &mut Self { + self.reverse = Some(reverse);
247-276: Consider adding#[serde(other)]variants for forward-compatibility.
BatchStrategyalready hasUnknownwith#[serde(other)]for forward-compat (line 42). If Meilisearch adds a new status or task type in a future release, deserialization ofStatusandTypewill fail hard. Adding anUnknowncatch-all variant to both enums would be consistent with the existing pattern.Proposed fix (example for Status)
pub enum Status { Enqueued, Processing, Succeeded, Failed, Canceled, + #[serde(other)] + Unknown, }Apply similarly to
Type.Note:
#[serde(other)]only works with unit variants, which is the case here. However, since these enums deriveHashand are used asHashMapkeys, verify that unknown values are handled appropriately in consumer code.
| /// Select batches containing tasks with the specified enqueuedAt field | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| before_enqueued_at: Option<OffsetDateTime>, | ||
| /// Select batches containing tasks with the specified startedAt field | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| before_started_at: Option<OffsetDateTime>, | ||
| /// Select batches containing tasks with the specified finishedAt field | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| before_finished_at: Option<OffsetDateTime>, | ||
| /// Select batches containing tasks with the specified enqueuedAt field | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| after_enqueued_at: Option<OffsetDateTime>, | ||
| /// Select batches containing tasks with the specified startedAt field | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| after_started_at: Option<OffsetDateTime>, | ||
| /// Select batches containing tasks with the specified finishedAt field | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| after_finished_at: Option<OffsetDateTime>, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's examine the src/batches.rs file to see the current state
wc -l src/batches.rsRepository: meilisearch/meilisearch-rust
Length of output: 90
🏁 Script executed:
#!/bin/bash
# Read the relevant sections of the file - around lines 21-24 (Batch struct)
# and lines 89-106 (BatchesQuery struct)
cat -n src/batches.rs | sed -n '1,120p'Repository: meilisearch/meilisearch-rust
Length of output: 5733
🏁 Script executed:
#!/bin/bash
# Search for how OffsetDateTime and serde rfc3339 are used throughout the codebase
rg -n "serde.*rfc3339" src/ -t rust
rg -n "serialize_with.*rfc3339" src/ -t rustRepository: meilisearch/meilisearch-rust
Length of output: 2320
🏁 Script executed:
#!/bin/bash
# Check how query serialization happens - look for serde attribute usage on query structs
rg -B2 -A2 "struct.*Query" src/ -t rust | head -100Repository: meilisearch/meilisearch-rust
Length of output: 3675
🏁 Script executed:
#!/bin/bash
# Look at TasksQuery to see how it handles OffsetDateTime serialization
cat -n src/tasks.rs | sed -n '540,580p'Repository: meilisearch/meilisearch-rust
Length of output: 2011
🏁 Script executed:
#!/bin/bash
# Let's also check if there are any tests or documentation about query format
rg -n "rfc3339" src/tasks.rs -B2 -A2 | head -60Repository: meilisearch/meilisearch-rust
Length of output: 2196
Add RFC 3339 serialization format to BatchesQuery date filter fields.
The BatchesQuery struct lacks serialization format annotations for its six date filter fields (before_enqueued_at, before_started_at, before_finished_at, after_enqueued_at, after_started_at, after_finished_at). Without these annotations, OffsetDateTime values will not serialize to RFC 3339 format required by the Meilisearch API. The TasksQuery struct in the same codebase already implements the correct pattern for identical date fields.
Apply the following annotation to each of the six date fields in BatchesQuery (lines 89-106):
Proposed fix
#[serde(
skip_serializing_if = "Option::is_none",
+ serialize_with = "time::serde::rfc3339::option::serialize"
)]
before_enqueued_at: Option<OffsetDateTime>,Repeat for: before_started_at, before_finished_at, after_enqueued_at, after_started_at, after_finished_at.
🤖 Prompt for AI Agents
In `@src/batches.rs` around lines 89 - 106, The six OffsetDateTime fields on
BatchesQuery (before_enqueued_at, before_started_at, before_finished_at,
after_enqueued_at, after_started_at, after_finished_at) are missing the RFC3339
serde annotations; update each of those fields to use the same serde
with-annotation used by TasksQuery for its date options so they serialize to
RFC3339 when sent to Meilisearch (apply the TasksQuery date field annotation to
each of the six BatchesQuery fields).
|
@Strift can you review this? |
|
Hello @valkrypton, and thank you for this contribution 🙌 Can you please add tests? @curquiza unsure why the tests are not running here :/ |
|
@Strift Hi, i am not sure what kind of tests are you referring to? Could you explain more? |
|
I meant we should add tests that assert the |
ebe00a3 to
26d3dfa
Compare
Tests added |
Related issue
Fixes #663
What does this PR do?
Updates the batch stats to be compliant with the new batches API.
PR checklist
Please check if your PR fulfills the following requirements:
Thank you so much for contributing to Meilisearch!
Summary by CodeRabbit
New Features
Documentation