Skip to content

fix: azure ai search write error#702

Closed
mpolomdeepsense wants to merge 5 commits intomainfrom
fix/azure-ai-search-write-error
Closed

fix: azure ai search write error#702
mpolomdeepsense wants to merge 5 commits intomainfrom
fix/azure-ai-search-write-error

Conversation

@mpolomdeepsense
Copy link
Copy Markdown
Contributor

@mpolomdeepsense mpolomdeepsense commented May 4, 2026

Summary

  • Azure AI Search rejected uploads with 400 The property 'table_extraction_method' does not exist on type 'search.complex.metadata' because the uploader's field filter only checked top-level keys — sub-fields of complex types passed through unfiltered.
  • Replaced get_index_field_names() / flat-set filter_doc() with get_index_schema() returning a nested dict and a recursive filter_doc() that walks complex types and collections of complex objects.
  • Top-level drop-logging behavior preserved.

Test plan

  • Unit: test_filter_doc_drops_unknown_field_inside_complex_metadata and 11 other new cases cover nested filtering, deep nesting, collections of complex objects, simple-collection passthrough, input non-mutation, and the schema walker against real Azure SDK SimpleField/ComplexField/SearchIndex objects.
  • Integration (gated by AZURE_SEARCH_API_KEY):
    • test_azure_ai_search_destination_drops_unknown_nested_fields — stages real elements, injects metadata.table_extraction_method plus a stray top-level field, runs the uploader against a real index, asserts upload succeeds and document count matches.
    • test_azure_ai_search_destination_rejects_unknown_nested_fields_when_unfiltered — bypasses the filter via write_dict and asserts Azure raises WriteError mentioning both table_extraction_method and search.complex.metadata. Pins the upstream Azure contract: if this stops failing, the filter is no longer load-bearing.

Note

Medium Risk
Changes core Azure AI Search upload filtering to recurse into complex/nested fields and lists, which could drop data if schema inference is wrong, but is narrowly scoped to pre-upload shaping and covered by new unit/integration tests.

Overview
Fixes Azure AI Search uploads failing on strict complex-type validation by replacing the flat top-level field allowlist with a nested index schema (get_index_schema) and a recursive filter_doc that strips unknown sub-fields inside complex objects and collections.

Adds unit coverage for schema-walking and nested/list filtering behavior, plus integration regression/contract tests that (1) verify uploads succeed after injecting unsupported nested/top-level fields and (2) confirm Azure rejects the same doc when bypassing filtering. Bumps version to 1.5.1 and records the fix in CHANGELOG.md.

Reviewed by Cursor Bugbot for commit 49ce02c. Bugbot is set up for automated code reviews on this repo. Configure here.

Drops unknown fields recursively, so that it fully matches defined schema.
@mpolomdeepsense
Copy link
Copy Markdown
Contributor Author

duplicates: #701

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant