Skip to content

[pull] master from DataDog:master#611

Merged
pull[bot] merged 1 commit into
ConnectionMaster:masterfrom
DataDog:master
Jun 22, 2026
Merged

[pull] master from DataDog:master#611
pull[bot] merged 1 commit into
ConnectionMaster:masterfrom
DataDog:master

Conversation

@pull

@pull pull Bot commented Jun 22, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

* Add SAP HANA schema collection and diagnostics

Collect SAP HANA catalog metadata (schemas, tables, columns) for Database
Monitoring's Schema Explorer, mirroring the postgres implementation on the
shared SchemaCollector base class. Add startup diagnostics for connection,
version, and catalog-view access.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Add changelog entry

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Fix license header year on new files

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sap_hana: handle missing DESCRIPTION column in SYS.M_DATABASE

HANA Express does not include the DESCRIPTION column in SYS.M_DATABASE.
Fetch DATABASE_NAME and DESCRIPTION in separate queries so that the
absence of DESCRIPTION (silently ignored) does not prevent the database
name from being resolved.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: align kind and dbms with dbm-metadata-processor expectations

Use 'saphana_databases' as the schema payload kind and 'saphana' as the
dbms identifier, matching KindSapHanaDatabases and the SapHana DBMS
constant defined in the dd-go dbm-metadata-processor PR.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: update tests to expect saphana kind and dbms values

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: fix schema collector privilege filter and column query efficiency

Remove HAS_PRIVILEGES filter from schema discovery so catalog-view grants
control visibility, consistent with the Postgres schema collector. Apply
max_tables trimming before fetching columns to avoid loading column data
for tables that will be discarded.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: push column filter to SQL WHERE clause instead of client-side

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: address PR review wording suggestions in README

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: replace "Database Monitoring's Schema Explorer" with "Data Quality features in Data Observability"

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: enforce schema collection limits in SQL and stream tables

Replace the three fetchall() catalog queries and in-memory filtering with a
single streamed JOIN. The limited_tables CTE pushes the schema filters and the
max_tables LIMIT into the database, so the agent never pulls more than
max_tables tables' rows into memory regardless of total schema size. Columns
are joined and ordered so each table is assembled one at a time as the cursor
streams, instead of materializing every table and column up front.

Verified against a live HANA Express instance that the CTE LIMIT caps tables
(not joined rows) and the LIKE ... ESCAPE system-schema filter parses correctly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sap_hana: add HanaSchemaCollector unit tests for column mapping and _get_databases

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: add schema collection memory benchmark

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: fix benchmark setup and add 50x50 baseline results

- Use host port 39019 to avoid collision with test container on 39017
- Grant SELECT on SYS.M_DATABASE, SYS.TABLES, SYS.SCHEMAS, SYS.TABLE_COLUMNS
  (CATALOG READ alone is insufficient for these views)
- Fix global declaration order bug in setup_database.py
- Add benchmark_results_50x50.txt as baseline (trivial data, both modes identical)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: add 1000x1000 benchmark results (18.4x RSS reduction with limits)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: set hdbcli cursor fetch size to 10k to reduce C-layer buffering

Without a fetch size, hdbcli buffers the entire query result set in its C layer
before Python iterates it, contributing ~500 MiB to RSS on a 1000x1000 schema.
setfetchsize(10_000) limits the client-side buffer to 10k rows per round-trip.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Revert "sap_hana: set hdbcli cursor fetch size to 10k to reduce C-layer buffering"

This reverts commit 6fe80a4.

* sap_hana: push max_columns limit into SQL via ROW_NUMBER() CTE

Previously the query returned all columns for every table and Python discarded
those beyond max_columns. On a 1000-table schema with max_columns=50 this sent
285k unnecessary rows from the server (300 tables x 950 discarded columns).

A new limited_columns CTE ranks columns per table with ROW_NUMBER() OVER
(PARTITION BY schema, table ORDER BY position) and the LEFT JOIN filters on
rn <= max_columns, so the server only sends the first max_columns columns per
table. The client-side check in _get_next() stays as a safety net.

Benchmark result on 1000x1000 schema: limited mode duration 8.1s -> 1.6s (5x).
Peak RSS is unchanged — memory is bounded by the Python-side column dict
accumulation, not the cursor row count.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: add license headers to benchmark scripts

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: flush after 50k columns instead of 10k tables

The base class payload_chunk_size counts tables, which is a poor memory
proxy for wide tables. HanaSchemaCollector now overrides maybe_flush to
trigger after PAYLOAD_COLUMN_CHUNK_SIZE (50,000) columns instead, keeping
_queued_rows bounded regardless of how wide the tables are.

On the 1000x1000 benchmark schema, unlimited peak RSS drops from 1,038 MiB
to 93.7 MiB (11x reduction). The limited mode (300 tables x 50 cols = 15k
columns) is unaffected since it never reaches the threshold.

The column count is tracked in _map_row rather than _get_next because the
base class loop calls _get_next after appending the current table; counting
there would cause the freshly-fetched table's columns to be lost on flush.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: raise default max_tables to 2000 and max_columns to 500

Previous defaults (300 tables, 50 columns) were conservative placeholders.
With the column-based flush threshold in place, peak memory is now bounded
by columns processed at once (50k) rather than total tables queued, so
higher defaults are safe without a proportional memory cost increase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: hide collect_schemas config block from user-facing docs

The feature is not yet backed by production quality monitors. Mark the
entire collect_schemas section as hidden: true so it is omitted from
conf.yaml.example until the backend is ready.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: switch schema collector to SYS.M_TABLES, SYS.VIEWS, and SYS.VIEW_COLUMNS

Replace SYS.TABLES with SYS.M_TABLES to gain live RECORD_COUNT (row_count
in the payload). Add SYS.VIEWS so view objects are collected alongside
tables, with columns sourced from SYS.VIEW_COLUMNS (TABLE_COLUMNS does not
cover views in HANA). Conditionally LEFT JOIN SYS.M_TABLE_STATISTICS at
runtime for last_updated_on: the collector probes for access on first run
and omits the join when the monitoring user lacks the privilege.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: update spec.yaml descriptions to reference new catalog views

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: update benchmark results with SYS.M_TABLES + SYS.VIEWS query

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: update benchmark README with current query results

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: tidy README grant order and bump example limits to defaults

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: address schema collector review feedback

- Add max_views config option (SQL LIMIT on the views CTE) so view
  collection is bounded like tables/columns.
- Update _maybe_collect_schemas to advance the schedule only on success
  so transient failures are retried promptly instead of being suppressed
  for a full interval.
- Classify version-query failures: privilege/access errors reading
  SYS.M_DATABASE now report the privilege/access diagnostic instead of a
  misleading "version unsupported" result.
- Drop the hostname fallback in _get_databases; skip collection and warn
  when the current database can't be determined to avoid mislabeled data.
- Extract HanaSchemaQueryBuilder to separate SQL/query-policy concerns
  from the collector's streaming and flush logic.
- Document why payloads flush by accumulated column count, referencing
  the schema-collection memory benchmark.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sap_hana: handle single-tenant HANA Express in monitoring queries

SYS views on single-tenant HANA Express lack the DATABASE_NAME column,
so inject a constant SYSTEMDB value and drop the GROUP BY for SYS-schema
queries, and use FILE_SIZE instead of TOTAL_SIZE for global disk usage.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* sap_hana: add maintainer docstring mapping schema collector flow

Document the end-to-end control flow, the query-builder/collector
responsibility split, and the collection-policy decisions (SQL vs
client-side caps, system-schema exclusion, optional stats join) in a
module docstring, with a pointer to the memory benchmark. Addresses the
reviewer's request to reduce cross-method context jumping.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Revert "sap_hana: handle single-tenant HANA Express in monitoring queries"

This reverts commit fd86284.

* sap_hana: address janine-c doc review feedback

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* sap_hana: fix subject-verb agreement in include_schemas description

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pull pull Bot locked and limited conversation to collaborators Jun 22, 2026
@pull pull Bot added the ⤵️ pull label Jun 22, 2026
@pull pull Bot merged commit 56de213 into ConnectionMaster:master Jun 22, 2026
3 of 4 checks passed
@pull pull Bot had a problem deploying to typo-squatting-release June 22, 2026 09:09 Failure
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant