Commit df7614a
[SparkConnector][No Review]FixNoClassDefFoundError for MetadataVersionUtil (#48837)
* Fix NoClassDefFoundError for MetadataVersionUtil in Cosmos Spark connector
Inline version validation logic in ChangeFeedInitialOffsetWriter instead
of depending on Spark-internal MetadataVersionUtil, which has been
relocated in Databricks Runtime 17.3 LTS (Spark 4.0).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add unit tests for inlined validateVersion logic
Add ChangeFeedInitialOffsetWriterSpec with tests covering:
- Valid version strings within supported range
- Version exceeding max supported (UnsupportedLogVersion)
- Malformed versions: non-numeric, empty, missing v prefix, v0, negative, bare v
Widen companion object visibility to private[spark] for testability.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add change feed micro-batch streaming scenarios to Databricks live test notebooks
Add structured streaming scenarios using cosmos.oltp.changeFeed to both
basicScenario.scala and basicScenarioAadManagedIdentity.scala notebooks.
These scenarios exercise the ChangeFeedInitialOffsetWriter and
HDFSMetadataLog code paths that can break on certain Spark distributions
(e.g. Databricks Runtime 17.3+).
Each scenario:
- Creates a sink container
- Reads change feed from source via readStream with micro-batch
- Writes to sink container via writeStream
- Validates records were copied
- Cleans up both containers
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Fix change feed streaming checkpoint path in Databricks notebooks
Use file:/tmp/ instead of /tmp/ for checkpoint location to avoid DBFS
access issues on Unity Catalog-enabled Databricks clusters. Also:
- Remove unused Trigger import
- Stop query before reading sink to avoid race conditions
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Simplify change feed streaming test to use memory sink
Replace cosmos.oltp sink with in-memory sink to eliminate the need for
a separate sink container. This avoids 404 errors from sink container
creation/resolution and removes checkpoint path concerns.
The test still exercises the full ChangeFeedInitialOffsetWriter and
HDFSMetadataLog code paths (readStream with cosmos.oltp.changeFeed),
which is the goal for validating the MetadataVersionUtil fix.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Remove change feed streaming scenarios from Databricks notebooks
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Re-add change feed streaming with shared logic in both notebooks
Both notebooks now use the same pattern: derive changeFeedCfg from the
existing cfg map (which already has the correct auth config) plus the
change feed-specific options. Write to an in-memory sink to avoid
container creation issues. This ensures both key-based and AAD/MSI
notebooks exercise identical streaming logic.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Remove change feed streaming from AAD/MSI notebook
The MSI notebook shares a cluster with basicScenario, and the Cosmos
client cache retains references from the first notebook's proactive
connection init. When basicScenario drops the source container during
cleanup, the MSI notebook's change feed streaming fails with 404 on
the cached (now-deleted) container. The change feed streaming test in
basicScenario already provides sufficient coverage for the
ChangeFeedInitialOffsetWriter code paths.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Add diagnostic logging to MSI change feed streaming test
Add detailed logging to capture:
- Endpoint, database, container, auth config used
- Source container record count before streaming
- Streaming query ID
- Full exception details on failure
This will help diagnose why the change feed streaming fails
on the MSI notebook but succeeds on the key-based one.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* Remove change feed streaming from MSI notebook
The MSI change feed test passes on a fresh cluster but fails when
basicScenario runs first on the same cluster without restart. The
basicScenario leaves cached Cosmos client state (proactive connection
init on the ephemeral endpoint) that causes the MSI streaming query
to resolve to the wrong endpoint, resulting in a 404. The change feed
test in basicScenario provides sufficient coverage for the
ChangeFeedInitialOffsetWriter/HDFSMetadataLog code paths.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent d86dd5d commit df7614a
File tree
3 files changed
+137
-2
lines changed- sdk/cosmos/azure-cosmos-spark_3
- src
- main/scala/com/azure/cosmos/spark
- test/scala/com/azure/cosmos/spark
- test-databricks/notebooks
3 files changed
+137
-2
lines changedLines changed: 34 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | | - | |
| 6 | + | |
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| |||
33 | 33 | | |
34 | 34 | | |
35 | 35 | | |
36 | | - | |
| 36 | + | |
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| |||
58 | 58 | | |
59 | 59 | | |
60 | 60 | | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
Lines changed: 69 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
Lines changed: 34 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
111 | 111 | | |
112 | 112 | | |
113 | 113 | | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
114 | 148 | | |
115 | 149 | | |
0 commit comments