Skip to content

Commit a2a457a

Browse files
xinlian12Annie LiangCopilot
authored
Add vectorEmbeddingPolicy support to Spark catalog container creation (#48349)
* Add vectorEmbeddingPolicy support to Spark catalog container creation Enable configuring Vector Embedding Policy via TBLPROPERTIES when creating containers through the Spark catalog API. This allows users to set up vector-search-enabled containers directly from Spark SQL. Changes: - Add vectorEmbeddingPolicy TBLPROPERTIES key to CosmosContainerProperties - Add JSON serialization bridge methods in SparkModelBridgeInternal - Apply vector embedding policy in both CosmosCatalogCosmosSDKClient and CosmosCatalogManagementSDKClient container creation paths - Expose vector embedding policy in container metadata readback - Add VectorEmbeddingPolicy constant to CosmosConstants.TableProperties - Add integration test for container creation with vector embedding policy - Update existing tblProperties size assertions to account for new property Usage example: CREATE TABLE catalog.db.container (...) USING cosmos.oltp TBLPROPERTIES( partitionKeyPath = '/mypk', indexingPolicy = '{..."vectorIndexes":[{"path":"/v1","type":"flat"}]}', vectorEmbeddingPolicy = '{"vectorEmbeddings":[{"path":"/v1","dataType":"float32","distanceFunction":"cosine","dimensions":1536}]}' ) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * change * fix * Update Spark connector CHANGELOGs with vectorEmbeddingPolicy feature Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address PR review comments for vector embedding policy support - Fix Management SDK getIndexingPolicy() to map vectorIndexes (blocking issue) - Add try-catch error handling for malformed VEP JSON deserialization - Guard against empty VectorEmbeddingPolicy serializing as {} instead of null - Add logInfo when vector embedding policy is applied during container creation - Add VectorEmbeddingPolicy null assertions to all existing catalog tests - Replace substring test assertions with structured JSON parsing Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * revert log change --------- Co-authored-by: Annie Liang <anniemac@Annies-MacBook-Pro.local> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent e8e3e0c commit a2a457a

11 files changed

Lines changed: 164 additions & 8 deletions

File tree

sdk/cosmos/azure-cosmos-spark_3-3_2-12/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
### 4.45.0-beta.1 (Unreleased)
44

55
#### Features Added
6+
* Added `vectorEmbeddingPolicy` support in Spark catalog `TBLPROPERTIES` for creating vector-search-enabled containers. - See [PR 48349](https://github.com/Azure/azure-sdk-for-java/pull/48349)
67

78
#### Breaking Changes
89

sdk/cosmos/azure-cosmos-spark_3-4_2-12/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
### 4.45.0-beta.1 (Unreleased)
44

55
#### Features Added
6+
* Added `vectorEmbeddingPolicy` support in Spark catalog `TBLPROPERTIES` for creating vector-search-enabled containers. - See [PR 48349](https://github.com/Azure/azure-sdk-for-java/pull/48349)
67

78
#### Breaking Changes
89

sdk/cosmos/azure-cosmos-spark_3-5_2-12/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
### 4.45.0-beta.1 (Unreleased)
44

55
#### Features Added
6+
* Added `vectorEmbeddingPolicy` support in Spark catalog `TBLPROPERTIES` for creating vector-search-enabled containers. - See [PR 48349](https://github.com/Azure/azure-sdk-for-java/pull/48349)
67

78
#### Breaking Changes
89

sdk/cosmos/azure-cosmos-spark_3-5_2-13/CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
### 4.45.0-beta.1 (Unreleased)
44

55
#### Features Added
6+
* Added `vectorEmbeddingPolicy` support in Spark catalog `TBLPROPERTIES` for creating vector-search-enabled containers. - See [PR 48349](https://github.com/Azure/azure-sdk-for-java/pull/48349)
67

78
#### Breaking Changes
89

sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/models/SparkModelBridgeInternal.scala

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,30 @@
33

44
package com.azure.cosmos.models
55

6+
import com.azure.cosmos.implementation.Utils
7+
68
private[cosmos] object SparkModelBridgeInternal {
9+
private val objectMapper = Utils.getSimpleObjectMapper
10+
711
def createIndexingPolicyFromJson(json: String): IndexingPolicy = {
812
new IndexingPolicy(json)
913
}
1014

1115
def createPartitionKeyDefinitionFromJson(json: String): PartitionKeyDefinition = {
1216
new PartitionKeyDefinition(json)
1317
}
18+
19+
def createVectorEmbeddingPolicyFromJson(json: String): CosmosVectorEmbeddingPolicy = {
20+
try {
21+
objectMapper.readValue(json, classOf[CosmosVectorEmbeddingPolicy])
22+
} catch {
23+
case e: Exception =>
24+
throw new IllegalArgumentException(
25+
s"Failed to parse vectorEmbeddingPolicy JSON. Ensure the JSON is well-formed: ${e.getMessage}", e)
26+
}
27+
}
28+
29+
def vectorEmbeddingPolicyToJson(policy: CosmosVectorEmbeddingPolicy): String = {
30+
objectMapper.writeValueAsString(policy)
31+
}
1432
}

sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/CosmosConstants.scala

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ private[cosmos] object CosmosConstants {
7474
val IndexingPolicy = "IndexingPolicy"
7575
val DefaultTtlInSeconds = "DefaultTtlInSeconds"
7676
val AnalyticalStoreTtlInSeconds = "AnalyticalStoreTtlInSeconds"
77+
val VectorEmbeddingPolicy = "VectorEmbeddingPolicy"
7778
}
7879

7980
object ChangeFeedMetricsConfigs {

sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/catalog/CosmosCatalogCosmosSDKClient.scala

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
package com.azure.cosmos.spark.catalog
55

66
import com.azure.cosmos.CosmosAsyncClient
7-
import com.azure.cosmos.models.{CosmosContainerProperties => ModelsCosmosContainerProperties, ExcludedPath, FeedRange, IncludedPath, IndexingMode, IndexingPolicy, ModelBridgeInternal, PartitionKeyDefinition, PartitionKeyDefinitionVersion, PartitionKind, SparkModelBridgeInternal, ThroughputProperties}
7+
import com.azure.cosmos.models.{CosmosContainerProperties => ModelsCosmosContainerProperties, CosmosVectorEmbeddingPolicy, ExcludedPath, FeedRange, IncludedPath, IndexingMode, IndexingPolicy, ModelBridgeInternal, PartitionKeyDefinition, PartitionKeyDefinitionVersion, PartitionKind, SparkModelBridgeInternal, ThroughputProperties}
88
import com.azure.cosmos.spark.catalog.{CosmosContainerProperties => CatalogCosmosContainerProperties}
99
import com.azure.cosmos.spark.diagnostics.BasicLoggingTrait
1010
import com.azure.cosmos.spark.{ContainerFeedRangesCache, CosmosConstants, Exceptions}
@@ -96,6 +96,13 @@ private[spark] case class CosmosCatalogCosmosSDKClient(cosmosAsyncClient: Cosmos
9696
case None =>
9797
}
9898

99+
CatalogCosmosContainerProperties.getVectorEmbeddingPolicy(containerProperties) match {
100+
case Some(vectorEmbeddingPolicyJson) =>
101+
cosmosContainerProperties.setVectorEmbeddingPolicy(
102+
SparkModelBridgeInternal.createVectorEmbeddingPolicyFromJson(vectorEmbeddingPolicyJson))
103+
case None =>
104+
}
105+
99106
throughputPropertiesOpt match {
100107
case Some(throughputProperties) =>
101108
cosmosAsyncClient
@@ -299,6 +306,12 @@ private[spark] case class CosmosCatalogCosmosSDKClient(cosmosAsyncClient: Cosmos
299306
case None => "null"
300307
}
301308

309+
val vectorEmbeddingPolicySnapshot = Option.apply(containerProperties.getVectorEmbeddingPolicy) match {
310+
case Some(policy) if policy.getVectorEmbeddings != null && !policy.getVectorEmbeddings.isEmpty =>
311+
SparkModelBridgeInternal.vectorEmbeddingPolicyToJson(policy)
312+
case _ => "null"
313+
}
314+
302315
val lastModifiedSnapshot = ZonedDateTime
303316
.ofInstant(containerProperties.getTimestamp, ZoneOffset.UTC)
304317
.format(DateTimeFormatter.ISO_INSTANT)
@@ -363,6 +376,10 @@ private[spark] case class CosmosCatalogCosmosSDKClient(cosmosAsyncClient: Cosmos
363376
CosmosConstants.TableProperties.IndexingPolicy,
364377
s"'$indexingPolicySnapshotJson'"
365378
)
379+
tableProperties.put(
380+
CosmosConstants.TableProperties.VectorEmbeddingPolicy,
381+
s"'$vectorEmbeddingPolicySnapshot'"
382+
)
366383

367384
tableProperties
368385
}

sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/catalog/CosmosCatalogManagementSDKClient.scala

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ import com.azure.cosmos.models.{FeedRange, PartitionKeyDefinitionVersion, SparkM
88
import com.azure.cosmos.spark.diagnostics.BasicLoggingTrait
99
import com.azure.cosmos.spark.{ContainerFeedRangesCache, CosmosConstants}
1010
import com.azure.resourcemanager.cosmos.CosmosManager
11-
import com.azure.resourcemanager.cosmos.models.{AutoscaleSettings, AutoscaleSettingsResource, ContainerPartitionKey, CreateUpdateOptions, ExcludedPath, IncludedPath, IndexingMode, IndexingPolicy, SqlContainerCreateUpdateParameters, SqlContainerGetPropertiesResource, SqlContainerResource, SqlDatabaseCreateUpdateParameters, SqlDatabaseResource, ThroughputSettingsGetPropertiesResource, ThroughputSettingsResource, ThroughputSettingsUpdateParameters}
11+
import com.azure.resourcemanager.cosmos.models.{AutoscaleSettings, AutoscaleSettingsResource, ContainerPartitionKey, CreateUpdateOptions, ExcludedPath, IncludedPath, IndexingMode, IndexingPolicy, SqlContainerCreateUpdateParameters, SqlContainerGetPropertiesResource, SqlContainerResource, SqlDatabaseCreateUpdateParameters, SqlDatabaseResource, ThroughputSettingsGetPropertiesResource, ThroughputSettingsResource, ThroughputSettingsUpdateParameters, VectorEmbeddingPolicy => MgmtVectorEmbeddingPolicy, VectorIndex, VectorIndexType}
1212
import com.fasterxml.jackson.annotation.JsonInclude.Include
1313
import com.fasterxml.jackson.databind.ObjectMapper
1414
import org.apache.spark.sql.connector.catalog.{NamespaceChange, TableChange}
@@ -116,6 +116,20 @@ private[spark] case class CosmosCatalogManagementSDKClient(resourceGroupName: St
116116
case None =>
117117
}
118118

119+
// setup vector embedding policy
120+
CosmosContainerProperties.getVectorEmbeddingPolicy(containerProperties) match {
121+
case Some(vectorEmbeddingPolicyJson) =>
122+
val mgmtVectorEmbeddingPolicy = try {
123+
objectMapper.readValue(vectorEmbeddingPolicyJson, classOf[MgmtVectorEmbeddingPolicy])
124+
} catch {
125+
case e: Exception =>
126+
throw new IllegalArgumentException(
127+
s"Failed to parse vectorEmbeddingPolicy JSON. Ensure the JSON is well-formed: ${e.getMessage}", e)
128+
}
129+
sqlContainerResource.withVectorEmbeddingPolicy(mgmtVectorEmbeddingPolicy)
130+
case None =>
131+
}
132+
119133
sqlResourcesClient
120134
.createUpdateSqlContainerAsync(
121135
resourceGroupName,
@@ -263,11 +277,19 @@ private[spark] case class CosmosCatalogManagementSDKClient(resourceGroupName: St
263277
excludedPathList += new ExcludedPath().withPath(excludedPath.getPath())
264278
})
265279

280+
val vectorIndexList = new ListBuffer[VectorIndex]()
281+
cosmosIndexingPolicy.getVectorIndexes.forEach(vectorIndex => {
282+
vectorIndexList += new VectorIndex()
283+
.withPath(vectorIndex.getPath)
284+
.withType(VectorIndexType.fromString(vectorIndex.getType))
285+
})
286+
266287
new IndexingPolicy()
267288
.withAutomatic(cosmosIndexingPolicy.isAutomatic)
268289
.withIndexingMode(indexingMode)
269290
.withIncludedPaths(includedPathList.toList.asJava)
270291
.withExcludedPaths(excludedPathList.toList.asJava)
292+
.withVectorIndexes(vectorIndexList.toList.asJava)
271293
}
272294
//scalastyle:off multiple.string.literals
273295
}
@@ -312,6 +334,11 @@ private[spark] case class CosmosCatalogManagementSDKClient(resourceGroupName: St
312334
case None => "null"
313335
}
314336

337+
val vectorEmbeddingPolicySnapshot = Option.apply(containerProperties.vectorEmbeddingPolicy()) match {
338+
case Some(policy) => objectMapper.writeValueAsString(policy)
339+
case None => "null"
340+
}
341+
315342
// TODO: Annie: Is it okie to do this way
316343
val lastModifiedSnapshot = ZonedDateTime
317344
.ofInstant(Instant.ofEpochSecond(containerProperties.ts().longValue()), ZoneOffset.UTC)
@@ -383,6 +410,10 @@ private[spark] case class CosmosCatalogManagementSDKClient(resourceGroupName: St
383410
CosmosConstants.TableProperties.IndexingPolicy,
384411
s"'$indexingPolicySnapshotJson'"
385412
)
413+
tableProperties.put(
414+
CosmosConstants.TableProperties.VectorEmbeddingPolicy,
415+
s"'$vectorEmbeddingPolicySnapshot'"
416+
)
386417

387418
tableProperties
388419
}

sdk/cosmos/azure-cosmos-spark_3/src/main/scala/com/azure/cosmos/spark/catalog/CosmosContainerProperties.scala

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ private[spark] object CosmosContainerProperties {
1212
private val indexingPolicy = "indexingPolicy"
1313
private val defaultTtlPropertyName = "defaultTtlInSeconds"
1414
private val analyticalStoreTtlPropertyName = "analyticalStoreTtlInSeconds"
15+
private val vectorEmbeddingPolicyPropertyName = "vectorEmbeddingPolicy"
1516
private val defaultPartitionKeyPath = "/id"
1617
private val defaultIndexingPolicy = AllPropertiesIndexingPolicyName
1718

@@ -45,4 +46,8 @@ private[spark] object CosmosContainerProperties {
4546
None
4647
}
4748
}
49+
50+
def getVectorEmbeddingPolicy(properties: Map[String, String]): Option[String] = {
51+
properties.get(vectorEmbeddingPolicyPropertyName)
52+
}
4853
}

sdk/cosmos/azure-cosmos-spark_3/src/test/scala/com/azure/cosmos/spark/CosmosCatalogITestBase.scala

Lines changed: 85 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -158,12 +158,13 @@ abstract class CosmosCatalogITestBase(val skipHive: Boolean = false) extends Int
158158

159159
val tblProperties = getTblProperties(spark, databaseName, containerName)
160160

161-
tblProperties should have size 7
161+
tblProperties should have size 8
162162

163163
tblProperties("AnalyticalStoreTtlInSeconds") shouldEqual "null"
164164
tblProperties("CosmosPartitionCount") shouldEqual "1"
165165
tblProperties("CosmosPartitionKeyDefinition") shouldEqual "{\"paths\":[\"/id\"],\"kind\":\"Hash\"}"
166166
tblProperties("DefaultTtlInSeconds") shouldEqual "null"
167+
tblProperties("VectorEmbeddingPolicy") shouldEqual "null"
167168
tblProperties("IndexingPolicy") shouldEqual
168169
"{\"indexingMode\":\"consistent\",\"automatic\":true,\"includedPaths\":[{\"path\":\"/*\"}]," +
169170
"\"excludedPaths\":[{\"path\":\"/\\\"_etag\\\"/?\"}]}"
@@ -196,7 +197,7 @@ abstract class CosmosCatalogITestBase(val skipHive: Boolean = false) extends Int
196197

197198
var tblProperties = getTblProperties(spark, databaseName, containerName)
198199

199-
tblProperties should have size 7
200+
tblProperties should have size 8
200201

201202
// would look like Manual|RUProvisioned|LastOfferModification
202203
// - last modified as iso datetime like 2021-12-07T10:33:44Z
@@ -214,7 +215,7 @@ abstract class CosmosCatalogITestBase(val skipHive: Boolean = false) extends Int
214215

215216
tblProperties = getTblProperties(spark, databaseName, containerName)
216217

217-
tblProperties should have size 7
218+
tblProperties should have size 8
218219

219220
// would look like Manual|RUProvisioned|LastOfferModification
220221
// - last modified as iso datetime like 2021-12-07T10:33:44Z
@@ -252,13 +253,14 @@ abstract class CosmosCatalogITestBase(val skipHive: Boolean = false) extends Int
252253

253254
val tblProperties = getTblProperties(spark, databaseName, containerName)
254255

255-
tblProperties should have size 7
256+
tblProperties should have size 8
256257

257258
// tblProperties("AnalyticalStoreTtlInSeconds") shouldEqual "3000000"
258259
tblProperties("AnalyticalStoreTtlInSeconds") shouldEqual "null"
259260
tblProperties("CosmosPartitionCount") shouldEqual "1"
260261
tblProperties("CosmosPartitionKeyDefinition") shouldEqual "{\"paths\":[\"/id\"],\"kind\":\"Hash\",\"version\":2}"
261262
tblProperties("DefaultTtlInSeconds") shouldEqual "null"
263+
tblProperties("VectorEmbeddingPolicy") shouldEqual "null"
262264
tblProperties("IndexingPolicy") shouldEqual
263265
"{\"indexingMode\":\"consistent\",\"automatic\":true,\"includedPaths\":[{\"path\":\"/*\"}]," +
264266
"\"excludedPaths\":[{\"path\":\"/\\\"_etag\\\"/?\"}]}"
@@ -300,12 +302,13 @@ abstract class CosmosCatalogITestBase(val skipHive: Boolean = false) extends Int
300302

301303
val tblProperties = getTblProperties(spark, databaseName, containerName)
302304

303-
tblProperties should have size 7
305+
tblProperties should have size 8
304306

305307
tblProperties("AnalyticalStoreTtlInSeconds") shouldEqual "null"
306308
tblProperties("CosmosPartitionCount") shouldEqual "2"
307309
tblProperties("CosmosPartitionKeyDefinition") shouldEqual "{\"paths\":[\"/id\"],\"kind\":\"Hash\"}"
308310
tblProperties("DefaultTtlInSeconds") shouldEqual "null"
311+
tblProperties("VectorEmbeddingPolicy") shouldEqual "null"
309312
tblProperties("IndexingPolicy") shouldEqual
310313
"{\"indexingMode\":\"consistent\",\"automatic\":true,\"includedPaths\":[{\"path\":\"/*\"}]," +
311314
"\"excludedPaths\":[{\"path\":\"/\\\"_etag\\\"/?\"}]}"
@@ -426,12 +429,13 @@ abstract class CosmosCatalogITestBase(val skipHive: Boolean = false) extends Int
426429

427430
val tblProperties = getTblProperties(spark, databaseName, containerName)
428431

429-
tblProperties should have size 7
432+
tblProperties should have size 8
430433

431434
tblProperties("AnalyticalStoreTtlInSeconds") shouldEqual "null"
432435
tblProperties("CosmosPartitionCount") shouldEqual "1"
433436
tblProperties("CosmosPartitionKeyDefinition") shouldEqual "{\"paths\":[\"/mypk\"],\"kind\":\"Hash\"}"
434437
tblProperties("DefaultTtlInSeconds") shouldEqual "null"
438+
tblProperties("VectorEmbeddingPolicy") shouldEqual "null"
435439

436440
// indexPolicyJson will be normalized by the backend - so not be the same as the input json
437441
// for the purpose of this test I just want to make sure that the custom indexing options
@@ -480,6 +484,81 @@ abstract class CosmosCatalogITestBase(val skipHive: Boolean = false) extends Int
480484
tblProperties("DefaultTtlInSeconds") shouldEqual "5"
481485
}
482486

487+
it can "create a table with vector embedding policy" in {
488+
val databaseName = getAutoCleanableDatabaseName
489+
val containerName = RandomStringUtils.randomAlphabetic(6).toLowerCase + System.currentTimeMillis()
490+
cleanupDatabaseLater(databaseName)
491+
492+
val vectorEmbeddingPolicyJson =
493+
raw"""{"vectorEmbeddings":[{"path":"/vector1","dataType":"float32","distanceFunction":"cosine","dimensions":500}]}"""
494+
495+
val indexingPolicyJson =
496+
raw"""{"indexingMode":"consistent","automatic":true,"includedPaths":[{"path":"\/mypk\/?"}],""" +
497+
raw""""excludedPaths":[{"path":"\/*"}],"vectorIndexes":[{"path":"\/vector1","type":"flat"}]}"""
498+
499+
spark.sql(s"CREATE DATABASE testCatalog.$databaseName;")
500+
501+
spark.sql(s"CREATE TABLE testCatalog.$databaseName.$containerName (word STRING, number INT) using cosmos.oltp " +
502+
s"TBLPROPERTIES(partitionKeyPath = '/mypk', manualThroughput = '1100', " +
503+
s"indexingPolicy = '$indexingPolicyJson', " +
504+
s"vectorEmbeddingPolicy = '$vectorEmbeddingPolicyJson')")
505+
506+
val containerProperties = cosmosClient.getDatabase(databaseName).getContainer(containerName).read().block().getProperties
507+
containerProperties.getPartitionKeyDefinition.getPaths.asScala.toArray should equal(Array("/mypk"))
508+
509+
// validate vector embedding policy
510+
val vectorEmbeddingPolicy = containerProperties.getVectorEmbeddingPolicy
511+
vectorEmbeddingPolicy should not be null
512+
vectorEmbeddingPolicy.getVectorEmbeddings should have size 1
513+
val embedding = vectorEmbeddingPolicy.getVectorEmbeddings.get(0)
514+
embedding.getPath shouldEqual "/vector1"
515+
embedding.getDataType.toString shouldEqual "float32"
516+
embedding.getDistanceFunction.toString shouldEqual "cosine"
517+
embedding.getEmbeddingDimensions shouldEqual 500
518+
519+
// validate vector indexes are in indexing policy
520+
val vectorIndexes = containerProperties.getIndexingPolicy.getVectorIndexes
521+
vectorIndexes should have size 1
522+
vectorIndexes.get(0).getPath shouldEqual "/vector1"
523+
vectorIndexes.get(0).getType shouldEqual "flat"
524+
525+
// validate throughput
526+
val throughput = cosmosClient.getDatabase(databaseName).getContainer(containerName).readThroughput().block().getProperties
527+
throughput.getManualThroughput shouldEqual 1100
528+
529+
val tblProperties = getTblProperties(spark, databaseName, containerName)
530+
531+
tblProperties should have size 8
532+
533+
tblProperties("CosmosPartitionKeyDefinition") shouldEqual "{\"paths\":[\"/mypk\"],\"kind\":\"Hash\"}"
534+
tblProperties("DefaultTtlInSeconds") shouldEqual "null"
535+
tblProperties("AnalyticalStoreTtlInSeconds") shouldEqual "null"
536+
537+
// validate vector embedding policy is in table properties (structured check)
538+
val vepObjectMapper = Utils.getSimpleObjectMapper
539+
val vepNode = vepObjectMapper.readTree(tblProperties("VectorEmbeddingPolicy"))
540+
val vepEmbeddings = vepNode.get("vectorEmbeddings")
541+
vepEmbeddings.size() shouldEqual 1
542+
vepEmbeddings.get(0).get("path").asText() shouldEqual "/vector1"
543+
vepEmbeddings.get(0).get("dataType").asText() shouldEqual "float32"
544+
vepEmbeddings.get(0).get("distanceFunction").asText() shouldEqual "cosine"
545+
546+
// validate vector indexes are in indexing policy (structured check)
547+
val ipNode = vepObjectMapper.readTree(tblProperties("IndexingPolicy"))
548+
val vectorIndexesNode = ipNode.get("vectorIndexes")
549+
vectorIndexesNode.size() shouldEqual 1
550+
vectorIndexesNode.get(0).get("path").asText() shouldEqual "/vector1"
551+
vectorIndexesNode.get(0).get("type").asText() shouldEqual "flat"
552+
553+
// would look like Manual|RUProvisioned|LastOfferModification
554+
// - last modified as iso datetime like 2021-12-07T10:33:44Z
555+
tblProperties("ProvisionedThroughput").startsWith("Manual|1100|") shouldEqual true
556+
tblProperties("ProvisionedThroughput").length shouldEqual 32
557+
558+
// last modified as iso datetime like 2021-12-07T10:33:44Z
559+
tblProperties("LastModified").length shouldEqual 20
560+
}
561+
483562
it can "select from a catalog table with default TBLPROPERTIES" in {
484563
val databaseName = getAutoCleanableDatabaseName
485564
val containerName = RandomStringUtils.randomAlphabetic(6).toLowerCase + System.currentTimeMillis()

0 commit comments

Comments
 (0)