Skip to content

Commit 50103b1

Browse files
MattSchuragoerlersmahati
authored
Vector Embeddings in Java (#2508)
Introduces new "Vector Functions" section below "Scalar Functions" --------- Co-authored-by: Adrian Görler <adrian.goerler@sap.com> Co-authored-by: Mahati Shankar <93712176+smahati@users.noreply.github.com>
1 parent dd8a098 commit 50103b1

2 files changed

Lines changed: 77 additions & 42 deletions

File tree

java/cds-data.md

Lines changed: 11 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -328,58 +328,27 @@ On the database, this data is serialized to [JSON](https://www.json.org/)<sup>(1
328328
329329
Map data can be nested and may contain nested maps and lists, which are serialized to JSON objects and arrays, respectively.
330330

331-
## Vector Embeddings <Beta /> { #vector-embeddings }
331+
## Vector Embeddings { #vector-embeddings }
332332

333-
In CDS [vector embeddings](../guides/databases/hana#vector-embeddings) are stored in elements of type `cds.Vector`:
333+
In CDS, [vector embeddings](../guides/databases/hana#vector-embeddings) are stored in elements of type [`Vector`](/@external/cds/types).
334334

335-
```cds
336-
entity Books : cuid { // [!code focus]
337-
title : String(111);
338-
description : LargeString; // [!code focus]
339-
embedding : Vector(1536); // vector space w/ 1536 dimensions // [!code focus]
340-
} // [!code focus]
341-
```
335+
CAP Java support the vector type on SAP HANA, as well as H2 and SQLite for local testing. On Postgres (beta) support for vectors requires the [pgvector](https://github.com/pgvector/pgvector) extension.
342336

343-
In CAP Java, vector embeddings are represented by the `CdsVector` type, which allows a unified handling of different vector representations such as `float[]` and `String`:
337+
In CAP Java, vectors are represented by the `CdsVector` type, which allows a unified handling of different vector representations such as `float[]` and `String`:
344338

345339
```Java
346-
// Vector embedding of text, for example, from SAP GenAI Hub or via LangChain4j
347-
float[] embedding = embeddingModel.embed(bookDescription).content().vector();
340+
// Vector embedding of text via SAP Cloud SDK for AI
341+
float[] embedding = embeddingModel.embedding(
342+
new OpenAiEmbeddingRequest(List.of(text))).getEmbeddingVectors().get(0);
348343

349344
CdsVector v1 = CdsVector.of(embedding); // float[] format
350-
CdsVector v2 = CdsVector.of("[0.42, 0.73, 0.28, ...]"); // String format
351-
```
352-
353-
You can use the functions, `CQL.cosineSimilarity` or `CQL.l2Distance` (Euclidean distance) in queries to compute the similarity or distance of embeddings in the vector space. To use vector embeddings in functions, wrap them using `CQL.vector`:
354-
355-
```Java
356-
CqnVector v = CQL.vector(embedding);
357-
358-
CdsResult<Books> similarBooks = service.run(Select.from(BOOKS).where(b ->
359-
CQL.cosineSimilarity(b.embedding(), v).gt(0.9))
360-
);
361345
```
362346

363-
You can also use parameters for vectors in queries:
364-
365-
```Java
366-
var similarity = CQL.cosineSimilarity(CQL.get(Books.EMBEDDING), CQL.param(0).type(VECTOR));
367-
368-
CqnSelect query = Select.from(BOOKS)
369-
.columns(b -> b.title(), b -> similarity.as("similarity"))
370-
.where(b -> b.ID().ne(bookId).and(similarity.gt(0.9)))
371-
.orderBy(b -> b.get("similarity").desc());
372-
373-
Result similarBooks = db.run(query, CdsVector.of(embedding));
374-
```
375-
376-
In CDS QL queries, elements of type `cds.Vector` are not included in select _all_ queries. They must be explicitly added to the select list:
377-
378-
```Java
379-
CdsVector embedding = service.run(Select.from(BOOKS).byId(101)
380-
.columns(b -> b.embedding())).single(Books.class).getEmbedding();
381-
```
347+
::: info
348+
In CDS QL queries, elements of type `Vector` are excluded from the select list by default.
349+
:::
382350

351+
CAP Java supports multiple [vector functions](./working-with-cql/query-api.md#vector-functions) that allow you to compute vector embeddings, similarity, and distance directly in the database.
383352

384353
## Data in CDS Query Language (CQL)
385354

java/working-with-cql/query-api.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1640,6 +1640,72 @@ Scalar functions are values that are calculated from other values. This calculat
16401640

16411641
See [`Concat`](#string-expressions) String Expression
16421642

1643+
1644+
#### Vector Functions
1645+
1646+
Vector functions allow you to compute similarity and distance of [vectors](../cds-data.md#vector-embeddings), as well as [vector embeddings](../../guides/databases/hana.md#vector-embeddings) of text data directly in the database.
1647+
1648+
##### Computing Vector Embeddings in SAP HANA <Beta />
1649+
1650+
CAP Java supports the [VECTOR_EMBEDDING](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-sql-reference-guide/vector-embedding-function-vector) function via `CQL.vectorEmbedding` to generate vector embeddings from text data directly in SAP HANA.
1651+
1652+
To automatically generate vector embeddings on write in the database, you can define a calculated element [on-write](../../cds/cdl#on-write) using the `vector_embedding` function:
1653+
1654+
```cds
1655+
extend Incidents with {
1656+
@cds.api.ignore
1657+
embedding : cds.Vector(768) = vector_embedding(
1658+
'title: ' || title || ', summary: ' || summary,
1659+
'DOCUMENT', 'SAP_GXY.20250407') stored;
1660+
}
1661+
```
1662+
1663+
In Java queries, use the `CQL.vectorEmbedding` function to compute vector embeddings:
1664+
1665+
```java
1666+
var userQuery = CQL.val("""
1667+
Have we seen incidents with solar inverters this month,
1668+
and how were they resolved?
1669+
""");
1670+
var v = CQL.vectorEmbedding(userQuery, TextType.QUERY, "SAP_GXY.20250407");
1671+
```
1672+
1673+
On H2 and SQLite, the `vectorEmbedding` function is emulated. You can also use local [ONNX](https://onnx.ai) embedding models, which can be added for local testing via [LangChain4j embeddings](https://github.com/langchain4j/langchain4j/tree/main/embeddings):
1674+
1675+
```xml
1676+
<dependency>
1677+
<groupId>dev.langchain4j</groupId>
1678+
<artifactId>langchain4j-embeddings-all-minilm-l6-v2-q</artifactId>
1679+
<scope>runtime</scope>
1680+
</dependency>
1681+
```
1682+
1683+
##### Computing Vector Similarity and Distance
1684+
1685+
You can use the functions, `CQL.cosineSimilarity`, and `CQL.l2Distance` (Euclidean distance) in queries to compute the similarity and distance of vectors. Distance functions are used in use cases such as finding similar items based on [vector embeddings](../../guides/databases/hana.md#vector-embeddings), for example to improve the response of an LLM to a user query. To use vector embeddings in functions, wrap them using `CQL.vector`:
1686+
1687+
```Java
1688+
CqnVector vec = CQL.vector(embedding);
1689+
1690+
var similarIncidents = db.run(Select.from(INCIDENTS).where(i ->
1691+
CQL.cosineSimilarity(i.embedding(), vec).gt(0.75))
1692+
);
1693+
```
1694+
1695+
You can also use parameters for vectors in queries:
1696+
1697+
```Java
1698+
var similarity = CQL.cosineSimilarity(
1699+
CQL.get(Incidents.EMBEDDING), CQL.param(0).type(VECTOR));
1700+
1701+
var query = Select.from(INCIDENTS)
1702+
.columns(i -> i.title(), i -> similarity.times(100).as("similarity"))
1703+
.where(i -> similarity.gt(0.75))
1704+
.orderBy(i -> i.get("similarity").desc());
1705+
1706+
Result similarIncidents = db.run(query, CdsVector.of(embedding));
1707+
```
1708+
16431709
#### Case-When-Then Expressions
16441710

16451711
Use a case expression to compute a value based on the evaluation of conditions. The following query converts the stock of Books into a textual representation as 'stockLevel':

0 commit comments

Comments
 (0)