The MongoDB data transfer extension provides source and sink capabilities for reading from and writing to a MongoDB database.
Note: When specifying the MongoDB extension as the Source or Sink property in configuration, utilize the name MongoDB.
Note for MongoDB Wire Version 2 Users: If you're connecting to a MongoDB instance that uses wire version 2 (such as older CosmosDB MongoDB API instances that use the
documents.azure.comendpoint), use the MongoDB-Legacy (Wire v2) extension instead. See the MongoDB Legacy Extension section below for details.
Source and sink settings require both ConnectionString and DatabaseName parameters. The source takes an optional Collection parameter (if this parameter is not set, it will read from all collections) and an optional Query parameter for filtering documents. The sink requires the Collection parameter and will insert all records received from a source into that collection, as well as optional BatchSize (default value is 1000) and IdFieldName parameters.
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": "",
"Query": ""
}The Query parameter allows you to filter documents during data migration using MongoDB query syntax in JSON format. This parameter supports two input methods:
- Direct JSON Query: Provide the MongoDB query directly as a JSON string
- File Path: Provide a path to a JSON file containing the MongoDB query
Examples:
Direct JSON query:
{
"ConnectionString": "mongodb://localhost:27017",
"DatabaseName": "sales",
"Collection": "person",
"Query": "{\"timestamp\":{\"$gte\":\"2025-01-01\",\"$lt\":\"2025-02-01\"}}"
}Query from file:
{
"ConnectionString": "mongodb://localhost:27017",
"DatabaseName": "sales",
"Collection": "person",
"Query": "/path/to/query.json"
}Where query.json contains:
{"timestamp":{"$gte":"2025-01-01","$lt":"2025-02-01"}}Supported Query Operators:
The query parameter supports all standard MongoDB query operators including:
- Comparison operators:
$eq,$ne,$gt,$gte,$lt,$lte,$in,$nin - Logical operators:
$and,$or,$not,$nor - Element operators:
$exists,$type - Array operators:
$all,$elemMatch,$size - And more...
For more information on MongoDB query syntax, see the MongoDB Query Documentation.
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": "",
"IdFieldName": ""
}The IdFieldName parameter allows you to specify which field from the source data should be mapped to MongoDB's _id field. When set, the value from the specified field will be used as the document's _id in MongoDB, while the original field will also remain in the document.
Example:
If your source data has an id field that you want to use as MongoDB's _id:
{
"ConnectionString": "mongodb://localhost:27017",
"DatabaseName": "mydb",
"Collection": "mycollection",
"IdFieldName": "id"
}Source data:
{
"id": "BSKT_1",
"CustomerId": 111
}Result in MongoDB:
{
"_id": "BSKT_1",
"id": "BSKT_1",
"CustomerId": 111
}Notes:
- The field name comparison is case-insensitive
- If
IdFieldNameis not specified or is empty, MongoDB will generate a unique ObjectId for_idas usual - The original field specified in
IdFieldNamewill remain in the document along with the_idfield
The MongoDB Legacy extension provides source and sink capabilities for reading from and writing to MongoDB instances that use wire version 2, which is not supported by the standard MongoDB extension. This extension is specifically designed for older MongoDB instances, including Azure Cosmos DB for MongoDB API instances using the documents.azure.com endpoint.
Note: When specifying the MongoDB Legacy extension as the Source or Sink property in configuration, utilize the name MongoDB-Legacy (Wire v2).
Source and sink settings require both ConnectionString and DatabaseName parameters. The source takes an optional Collection parameter (if this parameter is not set, it will read from all collections). The sink requires the Collection parameter and will insert all records received from a source into that collection, as well as an optional BatchSize parameter (default value is 1000) to batch the writes into the collection.
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": ""
}{
"ConnectionString": "",
"DatabaseName": "",
"Collection": "",
"BatchSize": 1000,
"IdFieldName": ""
}The IdFieldName parameter allows you to specify which field from the source data should be mapped to MongoDB's _id field. When set, the value from the specified field will be used as the document's _id in MongoDB, while the original field will also remain in the document. See the MongoDB Extension Sink IdFieldName Parameter section above for more details and examples.
The MongoDB Vector extension is a Sink only extension that builds on the MongoDB extension by providing additional capabilities for generating embeddings using Azure OpenAI APIs.
Note: When specifying the MongoDB Vector extension as the Sink property in configuration, utilize the name MongoDB-Vector(beta).
If using CSFLE (Client Side Field Level Encryption), source sink supports autodecryption providing the following parameters:
KeyVaultNamespace: Database and collection holding the Key Vault and Keys details. Format:database.collectionKMSProviders: Key Management Service providers for the Key Vault. For Azure Key Vault support, the following parameters are required:tenantId: The Azure Active Directory tenant IDclientId: The Azure Active Directory application client IDclientSecret: The Azure Active Directory application client secret
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": "",
"KeyVaultNamespace": "",
"KMSProviders": {
"azure": {
"tenantId": "",
"clientId": "",
"clientSecret": ""
}
}
}The settings are based on the MongoDB extension settings with additional parameters for generating embeddings.
The sink settings require the following additional parameters:
GenerateEmbedding: If set to true, the sink will generate embeddings for the records before writing them to the database. The sink requires theOpenAIUrl,OpenAIKey, andOpenAIDeploymentModelparameters to be set. Following paramaters are required if this is trueOpenAIUrl: The URL of the OpenAI APIOpenAIKey: The API key for the OpenAI APIOpenAIDeploymentName: The deployment model to use for the OpenAI APISourcePropEmbedding: The property in the source data that should be used to generate the embeddingsDestPropEmbedding: New property name that will be added to the source data with the generated embeddings
{
"ConnectionString": "",
"DatabaseName": "",
"Collection": "",
"BatchSize: 100,
"GenerateEmbedding": true | false
"OpenAIUrl": "",
"OpenAIKey": "",
"OpenAIDeploymentModel": "",
"SourcePropEmbedding": "",
"DestPropEmbedding": ""
}