| title | Amazon S3 |
|---|---|
| sidebarTitle | Amazon S3 |
This documentation describes the integration of MindsDB with Amazon S3, an object storage service that offers industry-leading scalability, data availability, security, and performance.
This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed.Before proceeding, ensure that MindsDB is installed locally via Docker or Docker Desktop.
Establish a connection to your Amazon S3 bucket from MindsDB by executing the following SQL command:
CREATE DATABASE s3_datasource
WITH
engine = 's3',
parameters = {
"aws_access_key_id": "AQAXEQK89OX07YS34OP",
"aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"region_name": "us-east-1",
"bucket": "my-bucket"
};Required connection parameters include the following:
aws_access_key_id: The AWS access key that identifies the user or IAM role.aws_secret_access_key: The AWS secret access key that identifies the user or IAM role.
Optional connection parameters include the following:
aws_session_token: The AWS session token that identifies the user or IAM role. This becomes necessary when using temporary security credentials.region_name: The AWS region of the bucket (for example,us-east-1). When omitted, MindsDB falls back to the bucket's location, which may add an extra metadata round trip and can fail for endpoints that require an explicit region.bucket: The name of the Amazon S3 bucket. If not provided, all available buckets can be queried, however, this can affect performance, especially when listing all of the available objects.
Retrieve data from a specified object (file) in a S3 bucket by providing the integration name and the object key:
SELECT *
FROM s3_datasource.`my-file.csv`;
LIMIT 10;Wrap the object key in backticks (`) to avoid any issues parsing the SQL statements provided. This is especially important when the object key contains spaces, special characters or prefixes, such as my-folder/my-file.csv.
At the moment, the supported file formats are CSV, TSV, JSON, and Parquet.
The above examples utilize `s3_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.The special files table can be used to list all objects available in the specified bucket or all buckets if the bucket name is not provided:
SELECT *
FROM s3_datasource.files LIMIT 10The content of files can also be retrieved by explicitly requesting the content column. This column is empty by default to avoid unnecessary data transfer:
SELECT path, content
FROM s3_datasource.files LIMIT 10- Symptoms: Failure to connect MindsDB with the Amazon S3 bucket.
- Checklist:
- Make sure the Amazon S3 bucket exists.
- Confirm that provided AWS credentials are correct. Try making a direct connection to the S3 bucket using the AWS CLI.
- Ensure a stable network between MindsDB and AWS.
- Symptoms: SQL queries failing or not recognizing object names containing spaces, special characters or prefixes.
- Checklist:
- Ensure object names with spaces, special characters or prefixes are enclosed in backticks.
- Examples:
- Incorrect: SELECT * FROM integration.travel/travel_data.csv
- Incorrect: SELECT * FROM integration.'travel/travel_data.csv'
- Correct: SELECT * FROM integration.`travel/travel_data.csv`