minds-platform/docs/integrations/data-integrations/amazon-s3.mdx at 95371cce5370c0d5ce06265b2ec1c8276a7fb3f4 · mindsdb/minds-platform

title	Amazon S3
sidebarTitle	Amazon S3

This documentation describes the integration of MindsDB with Amazon S3, an object storage service that offers industry-leading scalability, data availability, security, and performance.

This data source integration is thread-safe, utilizing a connection pool where each thread is assigned its own connection. When handling requests in parallel, threads retrieve connections from the pool as needed.

Prerequisites

Before proceeding, ensure that MindsDB is installed locally via Docker or Docker Desktop.

Connection

Establish a connection to your Amazon S3 bucket from MindsDB by executing the following SQL command:

CREATE DATABASE s3_datasource
WITH
    engine = 's3',
    parameters = {
      "aws_access_key_id": "AQAXEQK89OX07YS34OP",
      "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
      "region_name": "us-east-1",
      "bucket": "my-bucket"
    };

Note that sample parameter values are provided here for reference, and you should replace them with your connection parameters.

Required connection parameters include the following:

aws_access_key_id: The AWS access key that identifies the user or IAM role.
aws_secret_access_key: The AWS secret access key that identifies the user or IAM role.

Optional connection parameters include the following:

aws_session_token: The AWS session token that identifies the user or IAM role. This becomes necessary when using temporary security credentials.
region_name: The AWS region of the bucket (for example, us-east-1). When omitted, MindsDB falls back to the bucket's location, which may add an extra metadata round trip and can fail for endpoints that require an explicit region.
bucket: The name of the Amazon S3 bucket. If not provided, all available buckets can be queried, however, this can affect performance, especially when listing all of the available objects.

Usage

Retrieve data from a specified object (file) in a S3 bucket by providing the integration name and the object key:

SELECT *
FROM s3_datasource.`my-file.csv`;
LIMIT 10;

If a bucket name is provided in the `CREATE DATABASE` command, querying will be limited to that bucket and the bucket name can be ommitted from the object key as shown in the example above. However, if the bucket name is not provided, the object key must include the bucket name, such as `s3_datasource.`my-bucket/my-folder/my-file.csv`.

Wrap the object key in backticks (`) to avoid any issues parsing the SQL statements provided. This is especially important when the object key contains spaces, special characters or prefixes, such as my-folder/my-file.csv.

At the moment, the supported file formats are CSV, TSV, JSON, and Parquet.

The above examples utilize `s3_datasource` as the datasource name, which is defined in the `CREATE DATABASE` command.

The special files table can be used to list all objects available in the specified bucket or all buckets if the bucket name is not provided:

SELECT *
FROM s3_datasource.files LIMIT 10

The content of files can also be retrieved by explicitly requesting the content column. This column is empty by default to avoid unnecessary data transfer:

SELECT path, content
FROM s3_datasource.files LIMIT 10

This table will return all objects regardless of the file format, however, only the supported file formats mentioned above can be queried.

Troubleshooting Guide

`Database Connection Error`

Symptoms: Failure to connect MindsDB with the Amazon S3 bucket.
Checklist:
1. Make sure the Amazon S3 bucket exists.
2. Confirm that provided AWS credentials are correct. Try making a direct connection to the S3 bucket using the AWS CLI.
3. Ensure a stable network between MindsDB and AWS.

`SQL statement cannot be parsed by mindsdb_sql`

Symptoms: SQL queries failing or not recognizing object names containing spaces, special characters or prefixes.
Checklist:
1. Ensure object names with spaces, special characters or prefixes are enclosed in backticks.
2. Examples:
  - Incorrect: SELECT * FROM integration.travel/travel_data.csv
  - Incorrect: SELECT * FROM integration.'travel/travel_data.csv'
  - Correct: SELECT * FROM integration.`travel/travel_data.csv`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prerequisites

Connection

Usage

Troubleshooting Guide

FilesExpand file tree

amazon-s3.mdx

Latest commit

History

amazon-s3.mdx

File metadata and controls

Prerequisites

Connection

Usage

Troubleshooting Guide