Skip to content

Commit b8b9b2a

Browse files
plaharannejoerg84
andauthored
Refresh 'Import' documentation (#114)
* Refresh 'Import' documentation * fix typos * fix extra space * Small edits. --------- Co-authored-by: Joerg Schad <joerg.schad@gmail.com>
1 parent 54a7928 commit b8b9b2a

10 files changed

Lines changed: 45 additions & 78 deletions
85.1 KB
Loading
-258 KB
Binary file not shown.
-186 KB
Binary file not shown.
-570 KB
Binary file not shown.
-635 KB
Binary file not shown.
-206 KB
Binary file not shown.
-590 KB
Binary file not shown.
-623 KB
Binary file not shown.
87.6 KB
Loading

docs/cluster/import.md

Lines changed: 45 additions & 78 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,40 @@
11
(cluster-import)=
22
# Import
33

4-
The first thing you see in the "Import" tab is the history of your
5-
import jobs. You can see whether you imported from a URL or from a file,
6-
the source file name and the target table name, and other metadata
7-
like date and status.
8-
By navigating to "Show details", you can display details of a particular
9-
import job.
4+
You can import data into your CrateDB directly from various sources, including:
5+
- Local files
6+
- URLs
7+
- AWS S3 buckets
8+
- Azure storage
9+
- MongoDB database
10+
11+
Currently the following data formats are supported:
12+
- CSV
13+
- JSON (JSON-Lines, JSON Arrays, and JSON Documents)
14+
- Parquet
15+
- MongoDB collection
1016

11-
Clicking the "Import new data" button will bring up the page
12-
where you can select the source of your data.
17+
:::{note}
18+
If you don't have a dataset prepared, we also provide sample data to let
19+
you discover CrateDB. After importing those examples, feel free to go to
20+
the tutorial page to learn how to use them.
21+
:::
22+
23+
You can access the history of previous imports in the
24+
"Import history" tab.
25+
By navigating to "View detail", you can display details of a particular
26+
import job (e.g. The number of successful and failed records per file).
1327

14-
If you don't have a dataset prepared, we also provide an example in the
15-
URL import section. It's the New York City taxi trip dataset for July
16-
of 2019 (about 6.3M records).
28+
![Cloud Console cluster import data](../_assets/img/cluster-import.png)
1729

18-
(cluster-import-url)=
19-
## URL
30+
(cluster-import-file-import)=
31+
## File Import
2032

21-
To import data, fill out the URL, name of the table which will be
22-
created and populated with your data, data format, and whether it is
23-
compressed.
33+
To import data, select the file format, the source and the name of the table
34+
which will be created and populated with your data.
2435

25-
If a table with the chosen name doesn't exist, it will be automatically
26-
created.
36+
You can deactivate the "Allow schema evolution" checkbox if you don't want
37+
the destination table to be automatically created or its schema to be modified.
2738

2839
The following data formats are supported:
2940

@@ -33,21 +44,21 @@ The following data formats are supported:
3344

3445
Gzip compressed files are also supported.
3546

36-
![Cloud Console cluster upload from URL](../_assets/img/cluster-import-tab-url.png)
47+
![Cloud Console cluster upload from URL](../_assets/img/cluster-import-file-form.png)
3748

38-
(cluster-import-s3)=
39-
## S3 bucket
49+
(cluster-import-file-import-s3)=
50+
### AWS S3 bucket
4051

4152
CrateDB Cloud allows convenient imports directly from S3-compatible
42-
storage. To import a file form bucket, provide the name of your bucket,
53+
storage. To import a file from a bucket, provide the name of your bucket,
4354
and path to the file. The S3 Access Key ID, and S3 Secret Access Key are
4455
also needed. You can also specify the endpoint for non-AWS S3 buckets.
4556
Keep in mind that you may be charged for egress traffic, depending on
4657
your provider. There is also a volume limit of 10 GiB per file for S3
47-
imports. The usual file formats are supported - CSV (all variants), JSON
48-
(JSON-Lines, JSON Arrays and JSON Documents), and Parquet.
58+
imports.
4959

50-
![Cloud Console cluster upload from S3](../_assets/img/cluster-import-tab-s3.png)
60+
Importing multiple files is also supported by using wildcard
61+
notation: `/folder/*.parquet`.
5162

5263
:::{note}
5364
It is important to make sure that you have the right permissions to
@@ -72,8 +83,8 @@ have a policy that allows GetObject access, for example:
7283
```
7384
:::
7485

75-
(cluster-import-azure)=
76-
## Azure Blob Storage
86+
(cluster-import-file-import-azure)=
87+
### Azure Blob Storage
7788

7889
Importing data from private Azure Blob Storage containers is possible
7990
using a stored secret, which includes a secret name and either an Azure
@@ -83,60 +94,16 @@ the organization level can add this secret.
8394
You can specify a secret, a container, a table and a path in the form
8495
`/folder/my_file.parquet`.
8596

86-
As with other imports Parquet, CSV, and JSON files are supported. File
87-
size limitation for imports is 10 GiB per file.
88-
89-
![Cloud Console cluster upload from Azure Storage Container](../_assets/img/cluster-import-tab-azure.png)
90-
91-
(cluster-import-globbing)=
92-
## Globbing
97+
Importing multiple files is also supported by using wildcard
98+
notation: `/folder/*.parquet`.
9399

94-
Importing multiple files, also known as import globbing is supported in
95-
any s3-compatible blob storage. The steps are the same as if importing
96-
from S3, i.e. bucket name, path to the file and S3 ID/Secret.
97-
98-
Importing multiple files from Azure Container/Blob Storage is also
99-
supported: `/folder/*.parquet`
100-
101-
Files to be imported are specified by using the well-known
102-
[wildcard](https://en.wikipedia.org/wiki/Wildcard_character) notation,
103-
also known as "globbing". In computer programming,
104-
[glob](https://en.wikipedia.org/wiki/Glob_(programming)) patterns
105-
specify sets of filenames with wildcard characters. The following
106-
example would import all the files from the single specified day.
107-
108-
:::{code} console
109-
/somepath/AWSLogs/123456678899/CloudTrail/us-east-1/2023/11/12/*.json.gz
110-
:::
100+
File size limitation for imports is 10 GiB per file.
111101

112-
![Cloud Console cluster import globbing](../_assets/img/cluster-import-globbing.png)
102+
(cluster-import-integration)=
103+
## Integration
113104

114-
As with other imports, the supported file types are CSV, JSON, and
115-
Parquet.
116-
117-
(cluster-import-file)=
118-
## File
119-
120-
Uploading directly from your computer offers more control over your
121-
data. From the security point of view, you don't have to share the data
122-
on the internet just to be able to import it to your cluster. You also
123-
have more control over who has access to your data. Your files are
124-
temporarily uploaded to a secure location managed by Crate (an S3 bucket
125-
in AWS) which is not publicly accessible. The files are automatically
126-
deleted after 3 days. You may re-import the same file into multiple
127-
tables without having to re-upload it within those 3 days. Up to 5 files
128-
may be uploaded at the same time, with the oldest ones being
129-
automatically deleted if you upload more.
130-
131-
![Cloud Console cluster upload from file](../_assets/img/cluster-import-tab-file.png)
132-
133-
As with other import, the supported file formats are:
134-
135-
- CSV (all variants)
136-
- JSON (JSON-Lines, JSON Arrays and JSON Documents)
137-
- Parquet
105+
{ref}`More info about data integration. <cluster-integrations>`
138106

139-
There is also a limit to file size, currently 1GB.
140107

141108
(overview-cluster-import-schema-evolution)=
142109
## Schema evolution
@@ -145,7 +112,7 @@ Schema Evolution, available for all import types, enables automatic
145112
addition of new columns to existing tables during data import,
146113
eliminating the need to pre-define table schemas. This feature is
147114
applicable to both pre-existing tables and those created during the
148-
import process. It can be toggled via the 'Schema Evolution' checkbox
115+
import process. It can be toggled via the 'Allow schema evolution' checkbox
149116
on the import page.
150117

151118
Note that Schema Evolution is limited to adding new columns; it does not

0 commit comments

Comments
 (0)