Skip to content

Commit 84bdb8f

Browse files
committed
Add info about parquet file rewriting
1 parent fc404cf commit 84bdb8f

1 file changed

Lines changed: 11 additions & 0 deletions

File tree

docs/5. download.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,17 @@ Alternatively, the [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/ge
3939
aws s3 sync s3://verifier-alliance-parquet-export/v2/ ./verifier-alliance-dataset --endpoint-url https://storage.googleapis.com --no-sign-request
4040
```
4141

42+
:::info Rewriting of files
43+
44+
The newest parquet file per table, i.e. the one with the highest row range, is re-exported to `export.verifieralliance.org` until it is full.
45+
46+
For example, if the file `verified_contracts_16000000_17000000.parquet` is the newest file, it does not yet contain 1M records despite its name.
47+
It will be updated until it reaches 1M records. Then, the export script will work on the next file `verified_contracts_17000000_18000000.parquet` and insert new records there.
48+
49+
Any older files can be expected to never be changed.
50+
51+
:::
52+
4253
### Working with Parquet Files
4354

4455
Once downloaded, you can query and analyze Parquet files using various tools and libraries. Here are some popular options to give you a head start:

0 commit comments

Comments
 (0)