Consider adding sharding to the Parquet exported BigQuery content

As suggested by @mhalle, this can make query from the bucket quicker, perhaps in a significant number of cases. We could shard by collection and perhaps modality.

Related discussion with Claude: https://claude.ai/share/88b80074-de62-4553-a02b-d22d331cf5d2