|
19 | 19 | "\n", |
20 | 20 | "- An AWS account with SageMaker Feature Store access\n", |
21 | 21 | "- An S3 bucket for offline store data\n", |
22 | | - "- IAM role with permissions for SageMaker, Glue, and S3\n", |
| 22 | + "- IAM role with AmazonSageMakerFeatureStoreAccess and AmazonSageMakerFullAccess policies\n", |
23 | 23 | "- `pyiceberg[glue]>=0.8.0` installed\n", |
24 | | - "- `AWS_DEFAULT_REGION` and `SSL_CERT_FILE` are set" |
| 24 | + "- `AWS_DEFAULT_REGION` set if not using passing a region in function calls" |
25 | 25 | ] |
26 | 26 | }, |
27 | 27 | { |
|
31 | 31 | "source": [ |
32 | 32 | "## What This Notebook Does\n", |
33 | 33 | "\n", |
34 | | - "1. Creates a Feature Group with Iceberg table format and initial Iceberg properties\n", |
35 | | - "2. Retrieves the Feature Group with its Iceberg properties\n", |
36 | | - "3. Updates Iceberg properties on an existing Feature Group\n", |
37 | | - "4. Verifies the updated properties\n", |
38 | | - "5. Cleans up resources" |
| 34 | + "1. Prepare Sample Data and Feature Definitions\n", |
| 35 | + "2. Creates a Feature Group with Iceberg table format and initial Iceberg properties\n", |
| 36 | + " - Shows IcebergProperties validation\n", |
| 37 | + "3. Retrieves the Feature Group with its Iceberg properties\n", |
| 38 | + "4. Updates Iceberg properties on an existing Feature Group\n", |
| 39 | + "5. Verifies the updated properties\n", |
| 40 | + "6. Cleans up resources\n", |
| 41 | + "7. Allowed Iceberg Properties Reference" |
39 | 42 | ] |
40 | 43 | }, |
41 | 44 | { |
|
62 | 65 | "from sagemaker.mlops.feature_store.feature_utils import load_feature_definitions_from_dataframe" |
63 | 66 | ] |
64 | 67 | }, |
65 | | - { |
66 | | - "cell_type": "code", |
67 | | - "execution_count": null, |
68 | | - "id": "4b2118e8", |
69 | | - "metadata": {}, |
70 | | - "outputs": [], |
71 | | - "source": [ |
72 | | - "#REMOVE\n", |
73 | | - "import os\n", |
74 | | - "os.environ[\"AWS_DEFAULT_REGION\"] = os.environ.get(\"AWS_DEFAULT_REGION\", \"us-west-2\")\n", |
75 | | - "os.environ[\"SSL_CERT_FILE\"] = \"/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem\"" |
76 | | - ] |
77 | | - }, |
78 | 68 | { |
79 | 69 | "cell_type": "code", |
80 | 70 | "execution_count": null, |
|
135 | 125 | "\n", |
136 | 126 | "Create a Feature Group with `table_format='Iceberg'` in the `OfflineStoreConfig` and pass initial Iceberg properties via `IcebergProperties`. The properties are applied to the underlying Glue Iceberg table after the Feature Group reaches `Created` status. \n", |
137 | 127 | "\n", |
138 | | - "Objects of type `IcebergProperties` will validate on creation but not mutation so you can optionally validate your iceberg properties before passing them to the `create`, or `update` methods. If you do not validate and your keys are not properly formatted or approved, then you may recieve an error in your `create` or `update` call." |
| 128 | + "Objects of type `IcebergProperties` will validate on creation but not mutation so you can optionally validate your iceberg properties before passing them to the `create`, or `update` methods. If you do not validate and your keys are not properly formatted or allowed, then you may recieve an error in your `create` or `update` call." |
139 | 129 | ] |
140 | 130 | }, |
141 | 131 | { |
|
159 | 149 | " \"write.metadata.previous-versions-max\": \"5\",\n", |
160 | 150 | " \"history.expire.max-snapshot-age-ms\": \"86400000\",\n", |
161 | 151 | "})\n", |
162 | | - "# Add invalid property\n", |
| 152 | + "# Add non-allowed property\n", |
163 | 153 | "invalid_iceberg_props.properties.update({\"write.delete.isolation-level\": \"Snapshot\"})\n", |
164 | 154 | "try:\n", |
165 | 155 | " invalid_iceberg_props.validate_property_keys()\n", |
|
291 | 281 | }, |
292 | 282 | { |
293 | 283 | "cell_type": "markdown", |
294 | | - "id": "c5d6e7f8", |
| 284 | + "id": "de15cb88", |
295 | 285 | "metadata": {}, |
296 | 286 | "source": [ |
297 | | - "## 6. Approved Iceberg Properties Reference\n", |
298 | | - "\n", |
299 | | - "The following Iceberg properties can be configured on Feature Store offline stores:\n", |
300 | | - "\n", |
301 | | - "| Category | Property | Description |\n", |
302 | | - "|----------|----------|-------------|\n", |
303 | | - "| **Write** | `write.target-file-size-bytes` | Target size for data files |\n", |
304 | | - "| **Write** | `write.delete.target-file-size-bytes` | Target size for delete files |\n", |
305 | | - "| **Write** | `write.parquet.row-group-size-bytes` | Parquet row group size |\n", |
306 | | - "| **Write** | `write.delete.mode` | Delete operation mode |\n", |
307 | | - "| **Write** | `write.update.mode` | Update operation mode |\n", |
308 | | - "| **Write** | `write.delete.granularity` | Delete granularity |\n", |
309 | | - "| **Metadata** | `write.metadata.delete-after-commit.enabled` | Auto-delete old metadata files |\n", |
310 | | - "| **Metadata** | `write.metadata.previous-versions-max` | Max previous metadata versions to keep |\n", |
311 | | - "| **History** | `history.expire.max-snapshot-age-ms` | Max snapshot age before expiration |\n", |
312 | | - "| **History** | `history.expire.min-snapshots-to-keep` | Min snapshots to retain |\n", |
313 | | - "| **History** | `history.expire.max-ref-age-ms` | Max reference age |\n", |
314 | | - "| **Read** | `read.split.target-size` | Target split size for reads |\n", |
315 | | - "| **Read** | `read.split.metadata-target-size` | Metadata target split size |\n", |
316 | | - "| **Read** | `read.split.open-file-cost` | Cost of opening a file for split planning |" |
317 | | - ] |
318 | | - }, |
319 | | - { |
320 | | - "cell_type": "markdown", |
321 | | - "id": "d6e7f8a9", |
322 | | - "metadata": {}, |
323 | | - "source": [ |
324 | | - "## 7. Cleanup" |
| 287 | + "## 6. Cleanup" |
325 | 288 | ] |
326 | 289 | }, |
327 | 290 | { |
328 | 291 | "cell_type": "code", |
329 | 292 | "execution_count": null, |
330 | | - "id": "e7f8a9b0", |
| 293 | + "id": "992e73e6", |
331 | 294 | "metadata": {}, |
332 | 295 | "outputs": [], |
333 | 296 | "source": [ |
|
337 | 300 | "except Exception as e:\n", |
338 | 301 | " print(f\"Cleanup error: {e}\")" |
339 | 302 | ] |
| 303 | + }, |
| 304 | + { |
| 305 | + "cell_type": "markdown", |
| 306 | + "id": "c5d6e7f8", |
| 307 | + "metadata": {}, |
| 308 | + "source": [ |
| 309 | + "## 7. Allowed Iceberg Properties Reference\n", |
| 310 | + "\n", |
| 311 | + "The following Iceberg properties are allowed and can be configured on Feature Store offline stores:\n", |
| 312 | + "\n", |
| 313 | + "| Property | Default Value | Documentation | \n", |
| 314 | + " |----------|---------------|---------------| \n", |
| 315 | + " | write.metadata.delete-after-commit.enabled | FALSE | Controls whether to delete the oldest *tracked* version metadata files after each table commit. | \n", |
| 316 | + " | write.metadata.previous-versions-max | 100 | The max number of previous version metadata files to track | \n", |
| 317 | + " | history.expire.max-snapshot-age-ms | 432000000 (5 days) | Default max age of snapshots to keep on the table and all of its branches while expiring snapshots | \n", |
| 318 | + " | history.expire.min-snapshots-to-keep | 1 | Default min number of snapshots to keep on the table and all of its branches while expiring snapshots | \n", |
| 319 | + " | write.parquet.row-group-size-bytes | 134217728 (128 MB) | Parquet row group size | \n", |
| 320 | + " | read.split.target-size | 134217728 (128 MB) | Target size when combining data input splits | \n", |
| 321 | + " | read.split.metadata-target-size | 33554432 (32 MB) | Target size when combining metadata input splits | \n", |
| 322 | + " | write.delete.target-file-size-bytes | 67108864 (64 MB) | Controls the size of delete files generated to target about this many bytes | \n", |
| 323 | + " | write.delete.mode | copy-on-write | Mode used for delete commands: copy-on-write or merge-on-read (v2 and above) | \n", |
| 324 | + " | write.update.mode | copy-on-write | Mode used for update commands: copy-on-write or merge-on-read (v2 and above) | \n", |
| 325 | + " | write.delete.granularity | partition | Controls the granularity of generated delete files: partition or file | \n", |
| 326 | + " | history.expire.max-ref-age-ms | Long.MAX_VALUE (forever) | For snapshot references except the main branch, default max age of snapshot references to keep while expiring snapshots. The main branch never expires. | \n", |
| 327 | + " | read.split.open-file-cost | 4194304 (4 MB) | The estimated cost to open a file, used as a minimum weight when combining splits. | \n", |
| 328 | + " | write.target-file-size-bytes | 536870912 (512 MB) | Controls the size of files generated to target about this many bytes | \n", |
| 329 | + "\n", |
| 330 | + " Documentation from https://iceberg.apache.org/docs/latest/configuration/#write-properties" |
| 331 | + ] |
340 | 332 | } |
341 | 333 | ], |
342 | 334 | "metadata": { |
|
0 commit comments