|
13 | 13 | } |
14 | 14 | }, |
15 | 15 | "source": [ |
16 | | - "#Part 1 of the Zingg notebook\n", |
| 16 | + "# Part 1 of the Zingg notebook\n", |
17 | 17 | "## It is responsible for initializing the Zingg environment, which includes the following steps:\n", |
18 | 18 | "- **Environment Setup:** Loads all necessary libraries and dependencies required for Zingg to run.\n", |
19 | 19 | "- **Path Setup:** Defines and sets up all relevant file paths, such as model directory, input data locations, output directories.\n", |
|
33 | 33 | } |
34 | 34 | }, |
35 | 35 | "source": [ |
36 | | - "## Example Notebook For Training and Running Zingg Enterprise Entity Resolution Workflow on Databricks\n", |
| 36 | + "## Example Notebook For Training and Running Zingg Entity Resolution Workflow on Databricks\n", |
37 | 37 | "This notebook runs the Zingg Febrl Example on Databricks. Please refer to the\n", |
38 | 38 | "\n", |
39 | | - "- Enterprise Zingg Python API\n", |
| 39 | + "- Zingg Python API\n", |
40 | 40 | "- Zingg Official Documentation for details.\n", |
41 | 41 | "\n", |
42 | | - "_This notebook has been tested on 16.4 LTS DBR version (Spark 3.5.2, scala 2.12)_\n", |
| 42 | + "_This notebook has been tested on 16.4 LTS DBR version (Spark 3.5.5, scala 2.12)_\n", |
43 | 43 | "\n", |
44 | 44 | "## Create a Spark Cluster and Install Zingg\n", |
45 | 45 | "# \n", |
46 | | - "- Go to the Clusters tab, hit Create Cluster, and give it a name like “Zingg-Enterprise.”\n", |
| 46 | + "- Go to the Clusters tab, hit Create Cluster, and give it a name like “Zingg.”\n", |
47 | 47 | "- Set the runtime version to a current LTS (Long-Term Support) version for compatibility.\n", |
48 | | - "- Next, you’ll need to install Zingg. For this, we will be need the latest Zingg JAR file and the license.\n", |
49 | | - "- Create a Volume (managed volume) inside the schema and add the zingg-opensource-spark-0.8.0.jar and the zingg_license.jar to it.\n", |
| 48 | + "- Next, you’ll need to install Zingg. For this, we will be need the latest Zingg JAR file.\n", |
| 49 | + "- Create a Volume (managed volume) inside the schema and add the zingg-0.6.0-spark-3.5.5.jar to it.\n", |
50 | 50 | "- Upload the file: Open the cluster details, navigate to the Libraries section, and click Install New.\n", |
51 | 51 | "- Select the Volumes option and upload JAR from the specific path -> /Volumes/catalog_name/schema_name/volume_name/path_to_file (Zingg JAR)\n", |
52 | 52 | "\n", |
|
445 | 445 | "args.setOutput(outputPipe)" |
446 | 446 | ] |
447 | 447 | }, |
448 | | - { |
449 | | - "cell_type": "markdown", |
450 | | - "metadata": { |
451 | | - "application/vnd.databricks.v1+cell": { |
452 | | - "cellMetadata": {}, |
453 | | - "inputWidgets": {}, |
454 | | - "nuid": "b96dc60f-089d-4c83-9961-568c2100fbd6", |
455 | | - "showTitle": false, |
456 | | - "tableResultSettingsMap": {}, |
457 | | - "title": "" |
458 | | - } |
459 | | - }, |
460 | | - "source": [ |
461 | | - "## Configure the statistics output path\n", |
462 | | - "Here we configure the stats path\n", |
463 | | - "\n", |
464 | | - "Please make sure the path/name contains the placeholder \"**_$ZINGG_DYNAMIC_STAT_NAME_**\"" |
465 | | - ] |
466 | | - }, |
467 | 448 | { |
468 | 449 | "cell_type": "markdown", |
469 | 450 | "metadata": { |
|
0 commit comments