diff --git a/examples/databricks/01-SettingUp-Zingg.ipynb b/examples/databricks/01-SettingUp-Zingg.ipynb index aa7101f89..1f27e964f 100644 --- a/examples/databricks/01-SettingUp-Zingg.ipynb +++ b/examples/databricks/01-SettingUp-Zingg.ipynb @@ -13,7 +13,7 @@ } }, "source": [ - "#Part 1 of the Zingg notebook\n", + "# Part 1 of the Zingg notebook\n", "## It is responsible for initializing the Zingg environment, which includes the following steps:\n", "- **Environment Setup:** Loads all necessary libraries and dependencies required for Zingg to run.\n", "- **Path Setup:** Defines and sets up all relevant file paths, such as model directory, input data locations, output directories.\n", @@ -33,20 +33,20 @@ } }, "source": [ - "## Example Notebook For Training and Running Zingg Enterprise Entity Resolution Workflow on Databricks\n", + "## Example Notebook For Training and Running Zingg Entity Resolution Workflow on Databricks\n", "This notebook runs the Zingg Febrl Example on Databricks. Please refer to the\n", "\n", - "- Enterprise Zingg Python API\n", + "- Zingg Python API\n", "- Zingg Official Documentation for details.\n", "\n", - "_This notebook has been tested on 16.4 LTS DBR version (Spark 3.5.2, scala 2.12)_\n", + "_This notebook has been tested on 16.4 LTS DBR version (Spark 3.5.5, scala 2.12)_\n", "\n", "## Create a Spark Cluster and Install Zingg\n", "# \n", - "- Go to the Clusters tab, hit Create Cluster, and give it a name like “Zingg-Enterprise.”\n", + "- Go to the Clusters tab, hit Create Cluster, and give it a name like “Zingg.”\n", "- Set the runtime version to a current LTS (Long-Term Support) version for compatibility.\n", - "- Next, you’ll need to install Zingg. For this, we will be need the latest Zingg JAR file and the license.\n", - "- Create a Volume (managed volume) inside the schema and add the zingg-opensource-spark-0.8.0.jar and the zingg_license.jar to it.\n", + "- Next, you’ll need to install Zingg. For this, we will be need the latest Zingg JAR file.\n", + "- Create a Volume (managed volume) inside the schema and add the zingg-0.6.0-spark-3.5.5.jar to it.\n", "- Upload the file: Open the cluster details, navigate to the Libraries section, and click Install New.\n", "- Select the Volumes option and upload JAR from the specific path -> /Volumes/catalog_name/schema_name/volume_name/path_to_file (Zingg JAR)\n", "\n", @@ -445,25 +445,6 @@ "args.setOutput(outputPipe)" ] }, - { - "cell_type": "markdown", - "metadata": { - "application/vnd.databricks.v1+cell": { - "cellMetadata": {}, - "inputWidgets": {}, - "nuid": "b96dc60f-089d-4c83-9961-568c2100fbd6", - "showTitle": false, - "tableResultSettingsMap": {}, - "title": "" - } - }, - "source": [ - "## Configure the statistics output path\n", - "Here we configure the stats path\n", - "\n", - "Please make sure the path/name contains the placeholder \"**_$ZINGG_DYNAMIC_STAT_NAME_**\"" - ] - }, { "cell_type": "markdown", "metadata": { diff --git a/examples/databricks/02-LabelTrainingData.ipynb b/examples/databricks/02-LabelTrainingData.ipynb index 36bc3eba6..02c9f17f0 100644 --- a/examples/databricks/02-LabelTrainingData.ipynb +++ b/examples/databricks/02-LabelTrainingData.ipynb @@ -32,7 +32,7 @@ } }, "source": [ - "#Part 2: FindTrainingData and Label Phase\n", + "# Part 2: FindTrainingData and Label Phase\n", "## We have completed setting up Zingg in the previous step. In this part, we will run the **_FindTrainingData_** and **_Label_** phases. \n", "This involves generating candidate record pairs for training, presenting them for manual labeling, and saving the labeled data for use in model training. This step is essential for building a high-quality training dataset for entity resolution.\n", "\n", diff --git a/examples/databricks/04-GenerateDocument.ipynb b/examples/databricks/03-GenerateDocument.ipynb similarity index 98% rename from examples/databricks/04-GenerateDocument.ipynb rename to examples/databricks/03-GenerateDocument.ipynb index 323767817..b9f059aa8 100644 --- a/examples/databricks/04-GenerateDocument.ipynb +++ b/examples/databricks/03-GenerateDocument.ipynb @@ -13,7 +13,7 @@ } }, "source": [ - "# Part 4: Documenting the model\n", + "# Part 3: Documenting the model\n", "## We have completed setting up Zingg and labeled the training data in the previous steps. In this part, we will run the **_generateDocs_** phase. \n", "#### This phase processes the labeled data to create the readable documentation about the training data, including those marked as matches, as well as non-matches. \n", "\n", diff --git a/examples/databricks/03-TrainAndMatch.ipynb b/examples/databricks/04-TrainAndMatch.ipynb similarity index 99% rename from examples/databricks/03-TrainAndMatch.ipynb rename to examples/databricks/04-TrainAndMatch.ipynb index 55f3b50c5..0e6b2682b 100644 --- a/examples/databricks/03-TrainAndMatch.ipynb +++ b/examples/databricks/04-TrainAndMatch.ipynb @@ -13,7 +13,7 @@ } }, "source": [ - "# Part 3: Train and Match Phase\n", + "# Part 4: Train and Match Phase\n", "## We have completed setting up Zingg, labeled the training data, and generated the model documents in the previous steps. In this part, we will run the **_Train_** and **_Match_** phases. \n", "#### This involves training the entity resolution model using the labeled data and then applying the trained model to match records in your dataset. This step is crucial for identifying and matching similar entities across your data sources." ]