review and changed the databricks notebooks (#1324)

LOGANBLUE1 · web-flow · commit 3daeca42db9c · 2026-05-11T15:53:39.000+05:30
diff --git a/examples/databricks/01-SettingUp-Zingg.ipynb b/examples/databricks/01-SettingUp-Zingg.ipynb
@@ -13,7 +13,7 @@
     }
    },
    "source": [
-    "#Part 1 of the Zingg notebook\n",
+    "# Part 1 of the Zingg notebook\n",
     "## It is responsible for initializing the Zingg environment, which includes the following steps:\n",
     "- **Environment Setup:** Loads all necessary libraries and dependencies required for Zingg to run.\n",
     "- **Path Setup:** Defines and sets up all relevant file paths, such as model directory, input data locations, output directories.\n",
@@ -33,20 +33,20 @@
     }
    },
    "source": [
-    "## Example Notebook For Training and Running Zingg Enterprise Entity Resolution Workflow on Databricks\n",
+    "## Example Notebook For Training and Running Zingg Entity Resolution Workflow on Databricks\n",
     "This notebook runs the Zingg Febrl Example on Databricks. Please refer to the\n",
     "\n",
-    "- Enterprise Zingg Python API\n",
+    "- Zingg Python API\n",
     "- Zingg Official Documentation for details.\n",
     "\n",
-    "_This notebook has been tested on 16.4 LTS DBR version (Spark 3.5.2, scala 2.12)_\n",
+    "_This notebook has been tested on 16.4 LTS DBR version (Spark 3.5.5, scala 2.12)_\n",
     "\n",
     "## Create a Spark Cluster and Install Zingg\n",
     "# \n",
-    "- Go to the Clusters tab, hit Create Cluster, and give it a name like “Zingg-Enterprise.”\n",
+    "- Go to the Clusters tab, hit Create Cluster, and give it a name like “Zingg.”\n",
     "- Set the runtime version to a current LTS (Long-Term Support) version for compatibility.\n",
-    "- Next, you’ll need to install Zingg. For this, we will be need the latest Zingg JAR file and the license.\n",
-    "- Create a Volume (managed volume) inside the schema and add the zingg-opensource-spark-0.8.0.jar and the zingg_license.jar to it.\n",
+    "- Next, you’ll need to install Zingg. For this, we will be need the latest Zingg JAR file.\n",
+    "- Create a Volume (managed volume) inside the schema and add the zingg-0.6.0-spark-3.5.5.jar to it.\n",
     "- Upload the file: Open the cluster details, navigate to the Libraries section, and click Install New.\n",
     "- Select the Volumes option and upload JAR from the specific path -> /Volumes/catalog_name/schema_name/volume_name/path_to_file (Zingg JAR)\n",
     "\n",
@@ -445,25 +445,6 @@
     "args.setOutput(outputPipe)"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "application/vnd.databricks.v1+cell": {
-     "cellMetadata": {},
-     "inputWidgets": {},
-     "nuid": "b96dc60f-089d-4c83-9961-568c2100fbd6",
-     "showTitle": false,
-     "tableResultSettingsMap": {},
-     "title": ""
-    }
-   },
-   "source": [
-    "## Configure the statistics output path\n",
-    "Here we configure the stats path\n",
-    "\n",
-    "Please make sure the path/name contains the placeholder \"**_$ZINGG_DYNAMIC_STAT_NAME_**\""
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {
diff --git a/examples/databricks/02-LabelTrainingData.ipynb b/examples/databricks/02-LabelTrainingData.ipynb
@@ -32,7 +32,7 @@
     }
    },
    "source": [
-    "#Part 2: FindTrainingData and Label Phase\n",
+    "# Part 2: FindTrainingData and Label Phase\n",
     "## We have completed setting up Zingg in the previous step. In this part, we will run the **_FindTrainingData_** and **_Label_** phases. \n",
     "This involves generating candidate record pairs for training, presenting them for manual labeling, and saving the labeled data for use in model training. This step is essential for building a high-quality training dataset for entity resolution.\n",
     "\n",
diff --git a/examples/databricks/03-GenerateDocument.ipynb b/examples/databricks/03-GenerateDocument.ipynb
@@ -13,7 +13,7 @@
     }
    },
    "source": [
-    "# Part 4: Documenting the model\n",
+    "# Part 3: Documenting the model\n",
     "## We have completed setting up Zingg and labeled the training data in the previous steps. In this part, we will run the **_generateDocs_** phase. \n",
     "#### This phase processes the labeled data to create the readable documentation about the training data, including those marked as matches, as well as non-matches. \n",
     "\n",
diff --git a/examples/databricks/04-TrainAndMatch.ipynb b/examples/databricks/04-TrainAndMatch.ipynb
@@ -13,7 +13,7 @@
     }
    },
    "source": [
-    "# Part 3: Train and Match Phase\n",
+    "# Part 4: Train and Match Phase\n",
     "## We have completed setting up Zingg, labeled the training data, and generated the model documents in the previous steps. In this part, we will run the **_Train_** and **_Match_** phases. \n",
     "#### This involves training the entity resolution model using the labeled data and then applying the trained model to match records in your dataset. This step is crucial for identifying and matching similar entities across your data sources."
    ]

Original file line number	Diff line number	Diff line change
`@@ -32,7 +32,7 @@`
`32`	`32`	`}`
`33`	`33`	`},`
`34`	`34`	`"source": [`
`35`		`- "#Part 2: FindTrainingData and Label Phase\n",`
	`35`	`+ "# Part 2: FindTrainingData and Label Phase\n",`
`36`	`36`	`"## We have completed setting up Zingg in the previous step. In this part, we will run the _FindTrainingData_ and _Label_ phases. \n",`
`37`	`37`	`"This involves generating candidate record pairs for training, presenting them for manual labeling, and saving the labeled data for use in model training. This step is essential for building a high-quality training dataset for entity resolution.\n",`
`38`	`38`	`"\n",`
Original file line number	Diff line number	Diff line change
`@@ -13,7 +13,7 @@`
`13`	`13`	`}`
`14`	`14`	`},`
`15`	`15`	`"source": [`
`16`		`- "# Part 4: Documenting the model\n",`
	`16`	`+ "# Part 3: Documenting the model\n",`
`17`	`17`	`"## We have completed setting up Zingg and labeled the training data in the previous steps. In this part, we will run the _generateDocs_ phase. \n",`
`18`	`18`	`"#### This phase processes the labeled data to create the readable documentation about the training data, including those marked as matches, as well as non-matches. \n",`
`19`	`19`	`"\n",`