intel
diff --git a/‎README.md‎
Lines changed: 8 additions & 7 deletions b/‎README.md‎
Lines changed: 8 additions & 7 deletions
diff --git a/‎demo/builtin/resnet/img/resnet_perf.png‎
10.9 KB b/‎demo/builtin/resnet/img/resnet_perf.png‎
10.9 KB
diff --git a/‎demo/builtin/rnnt/RNNT_DEMO.ipynb‎
Lines changed: 28 additions & 20 deletions b/‎demo/builtin/rnnt/RNNT_DEMO.ipynb‎
Lines changed: 28 additions & 20 deletions
diff --git a/‎demo/builtin/rnnt/img/rnnt_perf.png‎
-20.7 KB b/‎demo/builtin/rnnt/img/rnnt_perf.png‎
-20.7 KB
diff --git a/‎demo/builtin/wnd/WND_DEMO.ipynb‎
Lines changed: 13 additions & 6 deletions b/‎demo/builtin/wnd/WND_DEMO.ipynb‎
Lines changed: 13 additions & 6 deletions
diff --git a/‎demo/builtin/wnd/img/wnd_perf.png‎
-349 Bytes b/‎demo/builtin/wnd/img/wnd_perf.png‎
-349 Bytes
diff --git a/‎demo/denas/asr/DENAS_ASR_DEMO.ipynb‎
Lines changed: 12 additions & 5 deletions b/‎demo/denas/asr/DENAS_ASR_DEMO.ipynb‎
Lines changed: 12 additions & 5 deletions
diff --git a/‎demo/denas/asr/img/asr_search_space.png‎
-80.3 KB b/‎demo/denas/asr/img/asr_search_space.png‎
-80.3 KB
@@ -20,13 +20,6 @@ Making AI more accessible:  Through built-in optimized, parameterized models gen
 
 This solution is intended for citizen data scientists, enterprise users, independent software vendor and partial of cloud service provider.
 
-## Papers and Blogs
-
-* [ICYMI – SigOpt Summit Recap Democratizing End-to-End Recommendation Systems](https://sigopt.com/blog/icymi-sigopt-summit-recap-democratizing-end-to-end-recommendation-systems-with-jian-zhang/)
-* [The SigOpt Intelligent Experimentation Platform](https://www.intel.com/content/www/us/en/developer/articles/technical/sigopt-intelligent-experimentation-platform.html#gs.gz2ls6)
-* [SDC2022 - Data Platform for End-to-end AI Democratization](https://storagedeveloper.org/events/sdc-2022/agenda/session/326)
-* [SIHG4SR: Side Information Heterogeneous Graph for Session Recommender](https://dl.acm.org/doi/abs/10.1145/3556702.3556852)
-
 # ARCHITECTURE
 
 ## Intel® End-to-End AI Optimization Kit
@@ -100,6 +93,14 @@ python scripts/start_e2eaiok_docker.py --backend [tensorflow, pytorch, pytorch11
 * [SDA Model Performance](docs/source/sda_model_performance.md) - ResNet, BERT, RNN-T, MiniGo
 * [DE-NAS Performance](docs/source/denas_performance.md) - CNN, ViT, BERT, ASR
 
+## Papers and Blogs
+
+* [ICYMI – SigOpt Summit Recap Democratizing End-to-End Recommendation Systems](https://sigopt.com/blog/icymi-sigopt-summit-recap-democratizing-end-to-end-recommendation-systems-with-jian-zhang/)
+* [The SigOpt Intelligent Experimentation Platform](https://www.intel.com/content/www/us/en/developer/articles/technical/sigopt-intelligent-experimentation-platform.html#gs.gz2ls6)
+* [SDC2022 - Data Platform for End-to-end AI Democratization](https://storagedeveloper.org/events/sdc-2022/agenda/session/326)
+* [SIHG4SR: Side Information Heterogeneous Graph for Session Recommender](https://dl.acm.org/doi/abs/10.1145/3556702.3556852)
+* [The Parallel Universe Magazine](https://www.intel.com/content/www/us/en/developer/community/parallel-universe-magazine/overview.html#gs.nznx3b)
+* [Accelerating Artificial Intelligence with Intel® End-to-End AI Optimization Kit](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-ai-with-intel-e2e-ai-optimization-kit.html#gs.ox779w)
 
 ## Getting Support
 
 
@@ -38,22 +38,24 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "# Overview\n",
-    "<img src=\"./img/asr.png\" width=\"800\"/>\n",
+    "<img src=\"./img/asr.png\" width=\"600\"/>\n",
     "\n",
     "* The traditional ASR system (top picture) contains acoustic, phonetic and language components that work together as in a pipeline system\n",
     "* The end-to-end ASR system is a single neural network that receives raw audio signal as input and provides a sequence of words at output"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Model Architecture\n",
-    "<img src=\"./img/rnnt_structure.png\"/>\n",
+    "<img src=\"./img/rnnt_structure.png\" width=\"250\"/>\n",
     "\n",
     "RNN-T is an end-to-end ASR model that directly converts audio into text representation.\n",
     "\n",
@@ -70,6 +72,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -78,7 +81,7 @@
     "For RNN-T model democratization, we enabled distributed training with pytorch DDP to scale out model training on multi nodes, added time stack layer and increased time stack factor to reduce input sequence lengh, added layer and batch normalization to speedup training converge, decreased layer size to get a lighter model.\n",
     "\n",
     "<img src=\"./img/model_base.png\" width=\"600\"/><figure>base model</figure>\n",
-    "<img src=\"./img/model_opt.png\" width=\"600\"/><figure>democratized model</figure>\n"
+    "<img src=\"./img/model_opt.png\" width=\"800\"/><figure>democratized model</figure>\n"
    ]
   },
   {
@@ -95,6 +98,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -134,16 +138,17 @@
     "\n",
     "About 4x speedup after increase time stack factor from 2 to 8.\n",
     "\n",
-    "<img src=\"./img/time_stack_2.PNG\" width=\"600\"/><figure>time_stack = 2</figure>\n",
-    "<img src=\"./img/time_stack_8.PNG\" width=\"600\"/><figure>time_stack = 8</figure>\n",
+    "<img src=\"./img/time_stack_2.PNG\" width=\"800\"/><figure>time_stack = 2</figure>\n",
+    "<img src=\"./img/time_stack_8.PNG\" width=\"800\"/><figure>time_stack = 8</figure>\n",
     "\n",
     "Profiling data proves that less time cost on forward/backward since input sequence reduced with time stack layer\n",
     "\n",
-    "<img src=\"./img/stack_profile_base.png\" width=\"600\"/><figure>base model profiling</figure>\n",
-    "<img src=\"./img/stack_profile_democratize.png\" width=\"600\"/><figure>democratized model profiling</figure>\n"
+    "<img src=\"./img/stack_profile_base.png\" width=\"800\"/><figure>base model profiling</figure>\n",
+    "<img src=\"./img/stack_profile_democratize.png\" width=\"800\"/><figure>democratized model profiling</figure>\n"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -159,8 +164,8 @@
     "self.layer_norm = torch.nn.LayerNorm(hidden_size)\n",
     "```\n",
     "\n",
-    "<img src=\"./img/no_norm.PNG\" width=\"600\"/><figure>without normalization</figure>\n",
-    "<img src=\"./img/norm.PNG\" width=\"600\"/><figure>with normalization</figure>\n"
+    "<img src=\"./img/no_norm.PNG\" width=\"800\"/><figure>without normalization</figure>\n",
+    "<img src=\"./img/norm.PNG\" width=\"800\"/><figure>with normalization</figure>\n"
    ]
   },
   {
@@ -192,6 +197,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -204,17 +210,18 @@
     "${CONDA_PREFIX}/bin/python -m intel_extension_for_pytorch.cpu.launch --distributed --nproc_per_node=2 --nnodes=4 --hostfile hosts train.py ${ARGS}\n",
     "```\n",
     "\n",
-    "<img src=\"./img/no_numa_binding.png\" width=\"600\"/><figure>without numa binding</figure>\n",
-    "<img src=\"./img/numa_binding.png\" width=\"600\"/><figure>enable numa binding</figure>\n"
+    "<img src=\"./img/no_numa_binding.png\" width=\"500\"/><figure>without numa binding</figure>\n",
+    "<img src=\"./img/numa_binding.png\" width=\"500\"/><figure>enable numa binding</figure>\n"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## Performance\n",
     "\n",
-    "<img src=\"./img/rnnt_perf.png\" width=\"900\"/>\n",
+    "<img src=\"./img/rnnt_perf.png\" width=\"500\"/>\n",
     "\n",
     "* Distributed training with HW scaling delivered 3.83x speedup from 1 node to 4 nodes\n",
     "* HPO delivered 1.35x speedup, and 5.16x speedup over baseline\n",
@@ -238,12 +245,13 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## 1. Environment Setup\n",
     "\n",
-    "### Option1 Setup Environment with Pip\n",
+    "### (Option1) Use Pip Install\n",
     "pre-work: move e2eAIOK source code to /home/vmagent/app/e2eaiok"
    ]
   },
@@ -269,10 +277,11 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Option2 Setup Environment with Docker\n",
+    "### (Option2) Use Docker\n",
     "``` bash\n",
     "# Setup ENV\n",
     "git clone https://github.com/intel/e2eAIOK.git\n",
@@ -312,6 +321,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
@@ -323,20 +333,18 @@
     "\n",
     "cd /home/vmagent/app/e2eaiok/modelzoo/rnnt/pytorch\n",
     "bash scripts/preprocess_librispeech.sh\n",
-    "```\n",
-    "\n",
-    "Notes: RNN-T training is based on LibriSpeech train-clean-100 and evaluated on dev-clean, we evaluated WER with stock model (based on MLPerf submission) at train-clean-100 dataset, and final WER is 0.25, all the following optimization guarantee 0.25 WER. MLPerf submission took 38.7min with 8x A100 on LibriSpeech train-960h dataset.\n",
-    "\n",
-    "public reference on train-clean-100: https://arxiv.org/pdf/1807.10893.pdf, https://arxiv.org/pdf/1811.00787.pdf"
+    "```"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "## 4. Train\n",
     "\n",
-    "Edit config file to control SDA process"
+    "Edit config file to control SDA process\n",
+    "> Note: Bellow training script is just for demonstration, and uses small sampled dataset and runs a small iterations. For actual performance result, please refer to [performance](#performance)"
    ]
   },
   {
 
@@ -84,6 +84,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "ef18b1d8",
    "metadata": {},
@@ -92,12 +93,12 @@
     "\n",
     "Long idle time per training step for horovod communication, horovod paramter sync consume much time during distributed training, causing poor scaling performance. The overhead mainly caused by large embedding table.\n",
     "\n",
-    "<img src=\"./img/wnd_profile.png\" width=\"600\"/><figure>Distributed training profiling</figure>\n",
+    "<img src=\"./img/wnd_profile.png\" width=\"800\"/><figure>Distributed training profiling</figure>\n",
     "\n",
     "Replace custom layer (contains embedding layer) with TensorFlow dense layer help to reduce embedding parameter size, thus reduce parameter size needed to sync by horovod, fix horovod poor scaling issue. Per step training time reduced from 5.16s to 2.71s, got about 1.9x speedup.\n",
     "\n",
-    "<img src=\"./img/wnd_traintime_custom_emd.png\" width=\"600\"/><figure>custom layer</figure>\n",
-    "<img src=\"./img/wnd_traintime_tf_emd.png\" width=\"600\"/><figure>TensorFlow build-in layer</figure>"
+    "<img src=\"./img/wnd_traintime_custom_emd.png\" width=\"800\"/><figure>custom layer</figure>\n",
+    "<img src=\"./img/wnd_traintime_tf_emd.png\" width=\"800\"/><figure>TensorFlow build-in layer</figure>"
    ]
   },
   {
@@ -134,6 +135,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "90924745",
    "metadata": {},
@@ -199,6 +201,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "b50f0c0c",
    "metadata": {},
@@ -229,13 +232,14 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "ca039366",
    "metadata": {},
    "source": [
     "## 1. Environment Setup\n",
     "\n",
-    "### Option 1 Setup Environment with Pip\n",
+    "### (Option 1) Use Pip Install\n",
     "pre-work: move e2eAIOK source code to /home/vmagent/app/e2eaiok. Install spark and start spark services for data preprocess"
    ]
   },
@@ -258,11 +262,12 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "f46ff1b7",
    "metadata": {},
    "source": [
-    "### Option 2 Setup Environment with Docker\n",
+    "### (Option 2) Use Docker\n",
     "``` bash\n",
     "# Setup ENV\n",
     "git clone https://github.com/intel/e2eAIOK.git\n",
@@ -364,13 +369,15 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "509d6539",
    "metadata": {},
    "source": [
     "## 4. Train\n",
     "\n",
-    "Edit config file to control SDA process"
+    "Edit config file to control SDA process\n",
+    "> Note: Bellow training script is just for demonstration, and uses small sampled dataset and runs a small iterations. For actual performance result, please refer to [performance](#performance)"
    ]
   },
   {
 
@@ -46,6 +46,7 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "8c1685c8",
    "metadata": {},
@@ -57,18 +58,21 @@
     "Transformer based search space consists of attention layer, layer normalization and feed forward layer, the search space can be controled by setting network depth, number attention heads, MLP layer ratio and layer dimension.\n",
     "\n",
     "<center>\n",
-    "<img src=\"./img/asr_search_space.png\" width=\"80%\"/><figure>DE-NAS ASR Search Space and Supernet</figure>\n",
+    "<img src=\"./img/asr_search_space.png\" width=\"60%\"/><figure>DE-NAS ASR Search Space and Supernet</figure>\n",
     "</center>"
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "421df1d6",
    "metadata": {},
    "source": [
     "## Performance\n",
     "\n",
-    "<img src=\"./img/denas_asr_perf.png\" width=\"900\"/>\n",
+    "<center>\n",
+    "<img src=\"./img/denas_asr_perf.png\" width=\"600\"/>\n",
+    "</center>\n",
     "\n",
     "* Testing methodology\n",
     "    * Dataset: LibriSpeech, Metrics: WER 5.8%\n",
@@ -90,7 +94,7 @@
     "\n",
     "## 1. Environment Setup\n",
     "\n",
-    "### Option 1 Setup Environment with Pip"
+    "### (Option 1) Use Pip Install"
    ]
   },
   {
@@ -106,11 +110,12 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "03f3d2b7",
    "metadata": {},
    "source": [
-    "### Option 2 Setup Environment with Docker\n",
+    "### (Option 2) Use Docker\n",
     "\n",
     "``` bash\n",
     "# Setup ENV\n",
@@ -300,13 +305,15 @@
    ]
   },
   {
+   "attachments": {},
    "cell_type": "markdown",
    "id": "1555538c",
    "metadata": {},
    "source": [
     "## 5. Train\n",
     "\n",
-    "Load searched best model in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt` and launch training with training configuration in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_train_asr.conf`"
+    "Load searched best model in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt` and launch training with training configuration in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_train_asr.conf`\n",
+    "> Note: Bellow training script is just for demonstration, and uses small sampled dataset and runs a small iterations. For actual performance result, please refer to [performance](#performance)"
    ]
   },
   {
Original file line number	Diff line number	Diff line change
`@@ -84,6 +84,7 @@`
`84`	`84`	`]`
`85`	`85`	`},`
`86`	`86`	`{`
	`87`	`+ "attachments": {},`
`87`	`88`	`"cell_type": "markdown",`
`88`	`89`	`"id": "ef18b1d8",`
`89`	`90`	`"metadata": {},`
`@@ -92,12 +93,12 @@`
`92`	`93`	`"\n",`
`93`	`94`	`"Long idle time per training step for horovod communication, horovod paramter sync consume much time during distributed training, causing poor scaling performance. The overhead mainly caused by large embedding table.\n",`
`94`	`95`	`"\n",`
`95`		`- "<img src=\"./img/wnd_profile.png\" width=\"600\"/><figure>Distributed training profiling</figure>\n",`
	`96`	`+ "<img src=\"./img/wnd_profile.png\" width=\"800\"/><figure>Distributed training profiling</figure>\n",`
`96`	`97`	`"\n",`
`97`	`98`	`"Replace custom layer (contains embedding layer) with TensorFlow dense layer help to reduce embedding parameter size, thus reduce parameter size needed to sync by horovod, fix horovod poor scaling issue. Per step training time reduced from 5.16s to 2.71s, got about 1.9x speedup.\n",`
`98`	`99`	`"\n",`
`99`		`- "<img src=\"./img/wnd_traintime_custom_emd.png\" width=\"600\"/><figure>custom layer</figure>\n",`
`100`		`- "<img src=\"./img/wnd_traintime_tf_emd.png\" width=\"600\"/><figure>TensorFlow build-in layer</figure>"`
	`100`	`+ "<img src=\"./img/wnd_traintime_custom_emd.png\" width=\"800\"/><figure>custom layer</figure>\n",`
	`101`	`+ "<img src=\"./img/wnd_traintime_tf_emd.png\" width=\"800\"/><figure>TensorFlow build-in layer</figure>"`
`101`	`102`	`]`
`102`	`103`	`},`
`103`	`104`	`{`
`@@ -134,6 +135,7 @@`
`134`	`135`	`]`
`135`	`136`	`},`
`136`	`137`	`{`
	`138`	`+ "attachments": {},`
`137`	`139`	`"cell_type": "markdown",`
`138`	`140`	`"id": "90924745",`
`139`	`141`	`"metadata": {},`
`@@ -199,6 +201,7 @@`
`199`	`201`	`]`
`200`	`202`	`},`
`201`	`203`	`{`
	`204`	`+ "attachments": {},`
`202`	`205`	`"cell_type": "markdown",`
`203`	`206`	`"id": "b50f0c0c",`
`204`	`207`	`"metadata": {},`
`@@ -229,13 +232,14 @@`
`229`	`232`	`]`
`230`	`233`	`},`
`231`	`234`	`{`
	`235`	`+ "attachments": {},`
`232`	`236`	`"cell_type": "markdown",`
`233`	`237`	`"id": "ca039366",`
`234`	`238`	`"metadata": {},`
`235`	`239`	`"source": [`
`236`	`240`	`"## 1. Environment Setup\n",`
`237`	`241`	`"\n",`
`238`		`- "### Option 1 Setup Environment with Pip\n",`
	`242`	`+ "### (Option 1) Use Pip Install\n",`
`239`	`243`	`"pre-work: move e2eAIOK source code to /home/vmagent/app/e2eaiok. Install spark and start spark services for data preprocess"`
`240`	`244`	`]`
`241`	`245`	`},`
`@@ -258,11 +262,12 @@`
`258`	`262`	`]`
`259`	`263`	`},`
`260`	`264`	`{`
	`265`	`+ "attachments": {},`
`261`	`266`	`"cell_type": "markdown",`
`262`	`267`	`"id": "f46ff1b7",`
`263`	`268`	`"metadata": {},`
`264`	`269`	`"source": [`
`265`		`- "### Option 2 Setup Environment with Docker\n",`
	`270`	`+ "### (Option 2) Use Docker\n",`
`266`	`271`	"``` bash\n",
`267`	`272`	`"# Setup ENV\n",`
`268`	`273`	`"git clone https://github.com/intel/e2eAIOK.git\n",`
`@@ -364,13 +369,15 @@`
`364`	`369`	`]`
`365`	`370`	`},`
`366`	`371`	`{`
	`372`	`+ "attachments": {},`
`367`	`373`	`"cell_type": "markdown",`
`368`	`374`	`"id": "509d6539",`
`369`	`375`	`"metadata": {},`
`370`	`376`	`"source": [`
`371`	`377`	`"## 4. Train\n",`
`372`	`378`	`"\n",`
`373`		`- "Edit config file to control SDA process"`
	`379`	`+ "Edit config file to control SDA process\n",`
	`380`	`+ "> Note: Bellow training script is just for demonstration, and uses small sampled dataset and runs a small iterations. For actual performance result, please refer to [performance](#performance)"`
`374`	`381`	`]`
`375`	`382`	`},`
`376`	`383`	`{`