Skip to content
This repository was archived by the owner on Jul 18, 2024. It is now read-only.

Commit 72adaff

Browse files
Peach-Hexuechendi
andauthored
[v1.1][ISSUE-274] refine demo and update performance doc (#286)
* refine denas asr overview picture * refine demo notebook * refine performance readme * update blogs * Update sda_model_performance.md * Update e2e_recsys_performance.md * Update denas_performance.md --------- Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
1 parent 5ddb2c6 commit 72adaff

12 files changed

Lines changed: 75 additions & 47 deletions

File tree

README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,6 @@ Making AI more accessible: Through built-in optimized, parameterized models gen
2020

2121
This solution is intended for citizen data scientists, enterprise users, independent software vendor and partial of cloud service provider.
2222

23-
## Papers and Blogs
24-
25-
* [ICYMI – SigOpt Summit Recap Democratizing End-to-End Recommendation Systems](https://sigopt.com/blog/icymi-sigopt-summit-recap-democratizing-end-to-end-recommendation-systems-with-jian-zhang/)
26-
* [The SigOpt Intelligent Experimentation Platform](https://www.intel.com/content/www/us/en/developer/articles/technical/sigopt-intelligent-experimentation-platform.html#gs.gz2ls6)
27-
* [SDC2022 - Data Platform for End-to-end AI Democratization](https://storagedeveloper.org/events/sdc-2022/agenda/session/326)
28-
* [SIHG4SR: Side Information Heterogeneous Graph for Session Recommender](https://dl.acm.org/doi/abs/10.1145/3556702.3556852)
29-
3023
# ARCHITECTURE
3124

3225
## Intel® End-to-End AI Optimization Kit
@@ -100,6 +93,14 @@ python scripts/start_e2eaiok_docker.py --backend [tensorflow, pytorch, pytorch11
10093
* [SDA Model Performance](docs/source/sda_model_performance.md) - ResNet, BERT, RNN-T, MiniGo
10194
* [DE-NAS Performance](docs/source/denas_performance.md) - CNN, ViT, BERT, ASR
10295

96+
## Papers and Blogs
97+
98+
* [ICYMI – SigOpt Summit Recap Democratizing End-to-End Recommendation Systems](https://sigopt.com/blog/icymi-sigopt-summit-recap-democratizing-end-to-end-recommendation-systems-with-jian-zhang/)
99+
* [The SigOpt Intelligent Experimentation Platform](https://www.intel.com/content/www/us/en/developer/articles/technical/sigopt-intelligent-experimentation-platform.html#gs.gz2ls6)
100+
* [SDC2022 - Data Platform for End-to-end AI Democratization](https://storagedeveloper.org/events/sdc-2022/agenda/session/326)
101+
* [SIHG4SR: Side Information Heterogeneous Graph for Session Recommender](https://dl.acm.org/doi/abs/10.1145/3556702.3556852)
102+
* [The Parallel Universe Magazine](https://www.intel.com/content/www/us/en/developer/community/parallel-universe-magazine/overview.html#gs.nznx3b)
103+
* [Accelerating Artificial Intelligence with Intel® End-to-End AI Optimization Kit](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-ai-with-intel-e2e-ai-optimization-kit.html#gs.ox779w)
103104

104105
## Getting Support
105106

10.9 KB
Loading

demo/builtin/rnnt/RNNT_DEMO.ipynb

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -38,22 +38,24 @@
3838
]
3939
},
4040
{
41+
"attachments": {},
4142
"cell_type": "markdown",
4243
"metadata": {},
4344
"source": [
4445
"# Overview\n",
45-
"<img src=\"./img/asr.png\" width=\"800\"/>\n",
46+
"<img src=\"./img/asr.png\" width=\"600\"/>\n",
4647
"\n",
4748
"* The traditional ASR system (top picture) contains acoustic, phonetic and language components that work together as in a pipeline system\n",
4849
"* The end-to-end ASR system is a single neural network that receives raw audio signal as input and provides a sequence of words at output"
4950
]
5051
},
5152
{
53+
"attachments": {},
5254
"cell_type": "markdown",
5355
"metadata": {},
5456
"source": [
5557
"## Model Architecture\n",
56-
"<img src=\"./img/rnnt_structure.png\"/>\n",
58+
"<img src=\"./img/rnnt_structure.png\" width=\"250\"/>\n",
5759
"\n",
5860
"RNN-T is an end-to-end ASR model that directly converts audio into text representation.\n",
5961
"\n",
@@ -70,6 +72,7 @@
7072
]
7173
},
7274
{
75+
"attachments": {},
7376
"cell_type": "markdown",
7477
"metadata": {},
7578
"source": [
@@ -78,7 +81,7 @@
7881
"For RNN-T model democratization, we enabled distributed training with pytorch DDP to scale out model training on multi nodes, added time stack layer and increased time stack factor to reduce input sequence lengh, added layer and batch normalization to speedup training converge, decreased layer size to get a lighter model.\n",
7982
"\n",
8083
"<img src=\"./img/model_base.png\" width=\"600\"/><figure>base model</figure>\n",
81-
"<img src=\"./img/model_opt.png\" width=\"600\"/><figure>democratized model</figure>\n"
84+
"<img src=\"./img/model_opt.png\" width=\"800\"/><figure>democratized model</figure>\n"
8285
]
8386
},
8487
{
@@ -95,6 +98,7 @@
9598
]
9699
},
97100
{
101+
"attachments": {},
98102
"cell_type": "markdown",
99103
"metadata": {},
100104
"source": [
@@ -134,16 +138,17 @@
134138
"\n",
135139
"About 4x speedup after increase time stack factor from 2 to 8.\n",
136140
"\n",
137-
"<img src=\"./img/time_stack_2.PNG\" width=\"600\"/><figure>time_stack = 2</figure>\n",
138-
"<img src=\"./img/time_stack_8.PNG\" width=\"600\"/><figure>time_stack = 8</figure>\n",
141+
"<img src=\"./img/time_stack_2.PNG\" width=\"800\"/><figure>time_stack = 2</figure>\n",
142+
"<img src=\"./img/time_stack_8.PNG\" width=\"800\"/><figure>time_stack = 8</figure>\n",
139143
"\n",
140144
"Profiling data proves that less time cost on forward/backward since input sequence reduced with time stack layer\n",
141145
"\n",
142-
"<img src=\"./img/stack_profile_base.png\" width=\"600\"/><figure>base model profiling</figure>\n",
143-
"<img src=\"./img/stack_profile_democratize.png\" width=\"600\"/><figure>democratized model profiling</figure>\n"
146+
"<img src=\"./img/stack_profile_base.png\" width=\"800\"/><figure>base model profiling</figure>\n",
147+
"<img src=\"./img/stack_profile_democratize.png\" width=\"800\"/><figure>democratized model profiling</figure>\n"
144148
]
145149
},
146150
{
151+
"attachments": {},
147152
"cell_type": "markdown",
148153
"metadata": {},
149154
"source": [
@@ -159,8 +164,8 @@
159164
"self.layer_norm = torch.nn.LayerNorm(hidden_size)\n",
160165
"```\n",
161166
"\n",
162-
"<img src=\"./img/no_norm.PNG\" width=\"600\"/><figure>without normalization</figure>\n",
163-
"<img src=\"./img/norm.PNG\" width=\"600\"/><figure>with normalization</figure>\n"
167+
"<img src=\"./img/no_norm.PNG\" width=\"800\"/><figure>without normalization</figure>\n",
168+
"<img src=\"./img/norm.PNG\" width=\"800\"/><figure>with normalization</figure>\n"
164169
]
165170
},
166171
{
@@ -192,6 +197,7 @@
192197
]
193198
},
194199
{
200+
"attachments": {},
195201
"cell_type": "markdown",
196202
"metadata": {},
197203
"source": [
@@ -204,17 +210,18 @@
204210
"${CONDA_PREFIX}/bin/python -m intel_extension_for_pytorch.cpu.launch --distributed --nproc_per_node=2 --nnodes=4 --hostfile hosts train.py ${ARGS}\n",
205211
"```\n",
206212
"\n",
207-
"<img src=\"./img/no_numa_binding.png\" width=\"600\"/><figure>without numa binding</figure>\n",
208-
"<img src=\"./img/numa_binding.png\" width=\"600\"/><figure>enable numa binding</figure>\n"
213+
"<img src=\"./img/no_numa_binding.png\" width=\"500\"/><figure>without numa binding</figure>\n",
214+
"<img src=\"./img/numa_binding.png\" width=\"500\"/><figure>enable numa binding</figure>\n"
209215
]
210216
},
211217
{
218+
"attachments": {},
212219
"cell_type": "markdown",
213220
"metadata": {},
214221
"source": [
215222
"## Performance\n",
216223
"\n",
217-
"<img src=\"./img/rnnt_perf.png\" width=\"900\"/>\n",
224+
"<img src=\"./img/rnnt_perf.png\" width=\"500\"/>\n",
218225
"\n",
219226
"* Distributed training with HW scaling delivered 3.83x speedup from 1 node to 4 nodes\n",
220227
"* HPO delivered 1.35x speedup, and 5.16x speedup over baseline\n",
@@ -238,12 +245,13 @@
238245
]
239246
},
240247
{
248+
"attachments": {},
241249
"cell_type": "markdown",
242250
"metadata": {},
243251
"source": [
244252
"## 1. Environment Setup\n",
245253
"\n",
246-
"### Option1 Setup Environment with Pip\n",
254+
"### (Option1) Use Pip Install\n",
247255
"pre-work: move e2eAIOK source code to /home/vmagent/app/e2eaiok"
248256
]
249257
},
@@ -269,10 +277,11 @@
269277
]
270278
},
271279
{
280+
"attachments": {},
272281
"cell_type": "markdown",
273282
"metadata": {},
274283
"source": [
275-
"### Option2 Setup Environment with Docker\n",
284+
"### (Option2) Use Docker\n",
276285
"``` bash\n",
277286
"# Setup ENV\n",
278287
"git clone https://github.com/intel/e2eAIOK.git\n",
@@ -312,6 +321,7 @@
312321
]
313322
},
314323
{
324+
"attachments": {},
315325
"cell_type": "markdown",
316326
"metadata": {},
317327
"source": [
@@ -323,20 +333,18 @@
323333
"\n",
324334
"cd /home/vmagent/app/e2eaiok/modelzoo/rnnt/pytorch\n",
325335
"bash scripts/preprocess_librispeech.sh\n",
326-
"```\n",
327-
"\n",
328-
"Notes: RNN-T training is based on LibriSpeech train-clean-100 and evaluated on dev-clean, we evaluated WER with stock model (based on MLPerf submission) at train-clean-100 dataset, and final WER is 0.25, all the following optimization guarantee 0.25 WER. MLPerf submission took 38.7min with 8x A100 on LibriSpeech train-960h dataset.\n",
329-
"\n",
330-
"public reference on train-clean-100: https://arxiv.org/pdf/1807.10893.pdf, https://arxiv.org/pdf/1811.00787.pdf"
336+
"```"
331337
]
332338
},
333339
{
340+
"attachments": {},
334341
"cell_type": "markdown",
335342
"metadata": {},
336343
"source": [
337344
"## 4. Train\n",
338345
"\n",
339-
"Edit config file to control SDA process"
346+
"Edit config file to control SDA process\n",
347+
"> Note: Bellow training script is just for demonstration, and uses small sampled dataset and runs a small iterations. For actual performance result, please refer to [performance](#performance)"
340348
]
341349
},
342350
{
-20.7 KB
Loading

demo/builtin/wnd/WND_DEMO.ipynb

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@
8484
]
8585
},
8686
{
87+
"attachments": {},
8788
"cell_type": "markdown",
8889
"id": "ef18b1d8",
8990
"metadata": {},
@@ -92,12 +93,12 @@
9293
"\n",
9394
"Long idle time per training step for horovod communication, horovod paramter sync consume much time during distributed training, causing poor scaling performance. The overhead mainly caused by large embedding table.\n",
9495
"\n",
95-
"<img src=\"./img/wnd_profile.png\" width=\"600\"/><figure>Distributed training profiling</figure>\n",
96+
"<img src=\"./img/wnd_profile.png\" width=\"800\"/><figure>Distributed training profiling</figure>\n",
9697
"\n",
9798
"Replace custom layer (contains embedding layer) with TensorFlow dense layer help to reduce embedding parameter size, thus reduce parameter size needed to sync by horovod, fix horovod poor scaling issue. Per step training time reduced from 5.16s to 2.71s, got about 1.9x speedup.\n",
9899
"\n",
99-
"<img src=\"./img/wnd_traintime_custom_emd.png\" width=\"600\"/><figure>custom layer</figure>\n",
100-
"<img src=\"./img/wnd_traintime_tf_emd.png\" width=\"600\"/><figure>TensorFlow build-in layer</figure>"
100+
"<img src=\"./img/wnd_traintime_custom_emd.png\" width=\"800\"/><figure>custom layer</figure>\n",
101+
"<img src=\"./img/wnd_traintime_tf_emd.png\" width=\"800\"/><figure>TensorFlow build-in layer</figure>"
101102
]
102103
},
103104
{
@@ -134,6 +135,7 @@
134135
]
135136
},
136137
{
138+
"attachments": {},
137139
"cell_type": "markdown",
138140
"id": "90924745",
139141
"metadata": {},
@@ -199,6 +201,7 @@
199201
]
200202
},
201203
{
204+
"attachments": {},
202205
"cell_type": "markdown",
203206
"id": "b50f0c0c",
204207
"metadata": {},
@@ -229,13 +232,14 @@
229232
]
230233
},
231234
{
235+
"attachments": {},
232236
"cell_type": "markdown",
233237
"id": "ca039366",
234238
"metadata": {},
235239
"source": [
236240
"## 1. Environment Setup\n",
237241
"\n",
238-
"### Option 1 Setup Environment with Pip\n",
242+
"### (Option 1) Use Pip Install\n",
239243
"pre-work: move e2eAIOK source code to /home/vmagent/app/e2eaiok. Install spark and start spark services for data preprocess"
240244
]
241245
},
@@ -258,11 +262,12 @@
258262
]
259263
},
260264
{
265+
"attachments": {},
261266
"cell_type": "markdown",
262267
"id": "f46ff1b7",
263268
"metadata": {},
264269
"source": [
265-
"### Option 2 Setup Environment with Docker\n",
270+
"### (Option 2) Use Docker\n",
266271
"``` bash\n",
267272
"# Setup ENV\n",
268273
"git clone https://github.com/intel/e2eAIOK.git\n",
@@ -364,13 +369,15 @@
364369
]
365370
},
366371
{
372+
"attachments": {},
367373
"cell_type": "markdown",
368374
"id": "509d6539",
369375
"metadata": {},
370376
"source": [
371377
"## 4. Train\n",
372378
"\n",
373-
"Edit config file to control SDA process"
379+
"Edit config file to control SDA process\n",
380+
"> Note: Bellow training script is just for demonstration, and uses small sampled dataset and runs a small iterations. For actual performance result, please refer to [performance](#performance)"
374381
]
375382
},
376383
{

demo/builtin/wnd/img/wnd_perf.png

-349 Bytes
Loading

demo/denas/asr/DENAS_ASR_DEMO.ipynb

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,7 @@
4646
]
4747
},
4848
{
49+
"attachments": {},
4950
"cell_type": "markdown",
5051
"id": "8c1685c8",
5152
"metadata": {},
@@ -57,18 +58,21 @@
5758
"Transformer based search space consists of attention layer, layer normalization and feed forward layer, the search space can be controled by setting network depth, number attention heads, MLP layer ratio and layer dimension.\n",
5859
"\n",
5960
"<center>\n",
60-
"<img src=\"./img/asr_search_space.png\" width=\"80%\"/><figure>DE-NAS ASR Search Space and Supernet</figure>\n",
61+
"<img src=\"./img/asr_search_space.png\" width=\"60%\"/><figure>DE-NAS ASR Search Space and Supernet</figure>\n",
6162
"</center>"
6263
]
6364
},
6465
{
66+
"attachments": {},
6567
"cell_type": "markdown",
6668
"id": "421df1d6",
6769
"metadata": {},
6870
"source": [
6971
"## Performance\n",
7072
"\n",
71-
"<img src=\"./img/denas_asr_perf.png\" width=\"900\"/>\n",
73+
"<center>\n",
74+
"<img src=\"./img/denas_asr_perf.png\" width=\"600\"/>\n",
75+
"</center>\n",
7276
"\n",
7377
"* Testing methodology\n",
7478
" * Dataset: LibriSpeech, Metrics: WER 5.8%\n",
@@ -90,7 +94,7 @@
9094
"\n",
9195
"## 1. Environment Setup\n",
9296
"\n",
93-
"### Option 1 Setup Environment with Pip"
97+
"### (Option 1) Use Pip Install"
9498
]
9599
},
96100
{
@@ -106,11 +110,12 @@
106110
]
107111
},
108112
{
113+
"attachments": {},
109114
"cell_type": "markdown",
110115
"id": "03f3d2b7",
111116
"metadata": {},
112117
"source": [
113-
"### Option 2 Setup Environment with Docker\n",
118+
"### (Option 2) Use Docker\n",
114119
"\n",
115120
"``` bash\n",
116121
"# Setup ENV\n",
@@ -300,13 +305,15 @@
300305
]
301306
},
302307
{
308+
"attachments": {},
303309
"cell_type": "markdown",
304310
"id": "1555538c",
305311
"metadata": {},
306312
"source": [
307313
"## 5. Train\n",
308314
"\n",
309-
"Load searched best model in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt` and launch training with training configuration in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_train_asr.conf`"
315+
"Load searched best model in `/home/vmagent/app/e2eaiok/e2eAIOK/DeNas/best_model_structure.txt` and launch training with training configuration in `${e2eAIOK_install_dir}/conf/denas/asr/e2eaiok_denas_train_asr.conf`\n",
316+
"> Note: Bellow training script is just for demonstration, and uses small sampled dataset and runs a small iterations. For actual performance result, please refer to [performance](#performance)"
310317
]
311318
},
312319
{
-80.3 KB
Loading

0 commit comments

Comments
 (0)