fix broken link and remove duplicate content (#908)

gooishin · gooishin · commit 6f33e0c41893 · 2026-01-13T10:46:34.000+08:00
diff --git a/usecases/real-time/tcc_tutorial/Tutorial.md b/usecases/real-time/tcc_tutorial/Tutorial.md
@@ -212,156 +212,4 @@ In this tutorial, three easy-to-use features of the Intel TCC toolbox, along wit
 
 By leveraging these tools, system integrators can efficiently enhance the performance and reliability of their real-time applications.
 
-<span style="color:red"> Note: Please keep in mind that performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndexPerformance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  No product or component can be absolutely secure</span>
-
-### 2. Intel® Cache Allocation Technology 
-In the second step, assume a system architecture where workloads with varying Quality of Service (QoS) requirements are consolidated on a single processor. This includes best-effort workloads, such as User Interface (UI) or Artificial Intelligence (AI) tasks, as well as real-time workloads, like the real-time test application. In such designs, shared resources like cache can become potential sources of contention.
-
-Let's take a look at how Intel® Cache Allocation Technology (CAT) can help mitigate these sources of contention. CAT provides the ability to partition caches at various levels in the caching hierarchy. For example, consider the cache architecture of the Intel® Core™ i5-1350PE processor illustrated below. Initially, the default cache configuration is used, where all cache ways are shared. In the second step, the statistics are compared with the Last Level Cache (LLC) partitioned to provide an exclusive portion of the cache to the real-time test application. In both scenarios, a memory-centric workload on the best-effort cores is simulated using stress-ng.
-
-<p align="center">
-  <img src="images/setup_CAT.png " alt="Cache Partitioning - System Setup" style="width: 70%;">
-</p>
-
-#### Execution and Analysis
-1. Start the real-time application if it is not already running, and output the statistics to the Grafana dashboard. Monitor the statistics and start in a second terminal the AI object classification demo from OpenVINO model zoo.
-
-  ```sh
-  sudo ./rt_linux_tutorial -i 1000 -s 1
-  ```
-  ```sh
-  sudo docker run --rm --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -it demo_obj_classification /bin/bash
-  sudo python3 classification_demo.py --device GPU -m /workspace/open_model_zoo/tools/model_tools/public/resnet-50-tf/FP32/resnet-50-tf.xml -i /workspace/fruit-and-vegetable-detection.mp4 --label /workspace/open_model_zoo/data/dataset_classes/imagenet_2012.txt
-```
-You can use the following command as an alternative to the AI object classification demo:
-
-```sh
-  stress-ng --vm 8 --vm-bytes 128M --fork 4
-  Options:
-    -vm 8: This parameter specifies that 8 virtual memory stressor instances should be started. Each instance will allocate and stress test memory.
-    -vm-bytes 128M: This parameter specifies the amount of memory each virtual memory stressor instance should allocate. In this case, each of the 8 instances will allocate 128 megabytes of memory.
-    -fork 4: This parameter specifies that 4 child processes should be forked. Each child process will execute the stress test independently.
-  ```
-2. Partition the Last Level Cache (LLC) and assign an exclusive portion of the cache to the real-time test application, as demonstrated for the Intel® Core™ i5-1350PE above. Here is how the LLC can be partitioned using the Linux `msr-tools`:
-  ```sh
-    #define LLC Core Masks
-    wrmsr 0xc90 0x30 # best effort mask
-    wrmsr 0xc91 0xFC # real-time mask
-    #define LLC GT Mask 
-    wrmsr 0x18b0 0x80 # iGPU mask
-    
-    #assign the masks to the cores.
-    #This has to match with the core selected for the rt app
-    wrmsr -a 0xc8f 0x0 # assign all cores to the CLOS0 
-    wrmsr -p 3 0xc8f 0x100000000 # assign real-time core to CLOS1
-
-    #There is also the pqos Linux command-line utility which is part of the intel-cmt-cat package which can be used.
-  ```
-  
-Alternatively, you can use the script with the `rt_optimized` option to partition the cache as demonstrated above, or with the `default` option for flat partitioning.
-  ```sh
-  sudo ./setCacheAllocation.sh rt_optimized
-  ```  
-  <span style="color:red"> Note: The script and the masks defined above are examples tailored for the cache topology of the Intel® Core™ i5-1350PE processor and the specific use-case. You may need to adapt them to match the cache topology of your processor and use-case.</span>
-<p align="center">
-  <img src="images/result_CAT.png " alt="Cache Partitioning - Results" style="width: 80%;">
-</p>
-
-Examining the performance metrics of the real-time application shows that, in the initial phase, the cores running the best-effort workloads are mostly idling. During this phase, the execution time is very consistent and within a reasonable range. However, when stress-ng starts, both the execution time and the number of LLC misses increase significantly as you can see in step 1b. This occurs because the data of the real-time application is frequently evicted from the LLC and must be fetched from the main memory.
-As demonstrated in the second step of the measurement, applying cache partitioning helps reduce the LLC misses for the real-time application, bringing the execution time back to a reasonable range.
-
-As demonstrated, partitioning the cache using Intel® Cache Allocation Technology (CAT) is a straightforward way to improve temporal isolation between real-time and best-effort workloads.
-
-This is just an example, and the configuration needs to be adjusted to your specific use case and processor. You can determine the cache topology, including the size and number of ways supported for a particular processor, by using the CPUID leaf "Deterministic Cache Parameters Leaf - 0x4." Additionally, Linux utilities like lstopo are very useful for getting an overview of the cache topology of a processor. Here are some references if you need more information about CAT ...
-  
-  - Public Intel® Time Coordinated Computing (TCC) User Guide - RDC #[831067](https://cdrdv2.intel.com/v1/dl/getContent/831067)
-  - Intel® Resource Director Technology (Intel® RDT) Architecture Specification - RDC #[789566](https://cdrdv2.intel.com/v1/dl/getContent/789566)
-  - Intel® 64 and IA-32 Architectures Software Developer’s Manual - RDC#[671200](https://cdrdv2.intel.com/v1/dl/getContent/671200)
-
- ### 3. Intel® Speed Shift technology for Edge Computing
-
-In the third and final step of this tutorial, let's examine another aspect of power management: Performance states, or P-States. P-States enable the scaling of the processor's frequency and voltage to reduce CPU power consumption. They are part of Dynamic Voltage and Frequency Scaling (DVFS) features such as Intel® Speed Step, Speed Shift, and Turbo Boost Technology. Speed Step and Speed Shift adjust the processor's voltage and frequency within these P-States to balance power efficiency and performance, while Turbo Boost allows the processor to temporarily exceed the highest P-State to provide additional performance during demanding tasks.
-
-Until the 11th generation of Intel Core processors, it was recommended for hard real-time use cases to disable all DVFS features in the BIOS, which would lock the frequency of all cores permanently to the base frequency. Starting with the 11th generation of Intel Core processors, P-State transitions were optimized. As a result, from the 11th generation onward, Intel® Speed Step, Speed Shift, and Turbo Boost Technology are no longer disabled if you enable <span style="font-family: 'Courier New';">Intel® TCC Mode</span> in the BIOS. You still have the option to lock core frequency during runtime using the HWP MSRs or the intel_pstate driver under Linux.
-
-With this knowledge, let's revisit the performance metrics. First, lock the core frequency of all cores to the base frequency. In the second step, boost the frequency of the real-time core to a value within the turbo frequency range to leverage higher single-threaded performance. Here let's follow the recommendations for the enveloping frequency configurations which are listed in the [TCC User Guide](https://cdrdv2.intel.com/v1/dl/getContent/831067) for the specific processor SKU.
-
-More information about HWP and the MSR can be found in the Intel® 64 and IA-32 Architectures Software Developer’s Manual Vol3 section "Power and Thermal Management-Hardware Controlled Performance States - RDC#[671200](https://cdrdv2.intel.com/v1/dl/getContent/671200) 
-
-<p align="center">
-  <img src="images/setup_CAT_isol_boost.png " alt="Speed Shift - System Setup" style="width: 70%;">
-</p>
-
-#### Execution and Analysis
-
-1.  Start the real-time application if it is not already running, and output the statistics to the Grafana dashboard. Monitor the statistics and start in a second terminal the AI object classification demo from OpenVINO model zoo.
-  ```sh
-  sudo ./rt_linux_tutorial -i 1000 -s 1
-  ```
-```sh
-  sudo docker run --rm --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix -it demo_obj_classification /bin/bash
-  sudo python3 classification_demo.py --device GPU -m /workspace/open_model_zoo/tools/model_tools/public/resnet-50-tf/FP32/resnet-50-tf.xml -i /workspace/fruit-and-vegetable-detection.mp4 --label /workspace/open_model_zoo/data/dataset_classes/imagenet_2012.txt
-```
-You can use the following command as an alternative to the AI object classification demo:
-
-```sh
-  stress-ng --vm 8 --vm-bytes 128M --fork 4
-  Options:
-    -vm 8: This parameter specifies that 8 virtual memory stressor instances should be started. Each instance will allocate and stress test memory.
-    -vm-bytes 128M: This parameter specifies the amount of memory each virtual memory stressor instance should allocate. In this case, each of the 8 instances will allocate 128 megabytes of memory.
-    -fork 4: This parameter specifies that 4 child processes should be forked. Each child process will execute the stress test independently.
-  ```
-  Ensure that CAT partitioning is still applied or run the following script again.
-  
-  ```sh
-  sudo ./setCacheAllocation.sh rt_optimized
-  ```  
-
-2. Lock the core frequency of all cores to base frequency of the core and tune the Energy Performance Preferences (EPP) towords `performance` this complies with disabling DVFS features in BIOS. Here is an example how you can use the sysfs entries of intel_pstate driver ...
-
-```sh
-# Loop through each CPU core and set the min and max frequencies to base frequency
-for CPU in /sys/devices/system/cpu/cpu[0-11]*; do
-    BASE_FREQUENCY=$(cat $CPU/cpufreq/base_frequency)
-    echo $BASE_FREQUENCY | sudo tee $CPU/cpufreq/scaling_min_freq
-    echo $BASE_FREQUENCY | sudo tee $CPU/cpufreq/scaling_max_freq
-    echo performance | sudo tee $CPU/cpufreq/energy_performance_preference
-done
-```
-Alternatively, you can use the script with `basefrequency`parameter ...
-
-```sh
-sudo ./setsetCoreFrequency.sh basefrequency
-``` 
-3. Boost the frequency of the core running the real-time application as described in the enveloping configuration in the TCC User Guide. In this configuration, the maximum allowed frequency of all best-effort cores is limited to the base frequency, and the Energy Performance Preferences (EPP) is set to power. This setup allows the best-effort cores to scale their frequency between the minimum and base frequency depending on core utilization. For the real-time core, the frequency is boosted to 3.1 GHz, and the EPP is set to performance to ensure Quality of Service (QoS) in case of power limit throttling.
-Use the script with `rt-boost`, followed by the identfier of the real-time core and the desired core frequency ...
-
-  ```sh
-  sudo ./setsetCoreFrequency.sh rt-boost '3' 3100000 3100000
-  ``` 
-  <span style="color:red"> Note: The script and the specified frequencies are examples tailored for this tutorial and the Intel® Core™ i5-1350PE processor. You may need to adapt them to match your processor and use case.</span>
-
-For more information on directly accessing the HWP MSR instead of using the sysfs entries of the intel_pstate driver, please refer to the [TCC User Guide](https://cdrdv2.intel.com/v1/dl/getContent/831067) and in the Intel® 64 and IA-32 Architectures Software Developer’s Manual Vol3 section "Power and Thermal Management-Hardware Controlled Performance States - RDC#[671200](https://cdrdv2.intel.com/v1/dl/getContent/671200) 
-
-<p align="center">
-  <img src="images/result_CAT_isol_boost.png " alt="Speed Shift - Results" style="width: 80%;">
-</p>
-Examining the performance metrics, you can observe the following:
-
-- In the first step of the graph, where HWP was still able to scale the P-State, the execution time jitter band remains noisy.
-- Step 2 shows that locking the frequency of all cores to the base frequency reduces the jitter band but also significantly increases the execution time.
-- Finally, in step 3, locking the core frequency of the core running the real-time application to a turbo frequency and limiting the maximum allowed frequency of the best-effort (BE) cores to the base frequency, following the guidance of the enveloping configurations listed in the TCC User Guide, results in a narrower execution time jitter band and a significantly lower execution time.
-
-
-As you can see, locking the core frequency of the core running the real-time application helps reduce execution time jitter. Additionally, boosting the core frequency of the real-time core can be highly beneficial for use cases that require higher single-threaded performance.
-
-### Conclusion
-
-In this tutorial, three easy-to-use features of the Intel TCC toolbox, along with some kernel command line parameters, were introduced to optimize real-time performance. These features and techniques help system integrators quickly tune the system for specific real-time use cases:
-- TCC Mode: Optimizes firmware for low latency with a single configuration knob.
-- Cache Allocation Technology (CAT): Enables quick partitioning of the cache to improve temporal isolation.
-- Speed Shift for Edge Computing: Can be used to boost single-threaded performance.
-
-By leveraging these tools, system integrators can efficiently enhance the performance and reliability of their real-time applications.
-
-<span style="color:red"> Note: Please keep in mind that performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndexPerformance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  No product or component can be absolutely secure</span> 
+<span style="color:red"> Note: Please keep in mind that performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  No product or component can be absolutely secure</span>