ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/cross-platform/multiplying-matrices-with-sme2/1-get-started.md‎
Lines changed: 1 addition & 0 deletions b/‎content/learning-paths/cross-platform/multiplying-matrices-with-sme2/1-get-started.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/1-about.md‎
Lines changed: 4 additions & 4 deletions b/‎content/learning-paths/cross-platform/simd-loops/1-about.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/2-using.md‎
Lines changed: 12 additions & 10 deletions b/‎content/learning-paths/cross-platform/simd-loops/2-using.md‎
Lines changed: 12 additions & 10 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/3-example.md‎
Lines changed: 3 additions & 5 deletions b/‎content/learning-paths/cross-platform/simd-loops/3-example.md‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/4-conclusion.md‎
Lines changed: 3 additions & 3 deletions b/‎content/learning-paths/cross-platform/simd-loops/4-conclusion.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-loops/_index.md‎
Lines changed: 14 additions & 5 deletions b/‎content/learning-paths/cross-platform/simd-loops/_index.md‎
Lines changed: 14 additions & 5 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-on-rust/_index.md‎
Lines changed: 7 additions & 4 deletions b/‎content/learning-paths/cross-platform/simd-on-rust/_index.md‎
Lines changed: 7 additions & 4 deletions
diff --git a/‎content/learning-paths/cross-platform/simd-on-rust/conclusion.md‎
Lines changed: 1 addition & 1 deletion b/‎content/learning-paths/cross-platform/simd-on-rust/conclusion.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎content/learning-paths/cross-platform/simd-on-rust/intro-to-rust.md‎
Lines changed: 11 additions & 11 deletions b/‎content/learning-paths/cross-platform/simd-on-rust/intro-to-rust.md‎
Lines changed: 11 additions & 11 deletions
@@ -319,6 +319,7 @@ These Android phones support SME2 natively.
 |-------------------------------------|--------------|---------------------------|
 | Vivo X300                           | 2025         | MediaTek Dimensity 9500 featuring an 8-core Arm C1 CPU cluster and Arm G1-Ultra GPU |
 | OPPO Find X9                        | 2025         | MediaTek Dimensity 9500 featuring an 8-core Arm C1 CPU cluster and Arm G1-Ultra GPU |
+| Samsung Galaxy S26                  | 2026         | Exynos 2600 variant |
 
 These Apple devices support SME2 natively.
 
 
@@ -6,11 +6,11 @@ weight: 2
 layout: learningpathall
 ---
 
-## Introduction to SIMD on Arm and why it matters for performance on Arm CPUs
+## Introduction to SIMD on Arm
 
-Writing high-performance software on Arm often means using single-instruction, multiple-data (SIMD) technologies. Many developers start with Neon, a familiar fixed-width vector extension. As Arm architectures evolve, so do the SIMD capabilities available to you.
+Writing high-performance software on Arm often means using single-instruction, multiple-data (SIMD) technologies. Many developers start with Neon, a familiar fixed-width vector extension. As Arm architectures evolve, the SIMD capabilities available to you also expand.
 
-This Learning Path uses the Scalable Vector Extension (SVE) and the Scalable Matrix Extension (SME) to demonstrate modern SIMD patterns. They are two powerful, scalable vector extensions designed for modern workloads. Unlike Neon, these architecture extensions are not just wider; they are fundamentally different. They introduce predication, vector-length-agnostic (VLA) programming, gather/scatter, streaming modes, and tile-based compute with ZA state. The result is more power and flexibility, but there can be a learning curve to match.
+This Learning Path uses the Scalable Vector Extension (SVE) and the Scalable Matrix Extension (SME) to demonstrate modern SIMD patterns. These are two powerful, scalable vector extensions designed for modern workloads. Unlike Neon, these architecture extensions aren't just wider; they're fundamentally different. They introduce predication, vector-length-agnostic (VLA) programming, gather/scatter, streaming modes, and tile-based compute with ZA state. The result is more power and flexibility, but there can be a learning curve to match.
 
 ## What is the SIMD Loops project?
 
@@ -30,6 +30,6 @@ The project includes:
 - A simple command-line runner to execute any loop interactively
 - Optional standalone binaries for bare-metal and simulator use
 
-You do not need to rely on auto-vectorization or guess at compiler flags. Each loop is handwritten and annotated to make the intended use of SIMD features clear. Study a kernel, modify it, rebuild, and observe the effect - this is the core learning loop.
+You don't need to rely on auto-vectorization or guess at compiler flags. Each loop is handwritten and annotated to make the intended use of SIMD features clear. Study a kernel, modify it, rebuild, and observe the effect—this is the core learning loop.
 
 
@@ -27,7 +27,7 @@ Expected output on Linux:
 aarch64
 ```
 
-Expected output on macOS:
+On macOS, the expected output is:
 
 ```output
 arm64
@@ -86,45 +86,45 @@ Each loop is implemented in several SIMD extension variants. Conditional compila
 
 The native C implementation is written first, and it can be generated either when building natively with `-DHAVE_NATIVE` or through compiler auto-vectorization with `-DHAVE_AUTOVEC`.
 
-When SIMD ACLE is supported (SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE support is not available, the build process falls back to handwritten inline assembly targeting one of the available SIMD extensions, such as SME2.1, SME2, SVE2.1, SVE2, and others.
+When SIMD ACLE is supported (SME, SVE, or Neon), the code is compiled using high-level intrinsics. If ACLE support isn't available, the build process falls back to handwritten inline assembly targeting one of the available SIMD extensions, such as SME2.1, SME2, SVE2.1, SVE2, and others.
 
 The overall code structure also includes setup and cleanup code in the main function, where memory buffers are allocated, the selected loop kernel is executed, and results are verified for correctness.
 
-At compile time, you can select which loop optimization to compile, whether it is based on SME or SVE intrinsics, or one of the available inline assembly variants.
+At compile time, you can select which loop optimization to compile, whether it's based on SME or SVE intrinsics, or one of the available inline assembly variants.
+
+To compile the project, run make in the project directory:
 
 ```console
 make
 ```
 
-With no target specified, the list of targets is printed:
+With no target specified, the output shows the list of available targets:
 
 ```output
 all fmt clean c-scalar scalar autovec-sve autovec-sve2 neon sve sve2 sme2 sme-ssve sve2p1 sme2p1 sve-intrinsics sme-intrinsics
 ```
 
-Build all loops for all targets:
+To build all loops for all targets, run:
 
 ```console
 make all
 ```
 
-Build all loops for a single target, such as Neon:
+To build all loops for a single target, such as Neon, run:
 
 ```console
 make neon
 ```
 
 As a result of the build, two types of binaries are generated.
 
-The first is a single executable named `simd_loops`, which includes all loop implementations.
-
-Select a specific loop by passing parameters to the program. For example, to run loop 1 for 5 iterations using the Neon target:
+To select a specific loop, pass parameters to the program. For example, to run loop 1 for 5 iterations using the Neon target:
 
 ```console
 build/neon/bin/simd_loops -k 1 -n 5
 ```
 
-Example output:
+The expected output is:
 
 ```output
 Loop 001 - FP32 inner product
@@ -140,6 +140,8 @@ To run loop 1 as a standalone binary:
 build/neon/standalone/bin/loop_001.elf
 ```
 
+The expected output is
+
 Example output:
 
 ```output
 
@@ -229,8 +229,6 @@ For instruction semantics and SME/SME2 optimization guidance, see the [SME Progr
 
 Beyond the SME2 and SVE implementations, this loop also includes additional optimized versions that leverage architecture-specific features:
 
-- **Neon**: the Neon version (lines 612–710) uses structure load/store combined with indexed `fmla` to vectorize the computation.
-
-- **SVE2.1**: the SVE2.1 version (lines 355–462) extends the base SVE approach using multi-vector loads and stores.
-
-- **SME2.1**: the SME2.1 version uses `movaz`/`svreadz_hor_za8_u8_vg4` to reinitialize `ZA` tile accumulators while moving data out to registers.
+- **Neon**: the Neon version (lines 612–710) uses structure load/store combined with indexed `fmla` to vectorize the computation
+- **SVE2.1**: the SVE2.1 version (lines 355–462) extends the base SVE approach using multi-vector loads and stores
+- **SME2.1**: the SME2.1 version uses `movaz`/`svreadz_hor_za8_u8_vg4` to reinitialize `ZA` tile accumulators while moving data out to registers
@@ -1,16 +1,16 @@
 ---
-title: How to learn with SIMD Loops
+title: Learning with SIMD Loops
 weight: 5
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## Bridging the gap between specs and real code
+## Bridging the gap between specifications and real code
 
 SIMD Loops is a practical way to learn the intricacies of SVE and SME across modern Arm architectures. By providing small, runnable loop kernels with reference code and optimized variants, it closes the gap between architectural specifications and real applications.
 
-Whether you are moving from Neon or starting directly with SVE and SME, the project offers:
+Whether you're moving from Neon or starting directly with SVE and SME, the project offers:
 - A broad catalog of kernels that highlight specific features (predication, VLA programming, gather/scatter, streaming mode, ZA tiles)
 - Clear, readable implementations in C, ACLE intrinsics, and selected inline assembly
 - Flexible build targets and a simple runner to execute and validate loops
 
@@ -1,20 +1,21 @@
 ---
-title: "Code kata: perfect your SVE and SME skills with SIMD Loops"
+title: Learn SVE and SME programming with SIMD Loops
+
+description: Learn how to write high-performance SIMD code using the SIMD Loops project, with hands-on examples demonstrating SVE, SVE2, and SME2 features on Arm processors.
 
 minutes_to_complete: 30
 
 who_is_this_for: This is an advanced topic for software developers who want to learn how to use the full range of features available in SVE, SVE2, and SME2 to improve software performance on Arm processors.
 
 learning_objectives:
      - Improve SIMD code performance using Scalable Vector Extension (SVE) and Scalable Matrix Extension (SME)
-     - Describe what SIMD Loops contains and how kernels are organized across scalar, Neon, SVE,SVE2, and SME2 variants
+     - Describe what SIMD Loops contains and how kernels are organized across scalar, Neon, SVE, SVE2, and SME2 variants
      - Build and run a selected kernel with the provided runner and validate correctness against the C reference
      - Choose the appropriate build target to compare Neon, SVE/SVE2, and SME2 implementations
 
-
 prerequisites:
-    - An AArch64 computer running Linux or macOS. You can use cloud instances, refer to [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/) for a list of cloud service providers. 
-    - Some familiarity with SIMD programming and Neon intrinsics.
+    - An AArch64 computer running Linux or macOS. You can use cloud instances, refer to [Get started with Arm-based cloud instances](/learning-paths/servers-and-cloud-computing/csp/) for a list of cloud service providers
+    - Some familiarity with SIMD programming and Neon intrinsics
     - Recent toolchains that support SVE/SME (GCC 13+ or Clang 16+ recommended)
 
 author:
@@ -48,6 +49,14 @@ further_reading:
         title: SVE Programming Examples
         link: https://developer.arm.com/documentation/dai0548/latest
         type: documentation
+    - resource:
+        title: SIMD Loops Repository
+        link: https://gitlab.arm.com/architecture/simd-loops
+        type: documentation
+    - resource:
+        title: Scalable Vector Extensions Resources
+        link: https://developer.arm.com/Architectures/Scalable%20Vector%20Extensions
+        type: documentation
     - resource:
         title: Port Code to Arm Scalable Vector Extension (SVE)
         link: /learning-paths/servers-and-cloud-computing/sve
 
@@ -1,17 +1,20 @@
 ---
-title: Learn how to write SIMD code on Arm using Rust
+title: Write SIMD code on Arm using Rust
 
 minutes_to_complete: 30
 
 description: Learn how to write SIMD code in Rust on Arm platforms using Neon intrinsics, portable SIMD abstractions, and optimize performance with architecture-specific instructions.
 
-who_is_this_for: This is an advanced topic for software developers who want take advantage of SIMD code on Arm systems using Rust.
+who_is_this_for: This is an advanced topic for software developers who want to take advantage of SIMD code on Arm systems using Rust.
 
 learning_objectives: 
-    - Learn how to write SIMD code with Rust on Arm.
+    - Write SIMD code with Rust using std::arch and Neon intrinsics on Arm
+    - Use portable SIMD abstractions with std::simd for cross-platform code
+    - Apply feature detection and target attributes for architecture-specific optimizations
+    - Compare C and Rust SIMD implementations and disassembly output
 
 prerequisites:
-    - An Arm-based computer with recent versions of a C compiler (Clang or GCC) and a Rust compiler installed.
+    - An Arm-based computer with recent versions of a C compiler (Clang or GCC) and a Rust compiler installed
 
 author: Konstantinos Margaritis
 
 
@@ -8,5 +8,5 @@ layout: learningpathall
 
 You have now seen a few examples of writing SIMD code on Arm with Rust. 
 
-Performance-wise, there is little difference between C and Rust as Rust is perfectly capable of generating the same assembly code as C in most cases. That said, if you want to program optimal SIMD code using the Arm ASIMD/Neon intrinsics, `std::arch` is the most obvious choice. If, however, your approach needs to be as portable as possible and you don't want to spend time providing multiple implementations for each architecture then `std::simd` is a very viable alternative (even though it's not part of the stable compiler yet).
+Performance-wise, there's little difference between C and Rust as Rust is perfectly capable of generating the same assembly code as C in most cases. That said, if you want to program optimal SIMD code using the Arm ASIMD/Neon intrinsics, `std::arch` is the most obvious choice. If, however, your approach needs to be as portable as possible and you don't want to spend time providing multiple implementations for each architecture then `std::simd` is a very viable alternative (even though it's not part of the stable compiler yet).
 
@@ -12,10 +12,10 @@ In this Learning Path, you will learn the basics of how to program SIMD code on
 
 Rust is a safe programming language with some key advantages:
 
-* It is a modern, strong-typed language.
-* Rust is memory safe by design: it is very difficult to introduce a bug like buffer overflow with Rust.
-* Strict language: the Rust compiler is very strict and does not let you make easy mistakes as you might with C.
-* The usage and support for Rust is expanding to many architectures and operating systems.
+* It's a modern, strong-typed language
+* Rust is memory safe by design: it's very difficult to introduce a bug like buffer overflow with Rust
+* Strict language: the Rust compiler is very strict and doesn't let you make easy mistakes as you might with C
+* The usage and support for Rust is expanding to many architectures and operating systems
 
 ## SIMD with Rust
 
@@ -24,19 +24,19 @@ Support for intrinsics in languages such as C and C++ is generally added by the
 Rust is a little different in that regard. While vendors are still very involved in providing the support for SIMD intrinsics in the compiler, there are other alternatives and approaches used to provide SIMD abstraction.
 
 Currently there are 2 SIMD programming interfaces in Rust:
-* One under `std::arch` which follows the C intrinsics as much as possible.
-* Another, `std::simd`, which provides a portable abstraction to SIMD programming so that code can just be recompiled across different architectures with more or less the same results. While there are similar libraries for C and C++, this is different in that the intent is for it to be merged as an official extension to the Rust standard library under `std::simd`.
+* One under `std::arch` which follows the C intrinsics as much as possible
+* Another, `std::simd`, which provides a portable abstraction to SIMD programming so that code can be recompiled across different architectures with more or less the same results. While there are similar libraries for C and C++, this is different in that the intent is for it to be merged as an official extension to the Rust standard library under `std::simd`
 
 You will learn how to use both of these interfaces to write code that uses Advanced SIMD/Neon instructions on an Arm CPU.
 
 Before you start, make sure you have the [Rust compiler installed](/install-guides/rust). 
 
-You can check if you have a working `rustc` compiler installed by running the following command:
+To check if you have a working `rustc` compiler installed, run the following command:
 
 ```bash
 rustc --version
 ```
-Your output should look similar to the following:
+The output should look similar to:
 
 ```bash
 rustc 1.79.0 (129f3b996 2024-06-10)
@@ -50,15 +50,15 @@ Switch to the `nightly` version to `rustc` by running the following:
 rustup default nightly
 ```
 
-Now run the version command again to check if you have the right version:
+To check the version again, run:
 
 ```bash
 rustc --version
 ```
-Your output should now look similar to the following:
+The output should now look similar to:
 
 ```bash
 rustc 1.82.0-nightly (92c6c0380 2024-07-21)
 ```
 
-Now that you have a working Rust compiler with the features supported in the nightly version, you can continue with building and running the examples included in this learning path. Please note that the code examples in this learning path are not optimally written for Rust (to do that you would have to use `cargo`, find the proper `crates` to do specific tasks, for example for 2D arrays, which would increase the complexity of this learning path significantly).
+Now that you have a working Rust compiler with the features supported in the nightly version, you can continue with building and running the examples included in this Learning Path. The code examples in this Learning Path aren't optimally written for Rust (to do that you would have to use `cargo`, find the proper `crates` to do specific tasks, for example for 2D arrays, which would increase the complexity of this Learning Path significantly).
Original file line number	Diff line number	Diff line change
`@@ -8,5 +8,5 @@ layout: learningpathall`
`8`	`8`
`9`	`9`	`You have now seen a few examples of writing SIMD code on Arm with Rust.`
`10`	`10`
`11`		-Performance-wise, there is little difference between C and Rust as Rust is perfectly capable of generating the same assembly code as C in most cases. That said, if you want to program optimal SIMD code using the Arm ASIMD/Neon intrinsics, `std::arch` is the most obvious choice. If, however, your approach needs to be as portable as possible and you don't want to spend time providing multiple implementations for each architecture then `std::simd` is a very viable alternative (even though it's not part of the stable compiler yet).
	`11`	+Performance-wise, there's little difference between C and Rust as Rust is perfectly capable of generating the same assembly code as C in most cases. That said, if you want to program optimal SIMD code using the Arm ASIMD/Neon intrinsics, `std::arch` is the most obvious choice. If, however, your approach needs to be as portable as possible and you don't want to spend time providing multiple implementations for each architecture then `std::simd` is a very viable alternative (even though it's not part of the stable compiler yet).
`12`	`12`