tutorial/ai-ml/machine-learning/programming-fundamentals/essential-libraries/numpy.mdx at b098e2c66e44a01782a3b783cf790e36f4a5f30e · codeharborhub/tutorial

title

NumPy: Numerical Python

sidebar_label

NumPy

description

Mastering N-dimensional arrays, vectorization, and broadcasting: the foundational tools for numerical computing in ML.

1. Why NumPy? (Speed & Efficiency)

Python lists are flexible but slow because they store pointers to objects scattered in memory. NumPy arrays store data in contiguous memory blocks, allowing the CPU to process them using SIMD (Single Instruction, Multiple Data).

graph LR
    List[Python List] --> L_Ptr["Scattered Memory (Slow)"]
    Array[NumPy Array] --> A_Cont["Contiguous Block (Fast)"]
    style Array fill:#e1f5fe,stroke:#01579b,color:#333

2. Array Anatomy and Shapes

In ML, we describe data by its Rank (number of dimensions) and Shape.

Scalar (Rank 0): A single number.
Vector (Rank 1): A line of numbers (e.g., a single sample's features).
Matrix (Rank 2): A table of numbers (e.g., a whole dataset).
Tensor (Rank 3+): Higher dimensional arrays (e.g., a batch of color images).

import numpy as np

# Creating a 2D Matrix
data = np.array([[1, 2, 3], [4, 5, 6]])
print(data.shape)  # Output: (2, 3) -> 2 rows, 3 columns

3. Vectorization

Vectorization is the practice of replacing explicit for loops with array expressions. This is how we achieve high performance in Python.

Instead of this:

# Slow: Element-wise addition with a loop
result = []
for i in range(len(a)):
    result.append(a[i] + b[i])

Do this:

# Fast: NumPy handles the loop in C
result = a + b

4. Broadcasting: The Magic of NumPy

Broadcasting allows NumPy to perform arithmetic operations on arrays with different shapes, provided they meet certain compatibility rules.

graph TD
    A["Matrix: (3, 3)"] 
    B["Scalar: (1,)"]
    A -->|Add| B
    B -->|Broadcast| B_Stretch["Stretched to (3, 3)"]
    B_Stretch --> Result["Element-wise Sum"]

Example: Adding a constant bias to every row in a dataset.

features = np.array([[10, 20], [30, 40]]) # Shape (2, 2)
bias = np.array([5, 5])                  # Shape (2,)
result = features + bias                 # [[15, 25], [35, 45]]

5. Critical ML Operations in NumPy

Operation	NumPy Function	ML Use Case
Dot Product	`np.dot(a, b)`	Calculating weighted sums in a neuron.
Reshaping	`arr.reshape(1, -1)`	Changing an image from 2D to a 1D feature vector.
Transposing	`arr.T`	Aligning dimensions for matrix multiplication.
Aggregations	`np.mean()`, `np.std()`	Normalizing data (Standard Scaling).
Slicing	`arr[:, 0]`	Extracting a single column (feature) from a dataset.

6. Slicing and Masking

NumPy allows for "Boolean Indexing," which is incredibly powerful for filtering data.

# Select all values in the array greater than 0.5
weights = np.array([0.1, 0.8, -0.2, 0.9])
positive_weights = weights[weights > 0] 
# Result: [0.1, 0.8, 0.9]

While NumPy handles the raw numbers, we need a way to manage data with column names, different data types, and missing values. For that, we turn to the most popular data manipulation library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1. Why NumPy? (Speed & Efficiency)

2. Array Anatomy and Shapes

3. Vectorization

4. Broadcasting: The Magic of NumPy

5. Critical ML Operations in NumPy

6. Slicing and Masking

Uh oh!

FilesExpand file tree

numpy.mdx

Latest commit

History

numpy.mdx

File metadata and controls

1. Why NumPy? (Speed & Efficiency)

2. Array Anatomy and Shapes

3. Vectorization

4. Broadcasting: The Magic of NumPy

5. Critical ML Operations in NumPy

6. Slicing and Masking