Skip to content

Latest commit

 

History

History
109 lines (75 loc) · 3.49 KB

File metadata and controls

109 lines (75 loc) · 3.49 KB
title NumPy: Numerical Python
sidebar_label NumPy
description Mastering N-dimensional arrays, vectorization, and broadcasting: the foundational tools for numerical computing in ML.
tags
python
numpy
arrays
vectorization
broadcasting
mathematics-for-ml

If Python is the skeleton of Machine Learning, NumPy is the muscle. It is a library for scientific computing that introduces the ndarray (N-dimensional array), which is significantly faster and more memory-efficient than standard Python lists.

1. Why NumPy? (Speed & Efficiency)

Python lists are flexible but slow because they store pointers to objects scattered in memory. NumPy arrays store data in contiguous memory blocks, allowing the CPU to process them using SIMD (Single Instruction, Multiple Data).

graph LR
    List[Python List] --> L_Ptr["Scattered Memory (Slow)"]
    Array[NumPy Array] --> A_Cont["Contiguous Block (Fast)"]
    style Array fill:#e1f5fe,stroke:#01579b,color:#333

Loading

2. Array Anatomy and Shapes

In ML, we describe data by its Rank (number of dimensions) and Shape.

  • Scalar (Rank 0): A single number.
  • Vector (Rank 1): A line of numbers (e.g., a single sample's features).
  • Matrix (Rank 2): A table of numbers (e.g., a whole dataset).
  • Tensor (Rank 3+): Higher dimensional arrays (e.g., a batch of color images).
import numpy as np

# Creating a 2D Matrix
data = np.array([[1, 2, 3], [4, 5, 6]])
print(data.shape)  # Output: (2, 3) -> 2 rows, 3 columns

3. Vectorization

Vectorization is the practice of replacing explicit for loops with array expressions. This is how we achieve high performance in Python.

Instead of this:

# Slow: Element-wise addition with a loop
result = []
for i in range(len(a)):
    result.append(a[i] + b[i])

Do this:

# Fast: NumPy handles the loop in C
result = a + b

4. Broadcasting: The Magic of NumPy

Broadcasting allows NumPy to perform arithmetic operations on arrays with different shapes, provided they meet certain compatibility rules.

graph TD
    A["Matrix: (3, 3)"] 
    B["Scalar: (1,)"]
    A -->|Add| B
    B -->|Broadcast| B_Stretch["Stretched to (3, 3)"]
    B_Stretch --> Result["Element-wise Sum"]

Loading

Example: Adding a constant bias to every row in a dataset.

features = np.array([[10, 20], [30, 40]]) # Shape (2, 2)
bias = np.array([5, 5])                  # Shape (2,)
result = features + bias                 # [[15, 25], [35, 45]]

5. Critical ML Operations in NumPy

Operation NumPy Function ML Use Case
Dot Product np.dot(a, b) Calculating weighted sums in a neuron.
Reshaping arr.reshape(1, -1) Changing an image from 2D to a 1D feature vector.
Transposing arr.T Aligning dimensions for matrix multiplication.
Aggregations np.mean(), np.std() Normalizing data (Standard Scaling).
Slicing arr[:, 0] Extracting a single column (feature) from a dataset.

6. Slicing and Masking

NumPy allows for "Boolean Indexing," which is incredibly powerful for filtering data.

# Select all values in the array greater than 0.5
weights = np.array([0.1, 0.8, -0.2, 0.9])
positive_weights = weights[weights > 0] 
# Result: [0.1, 0.8, 0.9]

While NumPy handles the raw numbers, we need a way to manage data with column names, different data types, and missing values. For that, we turn to the most popular data manipulation library.