Skip to content

Commit d9e692a

Browse files
committed
update docs
1 parent d714de1 commit d9e692a

3 files changed

Lines changed: 208 additions & 74 deletions

File tree

README.md

Lines changed: 99 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,8 @@ StructureFunctions.jl computes structure functions (SFs) from scattered data, ch
2525
## Features
2626

2727
- **Structure Functions**: 1st, 2nd, 3rd order; longitudinal & transverse projections in 1D, 2D, 3D
28+
- **In-place Mutating API**: Pre-allocated mutating functions (`calculate_structure_function!`) for zero-allocation loops (O(n_threads) multi-threaded chunked allocations)
29+
- **2D Joint-Probability Binning**: Natively accumulates both exact sums and contribution counts across distance and structure function value increment bins (`StructureFunction2D`)
2830
- **Typed Backend System**: Serial, Threaded, Distributed, GPU, Auto — choose your parallelization strategy
2931
- **Type-Stable Dispatch**: No runtime overhead from symbolic dispatch; all paths validated with JET
3032
- **Extensible Architecture**: Optional extensions for parallelization and GPU acceleration
@@ -60,6 +62,31 @@ if nthreads() > 1
6062
end
6163
```
6264

65+
### Pre-allocated In-place Calculation
66+
67+
For high-performance loops (e.g. over timesteps), you can pre-allocate memory buffers and run mutating calculations with zero heap allocation:
68+
69+
```julia
70+
using StructureFunctions: Calculations as SFC, StructureFunctionTypes as SFT
71+
72+
x = ([0.0, 1.0, 2.0], [0.0, 0.0, 0.0])
73+
u = ([1.0, 1.1, 1.2], [0.0, 0.05, 0.1])
74+
bins = [(0.0, 1.0), (1.0, 2.0), (2.0, 3.0)]
75+
sf_type = SFT.L2SFType()
76+
77+
# Pre-allocate output arrays
78+
n_bins = length(bins)
79+
sums = zeros(Float64, n_bins)
80+
counts = zeros(Float64, n_bins)
81+
82+
# Compute in-place (accumulates into provided buffers)
83+
SFC.calculate_structure_function!(sums, counts, sf_type, x, u, bins; backend=SFC.ThreadedBackend())
84+
85+
# Obtain structure function values via division
86+
sf_values = sums ./ counts
87+
```
88+
89+
6390
## Architecture
6491

6592
### Operator Types ✕ Result Container Pattern
@@ -192,7 +219,9 @@ result = SFC.calculate_structure_function(sf_type, x, u, bins;
192219

193220
## API Reference
194221

195-
### Main Entry Point
222+
### Main Entry Points
223+
224+
**1. Standard Allocating API:**
196225

197226
```julia
198227
calculate_structure_function(sf_type::AbstractStructureFunctionType,
@@ -206,35 +235,73 @@ calculate_structure_function(sf_type::AbstractStructureFunctionType,
206235
show_progress=true) StructureFunction
207236
```
208237

209-
**Arguments**:
210-
- `sf_type`: Operator instance (e.g., `LongitudinalSecondOrderStructureFunctionType()`)
211-
- `x`: Position data (Tuple of 1D vectors OR N×M matrix for N dimensions, M points)
212-
- `u`: Velocity/field data (same shape as `x`)
213-
- `distance_bins`: Vector of `(r_min, r_max)` tuples defining bins
238+
**2. 2D Joint-Probability Allocating API:**
239+
240+
```julia
241+
calculate_structure_function(sf_type::AbstractStructureFunctionType,
242+
x::Union{Tuple, Matrix},
243+
u::Union{Tuple, Matrix},
244+
distance_bins::AbstractVector{<:Tuple},
245+
value_bins::AbstractVector;
246+
backend=SerialBackend(),
247+
distance_metric=Euclidean(),
248+
verbose=true,
249+
show_progress=true) StructureFunction2D
250+
```
251+
252+
**3. In-place Mutating API (Zero-Allocation):**
214253

215-
**Returns**: `StructureFunction` result container
254+
```julia
255+
calculate_structure_function!(sums::AbstractVector,
256+
counts::AbstractVector,
257+
sf_type::AbstractStructureFunctionType,
258+
x::Union{Tuple, Matrix},
259+
u::Union{Tuple, Matrix},
260+
distance_bins::AbstractVector;
261+
backend=SerialBackend(),
262+
distance_metric=Euclidean(),
263+
verbose=true,
264+
show_progress=true) Nothing
265+
```
266+
267+
**4. 2D Joint-Probability Mutating API (Zero-Allocation):**
268+
269+
```julia
270+
calculate_structure_function!(sums_2d::AbstractMatrix,
271+
counts_2d::AbstractMatrix,
272+
sf_type::AbstractStructureFunctionType,
273+
x::Union{Tuple, Matrix},
274+
u::Union{Tuple, Matrix},
275+
distance_bins::AbstractVector,
276+
value_bins::AbstractVector;
277+
backend=SerialBackend(),
278+
distance_metric=Euclidean(),
279+
verbose=true,
280+
show_progress=true) Nothing
281+
```
216282

217-
**See also**: `serial_calculate_structure_function`, `parallel_calculate_structure_function`, `gpu_calculate_structure_function`
283+
*Note: The mutating APIs accumulate (`+=` and `.+=`) directly into the provided output buffers. The caller is responsible for pre-zeroing the arrays.*
218284

219285
### Operator Types
220286

221-
All inherit from `AbstractStructureFunctionType`. Instantiate with `()`:
287+
All inherit from `AbstractStructureFunctionType`. Instantiate with `()` or use shorthands:
222288

223289
```julia
224290
SFT.LongitudinalSecondOrderStructureFunctionType() # 2nd order, longitudinal
225291
SFT.TransverseSecondOrderStructureFunctionType() # 2nd order, transverse
226292
SFT.LongitudinalThirdOrderStructureFunctionType() # 3rd order, longitudinal
227-
SFT.TransverseThirdOrderStructureFunctionType() # 3rd order, transverse
228-
# ... and other variants (see docs/theory.md)
293+
# ... shorthands: L2SFType, T2SFType, L3SFType, T3SFType, S2SFType, S3SFType
229294
```
230295

231296
Each operator is **callable** (functors):
232297
```julia
233-
sf_op = SFT.LongitudinalSecondOrderStructureFunctionType()
234-
sf_op(du, rhat) # Equivalent to: calculate_structure_function(sf_op, ...)
298+
sf_op = SFT.L2SFType()
299+
sf_op(du, rhat) # Computes L2SF increment value
235300
```
236301

237-
### Result Container
302+
### Result Containers
303+
304+
**1. 1D Structure Function Container (`StructureFunction`):**
238305

239306
```julia
240307
struct StructureFunction{FT, OT, BT, VT} <: AbstractStructureFunction
@@ -245,12 +312,27 @@ struct StructureFunction{FT, OT, BT, VT} <: AbstractStructureFunction
245312
end
246313
```
247314

315+
**2. 2D Joint-Probability Container (`StructureFunction2D`):**
316+
317+
```julia
318+
struct StructureFunction2D{FT, OT, BT, VT, MT} <: AbstractStructureFunction
319+
operator::OT # AbstractStructureFunctionType
320+
distance_bins::BT # AbstractVector of (r_min, r_max)
321+
value_bins::VT # AbstractVector of value bin edges
322+
sums::MT # AbstractMatrix{FT} (distance x value)
323+
counts::MT # AbstractMatrix{FT} (distance x value)
324+
end
325+
```
326+
248327
**Access results**:
249328
```julia
250-
result.values # SF values, one per bin
329+
# 1D
330+
result.values # SF values, one per bin
251331
result.distance_bins # Original input bins
252-
result.operator # The SF operator used
253-
result.order # Order of the SF
332+
333+
# 2D
334+
result_2d.sums # Sum of SF values in each 2D cell
335+
result_2d.counts # Count of point pairs in each 2D cell
254336
```
255337

256338
## Theory & References

docs/architecture.md

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -88,18 +88,42 @@ Each operator stores:
8888
- **Order** (n=2, 3, 4, ...) — which structure function order
8989
- **Projection** (if applicable) — which component to analyze
9090

91-
### Result Container
91+
### Result Containers
9292

93+
StructureFunctions.jl decouples raw accumulation, processed 1D structure functions, and 2D joint-probability binning into separate parametric result types inheriting from `AbstractStructureFunction`:
94+
95+
1. **`StructureFunction`**: Stores the final processed structure function values.
96+
```julia
97+
struct StructureFunction{FT, OT, BT, VT} <: AbstractStructureFunction
98+
operator::OT # AbstractStructureFunctionType
99+
distance_bins::BT # AbstractVector of (r_min, r_max)
100+
values::VT # AbstractVector{FT} — computed SF
101+
order::Int # 1, 2, 3, ...
102+
end
103+
```
104+
105+
2. **`StructureFunctionSumsAndCounts`**: Stores exact computed sums and point counts per bin. Ideal for distributed or chunked temporal aggregation.
106+
```julia
107+
struct StructureFunctionSumsAndCounts{FT, OT, BT, VT} <: AbstractStructureFunction
108+
operator::OT
109+
distance_bins::BT
110+
sums::VT # Exact computed SF value sums
111+
counts::VT # Integer counts of contributing pairs
112+
end
113+
```
114+
115+
3. **`StructureFunction2D`**: Stores the 2D joint-probability binning grid (separation distance $r$ vs. SF value $v$).
93116
```julia
94-
struct StructureFunction{T}
95-
distance::Vector{T} # Bin centers
96-
structure_function::Matrix{T} # S(distance, order)
97-
sums::Matrix{T} # Numerator sums
98-
counts::Vector{Int64} # Counts per bin
117+
struct StructureFunction2D{FT, OT, BT, VT, MT} <: AbstractStructureFunction
118+
operator::OT
119+
distance_bins::BT
120+
value_bins::VT # Value increment bin edges
121+
sums::MT # 2D matrix of exact sums (distance x value)
122+
counts::MT # 2D matrix of contribution counts
99123
end
100124
```
101125

102-
Stores **both raw and processed data** so users can customize post-processing.
126+
All result containers support basic `Base` algebraic operations (like `+` and `+=`) to allow seamless aggregation across distributed processes or temporal timesteps.
103127

104128
---
105129

docs/backends.md

Lines changed: 78 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -40,40 +40,61 @@ Single-threaded, reference implementation. All computations run on the calling t
4040
- ❌ Large data (>10M points): Too slow
4141
- ❌ Multi-CPU available: Wastes resources
4242

43-
### Example
43+
### Examples
44+
45+
**1. Allocating API:**
4446

4547
```julia
46-
using StructureFunctions
48+
using StructureFunctions: Calculations as SFC, StructureFunctionTypes as SFT
4749

4850
# Small test dataset
49-
x = randn(1000, 2) # 1000 points in 2D
50-
u = randn(1000, 2) # velocity at each point
51+
x = (randn(1000), randn(1000))
52+
u = (randn(1000), randn(1000))
53+
bins = [(0.0, 1.0), (1.0, 2.0), (2.0, 3.0)]
5154

52-
# Use SerialBackend explicitly
53-
backend = SerialBackend()
54-
bins = 10:10:100 # 10 distance bins
55-
56-
result = calculate_structure_function(
57-
FullVectorStructureFunction{Float64}(order=2),
55+
# Calculate using SerialBackend explicitly
56+
result = SFC.calculate_structure_function(
57+
SFT.S2SFType(),
5858
x, u, bins;
59-
backend=backend,
59+
backend=SFC.SerialBackend(),
6060
show_progress=true
6161
)
6262

63-
println("Structure Function at bin 50: $(result.structure_function[50, 1])")
63+
println("Structure Function values: ", result.values)
64+
```
65+
66+
**2. Pre-allocated In-place API:**
67+
68+
```julia
69+
using StructureFunctions: Calculations as SFC, StructureFunctionTypes as SFT
70+
71+
x = (randn(1000), randn(1000))
72+
u = (randn(1000), randn(1000))
73+
bins = [(0.0, 1.0), (1.0, 2.0), (2.0, 3.0)]
74+
75+
# Pre-allocate output arrays
76+
n_bins = length(bins)
77+
sums = zeros(Float64, n_bins)
78+
counts = zeros(Float64, n_bins)
79+
80+
# Compute in-place (accumulates directly into provided arrays)
81+
SFC.calculate_structure_function!(
82+
sums, counts, SFT.S2SFType(),
83+
x, u, bins;
84+
backend=SFC.SerialBackend()
85+
)
6486
```
6587

6688
### Performance Notes
6789
- O(N²) complexity; for N=1M, expect ~1 sec
68-
- Light memory footprint (just result container + temporary arrays)
69-
- Good for validation before scaling up
90+
- Mutating `calculate_structure_function!` completely avoids allocating temporary arrays, making it ideal for temporal loops.
7091

7192
---
7293

7394
## ThreadedBackend
7495

7596
### Definition
76-
Multi-threaded execution using OhMyThreads.jl. Distributes pairwise calculations across Threads.nthreads() worker threads.
97+
Multi-threaded execution using OhMyThreads.jl. Distributes pairwise calculations across `Threads.nthreads()` worker threads.
7798

7899
### When to Use
79100
-**Medium datasets**: 10M–500M points
@@ -94,56 +115,63 @@ Multi-threaded execution using OhMyThreads.jl. Distributes pairwise calculations
94115
OhMyThreads = "67456a42-ebe4-4781-8ad1-67f7eda8d8f7"
95116
```
96117

97-
### Example
118+
### Examples
119+
120+
**1. Allocating API:**
98121

99122
```julia
100-
using StructureFunctions
101-
using Base.Threads
123+
using StructureFunctions: Calculations as SFC, StructureFunctionTypes as SFT
102124

103-
# Set number of threads before running
104-
# Either: JULIA_NUM_THREADS=8 julia script.jl
105-
# Or in REPL: Threads.nthreads() -> check current count
125+
N = 50_000
126+
x = (randn(N), randn(N))
127+
u = (randn(N), randn(N))
128+
bins = [(0.0, 1.0), (1.0, 2.0), (2.0, 3.0)]
106129

107-
# Medium dataset
108-
N = 50_000_000 # 50M points
109-
x = randn(N, 2)
110-
u = randn(N, 2)
130+
result = SFC.calculate_structure_function(
131+
SFT.L2SFType(),
132+
x, u, bins;
133+
backend=SFC.ThreadedBackend(),
134+
show_progress=true
135+
)
136+
```
111137

112-
backend = ThreadedBackend()
113-
bins = 10:10:1000 # 100 distance bins
138+
**2. Pre-allocated In-place API:**
114139

115-
result = calculate_structure_function(
116-
FullVectorStructureFunction{Float64}(order=2),
140+
```julia
141+
using StructureFunctions: Calculations as SFC, StructureFunctionTypes as SFT
142+
143+
N = 50_000
144+
x = (randn(N), randn(N))
145+
u = (randn(N), randn(N))
146+
bins = [(0.0, 1.0), (1.0, 2.0), (2.0, 3.0)]
147+
148+
# Pre-allocate output arrays
149+
n_bins = length(bins)
150+
sums = zeros(Float64, n_bins)
151+
counts = zeros(Float64, n_bins)
152+
153+
# Compute in-place (accumulates directly into provided arrays)
154+
SFC.calculate_structure_function!(
155+
sums, counts, SFT.L2SFType(),
117156
x, u, bins;
118-
backend=backend,
119-
show_progress=true # Progress bar shows thread work distribution
157+
backend=SFC.ThreadedBackend()
120158
)
121-
122-
# For 8 threads, expect ~2-8x speedup over serial
123159
```
124160

125-
### Performance Characteristics
161+
### Performance & Memory Efficiency
126162

127-
**Scaling** (measured on 4-core system):
163+
The modern mutating threaded backend (`threaded_calculate_structure_function!`) utilizes a **chunked reduction** strategy via `OhMyThreads.chunks` to divide point indexes into exactly `nthreads()` sub-ranges.
128164

129-
| N | Serial (s) | Threaded (s) | Speedup |
130-
|---|-----------|------------|---------|
131-
| 1M | 0.05 | 0.08 | 0.6x (overhead) |
132-
| 10M | 0.6 | 0.25 | 2.4x |
133-
| 50M | 3.5 | 1.2 | 2.9x |
134-
| 100M | 8 | 2.3 | 3.5x |
135-
136-
**Notes**:
137-
- Speedup is sublinear (not 4x on 4 cores) due to NUMA effects and atomic reductions
138-
- Optimal for scenarios where data fits in L3 cache per thread
139-
- Progress bar updates in real-time showing all threads' work
165+
* **Chunked Workspaces**: Each task/thread allocates exactly **one local buffer pair** for its entire chunk (rather than per-point).
166+
* **Memory Scaling**: This reduces the number of thread-local heap allocations to exactly **$O(n_{\text{threads}})$**, compared to the highly wasteful **$O(N_{\text{points}})$** allocation pattern in naive map-reduce implementations.
167+
* **Cache Locality**: This optimization maximizes L1/L2 cache locality while maintaining complete thread safety and task-migration protection.
140168

141169
### Thread Safety
142170

143-
ThreadedBackend uses **thread-local buffers** to avoid race conditions:
144-
- Each thread has its own workspace
145-
- No atomic operations (faster than distributed)
146-
- Completely safe; no possibility of data races
171+
ThreadedBackend uses **thread-local reduction buffers** to avoid race conditions:
172+
- Each task computes on its own local chunk workspace.
173+
- The results are folded together thread-safely using a parallel tree reduction.
174+
- No global locks or atomic conflicts are triggered, maximizing performance.
147175

148176
---
149177

0 commit comments

Comments
 (0)