Skip to content

Commit bd33b73

Browse files
authored
Merge pull request #72 from SebKrantz/africadata
Africadata
2 parents 5a60d8e + 4ef4ba4 commit bd33b73

6 files changed

Lines changed: 221 additions & 10 deletions

File tree

.github/workflows/claude-code-review.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
11
name: Claude Code Review
22

33
on:
4-
pull_request:
5-
types: [opened, synchronize, ready_for_review, reopened]
4+
workflow_dispatch:
5+
# pull_request:
6+
# types: [opened, synchronize, ready_for_review, reopened]
67
# Optional: Only run on specific file changes
78
# paths:
89
# - "src/**/*.ts"

CLAUDE.md

Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
4+
5+
## Overview
6+
7+
`flownet` is an R package for transport modeling that implements network processing, route enumeration, and traffic assignment algorithms. The package is maintained by CPCS transport consultants and provides high-performance tools through a combination of R, fastverse packages, and custom C implementations.
8+
9+
## Development Commands
10+
11+
### Package Building and Testing
12+
13+
```r
14+
# Build and install the package
15+
devtools::install()
16+
17+
# Run tests
18+
devtools::test()
19+
20+
# Run specific test file
21+
testthat::test_file("tests/testthat/test-assignment.R")
22+
23+
# Check package (like R CMD check)
24+
devtools::check()
25+
26+
# Build documentation
27+
devtools::document()
28+
29+
# Build vignettes
30+
devtools::build_vignettes()
31+
```
32+
33+
### C Code Compilation
34+
35+
The package includes custom C implementations in `src/` for performance-critical operations:
36+
- `path_sized_logit.c` - Core path-sized logit algorithm
37+
- `utils.c` - Utility functions for path operations
38+
- `init.c` - Registration of C functions
39+
40+
After modifying C code:
41+
```r
42+
devtools::clean_dll() # Clean compiled objects
43+
devtools::load_all() # Recompile and reload
44+
```
45+
46+
### Testing Commands
47+
48+
```bash
49+
# Run R CMD check from terminal
50+
R CMD build . && R CMD check flownet_*.tar.gz
51+
52+
# Run tests with coverage
53+
Rscript -e "covr::package_coverage()"
54+
```
55+
56+
## Architecture
57+
58+
### Core Algorithms
59+
60+
The package centers on two traffic assignment methods:
61+
62+
1. **All-or-Nothing (AoN)**: Fast assignment that allocates all flow to shortest paths. Implementation uses batched shortest path computation grouped by origin nodes for efficiency.
63+
64+
2. **Path-Sized Logit (PSL)**: Sophisticated stochastic assignment considering multiple alternative routes with overlap correction. The algorithm:
65+
- Enumerates candidate routes using distance matrices and geographic filtering
66+
- Filters routes by detour factor (`detour.max`) and angle constraints (`angle.max`)
67+
- Computes path-size factors to penalize overlapping routes
68+
- Assigns flows probabilistically based on generalized costs
69+
70+
### Route Enumeration Strategy
71+
72+
The PSL method uses a two-stage approach to avoid computing implausible paths:
73+
74+
1. **Pre-selection**: Uses precomputed distance matrices to identify promising intermediate nodes based on total cost (origin→intermediate→destination)
75+
2. **Geographic filtering**: When `angle.max` is specified and coordinates are available, filters nodes using the triangle equation with geodesic distances
76+
3. **Path computation**: Only computes actual paths for pre-selected candidates, then filters duplicates
77+
78+
This strategy dramatically reduces computational cost compared to enumerating all possible paths.
79+
80+
### Network Processing Pipeline
81+
82+
Typical workflow for processing spatial networks:
83+
84+
1. **`linestrings_to_graph()`**: Convert sf LINESTRING geometries to graph data frames with node coordinates (FX, FY, TX, TY)
85+
2. **`create_undirected_graph()`**: Normalize edge directions and aggregate bidirectional links
86+
3. **`consolidate_graph()`**: Contract intermediate nodes (degree-2 nodes) recursively, merging edges while preserving important nodes
87+
4. **`simplify_network()`**: Further reduce network size using either:
88+
- Shortest-paths method: Keep only edges traversed by shortest paths between key nodes
89+
- Cluster method: Spatially cluster nodes using leaderCluster and contract graph
90+
91+
### Parallelization
92+
93+
The package uses `mirai` for asynchronous parallelism:
94+
- Work is split across OD-pairs and distributed to daemon processes
95+
- Each daemon processes a subset independently
96+
- Results are aggregated after collection
97+
- The `nthreads` parameter controls the number of parallel workers
98+
99+
### C Integration
100+
101+
Performance-critical operations are implemented in C and called via `.Call()`:
102+
- `C_compute_path_sized_logit`: Core PSL computation including overlap detection and probability calculation
103+
- `C_check_path_duplicates`: Detect paths with duplicate edges (invalid routes)
104+
- `C_assign_flows_to_paths`: Batch flow assignment for AoN method
105+
- `C_mark_edges_traversed`: Track edge usage for simplify_network
106+
- `C_set_vector_elt`: Efficient list element assignment
107+
108+
## Code Organization
109+
110+
### Main Source Files
111+
112+
- **`R/assignment.R`** (802 lines): Contains `run_assignment()` and both AoN and PSL core functions. The PSL implementation handles distance matrix chunking for large networks and coordinates with C functions for path overlap calculations.
113+
114+
- **`R/utils.R`** (1273 lines): Network processing functions including:
115+
- Graph conversion utilities (`linestrings_to_graph`, `nodes_from_graph`, etc.)
116+
- `consolidate_graph()`: Recursive node contraction with sophisticated degree tracking
117+
- `simplify_network()`: Two methods for network reduction
118+
- `melt_od_matrix()`: OD matrix format conversion
119+
120+
- **`R/data.R`**: Documentation for included datasets (Africa network, cities, trade flows)
121+
122+
### C Source Files
123+
124+
- **`src/path_sized_logit.c`**: Implements path-size factor computation and flow assignment
125+
- **`src/utils.c`**: Helper functions for path operations
126+
- **`src/init.c`**: C function registration for .Call interface
127+
128+
### Tests
129+
130+
Tests are organized by functionality in `tests/testthat/`:
131+
- `test-assignment.R`: Traffic assignment methods
132+
- `test-graph-utils.R`: Graph utility functions
133+
- `test-network-processing.R`: Network conversion and processing
134+
- `test-od-matrix.R`: OD matrix operations
135+
- `test-consolidation.R`: Graph consolidation
136+
137+
## Dependencies
138+
139+
### Core Dependencies
140+
- **collapse** (≥ 2.1.5): Fast data transformations, used extensively for grouping, aggregation, and memory-efficient operations
141+
- **igraph** (≥ 2.1.4): Shortest path algorithms via Dijkstra
142+
- **sf** (≥ 1.0.0): Spatial data handling for LINESTRING networks
143+
- **geodist** (≥ 0.1.1): Fast haversine distance calculations for geographic filtering
144+
- **leaderCluster** (≥ 1.5.0): Efficient spatial clustering for network simplification
145+
- **mirai** (≥ 2.5.2): Asynchronous parallelism
146+
- **kit** (≥ 0.0.21): Fast tabulation and vectorized operations
147+
148+
## Key Implementation Details
149+
150+
### Distance Matrix Strategy
151+
152+
The package uses adaptive distance matrix computation:
153+
- If network size ≤ `sqrt(dmat.max.size)`, precompute full distance matrix once
154+
- Otherwise, compute in chunks as needed during OD-pair iteration
155+
- Separate geodesic distance matrices are used for angle-based filtering
156+
157+
### Graph Representation
158+
159+
Graphs are represented as data frames with:
160+
- `from`, `to`: Node IDs (integers)
161+
- `FX`, `FY`, `TX`, `TY`: Node coordinates (for spatial operations)
162+
- `edge`: Edge identifier (optional, regenerated by many functions)
163+
- Cost/attribute columns (e.g., `duration`, `cost`, `distance`)
164+
165+
Internally, igraph is used for shortest path computation, but the primary data structure is a data frame for flexibility and integration with fastverse tools.
166+
167+
### Node Consolidation Algorithm
168+
169+
The `consolidate_graph()` function uses a sophisticated multi-pass approach:
170+
1. Drop loop edges, duplicates, and singleton edges (optional)
171+
2. Identify degree-2 nodes (or nodes with deg_from=1 and deg_to=1 for directed graphs)
172+
3. For undirected graphs, orient edges so intermediate nodes appear as "from" in one edge and "to" in another
173+
4. Merge edges through intermediate nodes, tracking groups via `gid` vector
174+
5. Aggregate edge attributes using collapse::collap()
175+
6. Repeat recursively if `recursive = "full"` until no more consolidation possible
176+
177+
The `by` parameter allows preserving mode/type boundaries by preventing consolidation across different link characteristics.
178+
179+
## Working with Spatial Data
180+
181+
The package integrates with sf for spatial operations:
182+
183+
```r
184+
# Typical pattern for mapping OD zones to network nodes
185+
nodes <- nodes_from_graph(graph, sf = TRUE)
186+
nearest_nodes <- nodes$node[st_nearest_feature(od_zones, nodes)]
187+
```
188+
189+
Coordinate columns (FX, FY, TX, TY) are preserved through most operations and can be used to convert back to sf LINESTRING objects with `linestrings_from_graph()`.
190+
191+
## Performance Considerations
192+
193+
- Use `method = "AoN"` for large networks when route alternatives are not needed (much faster)
194+
- Adjust `detour.max` and `angle.max` to control PSL computation time (lower values = faster)
195+
- Set `unique.cost = TRUE` to deduplicate routes with same total cost
196+
- Use `dmat.max.size` to control memory usage for large networks
197+
- Enable multithreading with `nthreads` for large OD matrices
198+
- Consider consolidating and simplifying networks before assignment to reduce computational burden
199+
200+
## Package Structure
201+
202+
This is a standard R package with:
203+
- DESCRIPTION: Package metadata and dependencies
204+
- NAMESPACE: Exported functions (managed by roxygen2)
205+
- R/: R source code
206+
- src/: C source code with compiled .so/.dll
207+
- man/: Documentation (auto-generated from roxygen2 comments)
208+
- tests/testthat/: Test suite
209+
- vignettes/: Package vignettes
210+
- data/: Included datasets (Africa network, cities, trade)
211+
212+
Use roxygen2 for documentation - add `#'` comments above functions and run `devtools::document()` to update NAMESPACE and man/ files.

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: flownet
22
Type: Package
33
Title: Transport Modeling: Network Processing, Route Enumeration, and Traffic Assignment
4-
Version: 0.2.1.9000
4+
Version: 0.2.2
55
Authors@R: c(person("Sebastian", "Krantz", email = "sebastian.krantz@graduateinstitute.ch", role = c("aut", "cre")),
66
person("Kamol", "Roy", role = "ctb"))
77
Description: High-performance tools for transport modeling - network processing, route

NAMESPACE

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,6 @@ importFrom(kit,fpmin)
8989
importFrom(kit,iif)
9090
importFrom(leaderCluster,leaderCluster)
9191
importFrom(mirai,daemons)
92-
importFrom(mirai,everywhere)
9392
importFrom(mirai,mirai_map)
9493
importFrom(progress,progress_bar)
9594
importFrom(sf,st_as_sf)

NEWS.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
1-
# flownet 0.2.1.9000
1+
# flownet 0.2.2
22

33
- Fixed issue in `consolidate_graph()` which used to modify columns (`from` and `to` in-place). Users in older versions are advised to input a `data.table::copy()` of the graph to retain it.
44

5+
- Fixes issue with multithreading for newer versions of *mirai* (or R). Thanks @kent37 (#69).
6+
57
# flownet 0.2.1
68

79
- `angle.max` constraint in `run_assignment()` is now two-sided (angle measured from origin and destination node against the straight line between them), rather than just one-sided (from origin). Also, the implementation is slightly more efficient.

R/assignment.R

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@
232232
#' @importFrom kit fpmin fpmax
233233
#' @importFrom igraph V graph_from_data_frame delete_vertex_attr igraph_options distances shortest_paths vcount ecount
234234
#' @importFrom geodist geodist_vec
235-
#' @importFrom mirai mirai_map daemons everywhere
235+
#' @importFrom mirai mirai_map daemons
236236
#' @importFrom progress progress_bar
237237
run_assignment <- function(graph_df, od_matrix_long,
238238
directed = FALSE,
@@ -661,13 +661,10 @@ run_assignment <- function(graph_df, od_matrix_long,
661661
if(!is.finite(nthreads) || nthreads <= 1L) {
662662
res$final_flows <- run_assignment_core(seq_len(N), verbose, TRUE)
663663
} else {
664-
envir <- environment()
665664
# Split OD matrix in equal parts
666665
ind <- sample.int(as.integer(nthreads), N, replace = TRUE)
667666
ind_list <- gsplit(g = if(is_aon) sort(ind) else ind) # Since AoN should reduce calls to shortest_paths()
668667
daemons(n = nthreads - 1L)
669-
# Pass current environment dynamically
670-
everywhere({}, envir)
671668
# Now run the map in the background
672669
res_other <- mirai_map(ind_list[-1L], run_assignment_core)
673670
# Runs the first instance in the current session
@@ -697,7 +694,7 @@ run_assignment <- function(graph_df, od_matrix_long,
697694
}
698695
}
699696
res$final_flows <- final_flows
700-
rm(res_other, envir, ind_list, final_flows)
697+
rm(res_other, ind_list, final_flows)
701698
}
702699

703700
if(anyNA(od_pairs)) {

0 commit comments

Comments
 (0)