|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +`flownet` is an R package for transport modeling that implements network processing, route enumeration, and traffic assignment algorithms. The package is maintained by CPCS transport consultants and provides high-performance tools through a combination of R, fastverse packages, and custom C implementations. |
| 8 | + |
| 9 | +## Development Commands |
| 10 | + |
| 11 | +### Package Building and Testing |
| 12 | + |
| 13 | +```r |
| 14 | +# Build and install the package |
| 15 | +devtools::install() |
| 16 | + |
| 17 | +# Run tests |
| 18 | +devtools::test() |
| 19 | + |
| 20 | +# Run specific test file |
| 21 | +testthat::test_file("tests/testthat/test-assignment.R") |
| 22 | + |
| 23 | +# Check package (like R CMD check) |
| 24 | +devtools::check() |
| 25 | + |
| 26 | +# Build documentation |
| 27 | +devtools::document() |
| 28 | + |
| 29 | +# Build vignettes |
| 30 | +devtools::build_vignettes() |
| 31 | +``` |
| 32 | + |
| 33 | +### C Code Compilation |
| 34 | + |
| 35 | +The package includes custom C implementations in `src/` for performance-critical operations: |
| 36 | +- `path_sized_logit.c` - Core path-sized logit algorithm |
| 37 | +- `utils.c` - Utility functions for path operations |
| 38 | +- `init.c` - Registration of C functions |
| 39 | + |
| 40 | +After modifying C code: |
| 41 | +```r |
| 42 | +devtools::clean_dll() # Clean compiled objects |
| 43 | +devtools::load_all() # Recompile and reload |
| 44 | +``` |
| 45 | + |
| 46 | +### Testing Commands |
| 47 | + |
| 48 | +```bash |
| 49 | +# Run R CMD check from terminal |
| 50 | +R CMD build . && R CMD check flownet_*.tar.gz |
| 51 | + |
| 52 | +# Run tests with coverage |
| 53 | +Rscript -e "covr::package_coverage()" |
| 54 | +``` |
| 55 | + |
| 56 | +## Architecture |
| 57 | + |
| 58 | +### Core Algorithms |
| 59 | + |
| 60 | +The package centers on two traffic assignment methods: |
| 61 | + |
| 62 | +1. **All-or-Nothing (AoN)**: Fast assignment that allocates all flow to shortest paths. Implementation uses batched shortest path computation grouped by origin nodes for efficiency. |
| 63 | + |
| 64 | +2. **Path-Sized Logit (PSL)**: Sophisticated stochastic assignment considering multiple alternative routes with overlap correction. The algorithm: |
| 65 | + - Enumerates candidate routes using distance matrices and geographic filtering |
| 66 | + - Filters routes by detour factor (`detour.max`) and angle constraints (`angle.max`) |
| 67 | + - Computes path-size factors to penalize overlapping routes |
| 68 | + - Assigns flows probabilistically based on generalized costs |
| 69 | + |
| 70 | +### Route Enumeration Strategy |
| 71 | + |
| 72 | +The PSL method uses a two-stage approach to avoid computing implausible paths: |
| 73 | + |
| 74 | +1. **Pre-selection**: Uses precomputed distance matrices to identify promising intermediate nodes based on total cost (origin→intermediate→destination) |
| 75 | +2. **Geographic filtering**: When `angle.max` is specified and coordinates are available, filters nodes using the triangle equation with geodesic distances |
| 76 | +3. **Path computation**: Only computes actual paths for pre-selected candidates, then filters duplicates |
| 77 | + |
| 78 | +This strategy dramatically reduces computational cost compared to enumerating all possible paths. |
| 79 | + |
| 80 | +### Network Processing Pipeline |
| 81 | + |
| 82 | +Typical workflow for processing spatial networks: |
| 83 | + |
| 84 | +1. **`linestrings_to_graph()`**: Convert sf LINESTRING geometries to graph data frames with node coordinates (FX, FY, TX, TY) |
| 85 | +2. **`create_undirected_graph()`**: Normalize edge directions and aggregate bidirectional links |
| 86 | +3. **`consolidate_graph()`**: Contract intermediate nodes (degree-2 nodes) recursively, merging edges while preserving important nodes |
| 87 | +4. **`simplify_network()`**: Further reduce network size using either: |
| 88 | + - Shortest-paths method: Keep only edges traversed by shortest paths between key nodes |
| 89 | + - Cluster method: Spatially cluster nodes using leaderCluster and contract graph |
| 90 | + |
| 91 | +### Parallelization |
| 92 | + |
| 93 | +The package uses `mirai` for asynchronous parallelism: |
| 94 | +- Work is split across OD-pairs and distributed to daemon processes |
| 95 | +- Each daemon processes a subset independently |
| 96 | +- Results are aggregated after collection |
| 97 | +- The `nthreads` parameter controls the number of parallel workers |
| 98 | + |
| 99 | +### C Integration |
| 100 | + |
| 101 | +Performance-critical operations are implemented in C and called via `.Call()`: |
| 102 | +- `C_compute_path_sized_logit`: Core PSL computation including overlap detection and probability calculation |
| 103 | +- `C_check_path_duplicates`: Detect paths with duplicate edges (invalid routes) |
| 104 | +- `C_assign_flows_to_paths`: Batch flow assignment for AoN method |
| 105 | +- `C_mark_edges_traversed`: Track edge usage for simplify_network |
| 106 | +- `C_set_vector_elt`: Efficient list element assignment |
| 107 | + |
| 108 | +## Code Organization |
| 109 | + |
| 110 | +### Main Source Files |
| 111 | + |
| 112 | +- **`R/assignment.R`** (802 lines): Contains `run_assignment()` and both AoN and PSL core functions. The PSL implementation handles distance matrix chunking for large networks and coordinates with C functions for path overlap calculations. |
| 113 | + |
| 114 | +- **`R/utils.R`** (1273 lines): Network processing functions including: |
| 115 | + - Graph conversion utilities (`linestrings_to_graph`, `nodes_from_graph`, etc.) |
| 116 | + - `consolidate_graph()`: Recursive node contraction with sophisticated degree tracking |
| 117 | + - `simplify_network()`: Two methods for network reduction |
| 118 | + - `melt_od_matrix()`: OD matrix format conversion |
| 119 | + |
| 120 | +- **`R/data.R`**: Documentation for included datasets (Africa network, cities, trade flows) |
| 121 | + |
| 122 | +### C Source Files |
| 123 | + |
| 124 | +- **`src/path_sized_logit.c`**: Implements path-size factor computation and flow assignment |
| 125 | +- **`src/utils.c`**: Helper functions for path operations |
| 126 | +- **`src/init.c`**: C function registration for .Call interface |
| 127 | + |
| 128 | +### Tests |
| 129 | + |
| 130 | +Tests are organized by functionality in `tests/testthat/`: |
| 131 | +- `test-assignment.R`: Traffic assignment methods |
| 132 | +- `test-graph-utils.R`: Graph utility functions |
| 133 | +- `test-network-processing.R`: Network conversion and processing |
| 134 | +- `test-od-matrix.R`: OD matrix operations |
| 135 | +- `test-consolidation.R`: Graph consolidation |
| 136 | + |
| 137 | +## Dependencies |
| 138 | + |
| 139 | +### Core Dependencies |
| 140 | +- **collapse** (≥ 2.1.5): Fast data transformations, used extensively for grouping, aggregation, and memory-efficient operations |
| 141 | +- **igraph** (≥ 2.1.4): Shortest path algorithms via Dijkstra |
| 142 | +- **sf** (≥ 1.0.0): Spatial data handling for LINESTRING networks |
| 143 | +- **geodist** (≥ 0.1.1): Fast haversine distance calculations for geographic filtering |
| 144 | +- **leaderCluster** (≥ 1.5.0): Efficient spatial clustering for network simplification |
| 145 | +- **mirai** (≥ 2.5.2): Asynchronous parallelism |
| 146 | +- **kit** (≥ 0.0.21): Fast tabulation and vectorized operations |
| 147 | + |
| 148 | +## Key Implementation Details |
| 149 | + |
| 150 | +### Distance Matrix Strategy |
| 151 | + |
| 152 | +The package uses adaptive distance matrix computation: |
| 153 | +- If network size ≤ `sqrt(dmat.max.size)`, precompute full distance matrix once |
| 154 | +- Otherwise, compute in chunks as needed during OD-pair iteration |
| 155 | +- Separate geodesic distance matrices are used for angle-based filtering |
| 156 | + |
| 157 | +### Graph Representation |
| 158 | + |
| 159 | +Graphs are represented as data frames with: |
| 160 | +- `from`, `to`: Node IDs (integers) |
| 161 | +- `FX`, `FY`, `TX`, `TY`: Node coordinates (for spatial operations) |
| 162 | +- `edge`: Edge identifier (optional, regenerated by many functions) |
| 163 | +- Cost/attribute columns (e.g., `duration`, `cost`, `distance`) |
| 164 | + |
| 165 | +Internally, igraph is used for shortest path computation, but the primary data structure is a data frame for flexibility and integration with fastverse tools. |
| 166 | + |
| 167 | +### Node Consolidation Algorithm |
| 168 | + |
| 169 | +The `consolidate_graph()` function uses a sophisticated multi-pass approach: |
| 170 | +1. Drop loop edges, duplicates, and singleton edges (optional) |
| 171 | +2. Identify degree-2 nodes (or nodes with deg_from=1 and deg_to=1 for directed graphs) |
| 172 | +3. For undirected graphs, orient edges so intermediate nodes appear as "from" in one edge and "to" in another |
| 173 | +4. Merge edges through intermediate nodes, tracking groups via `gid` vector |
| 174 | +5. Aggregate edge attributes using collapse::collap() |
| 175 | +6. Repeat recursively if `recursive = "full"` until no more consolidation possible |
| 176 | + |
| 177 | +The `by` parameter allows preserving mode/type boundaries by preventing consolidation across different link characteristics. |
| 178 | + |
| 179 | +## Working with Spatial Data |
| 180 | + |
| 181 | +The package integrates with sf for spatial operations: |
| 182 | + |
| 183 | +```r |
| 184 | +# Typical pattern for mapping OD zones to network nodes |
| 185 | +nodes <- nodes_from_graph(graph, sf = TRUE) |
| 186 | +nearest_nodes <- nodes$node[st_nearest_feature(od_zones, nodes)] |
| 187 | +``` |
| 188 | + |
| 189 | +Coordinate columns (FX, FY, TX, TY) are preserved through most operations and can be used to convert back to sf LINESTRING objects with `linestrings_from_graph()`. |
| 190 | + |
| 191 | +## Performance Considerations |
| 192 | + |
| 193 | +- Use `method = "AoN"` for large networks when route alternatives are not needed (much faster) |
| 194 | +- Adjust `detour.max` and `angle.max` to control PSL computation time (lower values = faster) |
| 195 | +- Set `unique.cost = TRUE` to deduplicate routes with same total cost |
| 196 | +- Use `dmat.max.size` to control memory usage for large networks |
| 197 | +- Enable multithreading with `nthreads` for large OD matrices |
| 198 | +- Consider consolidating and simplifying networks before assignment to reduce computational burden |
| 199 | + |
| 200 | +## Package Structure |
| 201 | + |
| 202 | +This is a standard R package with: |
| 203 | +- DESCRIPTION: Package metadata and dependencies |
| 204 | +- NAMESPACE: Exported functions (managed by roxygen2) |
| 205 | +- R/: R source code |
| 206 | +- src/: C source code with compiled .so/.dll |
| 207 | +- man/: Documentation (auto-generated from roxygen2 comments) |
| 208 | +- tests/testthat/: Test suite |
| 209 | +- vignettes/: Package vignettes |
| 210 | +- data/: Included datasets (Africa network, cities, trade) |
| 211 | + |
| 212 | +Use roxygen2 for documentation - add `#'` comments above functions and run `devtools::document()` to update NAMESPACE and man/ files. |
0 commit comments