Add PBF file loading #1381
Conversation
|
@mvexel it looks like this was closed, which is ok. A couple quick thoughts: while I would welcome an enhancement to improve PBF file loading, I am avoiding merging this sort of LLM code at this point. It's overly complex and introduces a lot of technical debt into the project that I as maintainer become responsible for going forward. But I am open to a more parsimonious, legible, maintainable contribution. If you have any questions about a potential contribution's design or architecture to meet this goal, I'd be happy to discuss. Thanks for offering to contribute! |
|
Thanks for taking a look! I gave quite a bit of thought to the filtering logic and how to avoid duplication when applying the same filters to the different input streams. I was hoping to be able to contain the PR more, but this was really the least invasive way I was able to accomplish it. But I see your point - the filter presets are too important to be handled by code that you're unable to maintain. As a fellow OSS maintainer I've dealt with very similar situations. Should have consulted with you along the way! Perhaps it's better then to continue with your original approach, and see if it's worth bringing that to completion. If this has any priority for you I'm happy to contribute along those lines, otherwise good luck, your project has great value. (Apologies for the confusion with the closed PR - I was cleaning up my Github account, because I am moving everything to Codeberg. I must have been a little overzealous deleting forks.) |


This is a follow-up on #1338.
Checklist
Related issues
PBF read support was proposed and partially implemented #1338. @mvexel (me) suggested a slightly different approach in the comments centralizing the filter logic. This PR is the result.
Proposed changes
This PR adds the ability to build an OSMnx graph from a local OSM
.pbffile, without going through the Overpass API. This would enable reproducible workflows against a known OSM snapshot, and reliable support for larger geographical areas where Overpass is slow or unreliable.Two related pieces of work:
First,
osmnx.graph.graph_from_pbf(also exposed asosmnx.pbf.graph_from_pbfandox.graph_from_pbf). The function signature mirrorsgraph_from_bboxas closely as possible: takes afilepath, optionalbbox, and the samenetwork_type/custom_filter/simplify/retain_all/truncate_by_edgeparameters. When abboxis supplied, it reusesgraph_from_polygon's logic. Reading is handled by a new privateosmnx._pbfmodule that usesosmiumto do a two-pass scan (ways → referenced nodes), and produces a JSON dict that is the same as what the Overpass reader would return. This JSON is then fed into the existing internal_create_graphso all downstream graph-building behaviour is unchanged.Second, centralized network filters in a new
osmnx._osm_filtersmodule. This is the main new abstraction that @gboeing flagged as the part of the original PR that would still need thinking / work. To avoid maintaining two copies of"drive"/"walk"/"bike"/ etc., the filters are now expressed as_TagFilter/_WayFilterdataclasses with both.matches(tags)and.to_overpass()methods._overpass._get_network_filterand_overpass._download_overpass_networkdelegate to this new module. The Overpass strings they emit are identical to the old hard-coded versions. A new parser turns OSMnx's custom-filter strings into the same objects so the PBF reader supports the same knowncustom_filterargument.Notes:
osmium >= 4.0as a newpbfoptional extra. I did not discuss this yet but to me it makes sense to keep the module slim for users who won't be using PBF reading.osmnx/pbf.pyexists purely to let Sphinx render a dedicated section in the user reference.docs/source/getting-started.rstpointing users atgraph_from_pbfand adding some caveats.docs/source/user-reference.rsthas a newosmnx.pbfsection.mise.tomlat the repo root that pinsuv = "0.10"to match the existingpyproject.tomlconstraint. I use mise to manage python toolchains but if it's too opinionated, it's easily removed.Tests:
_osm_filterslogic. Each_TagFilteroperator including the "absent key matches negative filter" logic, the custom-filter parser's accept/reject behavior, all three branches of_get_way_filter._pbf's helpers using_Fake*osmium stand-ins so they run without the optional dep installed.tests/input_data/aguascalientes.osm.pbf+ bz2'd.osmconverted from the PBF withosmium cat) used intest_pbf_bbox_parity, to prove end-to-end thatgraph_from_bboxandgraph_from_pbfload the same area and match.Usage example
Benchmarks
There is not really anything to benchmark against since reading PBFs is new functionality. All the other new / changed code sits in front of the core network logic which is comppletely untouched. One thing I can think of but did not do is some memory profiling loading larger PBF files. Users should not expect to be able to load a Germany PBF file into
osmnxbut perhaps that needs to be made explicit.LLM disclosure
I used Opus 4.7 medium effort in a few ways:
numpydocrequirements - unfamiliar withnumpydocI reviewed all code I did not personally write.