Skip to content

Add PBF file loading #1381

Closed
mvexel wants to merge 6 commits into
gboeing:mainfrom
mvexel:feat/osm-filters
Closed

Add PBF file loading #1381
mvexel wants to merge 6 commits into
gboeing:mainfrom
mvexel:feat/osm-filters

Conversation

@mvexel
Copy link
Copy Markdown

@mvexel mvexel commented May 17, 2026

This is a follow-up on #1338.

Checklist

  • I have read the contributing guidelines and have run the pre-commit hooks and tests locally.
  • I have edited the changelog to reflect my changes.
  • If this is an enhancement, I have opened a feature proposal issue to discuss first.

Related issues

PBF read support was proposed and partially implemented #1338. @mvexel (me) suggested a slightly different approach in the comments centralizing the filter logic. This PR is the result.

Proposed changes

This PR adds the ability to build an OSMnx graph from a local OSM .pbf file, without going through the Overpass API. This would enable reproducible workflows against a known OSM snapshot, and reliable support for larger geographical areas where Overpass is slow or unreliable.

Two related pieces of work:

First, osmnx.graph.graph_from_pbf (also exposed as osmnx.pbf.graph_from_pbf and ox.graph_from_pbf). The function signature mirrors graph_from_bbox as closely as possible: takes a filepath, optional bbox, and the same network_type / custom_filter / simplify / retain_all / truncate_by_edge parameters. When a bbox is supplied, it reuses graph_from_polygon's logic. Reading is handled by a new private osmnx._pbf module that uses osmium to do a two-pass scan (ways → referenced nodes), and produces a JSON dict that is the same as what the Overpass reader would return. This JSON is then fed into the existing internal _create_graph so all downstream graph-building behaviour is unchanged.

Second, centralized network filters in a new osmnx._osm_filters module. This is the main new abstraction that @gboeing flagged as the part of the original PR that would still need thinking / work. To avoid maintaining two copies of "drive" / "walk" / "bike" / etc., the filters are now expressed as _TagFilter / _WayFilter dataclasses with both .matches(tags) and .to_overpass() methods. _overpass._get_network_filter and _overpass._download_overpass_network delegate to this new module. The Overpass strings they emit are identical to the old hard-coded versions. A new parser turns OSMnx's custom-filter strings into the same objects so the PBF reader supports the same known custom_filter argument.

Notes:

  • I added osmium >= 4.0 as a new pbf optional extra. I did not discuss this yet but to me it makes sense to keep the module slim for users who won't be using PBF reading.
  • osmnx/pbf.py exists purely to let Sphinx render a dedicated section in the user reference.
  • Docs: I added a short paragraph to docs/source/getting-started.rst pointing users at graph_from_pbf and adding some caveats. docs/source/user-reference.rst has a new osmnx.pbf section.
  • I added mise.toml at the repo root that pins uv = "0.10" to match the existing pyproject.toml constraint. I use mise to manage python toolchains but if it's too opinionated, it's easily removed.

Tests:

  • Unit tests for the _osm_filters logic. Each _TagFilter operator including the "absent key matches negative filter" logic, the custom-filter parser's accept/reject behavior, all three branches of _get_way_filter.
  • Unit tests for _pbf's helpers using _Fake* osmium stand-ins so they run without the optional dep installed.
  • I bundled a new fixture (tests/input_data/aguascalientes.osm.pbf + bz2'd .osm converted from the PBF with osmium cat) used in test_pbf_bbox_parity, to prove end-to-end that graph_from_bbox and graph_from_pbf load the same area and match.

Usage example

import osmnx as ox

# whole file, default "all" network type.
G = ox.graph_from_pbf("aguascalientes.osm.pbf")

# Bbox + network type
G = ox.graph_from_pbf(
    "aguascalientes.osm.pbf",
    bbox=(-102.33, 21.86, -102.25, 21.92),  # (left, bottom, right, top)
    network_type="drive",
)

# Custom tag filter
G = ox.graph_from_pbf(
    "aguascalientes.osm.pbf",
    custom_filter='["highway"~"motorway|trunk"]',
)

print(f"{len(G):,} nodes, {len(G.edges):,} edges")

Benchmarks

There is not really anything to benchmark against since reading PBFs is new functionality. All the other new / changed code sits in front of the core network logic which is comppletely untouched. One thing I can think of but did not do is some memory profiling loading larger PBF files. Users should not expect to be able to load a Germany PBF file into osmnx but perhaps that needs to be made explicit.

LLM disclosure

I used Opus 4.7 medium effort in a few ways:

  • Write most of the tests
  • Help me think through the architecture of the new filter abstraction
  • Provide an initial template for the PR description
  • Resolve some upstream merge conflicts that came up
  • Help with numpydoc requirements - unfamiliar with numpydoc

I reviewed all code I did not personally write.

@mvexel
Copy link
Copy Markdown
Author

mvexel commented May 17, 2026

I picked the area for the new PBF test fixture at random to capture an urban area of large enough size to be representative but small enough to hopefully not bloat the repository.

SCR-20260517-mtmp SCR-20260517-mtkq

@mvexel mvexel closed this by deleting the head repository May 18, 2026
@gboeing
Copy link
Copy Markdown
Owner

gboeing commented May 18, 2026

@mvexel it looks like this was closed, which is ok. A couple quick thoughts: while I would welcome an enhancement to improve PBF file loading, I am avoiding merging this sort of LLM code at this point. It's overly complex and introduces a lot of technical debt into the project that I as maintainer become responsible for going forward. But I am open to a more parsimonious, legible, maintainable contribution. If you have any questions about a potential contribution's design or architecture to meet this goal, I'd be happy to discuss. Thanks for offering to contribute!

@mvexel
Copy link
Copy Markdown
Author

mvexel commented May 18, 2026

Thanks for taking a look! I gave quite a bit of thought to the filtering logic and how to avoid duplication when applying the same filters to the different input streams. I was hoping to be able to contain the PR more, but this was really the least invasive way I was able to accomplish it. But I see your point - the filter presets are too important to be handled by code that you're unable to maintain. As a fellow OSS maintainer I've dealt with very similar situations. Should have consulted with you along the way!

Perhaps it's better then to continue with your original approach, and see if it's worth bringing that to completion. If this has any priority for you I'm happy to contribute along those lines, otherwise good luck, your project has great value.

(Apologies for the confusion with the closed PR - I was cleaning up my Github account, because I am moving everything to Codeberg. I must have been a little overzealous deleting forks.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants