Add more metadata about pages

The idea would be to extract more data from the page xml. I'm looking mainly for these fields
- Wikipedia article id -> `<id>39</id>`
- Revision id -> `<revision><id>1338813175</id></revision>`
- Revision timestamp -> `<revision><timestamp>2026-02-17T10:56:58Z</timestamp></revision>`

I want to use them to check for when an article was last updated, and the article id for better indexing rather than using the titles. I don't mean to change the articleId to be the key of the pages table, just have it as an extra for other uses.

The idea would be to add them in `parse_dump_xml`, and in `def add_page` parameters, then to the table. Or maybe another way to implement it if touching that is too much, like a article metadata model.

I can look to make the pr myself if approved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more metadata about pages #418

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add more metadata about pages #418

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions