Skip to content

Add more metadata about pages #418

Description

@sergiomadd

The idea would be to extract more data from the page xml. I'm looking mainly for these fields

  • Wikipedia article id -> <id>39</id>
  • Revision id -> <revision><id>1338813175</id></revision>
  • Revision timestamp -> <revision><timestamp>2026-02-17T10:56:58Z</timestamp></revision>

I want to use them to check for when an article was last updated, and the article id for better indexing rather than using the titles. I don't mean to change the articleId to be the key of the pages table, just have it as an extra for other uses.

The idea would be to add them in parse_dump_xml, and in def add_page parameters, then to the table. Or maybe another way to implement it if touching that is too much, like a article metadata model.

I can look to make the pr myself if approved

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions