Skip to content

parse_itut_bulletins.py - running into AssertionError #3

@tobiasfunke1

Description

@tobiasfunke1

Hi, thanks for your tools and collection of mnc and mcc data.

When running parse_itut_bulletins.py I get the following output:

$ ./parse_itut_bulletins.py -d -j -p
...
[+] downloaded PDF bulletin 1314 from year 2025 and converted to text
[+] downloaded PDF bulletin 1315 from year 2025 and converted to text
> error occured during MNC extraction: AssertionError()

I think because of different versions of pdftotext its output changed. For example in T-SP-OB.1162-2018-OAS-PDF-E.txt line 2259 there was added a space after *:

before:

*This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo

after:

* This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo

The regex pattern does not match anymore:

'(\*This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo)|'\

system infos

OS: Ubuntu 24.04.2 LTS
Python: 3.12.3
lxml: 5.4.0
pdftotext: 24.02.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions