Hi, thanks for your tools and collection of mnc and mcc data.
When running parse_itut_bulletins.py I get the following output:
$ ./parse_itut_bulletins.py -d -j -p
...
[+] downloaded PDF bulletin 1314 from year 2025 and converted to text
[+] downloaded PDF bulletin 1315 from year 2025 and converted to text
> error occured during MNC extraction: AssertionError()
I think because of different versions of pdftotext its output changed. For example in T-SP-OB.1162-2018-OAS-PDF-E.txt line 2259 there was added a space after *:
before:
*This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo
after:
* This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo
The regex pattern does not match anymore:
|
'(\*This designation is without prejudice to positions on status, and is in line with UNSCR 1244 and the ICJ Opinion on the Kosovo)|'\ |
system infos
OS: Ubuntu 24.04.2 LTS
Python: 3.12.3
lxml: 5.4.0
pdftotext: 24.02.0
Hi, thanks for your tools and collection of mnc and mcc data.
When running
parse_itut_bulletins.pyI get the following output:I think because of different versions of
pdftotextits output changed. For example inT-SP-OB.1162-2018-OAS-PDF-E.txtline 2259 there was added a space after*:before:
after:
The regex pattern does not match anymore:
MCC_MNC/parse_itut_bulletins.py
Line 224 in a5613a2
system infos
OS:
Ubuntu 24.04.2 LTSPython:
3.12.3lxml:
5.4.0pdftotext:
24.02.0