Adds a GH workflow to validate XML syntax and well-formedness#5
Adds a GH workflow to validate XML syntax and well-formedness#5funkyfuture wants to merge 2 commits into
Conversation
|
this is the outcome of an example run: https://github.com/delb-xml/test-corpus-aed-tei/actions/runs/19209861146/job/54910465966 the forth step includes the emitted error messages. but due to the usage of |
| steps: | ||
| - run: sudo apt install libxml2-utils | ||
| - uses: actions/checkout@v5 | ||
| - run: find files -type f -name '*.xml' -exec xmllint --noout --pedantic {} \; |
There was a problem hiding this comment.
since when has this --pedantic flag been a thing and why is it not in the man page??
There was a problem hiding this comment.
my installation even has --strict-namespace. i use Arch btw.
|
how is the action on the container registry lol i didnt even know you could do that!? |
|
Ah, o.k., this failing check is because of the errors in the corpus, right? |
this is the how. and yeah that's the common thing w/ the GH docs, there's a lot of info in it, but cluttered and hard to find.
yes, it is. the best comprehension for an assessment is this summary. at least regarding the unencoded ampersands seem to be suited for a simple search & replace action, doesn't it? |
|
i did that search & replace; interestingly it affects exactly 500 files. also, note that the editor (VS code) removed some CR characters at line endings, leaving the LFs only. |


ciao Simon,
i'd like to use this text corpus within the integrations test suite for the delb library. unfortunately there's quiet a lot of errors in the corpus.
thus i've added a Github workflow to validate all documents.
due to the amount of errors i don't see that it'd be reasonable to fix the issues by hand.