Conversation
There was a problem hiding this comment.
I just looked over the parts, where the behaviour of rdflib plays a role, because i dont fully understand what the code itself does. I hope my comments are a little helpful.
I have another more generell comment, as BNode is a subclass to str if you eg. check types with mypy you might get no error, when you by mistake compare a str to a BNode. But this would always fail. this might play a role, when you call hash_related_blank_node. But as i said, i have not a full grasp on the program.
| encoded = self._quote_encode(l_) | ||
|
|
||
| if l_.language: | ||
| if l_.datatype: |
There was a problem hiding this comment.
I would remove this test if l_.datatype: in rdflib, if language is given, the datatype should always be forced, so this test for datatype should always return the same. As far as i remember the correct datatype for lang tagged literals is rdf:langstring, so there might be in future a change to return rdfs:langstring instead of None.
There was a problem hiding this comment.
Agreed! But this is code copied (and cleaned) from rdflib's nquads serializer, so I'll mix it in a PR there.
@WhiteGobo do you mean |
No i did think more of finding type errors with tools like mypy. And i think of the over way aroung so something like This was merely a warning, when using bnode_map: "dict[str, str]" # this would be used, so that mypy can detect wrong inserts.
normalized, bnode_map = self._canonicalize(rdflib_dataset)
# mapping old bnode IDs to their new canonical IDs
for k, v in parser._bnode_ids.items():
bnode_id = str(v)
# Imagine, you were unconcentrated and you used by mistake the original bnode instead of the extracted str.
if v in bnode_map: # This line is errornous and therefore mypy should throw a warning
bnode_map[k] = bnode_map[bnode_id]
del bnode_map[bnode_id]For these kind of scenario, i use mypy to warn me, that i did something wrong, but mypy would register BNode as valid value, because, its just a subclass to |
|
ah ok, type checking, gotya. Adding type hints everywhere would generate too many changes (and be too much work). I think this for some other PR |
This PR is a rather big one. It fully implements URDNA2015, URDNA2012 and RDFC10 and has complete test-suite coverage on these algorithms (with the exception of the one on poisonous datasets, but more about that later). It 'fixes' #220
Some general remarks before I dive into the details:
canon.py. This made the code much much cleaner, but ironically didn't fix what I introduced it for: nquad serialization. The relevant methods are copied over from rdflib until I can turn them into PRs for rdflib.rdflib.Datasetand back. This should ensure backwards-compatibility on some methods such asjsonld.normalization()andURDNA2015.main().Overview of the changes:
rdflibdependency in all relevant config and code files.rdflibmostly changedutil.pyfrom_legacy_dataset()andto_legacy_dataset()to easily go from anrdflib.Datasetto an RDFJS-likedictwherever needed.tests/test_util.pyfor these functions. We might want to move more general-purpose methods to this file.canon.py:URDNA2015._canonicalize(self, dataset: Dataset)while handling input and output remained inURDNA2015.main(). The latter now does the nquads parsing instead ofJsonLdProcessor.normalize(), so all parsing and serialization is handled by the same class.URDNA2015._canonicalize(self, dataset: Dataset)accepts anrdflib.Datasetand returns a tuple withstranddict.URDNA2015 .main(self, dataset: str | dict | Dataset, options)now accepts ardflib.Datasetobject in addition to an nquadsstror the original RDFJS-likedict. It returnsstr: the serialized nquads result, ordict: the result as RDFJS-like dataset or the blank node identifier map when the new parameteroutputMapisTrue.URDNA2015.hash_algorithmso it easily configurable (required for RDFC1.0)permutations()function now usesitertools.permutationsinstead of a custom implementation._nq_rowand_quoteLiteral, which should eventually move to a fix for rdflib's nquads serializer.tests/runtests.pyIt's possible that we move this code out before ever merging, but at least it can be easily tested. That said, when importing this functionality as an external library, we include rdflib anyway. It therefore makes sense to continue replacing the RDFJS-like data structures elsewhere in the code, essentially making it fully rdflib compliant and dependent.
Since this involves RDFLib: @nicholascar, anyone who can maybe have a look at this? Also @davidlehn @BigBlueHat @anatoly-scherbakov