Conversation
| CSS_CLASS_TO_ELEMENT_TYPE_MAP: Dict[str, Type[ontology.OntologyElement]] = { | ||
| element_type().css_class_name: element_type | ||
| for element_type in ALL_ONTOLOGY_ELEMENT_TYPES | ||
| for tag in element_type().allowed_tags |
There was a problem hiding this comment.
how was this line of code working before?
There was a problem hiding this comment.
This was copy-paste form the mapping above (which is more specific), here we just did more loops, but the 'tag' is not used anywhere. After the change, the mapping is the same, just unnecessary loops are removed:
CSS_CLASS_TO_ELEMENT_TYPE_MAP_BEFORE: Dict[str, Type[ontology.OntologyElement]] = {
element_type().css_class_name: element_type
for element_type in ALL_ONTOLOGY_ELEMENT_TYPES
for tag in element_type().allowed_tags
}
CSS_CLASS_TO_ELEMENT_TYPE_MAP_AFTER: Dict[str, Type[ontology.OntologyElement]] = {
element_type().css_class_name: element_type for element_type in ALL_ONTOLOGY_ELEMENT_TYPES
}
print(CSS_CLASS_TO_ELEMENT_TYPE_MAP_BEFORE == CSS_CLASS_TO_ELEMENT_TYPE_MAP_AFTER)
# Output: TrueThis is because some CSS classes can have multiple HTML tags assigned eg.
class Heading(OntologyElement):
description: str = Field("Section headings (levels 1-6)", frozen=True)
elementType: ElementTypeEnum = Field(ElementTypeEnum.text, frozen=True)
allowed_tags: List[str] = Field(["h1", "h2", "h3", "h4", "h5", "h6"], frozen=True)In this case, we were creating a 6 times entry: "Heading": OntologyElement.Heading
|
Removed
Print For some reason, it was assigned in the unit test (even though I got None = None locally), possibly due to some version mismatch. I think it needs another debugging session and is not a blocker for this ticket (this change didn't touch detection origin) |
| if parent_id not in id_to_element_mapping: | ||
| raise ValueError(f"Parent element {parent_id} not found in id_to_element_mapping") |
There was a problem hiding this comment.
maybe add a unit test for the new exception?
There was a problem hiding this comment.
I tried, but I noticed that they make no sense. parent_id is always set before (if isn't it is generated) and ontology always returns some element so I just removed raising exception part
|
hold on while I bump our base image to fix the dockerfile cve |
In this pull request parent-child relationship for elements generated with v2 parser is based on actual element IDs instead of IDs baked somewhere in the HTML script. With some extra bug fixing it allowed for significantly simplifying json -> HTML script

In this pull request parent-child relationship for elements generated with v2 parser is based on actual element IDs instead of IDs baked somewhere in the HTML script.
With some extra bug fixing it allowed for significantly simplifying json -> HTML script