Skip to content

Improve schemas for non-total TypedDict#11

Merged
giograno merged 4 commits into
mainfrom
non-total-td
Dec 5, 2025
Merged

Improve schemas for non-total TypedDict#11
giograno merged 4 commits into
mainfrom
non-total-td

Conversation

@giograno
Copy link
Copy Markdown
Member

@giograno giograno commented Dec 3, 2025

TypedDicts in Python can be marked with total=None. This means that omitting a field still results in a valid structure.
It gets trickier when non-total TypedDicts have also nullable fields.

For instance, imagine the following structure:

class PyType(TypedDict, total=False):
    height: int | None
    weight: int | None

The resulting schema would be something like that:

{
  "type": "record",
  "name": "PyType",
  "fields": [
    {
      "name": "height",
      "type": [
        "long",
        "null"
      ]
    },
    {
      "name": "weight",
      "type": [
        "long",
        "null"
      ]
    }
  ]
}

Now, imagine the two following objects.

obj = PyType(height=180, weight=None)
obj2 = PyType(height=180)

The schema would fit both the Avro representation, but does not let us distinguish the case in which weight is explicitly set to None from the case in which it is not set at all.

With this PR, we extend the types of the fields of a non-total TypedDict with string. This is going to be useful for all clients to mark a field as explicitly missing. Eveyrhing would make more sense if you look at the mentioned PR using this feature.

Addresses PNX-545

@giograno giograno self-assigned this Dec 3, 2025
@giograno giograno marked this pull request as ready for review December 3, 2025 09:55
Copy link
Copy Markdown

@bentsku bentsku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, this looks good! Nice fix to a tricky solution, a None type hint can mean a lot for non-total typed dicts, so I think this a good way to solve that 👍

Comment thread tests/test_typed_dict.py
class PyType(TypedDict, total=False):
name: str
age: int | None
opt: Opt | None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth to add a str | None here just to verify what happens when we do Union[str, None, str]?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in c514ae0. Union can't have duplicates, so it would just be Union[str, | None]. If you actually try to create a type alias like this, you'd get a Optional[str] :)

Comment thread tests/test_typed_dict.py
"type": "string",
},
{"name": "age", "type": ["long", "null", "string"]},
{"name": "opt", "type": [{"namedString": "Opt", "type": "string"}, "null"]},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: so here the regular str is not added, so would that work if we add the special marker to a Opt enum type? seems like yes as your comment in the deduplication logic

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would. {"namedString": "Opt", "type": "string"} is totally equivalent to "type": "string" for Avro.

@giograno giograno merged commit 3a705c7 into main Dec 5, 2025
6 checks passed
@giograno giograno deleted the non-total-td branch December 5, 2025 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants