Skip to content

Hygienic mypy plugin Phase 0: route direct Django access through DjangoContext#3376

Draft
federicobond wants to merge 1 commit into
typeddjango:masterfrom
federicobond:consolidate-django-access-via-context
Draft

Hygienic mypy plugin Phase 0: route direct Django access through DjangoContext#3376
federicobond wants to merge 1 commit into
typeddjango:masterfrom
federicobond:consolidate-django-access-via-context

Conversation

@federicobond
Copy link
Copy Markdown
Contributor

First step toward a hygienic mypy plugin, as discussed in #1384. Every direct access to _meta, apps_registry, and settings outside DjangoContext is collapsed into wrapper methods on the context. Return types still expose live Django objects; the goal is a single chokepoint we can iterate on without touching every transformer again.

What I'd like review on

Don't focus too much on the method implementations for now. I'd like to validate the general direction first, and make sure we're happy with these method signatures at least as a transitional API while we figure out the next step.

If this is too big to safely merge in one go, I can split it up by method group, but I think the overall direction is easier to review when seen all together.

Related issues

Refs #1384

AI Policy

  • I have read and agree to the AI Policy, removed any "Co-Authored-By" lines attributing coding agents, and manually reviewed the final result

Add wrapper methods on DjangoContext that absorb every direct `_meta`,
`apps_registry`, and `settings` access scattered across `main.py`,
`lib/helpers.py`, and the transformers.

# ---- Manager wrappers ----

def iter_managers(self, model_cls: type[Model]) -> Iterator[tuple[str, models.Manager[Any]]]:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Design question: is it fine for a separate mypy plugin to return models.Manager? Can we theoretically construct a manager with importing Django and app models?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely not, but the intent of this phase is to make those leaks more visible via the single DjangoContext chokepoint, so we can later refactor the API to avoid returning live Django objects.

Copy link
Copy Markdown
Contributor

@UnknownPlatypus UnknownPlatypus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this is a great idea but claude did some weird stuff I think (or maybe you, idk these days, but it looks a lot like claude in some places)

Comment on lines +157 to +158
def is_model_abstract(self, model_cls: type[Model]) -> bool:
return model_cls._meta.abstract
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of these are static (they don't use self) maybe add @staticmethod to these ?

You can enable the ruff rule plr6301 to catch them

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's please model our APIs without @staticmethod :)
This way we have more flexibility to use self, when we would need it. Moreover, we don't want to provide DjangoContext.is_model_abstract(model) API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @sobolevn here, the fact that they don't use self is just accidental here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they will never need self, they don't need coupling with django_context.
I'm not sure I really see the advantage versus regular helper functions, or is it that you explicitely want to make them less easy to use in other contexts ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not if all they do is access _meta on the model class, but if we move forward with the hygienic plugin roadmap as I see it, they will probably read from an internal cache or maybe even rpc to a separate process. They will also not take a type[Model] but a pure mypy type but that's a refactor for later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point of this phase is to surface what queries we are asking from the Django runtime and consolidating them in a single place so we can design a better abstraction.

return model_cls._meta.auto_field

def get_pk_field_or_none(self, model_cls: type[Model]) -> Field[Any, Any] | None:
# Non-raising parallel of `get_primary_key_field`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct, get_primary_key_field is trying to the same thing in a different way and the raise ValueError seem to be to satisfy the typechecker mostly. Otherwise it's a crash, we don't seem to catch the possible valueerror anywhere.

The typing also suggest this can only return Field, never None

class Options(Generic[_M]):
    pk: Field

We should probably investigate django runtime for this and remove one of the two helpers

_, final_field, _, remainder = Query(model_cls).names_to_path(parts, model_cls._meta)
return final_field, remainder

def iter_select_related_choices(self, model_cls: type[Model]) -> Iterator[str]:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use it in _get_select_related_field_choices, does it really make sense to have this intermediary version ?

What about just having a DjagnoContext.get_select_related_choices hook that replaces both ?

Comment on lines +609 to +610
if django_context.has_setting(member_name):
return django_context.get_setting(member_name) # type: ignore[no-any-return]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if django_context.has_setting(member_name):
return django_context.get_setting(member_name) # type: ignore[no-any-return]
if (setting_value := django_context.get_setting(member_name)):
return setting_value # type: ignore[no-any-return]

and f"{expr.expr.fullname}.{expr.name}" == fullnames.AUTH_USER_MODEL_FULLNAME
):
lazy_reference = django_context.settings.AUTH_USER_MODEL
lazy_reference = django_context.auth_user_model_label
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels weird to me to stuff everything on django_context, django_context.settings.AUTH_USER_MODEL is a lot more clear to me about what we are fetching than this random attribute

Comment on lines +135 to +147
def get_setting(self, name: str, default: Any = None) -> Any:
return getattr(self._settings, name, default)

def has_setting(self, name: str) -> bool:
return hasattr(self._settings, name)

@cached_property
def auth_user_model_label(self) -> str:
return self._settings.AUTH_USER_MODEL

@cached_property
def default_auto_field_path(self) -> str:
return self._settings.DEFAULT_AUTO_FIELD
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is odd:

I think we should remove the cached_property, looks like premature optimization.
If settings lookup are expensive, then we should be doing caching at the get_setting layer, and if they are not, we should not do caching


# ---- Single-model meta wrappers ----

def get_field_on_model(self, model_cls: type[Model], field_name: str) -> Field[Any, Any] | ForeignObjectRel | None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naming is not super consistant, get_field_on_model but get_auto_field get_pk_field later.

Again this is a bit confusing on usage, maybe use somthing like:

  • get_model_field
  • get_model_auto_field
  • get_model_pk_field
  • ...

Comment on lines +186 to +191
def iter_reverse_relations(self, model_cls: type[Model]) -> Iterator[ForeignObjectRel]:
# Public-API equivalent of `_meta.related_objects` (private API).
# Matches `include_hidden=True` to preserve behaviour for hidden M2M etc.
for field in model_cls._meta.get_fields(include_hidden=True):
if isinstance(field, ForeignObjectRel):
yield field
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation changed, is it expected ?
I think it's better to use an internal api than duplicating one that will drift silently.

Comment on lines 975 to +991
def _try_get_field(
ctx: MethodContext, model_cls: type[Model], field_name: str, *, resolve_pk: bool = False
ctx: MethodContext,
django_context: DjangoContext,
model_cls: type[Model],
field_name: str,
*,
resolve_pk: bool = False,
) -> _AnyField | None:
opts = model_cls._meta
resolved_name = opts.pk.name if resolve_pk and field_name == "pk" else field_name
try:
return opts.get_field(resolved_name)
except FieldDoesNotExist as e:
ctx.api.fail(str(e), ctx.context)
return None
if resolve_pk and field_name == "pk":
pk = django_context.get_pk_field_or_none(model_cls)
resolved_name = pk.name if pk is not None else field_name
else:
resolved_name = field_name
field = django_context.get_field_on_model(model_cls, resolved_name)
if field is None:
ctx.api.fail(f"{model_cls.__name__} has no field named {resolved_name!r}", ctx.context)
return field
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation changed, is it expected ?

Also should we instead add a suppress_errors = True flag to get_field_on_model that would be switch to False here ? Similarly to what mypy does with analyze_member_access

Comment on lines +1011 to +1013
all_pk_fields: set[_AnyField] = set(django_context.get_pk_fields(model_cls))
for parent in django_context.get_model_parents(model_cls):
all_pk_fields.update(django_context.get_pk_fields(parent))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask why we don't use django_context.get_pk_fields here but django_context.get_pk_fields seem to only return one pk (?)

Can we unify the implementations ?

@federicobond
Copy link
Copy Markdown
Contributor Author

@UnknownPlatypus thanks for the thorough look, and sorry for the noise. I mentioned in the description that I was mainly after a sanity check on the direction before polishing the implementations, but I should have been more explicit about expectations.

I'll tighten the implementations (naming, the changed behaviors you flagged, etc.) and ping you again once it's in better shape.

@UnknownPlatypus
Copy link
Copy Markdown
Contributor

Thanks, sorry for the tone.

Having to review my claude and my coworker claude all day is getting on my nerves.

But thanks for the work on this, will be much needed for recent mypy parallel work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants