Skip to content

Consider not including data property in Response.__getstate__ for rendered responses #7704

@diox

Description

@diox

Checklist

  • I have verified that that issue exists against the master branch of Django REST framework.
  • I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • This is not a usage question. (Those should be directed to the discussion group instead.)
  • This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
  • I have reduced the issue to the simplest possible case.
  • I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)
  • It's difficult to write a failing test because currently DRF tests don't have memcached so the problem wouldn't immediately be visible. I could add one just comparing the rough size of the pickled response if you think that's useful to understand the issue.

Description

Once a Response is rendered, the instance contains the data twice:

  • In the data property, in the "raw" form that was passed to the Response. Usually a dict or a list.
  • In the content property, in the "final" form that is going to be sent down to the client (at this point content negotiation has found the relevant renderer for the request and used it to render the response)

This means that, when caching a Response object after it has been rendered, the object is roughly twice as big as it should be. This is a problem when using django's cache_page decorator as suggested in DRF docs because not only the larger object makes the round-trip unnecessarily longer, but also it can cause some caching backend like Memcached to silently fail to cache the response entirely! With Memcached default configuration objects larger than 1MB are not stored in the cache, and django's Memcached backend hides that failure.

Response.__getstate__() already excludes some properties to prevent them from being cached. As far as I can tell this already prevents a cached Response from being re-rendered since a bunch of properties will be missing, so please consider also excluding data to improve cacheability of Response objects (note that __getstate__() can't be called successfully on non-rendered Responses anyway because it inherits from django's SimpleTemplateResponse).

I realize this is a backwards-incompatible change if people are relying on data being present (it can probably mess up view tests that directly look at a Response data and use caching), but I think it's worth it to solve this problem. I'd be happy to write a PR if this issue is accepted (even if the solution is different to what I'm proposing, I'm happy to help implementing anything that fixes the functional problem).

Steps to reproduce

  • Using django's memcached backend, build an API like documented in Caching section of DRF docs, but return enough content for the data to be roughly 600 KB.

Expected behavior

  • Pickled response object size is roughly 600 KB (there is some overhead, but not that much)
  • Caching works

Actual behavior

  • Pickled response object size of over 1 MB
  • Caching silently fails (if you response makes database requests or takes a little time to compute it's easy to notice)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions