I am currently trying to set up the per-site-cache, it all seemed to work fine locally but once I moved it onto test servers with google analytics I noticed that it wasn't serving pages from the cache as expected.
After a bit of digging I realised that this was due to another third party app that we are using (and need) that was accessing the session middleware, if the session middleware is accessed then Django will add 'Cookie' to the vary headers:
# django/contrib/sessions/middleware.py
if accessed:
patch_vary_headers(response, ('Cookie',))
HTTP header:
Vary: Cookie, Accept-Language
The issue I have is that when Django is generating the cache key it is looking at all the cookies that are in the request, including third party ones (e.g. google analytics) which are nothing to do with Django and which do not impact on rendering the view.
I don't want to monkey patch the third party Django app that we are using to stop is accessing the session.
I could patch the vary headers to remove the vary on cookie header completely but I think that if the session middleware is setting the header then it is probably safer to leave it alone and just filter out cookies that aren't relevant to Django?
I was thinking about stripping out 3rd party cookies using a white list like this:
from django.http import parse_cookie
class StripCookiesMiddleware:
"""Middleware to selectively strip cookies from the request"""
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
# selectively remove cookies from the request based on whitelists
# only strip out cookies if user is anonymous
if request.method in ('GET', 'HEAD') and not request.user.is_authenticated:
allowed_cookies = []
allowed_cookies += getattr(settings, "COOKIES_WHITELIST", [])
allowed_cookies += get_default_cookie_list(request)
cookie_dict = parse_cookie(request.META.get('HTTP_COOKIE', ''))
keys = list(cookie_dict.keys())
for key in keys:
if key not in allowed_cookies:
del cookie_dict[key]
cookie_string = "; ".join(
[f"{key}={value}" for key, value in cookie_dict.items()]
)
request.META["HTTP_COOKIE"] = cookie_string
response = self.get_response(request)
return response
def get_default_cookie_list(request):
# return a list of default cookies that are allowed
# this is a list of django cookies that we do not want to strip out
# we don't include CSRF as we are only stripping cookies
# on GET and HEAD requests
return [
getattr("settings", "LANGUAGE_COOKIE_NAME", "django_language"),
getattr("settings", "SESSION_COOKIE_NAME", "sessionid"),
]
I added my custom middleware just before FetchFromCacheMiddleware in the settings.py.
I have tried this locally and on test servers and it appears to be working as expected, i.e. all unauthenticated users are now being served the same cached version of the page regardless of third party cookies.
This appears to work but I still don't feel 100% comfortable manipulating the request object like this, and I can't see any obvious knock on for downstream caches but wondered if anyone had any other suggestions or can see a problem with this approach that I might be missing?
By the way we are not caching pages with forms (relatively small number of pages for our site) so the setting/reading of CSRF tokens should be unaffected by any of this.