<- Back to homepage

No query strings here either

by

A couple of weeks ago, Chris Morgan published I've banned query strings. I read it, liked it and then did roughly the same thing on my own site - with two deliberate differences.

Chris's opening sums up the motivation better than I could:

I don't like people adding tracking stuff to URLs. Still less do I like people adding tracking stuff to my URLs.

[...] UTM parameters are for me to use, not you. Leave my URLs alone.

From chrismorgan.info/no-query-strings

The premise is the same here. A ?utm_source=... or ?ref=... tacked onto one of my URLs by some intermediary is, at best, noise I never asked for and at worst a tracker the referrer is using to nudge my visitor's behaviour into a funnel. I'd rather refuse to serve those requests than pretend they're a legitimate way to reach a page on my site. I'm also a fan of having one true canonical URL to all my pages.

Where I differ from Chris

1. cache-busters like ?v=<digits> are allowed

Chris went for a true blanket ban - including breaking old cache-busting URLs like ?t=... and ?h=... that his site used to serve. That's likely the right call when none of those URLs are still in circulation. In any case, those might only be used for static assets anyways, where people don't have bookmarks to.

My situation is slightly different: I actively use ?v=<n> as a cache buster on assets I serve today. The very HTML you're reading links to main.css?v=1. I use it so that I can set a very high Cache-Control: max-age: ... on static assets. If I matched Chris's strictness I'd have to either give up on query-string cache busting (and switch to fingerprinted filenames or Cache-Control juggling), or break my own page load on every bump.

So my rule is a narrow allowlist: everything is blocked, except ?v=<digits>. The matcher is intentionally strict - the whole query string must be exactly v= followed by digits, nothing else, no extra parameters smuggled in alongside.

(no_query_strings) {
    @bad_query `{http.request.orig_uri}.contains("?") && !{http.request.uri.query}.matches("^v=[0-9]+$")`
    error @bad_query 403
}

The first clause uses orig_uri so a bare trailing ? still trips the ban - Caddy's {query} placeholder can't distinguish "absent" from "empty", and a lone ? deserves the same treatment as a parameter list. The second clause uses the canonical {uri.query} because Caddy doesn't expose .query as a sub-key on orig_uri - the rewrite never touches the query so the two are equivalent here.

2. 403 Forbidden, not 414 URI Too Long

Chris picked 414 URI Too Long, and is upfront about it:

You could argue that I'm abusing 414 URI Too Long. I respond that it's funnier this way.

From Chris's ban page

It's indeed nice, but I wanted to pick a status code that I can defend on RFC grounds rather than vibes. Here's how I read RFC 9110 and RFC 7725 for this case:

400 Bad Request
The server cannot or will not process the request due to something that is perceived to be a client error (e.g., malformed request syntax). The request isn't malformed; ?utm_source=x is perfectly well-formed. Too generic.
403 Forbidden
The server understood the request but refuses to authorize it. That is exactly what's happening: I understood the request, the URL would otherwise resolve, and I am refusing on policy grounds. The spec also explicitly notes that a server can describe that reason in the response content, which is what the body of the 403 page does.
404 Not Found
Misleading. The resource exists; I just won't serve it via this URL. Also has unpleasant SEO and caching side effects.
414 URI Too Long
A refusal to service the request because the request-target is longer than the server is willing to interpret. The objection is about length, not policy or content. Although, I agree that one could argue that everything after the canonical URL is too long.
451 Unavailable For Legal Reasons
Not legal reasons. Just personal taste.

403 is the cleanest semantic match. I'm not 100% certain, but I'd interpret the "authorize" from RFC9110 as not only HTTP authentication or authorization, but rather the more general sense of "permit".

(Okay, Chris is right that 414 is funnier, though. I'll concede that one.)

What it looks like

When a request comes in with anything other than ?v=<digits>, Caddy short-circuits with a 403 and serves a small explainer page. You can try it yourself: furrer.life/~timo/?utm_source=this-post. The page tells you what happened, why, and offers the same URL without the query string.

If you want to follow Chris down this path, his post links the relevant Caddyfile snippet on his site or use mine above which is a small variant of that with the extra allowlist clause shown above.

Webmentions

Replies (1)