Varnish power

Varnish is extra powerful. Giant platforms are using it, and you may want to leverage its benefits too.

Let’s dig it.

You may have read the following terms about Varnish:

They are all true but kind of abstract—my advice: use the HTTP approach.

Varnish is a smart go-between

Varnish caches HTTP requests. It provides a cached version of frequently-requested data for the app server.

When the client (~ the user) requests a page, Varnish automatically (with configurations) builds the output and sends it.

It serves all data already in the cache and asks the server only for missing generated data. That’s the reason why you often read the term “go-between”, as Varnish places itself between the server (Apache, NGINX) and the client.

Varnish grabs all resources and stores them (as static files) in its cache during the first load. The next loads get cached versions of contents.

Varnish makes a critical decision: delivering a cached version of the page or fetching content from the backend server.

Varnish scales

With Varnish, you get faster HTTP requests through caching.

However, as it’s a go-between, it has several other benefits. Sys may use it as a reverse proxy or a load balancer (for example, with the vcl_pipe() subroutine).

N.B.: consider reverse proxy as a way to intercept requests from clients and make sure they do not communicate directly to the backend server

You can even use Varnish as an additional security layer to restrict access to specific URLs, for example.

Varnish is not only useful to boost performances. It’s an efficient way to scale your architecture.

Varnish talks with headers

For each HTTP request, you get response headers. By listening to those headers, servers can trigger specific actions. That’s how they talk. They exchange headers through the HTTP protocol, and you get HTTP statuses (e.g., 200).

Varnish listens to port 80 and expects specific headers to determine if it should cache requests or not.

It sends HTTP headers to the client too. You can see if a page is in the Varnish cache with the X-Varnish-Cache header.

Just open the browser console and go to the network tab:

x-cache: HIT

If it’s “cache HIT”, then your page is in the cache. If it’s “cache MISS” then it’s not in the cache.

“HIT” means Varnish has found the resource.

VCL and cache lookup

The Varnish configuration language (VCL) allows you to define a cache policy.

It’s a C-like syntax that Varnish can parse. In that perspective, it’s very similar to all languages in the C-family. The code is parsed, compiled, and executed. If you know PHP or JavaScript, you won’t be lost.

In VCL, the req object is useful. We can access anything set on the client-side.

The method req.method allows for using the HTTP protocol. This way, you can easily detect and manipulate GET, POST, PUT, or PURGE requests.

For example, the following code is quite straighforward:

sub vcl_recv {
    if (req.method != "GET" and req.method != "HEAD") {
        return (pass);
    }

    return (hash);
}

You tell Varnish to skip any HTTP request that is neither GET nor HEAD because other request types usually involve data processing and calculations by the backend (e.g., PURGE).

Those HTTP configurations usually go in the vcl_recv() subroutine (recv is shorthand for “receive”), unique in the VCL, as it’s the first subroutine that Varnish uses when handling any HTTP request.

If Varnish finds the page in the cache, it will deliver it from there. I mean literally, with the vcl_deliver() subroutine.

If it does not find the page, it forwards the request to the backend server and uses vcl_backend_* subroutines to determine whether it’s cacheable or not. If so, it puts the results in its cache.

Eventually, it delivers the page with the vcl_deliver() subroutine.

Varnish under the hood

Varnish uses keys and values. It stores what we call “chunks” of data with a unique key. The key is used as an identifier to search content (outputs stored in memory) in the colossal haystack of bytes.

One of the most harmful mistakes you can make with Varnish is to forget that, under the hood, Varnish has its built-in logic, which is always present and prepended to each sub. They call it “built-in VCL”.

Besides, if you do not include your rules in an appropriate subroutine, the default VCL will apply. It might seem fair, but it sometimes makes things like debugging a little more complicated.

You know, It’s a UNIX system, so it’s quite conservative, to me in the best way, as it’s “decisions, not options”, but it won’t work out of the box.

Varnish does not cache requests with a cookie header by default. You often have to write custom VCL to handle them.

However, be careful with those cookies. You do not want to set a cache rule that generates a copy of your pages in Varnish cache for each user or display the wrong data. That could lead to weird bugs and complicated maintenance.

Conclusion: Varnish is a great addition

Varnish reduces bandwidth usage, footprint, and the server’s load. That does not mean you should drop all other optimizations. I like combining it with technologies such as Redis. They are not, IMHO, mutually exclusive.

I hope this post does a little more than scratching the surface, but Varnish has many other fantastic features and advanced usages, such as fragment caching in ESI (~ cache blocks in the page).

While it’s a compelling technology, it requires sufficient knowledge, experience, AND time to write the proper configuration. It does not work out of the box—neither surprising nor unacceptable for such a beautiful technology.

Photo by Thao Le Hoang on Unsplash