Make your NextJS site 10-100x faster with Express caching

February 18, 2022
0 comments React, Node, Nginx, JavaScript

UPDATE: Feb 21, 2022: The original blog post didn't mention the caching of custom headers. So warm cache hits would lose Cache-Control from the cold cache misses. Code updated below.

I know I know. The title sounds ridiculous. But it's not untrue. I managed to make my NextJS 20x faster by allowing the Express server, which handles NextJS, to cache the output in memory. And cache invalidation is not a problem.

Layers

My personal blog is a stack of layers:

KeyCDN --> Nginx (on my server) -> Express (same server) -> NextJS (inside Express)

And inside the NextJS code, to get the actual data, it uses HTTP to talk to a local Django server to get JSON based on data stored in a PostgreSQL database.

The problems I have are as follows:

  • The CDN sometimes asks for the same URL more than once when in theory you'd think it should be cached by them for a week. And if the traffic is high, my backend might get a stamping herd of requests until the CDN has warmed up.
  • It's technically possible to bypass the CDN by going straight to the origin server.
  • NextJS is "slow" and the culprit is actually critters which computes the critical CSS inline and lazy-loads the rest.
  • Using Nginx to do in-memory caching (which is powerfully fast by the way) does not allow cache purging at all (unless you buy Nginx Plus)

I really like NextJS and it's a great developer experience. There are definitely many things I don't like about it, but that's more because my site isn't SPA'y enough to benefit from much of what NextJS has to offer. By the way, I blogged about rewriting my site in NextJS last year.

Quick detour about critters

If you're reading my blog right now in a desktop browser, right-click and view source and you'll find this:


<head>
  <style>
  *,:after,:before{box-sizing:inherit}html{box-sizing:border-box}inpu...
  ... about 19k of inline CSS...
  </style>
  <link rel="stylesheet" href="/_next/static/css/fdcd47c7ff7e10df.css" data-n-g="" media="print" onload="this.media='all'">
  <noscript><link rel="stylesheet" href="/_next/static/css/fdcd47c7ff7e10df.css"></noscript>  
  ...
</head>

It's great for web performance because a <link rel="stylesheet" href="css.css"> is a render-blocking thing and it makes the site feel slow on first load. I wish I didn't need this, but it comes from my lack of CSS styling skills to custom hand-code every bit of CSS and instead, I rely on a bloated CSS framework which comes as a massive kitchen sink.

To add critical CSS optimization in NextJS, you add:


experimental: { optimizeCss: true },

inside your next.config.js. Easy enough, but it slows down my site by a factor of ~80ms to ~230ms on my Intel Macbook per page rendered.
So see, if it wasn't for this need of critical CSS inlining, NextJS would be about ~80ms per page and that includes getting all the data via HTTP JSON for each page too.

Express caching middleware

My server.mjs looks like this (simplified):


import next from "next";

import renderCaching from "./middleware/render-caching.mjs";

const app = next({ dev });
const handle = app.getRequestHandler();

app
  .prepare()
  .then(() => {
    const server = express();

    // For Gzip and Brotli compression
    server.use(shrinkRay());

    server.use(renderCaching);

    server.use(handle);

    // Use the rollbar error handler to send exceptions to your rollbar account
    if (rollbar) server.use(rollbar.errorHandler());

    server.listen(port, (err) => {
      if (err) throw err;
      console.log(`> Ready on http://localhost:${port}`);
    });
  })

And the middleware/render-caching.mjs looks like this:


import express from "express";
import QuickLRU from "quick-lru";

const router = express.Router();

const cache = new QuickLRU({ maxSize: 1000 });

router.get("/*", async function renderCaching(req, res, next) {
  if (
    req.path.startsWith("/_next/image") ||
    req.path.startsWith("/_next/static") ||
    req.path.startsWith("/search")
  ) {
    return next();
  }

  const key = req.url;
  if (cache.has(key)) {
    res.setHeader("x-middleware-cache", "hit");
    const [body, headers] = cache.get(key);
    Object.entries(headers).forEach(([key, value]) => {
      if (key !== "x-middleware-cache") res.setHeader(key, value);
    });
    return res.status(200).send(body);
  } else {
    res.setHeader("x-middleware-cache", "miss");
  }

  const originalEndFunc = res.end.bind(res);
  res.end = function (body) {
    if (body && res.statusCode === 200) {
      cache.set(key, [body, res.getHeaders()]);
      // console.log(
      //   `HEAP AFTER CACHING ${(
      //     process.memoryUsage().heapUsed /
      //     1024 /
      //     1024
      //   ).toFixed(1)}MB`
      // );
    }
    return originalEndFunc(body);
  };

  next();
});

export default router;

It's far from perfect and I only just coded this yesterday afternoon. My server runs a single Node process so the max heap memory would theoretically be 1,000 x the average size of those response bodies. If you're worried about bloating your memory, just adjust the QuickLRU to something smaller.

Let's talk about your keys

In my basic version, I chose this cache key:


const key = req.url;

but that means that http://localhost:3000/foo?a=1 is different from http://localhost:3000/foo?b=2 which might be a mistake if you're certain that no rendering ever depends on a query string.

But this is totally up to you! For example, suppose that you know your site depends on the darkmode cookie, you can do something like this:


const key = `${req.path} ${req.cookies['darkmode']==='dark'} ${rec.headers['accept-language']}`

Or,


const key = req.path.startsWith('/search') ? req.url : req.path

Purging

As soon as I launched this code, I watched the log files, and voila!:

::ffff:127.0.0.1 [18/Feb/2022:12:59:36 +0000] GET /about HTTP/1.1 200 - - 422.356 ms
::ffff:127.0.0.1 [18/Feb/2022:12:59:43 +0000] GET /about HTTP/1.1 200 - - 1.133 ms

Cool. It works. But the problem with a simple LRU cache is that it's sticky. And it's stored inside a running process's memory. How is the Express server middleware supposed to know that the content has changed and needs a cache purge? It doesn't. It can't know. The only one that knows is my Django server which accepts the various write operations that I know are reasons to purge the cache. For example, if I approve a blog post comment or an edit to the page, it triggers the following (simplified) Python code:


import requests

def cache_purge(url):
    if settings.PURGE_URL:
        print(requests.get(settings.PURGE_URL, json={
           pathnames: [url]
        }, headers={
           "Authorization": f"Bearer {settings.PURGE_SECRET}"
        })

    if settings.KEYCDN_API_KEY:
        api = keycdn.Api(settings.KEYCDN_API_KEY)
        print(api.delete(
            f"zones/purgeurl/{settings.KEYCDN_ZONE_ID}.json", 
            {"urls": [url]}
        ))    

Now, let's go back to the simplified middleware/render-caching.mjs and look at how we can purge from the LRU over HTTP POST:


const cache = new QuickLRU({ maxSize: 1000 })

router.get("/*", async function renderCaching(req, res, next) {
// ... Same as above
});


router.post("/__purge__", async function purgeCache(req, res, next) {
  const { body } = req;
  const { pathnames } = body;
  try {
    validatePathnames(pathnames)
  } catch (err) {
    return res.status(400).send(err.toString());
  }

  const bearer = req.headers.authorization;
  const token = bearer.replace("Bearer", "").trim();
  if (token !== PURGE_SECRET) {
    return res.status(403).send("Forbidden");
  }

  const purged = [];

  for (const pathname of pathnames) {
    for (const key of cache.keys()) {
      if (
        key === pathname ||
        (key.startsWith("/_next/data/") && key.includes(`${pathname}.json`))
      ) {
        cache.delete(key);
        purged.push(key);
      }
    }
  }
  res.json({ purged });
});

What's cool about that is that it can purge both the regular HTML URL and it can also purge those _next/data/ URLs. Because when NextJS can hijack the <a> click, it can just request the data in JSON form and use existing React components to re-render the page with the different data. So, in a sense, GET /_next/data/RzG7kh1I6ZEmOAPWpdA7g/en/plog/nextjs-faster-with-express-caching.json?oid=nextjs-faster-with-express-caching is the same as GET /plog/nextjs-faster-with-express-caching because of how NextJS works. But in terms of content, they're the same. But worth pointing out that the same piece of content can be represented in different URLs.

Another thing to point out is that this caching is specifically about individual pages. In my blog, for example, the homepage is a mix of the 10 latest entries. But I know this within my Django server so when a particular blog post has been updated, for some reason, I actually send out a bunch of different URLs to the purge where I know its content will be included. It's not perfect but it works pretty well.

Conclusion

The hardest part about caching is cache invalidation. It's usually the inner core of a crux. Sometimes, you're so desperate to survive a stampeding herd problem that you don't care about cache invalidation but as a compromise, you just set the caching time-to-live short.

But I think the most important tenant of good caching is: have full control over it. I.e. don't take it lightly. Build something where you can fully understand and change how it works exactly to your specific business needs.

This idea of letting Express cache responses in memory isn't new but I didn't find any decent third-party solution on NPMJS that I liked or felt fully comfortable with. And I needed to tailor exactly to my specific setup.

Go forth and try it out on your own site! Not all sites or apps need this at all, but if you do, I hope I have inspired a foundation of a solution.

How to use letsencrypt-acme-challenge.conf in Nginx

September 5, 2021
1 comment Nginx

Because I always forget, if you're using certbot to create certs for your Nginx server, you'll need to it up so it works on HTTP as well as HTTPS. But once you're done, you're going to want all HTTP traffic to redirect to HTTPS. The correct syntax is:


server {
    server_name mydomain.example.com;
    include /etc/nginx/snippets/letsencrypt-acme-challenge.conf;
    location / {
      return 301 https://mydomain.example.com$request_uri;
    }
}

And that letsencrypt-acme-challenge.conf looks like this (code comments stripped):


location ^~ /.well-known/acme-challenge/ {
    default_type "text/plain";
    root         /var/www/html;
    break;
}
location = /.well-known/acme-challenge/ {
    return 404;
}

This way, a GET request for http://mydomain.example.com/.well-known/acme-challenge/test.html will be 200 OK if there's a file called /var/www/html/.well-known/acme-challenge/test.html. And http://mydomain.example.com/.well-known/acme-challenge/does-not-exist.html will 404 Not Found.

But all and any other GET request will redirect. E.g. http://mydomain.example.com/whatever -- 301 Moved Permanently --> https://mydomain.example.com/whatever.

How I added brotli_static to nginx 1.17 in Ubuntu (Eoan Ermine) 19.10

April 9, 2020
0 comments Nginx, Linux

I knew I didn't want to download the sources to nginx to install it on my new Ubuntu 19.10 server because I'll never have the discipline to remember to keep it upgraded. No, I'd rather just run apt update && apt upgrade every now and then.

Why is this so hard?! All I need is the ability to set brotli_static on; in my Nginx config so it'll automatically pick the .br file if it exists on disk.

These instructions totally helped but here they are specifically for my version (all run as root):

git clone --recursive https://github.com/google/ngx_brotli.git

apt install brotli
apt-get build-dep nginx

# Note the version of which nginx you have installed
nginx -v
# ...which informs which URL to wget
wget https://nginx.org/download/nginx-1.17.9.tar.gz
aunpack nginx-1.17.9.tar.gz
nginx -V 2>&1 >/dev/null | grep -o " --.*" | grep -oP .+?(?=--add-dynamic-module)| head -1 > nginx-1.17.9/build_args.txt
cd nginx-1.17.9/
./configure --with-compat $(cat build_args.txt) --add-dynamic-module=../ngx_brotli
make install

cp objs/ngx_http_brotli_filter_module.so  /usr/lib/nginx/modules/
chmod 644 /usr/lib/nginx/modules/ngx_http_brotli_filter_module.so
cp objs/ngx_http_brotli_static_module.so /usr/lib/nginx/modules/
chmod 644 /usr/lib/nginx/modules/ngx_http_brotli_static_module.so

ls -l /etc/nginx/modules

Now I can edit my /etc/nginx/nginx.conf (somewhere near the top) to:

load_module /usr/lib/nginx/modules/ngx_http_brotli_filter_module.so;
load_module /usr/lib/nginx/modules/ngx_http_brotli_static_module.so;

And test that it works:

nginx -t

Update to speed comparison for Redis vs PostgreSQL storing blobs of JSON

September 30, 2019
2 comments Redis, Nginx, Web Performance, Python, Django, PostgreSQL

Last week, I blogged about "How much faster is Redis at storing a blob of JSON compared to PostgreSQL?". Judging from a lot of comments, people misinterpreted this. (By the way, Redis is persistent). It's no surprise that Redis is faster.

However, it's a fact that I have do have a lot of blobs stored and need to present them via the web API as fast as possible. It's rare that I want to do relational or batch operations on the data. But Redis isn't a slam dunk for simple retrieval because I don't know if I trust its integrity with the 3GB worth of data that I both don't want to lose and don't want to load all into RAM.

But is it entirely wrong to look at WHICH database to get the best speed?

Reviewing this corner of Song Search helped me rethink this. PostgreSQL is, in my view, a better database for storing stuff. Redis is faster for individual lookups. But you know what's even faster? Nginx

Nginx??

The way the application works is that a React web app is requesting the Amazon product data for the sake of presenting an appropriate affiliate link. This is done by the browser essentially doing:


const response = await fetch('https://songsear.ch/api/song/5246889/amazon');

Internally, in the app, what it does is that it looks this up, by ID, on the AmazonAffiliateLookup ORM model. Suppose it wasn't there in the PostgreSQL, it uses the Amazon Affiliate Product Details API, to look it up and when the results come in it stores a copy of this in PostgreSQL so we can re-use this URL without hitting rate limits on the Product Details API. Lastly, in a piece of Django view code, it carefully scrubs and repackages this result so that only the fields used by the React rendering code is shipped between the server and the browser. That "scrubbed" piece of data is actually much smaller. Partly because it limits the results to the first/best match and it deletes a bunch of things that are never needed such as ProductTypeName, Studio, TrackSequence etc. The proportion is roughly 23x. I.e. of the 3GB of JSON blobs stored in PostgreSQL only 130MB is ever transported from the server to the users.

Again, Nginx?

Nginx has a built in reverse HTTP proxy cache which is easy to set up but a bit hard to do purges on. The biggest flaw, in my view, is that it's hard to get a handle of how much RAM this it's eating up. Well, if the total possible amount of data within the server is 130MB, then that is something I'm perfectly comfortable to let Nginx handle cache in RAM.

Good HTTP performance benchmarking is hard to do but here's a teaser from my local laptop version of Nginx:

▶ hey -n 10000 -c 10 https://songsearch.local/api/song/1810960/affiliate/amazon-itunes

Summary:
  Total:    0.9882 secs
  Slowest:  0.0279 secs
  Fastest:  0.0001 secs
  Average:  0.0010 secs
  Requests/sec: 10119.8265


Response time histogram:
  0.000 [1] |
  0.003 [9752]  |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.006 [108]   |
  0.008 [70]    |
  0.011 [32]    |
  0.014 [8] |
  0.017 [12]    |
  0.020 [11]    |
  0.022 [1] |
  0.025 [4] |
  0.028 [1] |


Latency distribution:
  10% in 0.0003 secs
  25% in 0.0006 secs
  50% in 0.0008 secs
  75% in 0.0010 secs
  90% in 0.0013 secs
  95% in 0.0016 secs
  99% in 0.0068 secs

Details (average, fastest, slowest):
  DNS+dialup:   0.0000 secs, 0.0001 secs, 0.0279 secs
  DNS-lookup:   0.0000 secs, 0.0000 secs, 0.0026 secs
  req write:    0.0000 secs, 0.0000 secs, 0.0011 secs
  resp wait:    0.0008 secs, 0.0001 secs, 0.0206 secs
  resp read:    0.0001 secs, 0.0000 secs, 0.0013 secs

Status code distribution:
  [200] 10000 responses

10,000 requests across 10 clients at rougly 10,000 requests per second. That includes doing all the HTTP parsing, WSGI stuff, forming of a SQL or Redis query, the deserialization, the Django JSON HTTP response serialization etc. The cache TTL is controlled by simply setting a Cache-Control HTTP header with something like max-age=86400.

Now, repeated fetches for this are cached at the Nginx level and it means it doesn't even matter how slow/fast the database is. As long as it's not taking seconds, with a long Cache-Control, Nginx can hold on to this in RAM for days or until the whole server is restarted (which is rare).

Conclusion

If you the total amount of data that can and will be cached is controlled, putting it in a HTTP reverse proxy cache is probably order of magnitude faster than messing with chosing which database to use.

SongSearch autocomplete rate now 2+ per second

July 11, 2019
0 comments Django, Python, Nginx, Redis

By analyzing my Nginx logs, I've concluded that SongSearch's autocomplete JSON API now gets about 2.2 requests per second. I.e. these are XHR requests to /api/search/autocomplete?q=....

Roughly, 1.8 requests per second goes back to the Django/Elasticsearch backend. That's a hit ratio of 16%. These Django/Elasticsearch requests take roughly 200ms on average. I suspect about 150-180ms of that time is spent querying Elasticsearch, the rest being Python request/response and JSON "paperwork".

Autocomplete counts in Datadog

Caching strategy

Caching is hard because the queries are so vastly different over time. Had I put a Redis cache decorator on the autocomplete Django view function I'd quickly bloat Redis memory and cause lots of evictions.

What I used to do was something like this:


def search_autocomplete(request):
   q = request.GET.get('q') 

   cache_key = None
   if len(q) < 10:
      cache_key = 'autocomplete:' + q
      results = cache.get(cache_key)
      if results is not None:
          return http.JsonResponse(results)

   results = _do_elastisearch_query(q)
   if cache_key:
       cache.set(cache_key, results, 60 * 60)

   return http.JsonResponse(results)   

However, after some simple benchmarking it was clear that using Nginx' uwsgi_cache it was much faster to let the cacheable queries terminate already at Nginx. So I changed the code to something like this:


def search_autocomplete(request):
   q = request.GET.get('q') 
   results = _do_elastisearch_query(q)
   response = http.JsonResponse(results)   

   if len(q) < 10:
       patch_cache_control(response, public=True, max_age=60 * 60)

   return response

The only annoying thing about Nginx caching is that purging is hard unless you go for that Nginx Plus (or whatever their enterprise version is called). But more annoying, to me, is that fact that I can't really see what this means for my server. When I was caching with Redis I could just use redis-cli and...

> INFO
...
# Memory
used_memory:123904288
used_memory_human:118.16M
...

Nginx Amplify

My current best tool for keeping an eye on Nginx is Nginx Amplify. It gives me some basic insights about the state of things. Here are some recent screenshots:

NGINX Requests/s

NGINX Memory Usage

NGINX CPU Usage %

Thoughts and conclusion

Caching is hard. But it's also fun because it ties directly into performance work.

In my business logic, I chose that autocomplete queries that are between 1 and 9 characters are cacheable. And I picked a TTL of 60 minutes. At this point, I'm not sure exactly why I chose that logic but I remember doing some back-of-envelope calculations about what the hit ratio would be and roughly what that would mean in bytes in RAM. I definitely remember picking 60 minutes because I was nervous about bloating Nginx's memory usage. But as of today, I'm switching that up to 24 hours and let's see what that does to my current 16% Nginx cache hit ratio. At the moment, /var/cache/nginx-cache/ is only 34MB which isn't much.

Another crux with using uwsgi_cache (or proxy_cache) is that you can't control the cache key very well. When it was all in Python I was able to decide about the cache key myself. A plausible implementation is cache_key = q.lower().strip() for example. That means you can protect your Elasticsearch backend from having to do {"q": "A"} and {"q": "a"}. Who knows, perhaps there is a way to hack this in Nginx without compiling in some Lua engine.

The ideal would be some user-friendly diagnostics tool that I can point somewhere, towards Nginx, that says how much my uwsgi_cache is hurting or saving me. Autocomplete is just one of many things going on on this single DigitalOcean server. There's also a big PostgreSQL server, a node-express cluster, a bunch of uwsgi workers, Redis, lots of cron job scripts, and of course a big honking Elasticsearch 6.

UPDATE (July 12 2019)

Currently, and as mentioned above, I only set Cache-Control headers (which means Nginx snaps it up) for queries that at max 9 characters long. I wanted to appreciate and understand how ratio of all queries are longer than 9 characters so I wrote a report and its output is this:

POINT: 7
Sum show 75646 32.2%
Sum rest 159321 67.8%

POINT: 8
Sum show 83702 35.6%
Sum rest 151265 64.4%

POINT: 9
Sum show 90870 38.7%
Sum rest 144097 61.3%

POINT: 10
Sum show 98384 41.9%
Sum rest 136583 58.1%

POINT: 11
Sum show 106093 45.2%
Sum rest 128874 54.8%

POINT: 12
Sum show 113905 48.5%
Sum rest 121062 51.5%

It means that (independent of time expiry) 38.7% of queries are 9 characters or less.

How I simulate a CDN with Nginx

May 15, 2019
1 comment Python, Nginx

Usually, a CDN is just a cache you put in front of a dynamic website. You set up the CDN to be the first server your clients get data from, the CDN quickly decides if it was a copy cached or otherwise it asks the origin server for a fresh copy. So far so good, but if you really care about squeezing that extra performance out you need to worry about having a decent TTL and as soon as you make the TTL more than a couple of minutes you need to think about cache invalidation. You also need to worry about preventing certain endpoints from ever getting caught in the CDN which could be very bad.

For this site, www.peterbe.com, I'm using KeyCDN which I've blogged out here: "I think I might put my whole site behind a CDN" and here: "KeyCDN vs. DigitalOcean Nginx". KeyCDN has an API and a python client which I've contributed to.

The next problem is; how do you test all this stuff on your laptop? Unfortunately, you can't deploy a KeyCDN docker image or something like that, that attempts to mimic how it works for reals. So, to simulate a CDN locally on my laptop, I'm using Nginx. It's definitely pretty different but it's not the point. The point is that you want something that acts as a reverse proxy. You want to make sure that stuff that's supposed to be cached gets cached, stuff that's supposed to be purged gets purged and that things that are always supposed to be dynamic is always dynamic.

The Configuration

First I add peterbecom.local into /etc/hosts like this:

▶ cat /etc/hosts | grep peterbecom.local
127.0.0.1       peterbecom.local origin.peterbecom.local
::1             peterbecom.local origin.peterbecom.local

Next, I set up the Nginx config (running on port 80) and the configuration looks like this:

proxy_cache_path /tmp/nginxcache  levels=1:2    keys_zone=STATIC:10m
    inactive=24h  max_size=1g;

server {
    server_name peterbecom.local;
    location / {
        proxy_cache_bypass $http_secret_header;
        add_header X-Cache $upstream_cache_status;
        proxy_set_header x-forwarded-host $host;
        proxy_cache STATIC;
        # proxy_cache_key $uri;
        proxy_cache_valid 200  1h;
        proxy_pass http://origin.peterbecom.local;
    }
    access_log /tmp/peterbecom.access.log combined;
    error_log /tmp/peterbecom.error.log info;
}

By the way, I've also set up origin.peterbecom.local to be run in Nginx too but it could just be proxy_pass http://localhost:8000; to go straight to Django. Not relevant for this context.

The Purge

Without the commercial version of Nginx (Plus) you can't do easy purging just for purging sake. But with proxy_cache_bypass $http_secret_header; it's very similar to purging except that it immediately makes a request to the origin.

First, to test that it works, I start up Nginx and Django and now I can run:

▶ curl -v http://peterbecom.local/about > /dev/null
< HTTP/1.1 200 OK
< Server: nginx/1.15.10
< Cache-Control: public, max-age=3672
< X-Cache: MISS
...

(Note the X-Cache: MISS which comes from add_header X-Cache $upstream_cache_status;)

This should trigger a log line in /tmp/peterbecom.access.log and in the Django runserver foreground logs.

At this point, I can kill the Django server and run it again:

▶ curl -v http://peterbecom.local/about > /dev/null
< Server: nginx/1.15.10
< HTTP/1.1 200 OK
< Cache-Control: max-age=86400
< Cache-Control: public
< X-Cache: HIT
...

Cool! It's working without Django running. As expected. This is how to send a "purge request"

▶ curl -v -H "secret-header:true" http://peterbecom.local/about > /dev/null
> GET /about HTTP/1.1
> secret-header:true
>
< HTTP/1.1 502 Bad Gateway
...

Clearly, it's trying to go to the origin, which was killed, so you start that up again and you get back to:

▶ curl -v http://peterbecom.local/about > /dev/null
< HTTP/1.1 200 OK
< Server: nginx/1.15.10
< Cache-Control: public, max-age=3672
< X-Cache: MISS
...

In Python

In my site, there are Django signals that are triggered when a piece of content changes and I'm using python-keycdn-api in production but obviously, that won't work with Nginx. So I have a local setting and my Python code looks like this:



# This function gets called by a Django `post_save` signal
# among other things such as cron jobs and management commands.

def purge_cdn_urls(urls):
    if settings.USE_NGINX_BYPASS:
        # Note! This Nginx trick will not just purge the proxy_cache, it will
        # immediately trigger a refetch.
        x_cache_headers = []
        for url in urls:
            if "://" not in url:
                url = settings.NGINX_BYPASS_BASEURL + url
            r = requests.get(url, headers={"secret-header": "true"})
            r.raise_for_status()
            x_cache_headers.append({"url": url, "x-cache": r.headers.get("x-cache")})
        print("PURGED:", x_cache_headers)
        return 

    ...the stuff that uses keycdn...

Notes and Conclusion

One important feature is that my CDN is a CNAME for www.peterbe.com but it reaches the origin server on a different URL. When my Django code needs to know the outside facing domain, I need to respect that. The communication between by the CDN and my origin is a domain I don't want to expose. What KeyCDN does is that they send an x-forwarded-host header which I need to take into account when understanding what outward facing absolute URL was used. Here's how I do that:


def get_base_url(request):
    base_url = ["http"]
    if request.is_secure():
        base_url.append("s")
    base_url.append("://")
    x_forwarded_host = request.headers.get("X-Forwarded-Host")
    if x_forwarded_host and x_forwarded_host in settings.ALLOWED_HOSTS:
        base_url.append(x_forwarded_host)
    else:
        base_url.append(request.get_host())
    return "".join(base_url)

That's about it. There are lots of other details I glossed over but the point is that this works good enough to test that the cache invalidation works as expected.

I think I might put my whole site behind a CDN

April 23, 2019
0 comments Web development, Nginx, Web Performance

tl;dr; I'm going to put this blog behind KeyCDN and I expect a 2-4x performance boost (on Time To First Byte).

Right now, requests to my blog go straight to an Nginx server in DigitalOcean in NYC, USA. The Nginx server, 99% of the time, serves the blog posts (and static assets) as index.html files straight from disk. If the request is GET /plog/some-slug it will search for a file called /path/to/cached/files/plog/some-slug/index.html (or index.html.br or index.html.gz depending on the user agent's Accept-Encoding header). Only if the file doesn't exist on disk, it goes through to Django (via uWSGI built into Nginx). All of it is done with HTTP/2 and uses LetsEncrypt for SSL.

This has been working great but it's time to step it up. It's time to put the whole site behind a CDN. And I think I'm going to use KeyCDN for it.

In the past, it used to be best-practice that you serve your HTML document from your smart server (e.g. Django) and then, for the static assets, you put in a CDN. Like this:


<html>
  <link rel="stylesheet" href="https://myaccount123.cloudakamaifastlyflare.com/static/main.d910ef9a33.css">

...
<body>
  <img src="https://myaccount123.cloudakamaifastlyflare.com/images/hero.jpg">

...

But with HTTP/2, this becomes an anti-pattern for web performance because your client has already made an expensive HTTP/2 connection (and SSL negotiation) to https://yourcooldomain.com and now it's cheap to just download the rest. I used to do it like that too and I don't regret it. As a matter of fact, on https://songsear.ch is straight to Nginx but all its images are (lazy) loaded via songsearch-2916.kxcdn.com. But I think, when time allows, I'll put all of Song Search behind a CDN too.

Basically, it's time to put the whole site behind a CDN. With smart purging techniques and smarter CDNs respecting your dynamic content cache control headers, it's time to share the load. ...all over the world.

CDN Choices

There are many sites that want to compare CDNs. But many are affiliated or even made by one of them. So it's hard to get comparisons. For example, KeyCDN demonstrates they're the cheapest by comparing themselves with 5 others that they picked. (But mind you, that seems reasonably backed up by this comparison on cdn.reviews).

CDNPerf does a decent job with cool graphs and stuff. Incidentally, they rank my current favorite (KeyCDN) as the slowest compared to the well known giants that I compared it to.

CDNPerf graph

But mind you, the perf difference between KeyCDN and the winner (topmost in the graph as of today) is 36ms vs 47ms which are both fantastic numbers.

CDNPerf list

It's hard to compare CDNs because they're all pretty fast, and actually, they're all reasonably cheap. What really matters is the features and that's a lot harder to compare. CloudFlare often comes up as a CDN provider with stellar features that impress me. I've never actually used them but at least they mention "Fast cache purge" and "API programmability" are their key features. But they also don't mention Brotli caching which I know is a feature KeyCDN supports.

KeyCDN has been great to me in the past when I've used it to CDN host static assets. I'm familiar with their interface and they recently launched an API to do things like purge-by-tag and purge-by-URL. They're cheap, which matters because in this context it's all side-project stuff I want to put behind a CDN. They have a Python library which, although very rough around the edges, it works. And also very important; I've communicated very successfully with them through their support and they've been responsive and helpful. So I'll go with KeyCDN.

The Opportunity

Before I move my domain www.peterbe.com to become a CNAME for one of their CDN domains, I wanted to experiment a little and see how it works and what performance numbers I get for comparison. So I set up beta.peterbe.com and did some Django and Nginx wiring so it would work the same but with the difference that it goes through a CDN for everything.

Then I picked a random page and set up a Hyperping monitor from all of its available regions and let it brew for a while. Unfortunately, Hyperping doesn't let you compare two monitors side-by-side so you're going to have to use your own eyes to compare the graphs:

www means no CDN, just the origin Nginx
NOT behind a CDN (server is New York, USA)

beta means with a CDN in front
Behind a CDN

The "total Response Time" in Hyperping doesn't really make sense. They're an average across all regions it pings from. If you live in, for example, Germany; the only response time that matters to you is 1,215 ms versus 40 ms. Equally, if you live somewhere in New York, the only response time that matters to you is 20 ms versus 64 ms.

I actually ran another benchmark. I used Python like this:


t0 = time.time()
r = requests.get('https://www.peterbe.com/plog/some-slug')
t1 = time.time()
print("Took", t1 - t0)

I did this from South Carolina which means my nearest KeyCDN edge location could be Atlanta, Miami, or New York. Either way, I'm reasonably near New York (compared to the rest of the world) so it'd be a fair performance comparison for all US east coast traffic. (Insert disclaimer here). It downloads the most recent blog posts, in repeated cycles, which gives the CDN a solid chance to warm up and then it compares the median of the last 100 downloads. The output of this is as follows:

beta
    COUNT               1854 (but only using the last 100)
    HIT RATIO           100.0%
    AVERAGE (all)       63.12ms
    MEDIAN (all)        61.89ms

www
    COUNT               1856 (but only using the last 100)
    HIT RATIO           100.0%
    AVERAGE             136.22ms
    MEDIAN              135.61ms

("HIT RATIO" for the non-CDN URL means it was served entirely without Djando server rendering)

What it means is that the median, with a CDN is: 62ms and 135.6ms without. That's a 2x boost.

The crawler stats script is available here: github.com/peterbe/peterbecom-cdn-crawl and I would be thrilled if you can clone it and run it and report what numbers you get and where you're running it from.

Notes and Conclusion

Mind you, 62ms vs. 136ms might sound like a silly difference if Webpagetest says it takes 700ms until the page is interactive (on an LTE connection). And this is a tiny super-optimized page. But never forget A) we can't all live in the US east-coast area and B) if the HTML can download marginally faster it allows the browser to parse it sooner and start downloading all the other stuff much sooner. It'll make a big difference! I'm sure you've all seen graphs like this:

Cold-cache MDN page on 4G
Imagine if all those static asset downloads could have started a whole second "to the left"

Of course a CDN is faster. It's no news. But it's also a hassle and it costs money. It's 2019 and most good CDNs now support Brotli, fast purge-by-url, and HTTP/2. It's time to make the switch! It's not like cache-invalidation is hard.

UPDATE April 23 2019 (same day)

KeyCDN has a neat looking tool that is similar to Hyperping but more of a one-off kinda deal. It's called Performance Test and I wouldn't be surprised it's biased as heck because they probably run these pings from the same location'ish as where they have the edge locations. Anyway, the results are nevertheless juicy. Note the last, TTFB column numbers.

Performance Test without CDN
Performance Test without CDN

Performance Test with CDN
Performance Test with CDN

KeyCDN vs. DigitalOcean Nginx

April 12, 2019
0 comments Web development, Nginx, Web Performance

tl;dr; The global average response time of serving an image from my NYC DigitalOcean server compared to a CDN is almost 10x.

KeyCDN is a CDN service that I use for side-projects. It's great. It has about ~35 edge locations. I don't know much about how their web servers work but I can't imagine it's much different from the origin server. In principle.

The origin server is my DigitalOcean (6 vCPU, 16 GB RAM, Ubuntu 14) droplet. It's running an up-to-date CloudFlare build of Nginx and the static images are served straight from (SSD) disk with a 4 weeks TTL (max-age=2419200,public,immutable). The SSL is done with LetsEncrypt and I'm somewhat confident the Nginx is decently configured and uses HTTP/2.

So the CDN, on songsearch-2916.kxcdn.com, is basically configured to front any requests to songsear.ch. If the origin has cache-control headers, KeyCDN knows it can hold on to it for a while, but it's not a guarantee that it will for the full time specified in the cache-control. Either way; how does it compare?

The Experiment

I picked a random static asset URL. It's a 32 KB JPEG file. Its origin URL and its CDN URL are:

  1. https://songsear.ch/static/albums/2017/05/24/08/170630_300x300.jpg
  2. https://songsearch-2916.kxcdn.com/static/albums/2017/05/24/08/170630_300x300.jpg

Next, I set up a Hyperping monitor on both URLs as GET requests. For the regions (regions from where Hyperping will do pings from), I picked the following:

  1. San Francisco, USA
  2. New York, USA
  3. London, United Kindom
  4. Frankfurt, Germany
  5. Mumbai, India
  6. São Paulo, Brazil
  7. Sydney, Australia

(I wish I had selected all 12 possible regions when I started but now it's too late for lazy me)

Then, I let Hyperping GET these URLs for a while and behold, here are the numbers:

The Results

Average response time:

  • The CDN: 79 ms
  • The origin: 714 ms

That's a 10x difference!

Mind you, the "average response time" is across all regions. It doesn't reflect what people get. If 90% of your visitors are from Australia, the average response times would, of course, be very different. But as an example, the origin server is in New York and there, the average response time is 26 ms vs. 105 ms which is a 5x difference.

Here are some screenshots from Hyperping:

KeyCDN results
KeyCDN

Origin server
Origin server

Conclusion

KeyCDN's server is clearly fast and worth doing. It's unsurprising that it performs better far away from New York but it's surprising how much faster it is at serving than the origin when pinged from New York (5x difference).

The site is still NOT fronted by a CDN because, apart from the images, almost all content is un-cacheable. However, I need to do more research and experimentation with putting everything behind a CDN and being meticulous with setting no-cache headers on dynamic stuff and using async tools to invalidate CDN caches when appropriate.

Experimenting with Nginx worker_processes

February 14, 2019
0 comments Web development, Nginx, macOS, Linux

I have Nginx 1.15.8 installed with Homebrew on my macOS. By default the /usr/local/etc/nginx/nginx.conf it set to...:

worker_processes  1;

But, from the documentation, it says:

"The optimal value depends on many factors including (but not limited to) the number of CPU cores, the number of hard disk drives that store data, and load pattern. When one is in doubt, setting it to the number of available CPU cores would be a good start (the value “auto” will try to autodetect it)." (bold emphasis mine)

What is the ideal number for me? The performance of Nginx on my laptop doesn't really matter. But for my side-projects it's important to have a fast Nginx since it serves static HTML and lots of static assets. However, on my personal servers I have a bunch of other resource hungry stuff going on that I know is more likely to need the resources, like Elasticsearch and uwsgi.

To figure this out, I wrote a benchmark program that requested a small index.html about 10,000 times across 10 concurrent clients with hey.

hey -n 10000 -c 10 http://peterbecom.local/plog/variable_cache_control/awspa

I ran this 10 times between changing the worker_processes in the nginx.conf file. Here's the output:

1 WORKER PROCESSES
BEST  : 13,607.24 reqs/s

2 WORKER PROCESSES
BEST  : 17,422.76 reqs/s

3 WORKER PROCESSES
BEST  : 18,886.60 reqs/s

4 WORKER PROCESSES
BEST  : 19,417.35 reqs/s

5 WORKER PROCESSES
BEST  : 19,094.18 reqs/s

6 WORKER PROCESSES
BEST  : 19,855.32 reqs/s

7 WORKER PROCESSES
BEST  : 19,824.86 reqs/s

8 WORKER PROCESSES
BEST  : 20,118.25 reqs/s

Or, as a graph:

Graph

Now note, this is done here on my MacBook Pro. Not on my Ubuntu DigitalOcean servers. For now, I just want to get a feeling for how these numbers correlate.

Conclusion

The benchmark isn't good enough. The numbers are pretty stable but I'm doing this on my laptop with multiple browsers idling, Slack, and Spotify running. Clearly, the throughput goes up a bit when you allocate more workers but if anything can be learned from this, start with going beyond 1 for a quick fix and from there start poking and more exhaustive benchmarks. And don't forget, if you have time to go deeper on this, to look at the combination of worker_connections and worker_processes.

Be very careful with your add_header in Nginx! You might make your site insecure

February 11, 2018
17 comments Linux, Web development, Nginx

tl;dr; When you use add_header in a location block in Nginx, it undoes all "parent" add_header directives. Dangerous!

Gist of the problem is this:

There could be several add_header directives. These directives are inherited from the previous level if and only if there are no add_header directives defined on the current level.

From the documentation on add_header

The grand but subtle mistake

Basically, I had this:

server {
    server_name example.com;

    ...gzip...
    ...ssl...
    ...root...

    # Great security headers...
    add_header X-Frame-Options SAMEORIGIN;
    add_header X-XSS-Protection "1; mode=block";
    ...more security headers...

    location / {
        try_files    $uri /index.html;
    }
}

And when you curl it, you can see that it works:

$ curl -I https://example.com
[snip]
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Strict-Transport-Security: max-age=63072000; includeSubdomains; preload

The mistake I had, was that I added a new add_header inside a relevant location block. If you do that, all the other "global" add_headers are dropped.
E.g.

server {
    server_name example.com;

    ...gzip...
    ...ssl...
    ...root...

    # Great security headers...
    add_header X-Frame-Options SAMEORIGIN;
    add_header X-XSS-Protection "1; mode=block";
    ...more security headers...

    location / {
        try_files    $uri /index.html;
        # NOTE! Adding some more headers here
+       add_header X-debug-whats-going-on on; 
    }
}

Now, same curl command:

$ curl -I https://example.com
[snip]
X-debug-whats-going-on: on

Bad score on Observatory for www.peterbe.com
Yikes! Now those other useful security headers are gone!

Here are your options:

  1. Don't add headers like that inside location blocks. Yeah, that's not always a choice.
  2. Copy-n-paste all the general security add_header blocks into the location blocks where you have to have "custom" add_header entries.
  3. Use an include file, see below.

How to include files

First create a new file, like /etc/nginx/snippets/general-security-headers.conf then put this into it:

# Great security headers...
add_header X-Frame-Options SAMEORIGIN;
add_header X-XSS-Protection "1; mode=block";
...more security headers...
# More realistically, see https://gist.github.com/plentz/6737338

Now, instead of saying these add_header lines in your /etc/nginx/sites-enabled/example.conf change that to:

server {
    server_name example.com;

    ...gzip...
    ...ssl...
    ...root...

    include /etc/nginx/snippets/general-security-headers.conf;

    location / {
        try_files    $uri /index.html;
        # Note! This gets included *again* because
        # this location block needs its own custom add_header
        # directives.
        include /etc/nginx/snippets/general-security-headers.conf;
        # NOTE! Adding some more headers here
        add_header X-debug-whats-going-on on; 
    }
}

(You need to use your imagination that a real Nginx config site probably has many different more complex location directives)

It's arguably a bit clunky but it works and it's the best of both worlds. The right security headers for all locations and ability to set custom add_header directives for specific locations.

Discussion

I'm most disappointed in myself for not noticing. Not for not noticing this in the Nginx documentation, but that I didn't check my security headers on more than one path. But I'm also quite disappointed in Nginx for this rather odd behaviour. To quote my security engineer at Mozilla, April King:

"add" doesn't usually mean "subtract everything else"

She agreed with me that the way it works is counter-intuitive and showed me this snippet which uses include files the same way.

Previous page
Next page