Filtered by JavaScript, Python

Page 15

Reset

The ideal number of workers in Jest

October 8, 2018
0 comments Python, React

tl;dr; Use --runInBand when running jest in CI and use --maxWorkers=3 on your laptop.

Running out of memory on CircleCI

We have a test suite that covers 236 tests across 68 suites and runs mainly a bunch of enzyme rendering of React component but also some plain old JavaScript function tests. We hit a problem where tests utterly failed in CircleCI due to running out of memory. Several individual tests, before it gave up or failed, reported to take up to 45 seconds.
Turns out, jest tried to use 36 workers because the Docker OS it was running was reporting 36 CPUs.


> circleci@9e4c489cf76b:~/repo$ node
> var os = require('os')
undefined
> os.cpus().length
36

After forcibly setting --maxWorkers=2 to the jest command, the tests passed and it took 20 seconds. Yay!

But that got me thinking, what is the ideal number of workers when I'm running the suite here on my laptop? To find out, I wrote a Python script that would wrap the call CI=true yarn run test --maxWorkers=%(WORKERS) repeatedly and report which number is ideal for my laptop.

After leaving it running for a while it spits out this result:

SORTED BY BEST TIME:
3 8.47s
4 8.59s
6 9.12s
5 9.18s
2 9.51s
7 10.14s
8 10.59s
1 13.80s

The conclusion is vague. There is some benefit to using some small number greater than 1. If you attempt a bigger number it might backfire and take longer than necessary and if you do do that your laptop is likely to crawl and cough.

Notes and conclusions

Inline scripts in create-react-app 2.0 and CSP hashes

October 5, 2018
0 comments Web development, JavaScript, React

UPDATE (1)

My understanding of how to generate the CSP nonces was wrong. What I initially posted was a confusion between nonces and hashes. Sorry. The blog post has been updated to use hashing.

UPDATE (2)

Shortly after publishing this I changed my mind entirely. I decided I don't want any inline scripts no matter how small. Reasons are: 1) with HTTP2 it's cheap to send another file and thus that critical precious first HTML document becomes smaller and 2) when you load it as an external you have the power to load it async if it's applicable.

Check out this new script, it's hackish but works: uninline_scripts.js

UPDATE (Oct 18, 2018)

If you use INLINE_RUNTIME_CHUNK=false yarn run build no scripts, independent of size, are inlined. See this pull request for details.

END UPDATES

I have an app that is hosted on github-pages and because I can't control Content Security Policy HTTP headers I have to do it with a <meta http-equiv="Content-Security-Policy" content="${csp}"> tag in the HTML. That's working fine and the way I do it is that I have a script that looks like this:


#!/usr/bin/env node
const fs = require("fs");
const crypto = require("crypto");

const CSP_TEMPLATE = `
default-src 'none';
connect-src 'self' kinto.workon.app peterbecom.auth0.com;
frame-src peterbecom.auth0.com;
img-src 'self' avatars2.githubusercontent.com https://*.googleusercontent.com;
script-src 'self'%SCRIPT_HASHES%;
style-src 'self' 'unsafe-inline';
font-src 'self' data:;
manifest-src 'self'
`.trim();

const htmlFile = process.argv[2];
if (!htmlFile) throw new Error("missing file argument");
let html = fs.readFileSync(htmlFile, "utf8");

let hashes = "";
let csp = CSP_TEMPLATE;
const matches = html.match(/<script>.*<\/script>/g);
if (matches) {
  matches.forEach(scriptTag => {
    const hash = crypto.createHash("sha256");
    hash.update(scriptTag.replace(/<script>/, "").replace("</script>", ""));
    const digest = hash.digest("hex");
    hashes += ` 'sha256-${digest.toString("base64")}'`;
  });
}
csp = csp.replace(/%SCRIPT_HASHES%/, hashes);

const metatag = `
  <meta http-equiv="Content-Security-Policy" content="${csp}">
`
  .replace(/\n/g, "")
  .trim();
if (html.search(metatag) > -1)
  throw new Error("already has CSP metatag in HTML");
const anchor = '<meta charset="utf-8">';
const newHtml = html.replace(anchor, `${anchor}${metatag}`);
fs.writeFileSync(htmlFile, newHtml, "utf8");

Laugh all you like at my hurried node scripting but it works. It finds any <script>ANYTHING</script> tags (which means it disregards any <script src="... tags), calculates a sha256 hash string out of it and then puts that into the CSP block.

The output becomes something like this:


<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <meta 
      http-equiv="Content-Security-Policy" 
      content="default-src 'none';script-src 'self' 'sha256-bb84aa7f904e73495b9e99f08531053f3a86f3c1b2e232e3abbac252bf723f1f';">
  </head>
  <body>
    ...
    <script>....</script>
  </body>
</html>

I don't know if I've done it right but at least what didn't use to work now works; the page loads in my browsers now.

An awesome snippet to web performance test a page programmatically

October 1, 2018
0 comments Web development, JavaScript, Web Performance

I found this in an issue discussing measuring page performance with puppeteer and it's pure gold. Especially because it's so accessible and easy to use.

Here's the code:


const puppeteer = require('puppeteer');

async function run() {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();

  await page.goto('https://www.peterbe.com/');

  console.log('\n==== performance.getEntries() ====\n');
  console.log(
    await page.evaluate(() =>
      JSON.stringify(performance.getEntries(), null, '  ')
    )
  );

  console.log('\n==== performance.toJSON() ====\n');
  console.log(
    await page.evaluate(() => JSON.stringify(performance.toJSON(), null, '  '))
  );

  console.log('\n==== page.metrics() ====\n');
  const perf = await page.metrics();
  console.log(JSON.stringify(perf, null, '  '));

  browser.close();
}

run();

Network waterfall Google Chrome

When you run it you get this output: https://gist.github.com/peterbe/afb09bf9277e5fa9242f8d270c687640
To run it you need to have a decently up-to-date version of puppeteer installed.

I don't claim (far from it actually!) to understand all the metrics points in there but I believe this is basically what the Network panel in the Google Chrome Dev tools is built upon. But some details and facts are easy to figure out and use in your analysis. For example, the fact that the getEntries() lists all the resources that had to be downloaded in the order they were downloaded. Also, at the end of getEntries() you get the first-paint which is often a useful metric.

Anyway, give it a spin. Wrap this up in a platform and see if you can build something really simple and really tailored to your web projects web performance testing.

Merge two arrays without duplicates in JavaScript

September 20, 2018
0 comments JavaScript

Here's how you do it if you don't care about the order:


const array1 = [1, 2, 3];
const array2 = [2, 3, 4];
console.log([...new Set([...array1, ...array2])]);
// prints [1, 2, 3, 4]

It merges two arrays first. Then it creates a set out of that merged array and lastly convers the set back out to an array.

I searched for a solution and all I found was dated or wrong. This oneliner works and I'm using it to make it possible to add a list of product versions to another list and I don't want to mutate existing arrays because of React state stuff.

If you want to see the ES5 version, check out this Babel repl.

A darn good search filter function in JavaScript

September 12, 2018
8 comments Web development, JavaScript

Demo here. The demo uses React and a list of blog post titles that get immediately filtered when you type in a search. I.e. you have the whole list but show less when a search term is entered.

That the demo uses React isn't important. What's important is the search function. It looks like this:


function filterList(q, list) {
  function escapeRegExp(s) {
    return s.replace(/[-/\\^$*+?.()|[\]{}]/g, "\\$&");
  }
  const words = q
    .split(/\s+/g)
    .map(s => s.trim())
    .filter(s => !!s);
  const hasTrailingSpace = q.endsWith(" ");
  const searchRegex = new RegExp(
    words
      .map((word, i) => {
        if (i + 1 === words.length && !hasTrailingSpace) {
          // The last word - ok with the word being "startswith"-like
          return `(?=.*\\b${escapeRegExp(word)})`;
        } else {
          // Not the last word - expect the whole word exactly
          return `(?=.*\\b${escapeRegExp(word)}\\b)`;
        }
      })
      .join("") + ".+",
    "gi"
  );
  return list.filter(item => {
    return searchRegex.test(item.title);
  });
}

In action
I use this in a single-page content management app. There's a list of records and a search input. Every character you put into the search bar updates the list of records shown.

What it does is that it allows you to search texts based on multiple whole words. But the key feature is that the last word doesn't have to be whole. For example, it will positively match "This is a blog post about JavaScript" if the search is "post javascript" or "post javasc". But it won't match on "pos blog".

The idea is that if a user has typed in a full word followed by a space, all previous words needs to be matched fully. For example if the input is "java " it won't match on "This is a blog post about JavaScript" because the word java, alone, isn't in the search text.

Sure, there are different ways to write this but I think this functionality is good for this kind of filtering search. A different implementation would have a function that returns the regex and then it can be used both for filtering and for highlighting.

Hope it helps.

Replace an item in an array, by number, without mutation in JavaScript (ES6)

August 23, 2018
1 comment JavaScript

Suppose you have an array like this:


const items = ["B", "M", "X"];

And now you want to replace that second item ("J" instead of "M") and suppose that you already know its position as opposed to finding its position by doing an Array.prototype.find.

Here's how you do it:


const index = 1;
const replacementItem = "J";

const newArray = Object.assign([], items, {[index]: replacementItem});

console.log(items); // ["B", "M", "X"]
console.log(newArray); //  ["B", "J", "X"]

Wasn't immediately obvious to me but writing it down will help me remember.

UPDATE

There's a much faster way and that's to use slice and it actually looks nicer too:


function replaceAt(array, index, value) {
  const ret = array.slice(0);
  ret[index] = value;
  return ret;
}
const newArray = replaceAt(items, index, "J");

See this codepen.

UPDATE (Feb 2019)

Here's a more powerful solution that uses Immer. It looks like this:


const items = ["B", "M", "X"];
const index = 1;
const replacementItem = "J";

const newArray = immer.produce(items, draft => {
  draft[index] = "J";
});

console.log(items); // ["B", "M", "X"]
console.log(newArray); //  ["B", "J", "X"]

Codepen

See this codepen.

It's more "powerful", because, if the original array (that you don't want to mutate) contains items that are mutable, you don't want to actually mutate them. This codepen demonstrates that subtlety. And this codepen demonstrates how to solve that with Immer.

django-pipeline and Zopfli

August 15, 2018
0 comments Python, Web development, Django

tl;dr; I wrote my own extension to django-pipeline that uses Zopfli to create .gz files from static assets collected in Django. Here's the code.

Nginx and Gzip

What I wanted was to continue to use django-pipeline which does a great job of reading a settings.BUNDLES setting and generating things like /static/js/myapp.min.a206ec6bd8c7.js. It has configurable options to not just make those files but also generate /static/js/myapp.min.a206ec6bd8c7.js.gz which means that with gzip_static in Nginx, Nginx doesn't have to Gzip compress static files on-the-fly but can basically just read it from disk. Nginx doesn't care how the file got there but an immediate advantage of preparing the file on disk is that the compression can be higher (smaller .gz files). That means smaller responses to be sent to the client and less CPU work needed from Nginx. Your job is to set gzip_static on; in your Nginx config (per location) and make sure every compressable file exists on disk with the same name but with the .gz suffix.

In other words, when the client does GET https://example.com/static/foo.js Nginx quickly does a read on the file system to see if there exists a ROOT/static/foo.js.gz and if so, return that. If the files doesn't exist, and you have gzip on; in your config, Nginx will read the ROOT/static/foo.js into memory, compress it (usually with a lower compression level) and return that. Nginx takes care of figuring out whether to do this, at all, dynamically by reading the Accept-Encoding header from the request.

Zopfli

The best solution today to generate these .gz files is Zopfli. Zopfli is slower than good old regular gzip but the files get smaller. To manually compress a file you can install the zopfli executable (e.g. brew install zopfli or apt install zopfli) and then run zopfli $ROOT/static/foo.js which creates a $ROOT/static/foo.js.gz file.

So your task is to build some pipelining code that generates .gz version of every static file your Django server creates.
At first I tried django-static-compress which has an extension to regular Django staticfiles storage. The default staticfiles storage is django.contrib.staticfiles.storage.StaticFilesStorage and that's what django-static-compress extends.

But I wanted more. I wanted all the good bits from django-pipeline (minification, hashes in filenames, concatenation, etc.) Also, in django-static-compress you can't control the parameters to zopfli such as the number of iterations. And with django-static-compress you have to install Brotli which I can't use because I don't want to compile my own Nginx.

Solution

So I wrote my own little mashup. I took some ideas from how django-pipeline does regular gzip compression as a post-process step. And in my case, I never want to bother with any of the other files that are put into the settings.STATIC_ROOT directory from the collectstatic command.

Here's my implementation: peterbecom.storage.ZopfliPipelineCachedStorage. Check it out. It's very tailored to my personal preferences and usecase but it works great. To use it, I have this in my settings.py: STATICFILES_STORAGE = "peterbecom.storage.ZopfliPipelineCachedStorage"

I know what you're thinking

Why not try to get this into django-pipeline or into django-compress-static. The answer is frankly laziness. Hopefully someone else can pick up this task. I have fewer and fewer projects where I use Django to handle static files. These days most of my projects are single-page-apps that are 100% static and using Django for XHR requests to get the data.

Django lock decorator with django-redis

August 14, 2018
4 comments Python, Web development, Django, Redis

Here's the code. It's quick-n-dirty but it works wonderfully:


import functools
import hashlib

from django.core.cache import cache
from django.utils.encoding import force_bytes


def lock_decorator(key_maker=None):
    """
    When you want to lock a function from more than 1 call at a time.
    """

    def decorator(func):
        @functools.wraps(func)
        def inner(*args, **kwargs):
            if key_maker:
                key = key_maker(*args, **kwargs)
            else:
                key = str(args) + str(kwargs)
            lock_key = hashlib.md5(force_bytes(key)).hexdigest()
            with cache.lock(lock_key):
                return func(*args, **kwargs)

        return inner

    return decorator

How To Use It

This has saved my bacon more than once. I use it on functions that really need to be made synchronous. For example, suppose you have a function like this:


def fetch_remote_thing(name):
    try:
        return Thing.objects.get(name=name).result
    except Thing.DoesNotExist:
        # Need to go out and fetch this
        result = some_internet_fetching(name)  # Assume this is sloooow
        Thing.objects.create(name=name, result=result)
        return result

That function is quite dangerous because if executed by two concurrent web requests for example, they will trigger
two "identical" calls to some_internet_fetching and if the database didn't have the name already, it will most likely trigger two calls to Thing.objects.create(name=name, ...) which could lead to integrity errors or if it doesn't the whole function breaks down because it assumes that there is only 1 or 0 of these Thing records.

Easy to solve, just add the lock_decorator:


@lock_decorator()
def fetch_remote_thing(name):
    try:
        return Thing.objects.get(name=name).result
    except Thing.DoesNotExist:
        # Need to go out and fetch this
        result = some_internet_fetching(name)  # Assume this is sloooow
        Thing.objects.create(name=name, result=result)
        return result

Now, thanks to Redis distributed locks, the function is always allowed to finish before it starts another one. All the hairy locking (in particular, the waiting) is implemented deep down in Redis which is rock solid.

Bonus Usage

Another use that has also saved my bacon is functions that aren't necessarily called with the same input argument but each call is so resource intensive that you want to make sure it only does one of these at a time. Suppose you have a Django view function that does some resource intensive work and you want to stagger the calls so that it only runs it one at a time. Like this for example:


def api_stats_calculations(request, part):
    if part == 'users-per-month':
        data = _calculate_users_per_month()  # expensive
    elif part == 'pageviews-per-week':
        data = _calculate_pageviews_per_week()  # intensive
    elif part == 'downloads-per-day':
        data = _calculate_download_per_day()  # slow
    elif you == 'get' and the == 'idea':
        ...

    return http.JsonResponse({'data': data})

If you just put @lock_decorator() on this Django view function, and you have some (almost) concurrent calls to this function, for example from a uWSGI server running with threads and multiple processes, then it will not synchronize the calls.

The solution to this is to write your own function for generating the lock key, like this for example:


@lock_decorator(
    key_maker=lamnbda request, part: 'api_stats_calculations'
)
def api_stats_calculations(request, part):
    if part == 'users-per-month':
        data = _calculate_users_per_month()  # expensive
    elif part == 'pageviews-per-week':
        data = _calculate_pageviews_per_week()  # intensive
    elif part == 'downloads-per-day':
        data = _calculate_download_per_day()  # slow
    elif you == 'get' and the == 'idea':
        ...

    return http.JsonResponse({'data': data})

Now it works.

How Time-Expensive Is It?

Perhaps you worry that 99% of your calls to the function don't have the problem of calling the function concurrently. How much is this overhead of this lock costing you? I wondered that too and set up a simple stress test where I wrote a really simple Django view function. It looked something like this:


@lock_decorator(key_maker=lambda request: 'samekey')
def sample_view_function(request):
    return http.HttpResponse('Ok\n')

I started a Django server with uWSGI with multiple processors and threads enabled. Then I bombarded this function with a simple concurrent stress test and observed the requests per minute. The cost was extremely tiny and almost negligable (compared to not using the lock decorator). Granted, in this test I used Redis on redis://localhost:6379/0 but generally the conclusion was that the call is extremely fast and not something to worry too much about. But your mileage may vary so do your own experiments for your context.

What's Needed

You need to use django-redis as your Django cache backend. I've blogged before about using django-redis, for example Fastest cache backend possible for Django and Fastest Redis configuration for Django.

django-html-validator now supports Django 2.x

August 13, 2018
0 comments Python, Web development, Django

django-html-validator is a Django project that can validate your generated HTML. It does so by sending the HTML to https://html5.validator.nu/ or you can start your own Java server locally with vnu.jar from here. The output is that you can have validation errors printed to stdout or you can have them put as .txt files in a temporary directory. You can also include it in your test suite and make it so that tests fail if invalid HTML is generated during rendering in Django unit tests.

The project seems to have become a lot more popular than I thought it would. It started as a one-evening-hack and because there was interest I wrapped it up in a proper project with "docs" and set up CI for future contributions.

I kinda of forgot the project since almost all my current projects generate JSON on the server and generates the DOM on-the-fly with client-side JavaScript but apparently a lot of issues and PRs were filed related to making it work in Django 2.x. So I took the time last night to tidy up the tox.ini etc. and the necessary compatibility fixes to make it work with but Django 1.8 up to Django 2.1. Pull request here.

Thank you all who contributed! I'll try to make a better job noticing filed issues in the future.

Quick dog-piling (aka stampeding herd) URL stresstest

August 10, 2018
0 comments Python

Whenever you want to quickly bombard a URL with some concurrent traffic, you can use this:


import random
import time
import requests
import concurrent.futures


def _get_size(url):
    sleep = random.random() / 10
    # print("sleep", sleep)
    time.sleep(sleep)
    r = requests.get(url)
    # print(r.status_code)
    assert len(r.text)
    return len(r.text)


def run(url, times=10):
    sizes = []
    futures = []
    with concurrent.futures.ThreadPoolExecutor() as executor:
        for _ in range(times):
            futures.append(executor.submit(_get_size, url))
        for future in concurrent.futures.as_completed(futures):
            sizes.append(future.result())
    return sizes


if __name__ == "__main__":
    import sys

    print(run(sys.argv[1]))

It's really basic but it works wonderfully. It starts 10 concurrent threads that all hit the same URL at almost the same time.
I've been using this stress test a local Django server to test some atomicity writes with the file system.