Filtered by JavaScript, Python

Page 13

Reset

Displaying fetch() errors and unwanted responses in React

February 6, 2019
0 comments Web development, React, JavaScript

tl;dr; You can use error instanceof window.Response to distinguish between fetch exceptions and fetch responses.

When you do something like...


const response = await fetch(URL);

...two bad things can happen.

  1. The XHR request fails entirely. I.e. there's not even a response with a HTTP status code.
  2. The response "worked" but the HTTP status code was not to your liking.

Either way, your React app needs to deal with this. Ideally in a not-too-clunky way. So here is one take on this challenge/opportunity which I hope can inspire you to extend it the way you need it to go.

The trick is to "clump" exceptions with responses. Then you can do this:


function ShowServerError({ error }) {
  if (!error) {
    return null;
  }
  return (
    <div className="alert">
      <h3>Server Error</h3>
      {error instanceof window.Response ? (
        <p>
          <b>{error.status}</b> on <b>{error.url}</b>
          <br />
          <small>{error.statusText}</small>
        </p>
      ) : (
        <p>
          <code>{error.toString()}</code>
        </p>
      )}
    </div>
  );
}

The greatest trick the devil ever pulled was to use if (error instanceof window.Reponse) {. Then you know that error thing is the outcome of THIS = await fetch(URL) (or fetch(URL).then(THIS) if you prefer). Another good trick the devil pulled was to be aware that exceptions, when asked to render in React does not naturally call its .toString() so you have to do that yourself with {error.toString()}.

This codesandbox demonstrates it quite well. (Although, at the time of writing, codesandbox will spew warnings related to testing React components in the console log. Ignore that.)

If you can't open that codesandbox, here's the gist of it:


React.useEffect(() => {
  url &&
    (async () => {
      let response;
      try {
        response = await fetch(url);
      } catch (ex) {
        return setServerError(ex);
      }
      if (!response.ok) {
        return setServerError(response);
      }
      // do something here with `await response.json()`
    })(url);
}, [url]);

By the way, another important trick is to be subtle with how you put the try { and } catch(ex) {.


// DON'T DO THIS

try {
  const response = await fetch(url);
  if (!response.ok) {
    setServerError(response);
  }
  // do something here with `await response.json()`
} catch (ex) {
  setServerError(ex);
}

Instead...


// DO THIS

let response;
try {
  response = await fetch(url);
} catch (ex) {
  return setServerError(ex);
}
if (!response.ok) {
  return setServerError(response);
}
// do something here with `await response.json()`

If you don't do that you risk catching other exceptions that aren't exclusively the fetch() call. Also, notice the use of return inside the catch block which will exit the function early leaving you the rest of the code (de-dented 1 level) to deal with the happy-path response object.

Be aware that the test if (!response.ok) is simplistic. It's just a shorthand for checking if the "status in the range 200 to 299, inclusive". Realistically getting a response.status === 400 isn't an "error" really. It might just be a validation error hint from a server, and likely the await response.json() will work and contain useful information. No need to throw up a toast or a flash message that the communication with the server failed.

Conclusion

The details matter. You might want to deal with exceptions entirely differently from successful responses with bad HTTP status codes. It's nevertheless important to appreciate two things:

  1. Handle complete fetch() failures and feed your UI or your retry mechanisms.

  2. You can, in one component distinguish between a "successful" fetch() call and thrown JavaScript exceptions.

Format thousands in Python

February 1, 2019
9 comments Python

tl;dr; Use f"{number:,}" to thousands format an integer to a string.

I keep forgetting and having to look this up every time. Hopefully by blogging about it, this time it'll stick in my memory. And hopefully in yours too :)

Suppose you have a number, like 1234567890 and you want to display it, here's how you do it:


>>> number = 1234567890
>>> f"{number:,}"
'1,234,567,890'

In the past, before Python 3.6, I've been using:


>>> number = 1234567890
>>> format(number, ",")
'1,234,567,890'

All of this and more detail can be found in PEP 378 -- Format Specifier for Thousands Separator. For example, you can do this beast too:


>>> number = 1234567890
>>> f"{number:020,.2f}"
'0,001,234,567,890.00'

which demonstrates (1) how to do zero-padding (of length 20), (2) the thousands comma, (3) round to 2 significant figures. All useful weapons to be able to draw from the top of your head.

UPDATE

Also, incredibly useful is the equivalent of somestring.ljust(10):


>>> mystr = "peter"
>>> f"{mystr:10}"
'peter     '
>>> f"{mystr:>10}"
'     peter'

hashin 0.14.5 and canonical pip hashes

January 31, 2019
0 comments Python

Prior to version 0.14.5 hashin would write write down the hashes of PyPI packages in the order they appear in PyPI's JSON response. That means there's a slight chance that two distinct clients/computers/humans might actually get different output when then run hashin Django==2.1.5.

The pull request has a pretty hefty explanation as it demonstrates the fix.

Do note that if the existing order of hashes in a requirements file is not in the "right" order, hashin won't correct it unless any of the hashes are different.

Thanks @SomberNight for patiently pushing for this.

variable_cache_control - Django view decorator to set max_age in runtime

January 22, 2019
0 comments Django, Python

tl;dr; If you use the django.views.decorators.cache.cache_control decorator, consider this one instead to change the max_age depending on the request.

I had/have a Django view function that looks something like this:


@cache_control(public=True, max_age=60 * 60)
def home(request, oc=None, page=1):
    ...

But, that number 60 * 60 I really needed it to be different depending on the request parameters. For example, that oc=None, if that's not None I know the page's Cache-Control header can and should be different.

So I wrote this decorator:


from django.utils.cache import patch_cache_control


def variable_cache_control(**kwargs):
    """Same as django.views.decorators.cache.cache_control except this one will
    allow the `max_age` parameter be a callable.
    """

    def _cache_controller(viewfunc):
        @functools.wraps(viewfunc)
        def _cache_controlled(request, *args, **kw):
            response = viewfunc(request, *args, **kw)
            copied = kwargs
            if kwargs.get("max_age") and callable(kwargs["max_age"]):
                max_age = kwargs["max_age"](request, *args, **kw)
                # Can't re-use, have to create a shallow clone.
                copied = dict(kwargs, max_age=max_age)
            patch_cache_control(response, **copied)
            return response

        return _cache_controlled

    return _cache_controller

Now, I can do this instead:


def _best_max_age(req, oc=None, **kwargs):
    max_age = 60 * 60
    if oc:
        max_age *= 10
    return max_age

@variable_cache_control(public=True, max_age=_best_max_age)
def home(request, oc=None, page=1):
    ...

I hope it inspires.

An example of using Immer to handle nested objects in React state

January 18, 2019
1 comment React, JavaScript

When Immer first came out I was confused. I kinda understood what I was reading but I couldn't really see what was so great about it. As always, nothing beats actual code you type yourself to experience how something works.

Here is, I believe, a great example: https://codesandbox.io/s/y2m399pw31

If you're reading this on your mobile it might be hard to see what it does. Basically, it's a very simple React app that displays a "todo list like" thing. The state (aka. this.state.tasks) is a pure JavaScript array. The React components that display the data (e.g. <List tasks={this.state.tasks}/> and <ShowItem item={item} />) are pure (i.e. extends React.PureComponent) meaning React natively protects from re-rendering a component when the props haven't changed. So no wasted render-cycles.

What Immer does is that it helps mutate an object in a smart way. I'm sure you've heard that you're never supposed to mutate state objects (arrays are a form of mutable objects too!) and instead do things like const stuff = Object.assign({}, this.state.stuff); or const things = this.state.things.slice(0);. However, those things are shallow copies meaning any mutable objects within (i.e. nested objects) don't get the clone treatment and can thus cause problems with not re-rendering when they should.

Here's the core gist:


import React from "react";
import produce from "immer";

class App extends React.Component {
  state = {
    tasks: [[false, { text: "Do something", date: new Date() }]]
  };
  onToggleDone = (i, done) => {
    // Immer
    // This is what the blog post is all about...
    const tasks = produce(this.state.tasks, draft => {
      draft[i][0] = done;
      draft[i][1].date = new Date();
    });

    // Pure JS
    // Don't do this!
    // const tasks = this.state.tasks.slice(0);
    // tasks[i][0] = done;
    // tasks[i][1].date = new Date();

    this.setState({ tasks });
  };
  render() {
    // appreviated, but...
    return <List tasks={this.state.tasks}/>
  }
}

class List extends React.PureComponent {
   ...

It just works. Neat!

By the way, here's a code sandbox that accomplishes the same thing but with ImmutableJS which I think is uglier. I think it's uglier because now the rendering components need to be aware that it's rendering immutable.Map objects instead.

Caveats

  1. The cost of doing what immer.produce isn't free. It's some smart work that needs to be done. But the alternative is to deep clone the object which is going to be much slower. Immer isn't the fastest kid on the block but unlike MobX and ImmutableJS once you've done this smart stuff you're back to plain JavaScript objects.

  2. Careful with doing something like console.log(draft) since it will raise a TypeError in your web console. Just be aware of that or use console.log(JSON.stringify(draft)) instead.

  3. If you know with confidence that your mutable object does not, and will not, have nested mutable objects you can use object spread, Object.assign(), or .slice(0) and save yourself the trouble of another dependency.

Use vars() to send an argparse Namespace into a function in Python

January 8, 2019
1 comment Python

I only just learned about this today after all these years and thought you might like it too.

The trick is to conveniently turn an argparse.Namespace into keyword arguments that you can send to a function. This is the old/wrong way I've been doing it for years:


# THE OLD WAY

def main(things, option_a, option_n):
    print(locals())  # Debugging 


import argparse

parser = argparse.ArgumentParser()
parser.add_argument("things", help="Bla bla", nargs="*")
parser.add_argument("-o", "--option-a", help="Bla bla", default="Op A")
parser.add_argument("-n", "--option-n", help="Ble ble", default="Op N")
args = parser.parse_args()

main(
    things=args.things,
    option_a=args.option_a,
    option_n=args.option_n
)

That works but the tedious thing is to have to have spell out every single argument, twice!, when sending the argparse Namespace into the function. Here's the much moar betterest way:


# THE NEW WAY

def main(things, option_a, option_n):
    print(locals())  # Debugging 


import argparse

parser = argparse.ArgumentParser()
parser.add_argument("things", help="Bla bla", nargs="*")
parser.add_argument("-o", "--option-a", help="Bla bla", default="Op A")
parser.add_argument("-n", "--option-n", help="Ble ble", default="Op N")
args = parser.parse_args()

# The only difference and the magic sauce...
main(**vars(args))  

What's neat about this is that you don't have to type up every argument defined in the parser to the get it as arguments into a function. And as a bonus, Python will name match keyword arguments to arguments so the order doesn't matter.

Caveat! This "trick" assumes that the arguments in the parser match the arguments in the function. So if the main() function takes an argument called foo_bar you have to have an argument in the parser called --foo-bar.

Number.prototype.toString() is incredibly useful to display numbers

January 4, 2019
0 comments JavaScript

tl;dr; Use Number.prototype.toString() to display percentages that might be floating point numbers.

10% entered
I started writing a complicated solution but as I discovered corner cases and surprised I was brutally forced to do some research and actually read some documentation. Turns out Number.prototype.toString(), with the precision argument omitted, is the ideal solution.

The application I was working on has an input field to type in a percentage. I.e. a number between 0 and 100. But whatever the user types in, we store the number in decimal. So, if the user typed in "10" into the input widget, we actually store it as 0.1 in the database. Most people will type in a whole number (aka. an integer) like "12" or "5" but some people actually need more precision so they might type in "0.2%" which means 0.002 stored in the backend database.

But the widget is a React controlled component meaning it's value prop needs to be potentially formatted to what gives the best user experience. If the user types in whole numbers set the value prop to a whole number. If the user types in floating point numbers set the value prop type a floating point number with the "matching formatting".

0.12% entered
I started writing an overly complicated function that tries to figure out how many decimal-points the user typed in. For example 0.123 is 3 because parseInt(0.123 * 10 ** 3, 10) === 0.123 * 10 ** 3. But, that approach doesn't work because of floating point arithmetic and the rounding problem. For example 103441 !== 10.3441 * (10 ** 4) === 103440.99999999999. So, don't look for a number to pass into .toFixed().

Turns out Number.prototype.toString() is all you need. If you omit the precision argument, it figures out how many significant digits to use based on the input. It's best explained with some examples:

> (33).toString()
"33"
> (33.3).toString()
"33.3"
> (33.10000).toString()
"33.1"
> (10.3441).toString()
"10.3441"

Perfect!

Next level stuff

So actually, it's a bit more complicated than that. You see, the number stored in the backend database might be 0.007 which you and I know as "0.7%" but be warned:

> 0.008 * 100
0.8
> 0.007 * 100
0.7000000000000001

You know, because of floating-point arithmetic, which every high-level software engineer remembers understanding one time years ago but now know just to watch out for.

So if you use the toString() on that you'd get...

> var backendPercentage = 0.007
> (100 * backendPercentage).toString() + '%'
"0.700000000000001%"

Ouch! So how to solve that? Use Math.round(number * 100) / 100 to get rid of those rounding errors. Apparently, it's very fast too. So, now combine this with the toString():

> var backendPercentage = 0.007
> (Math.round(100 * backendPercentage * 100) / 100).toString() + '%'
"0.7%"

Perfect!

Concurrent download with hashin without --update-all

December 18, 2018
0 comments Web development, Python

Last week, I landed concurrent downloads in hashin. The example was that you do something like...


$ time hashin -r some/requirements.txt --update-all

...and the whole thing takes ~2 seconds even though it that some/requirements.txt file might contain 50 different packages, and thus 50 different PyPI.org lookups.

Just wanted to point out, this is not unique to use with --update-all. It's for any list of packages. And I want to put some better numbers on that so here goes...

Suppose you want to create a requirements file for every package in the current virtualenv you might do it like this:


# the -e filtering removes locally installed packages from git URLs
$ pip freeze | grep -v '-e ' | xargs hashin -r /tmp/reqs.txt

Before running that I injected a little timer on each pypi.org download. It looked like this:


def get_package_data(package, verbose=False):
    url = "https://pypi.org/pypi/%s/json" % package
    if verbose:
        print(url)
+   t0 = time.time()
    content = json.loads(_download(url))
    if "releases" not in content:
        raise PackageError("package JSON is not sane")
+   t1 = time.time()
+   print(t1 - t0)

I also put a print around the call to pre_download_packages(lookup_memory, specs, verbose=verbose) to see what the "total time" was.

The output looked like this:

▶ pip freeze | grep -v '-e ' | xargs python hashin.py -r /tmp/reqs.txt
0.22896194458007812
0.2900810241699219
0.2814369201660156
0.22658205032348633
0.24882292747497559
0.268247127532959
0.29332590103149414
0.23981380462646484
0.2930259704589844
0.29442572593688965
0.25312376022338867
0.34232664108276367
0.49491214752197266
0.23823285102844238
0.3221290111541748
0.28302812576293945
0.567702054977417
0.3089122772216797
0.5273139476776123
0.31477880477905273
0.6202089786529541
0.28571176528930664
0.24558186531066895
0.5810830593109131
0.5219211578369141
0.23252081871032715
0.4650228023529053
0.6127192974090576
0.6000659465789795
0.30976200103759766
0.44440698623657227
0.3135409355163574
0.638585090637207
0.297544002532959
0.6462509632110596
0.45389699935913086
0.34597206115722656
0.3462028503417969
0.6250648498535156
0.44159507751464844
0.5733060836791992
0.6739277839660645
0.6560370922088623
SUM TOTAL TOOK 0.8481268882751465

If you sum up all the individual times it would have become 17.3 seconds. It's 43 individual packages and 8 CPUs multiplied by 5 means it had to wait with some before downloading the rest.

Clearly, this works nicely.

hashin 0.14.0 with --update-all and a bunch of other features

November 13, 2018
0 comments Python, Linux

If you don't know it is, hashin is a Python command line tool for updating your requirements file's packages and their hashes for use with pip install. It takes the pain out of figuring out what hashes each package on PyPI has. It also takes the pain out of figuring out what version you can upgrade to.

In the 0.14.0 release (changelog) there are a bunch of new features. The most exciting one is --update-all. Let's go through some of the new features:

Update all (--update-all)

Suppose you want to bravely upgrade all the pinned packages to the latest and greatest. Before version 0.14.0 you'd have to manually open the requirements file and list every single package on the command line:


$ less requirements.txt
$ hashin Django requests Flask cryptography black nltk numpy

With --update-all it's the same thing except it does that reading and copy-n-paste for you:


$ hashin --update-all

Particularly nifty is to combine this with --dry-run if you get nervous about that many changes.

Interactive update all (--interactive)

This new flag only makes sense when used together with --update-all. Used together, it basically reads all packages in the requirements file, and for each one that there is a new version it asks you if you want to update it or skip it:

It looks like this:


$ hashin --update-all --interactive
PACKAGE                        YOUR VERSION    NEW VERSION
Django                         2.1.2           2.1.3           ✓
requests                       2.20.0          2.20.1          ✘
numpy                          1.15.2          1.15.4          ?
Update? [Y/n/a/q/?]:

You can also use the aliases hashin -u -i to do the same thing.

Support for "extras"

If you want to have requests[security] or markus[datadog] in your requirements file, hashin used to not support that. This now works:


$ hashin "requests[security]"

Before, it would look for a package called verbatim requests[security] on PyPI which obviously doesn't exist. Now, it parses that syntax, makes a lookup for requests and when it's done it puts the extra syntax back into the requirements file.

Thanks Dustin Ingram for pushing for this one!

Atomic writes

Prior to this version, if you typed hashin requests flask numpy nltkay it would go ahead and do one of those packages at a time and effectively open and edit the requirements file as many times as there are packages mentioned. The crux of that is that if you, for example, have a typo (e.g. nltkay instead of nltk) it would crash there and not roll back any of the other writes. It's not a huge harm but it certainly is counter intuitive.

Another place where this matters is with --dry-run. If you specified something like hashin --dry-run requests flask numpy you would get one diff per package and thus repeat the diff header 3 (excessive) times.

The other reason why atomic writes is important is if you use hashin --update-all --interactive and it asks you if you want to update package1, package2, package3, and then you decide "Nah. I don't want any of this. I quit!" it would just do that without updating the requirements file.

Better not-found errors

This was never a problem if you used Python 2.7 but for Python 3.x, if you typoed a package name you'd get a Python exception about the HTTP call and it wasn't obvious that the mistake lies with your input and not the network. Basically, it traps any HTTP errors and if it's 404 it's handled gracefully.

(Internal) Black everything and pytest everything

All source code is now formatted with Black which, albeit imperfect, kills any boring manual review of code style nits. And, it uses therapist to wrap the black checks and fixes.

And all unit tests are now written for pytest. pytest was already the tool used in TravisCI but now all of those self.assertEqual(foo, bar)s have been replaced with assert foo == bar.

How to JSON schema validate 10x (or 100x) faster in Python

November 4, 2018
9 comments Python

This is perhaps insanely obvious but it was a measurement I had to do and it might help you too if you use python-jsonschema a lot too.

I have this project which has a migration script that needs to transfer about 1M records from one PostgreSQL database, transform it a bit, validate it, and store it in another PostgreSQL database. The validation step was done like this:


from jsonschema import validate

...

with open(os.path.join(settings.BASE_DIR, "schema.yaml")) as f:
    SCHEMA = yaml.load(f)["schema"]

...


class Build(models.Model):

    ...

    @classmethod
    def validate_build(cls, build):
        validate(build, SCHEMA)

That works fine when you have a slow trickle of these coming in with many seconds or minutes apart. But when you have to do about 1M of them, the speed overhead starts to really matter. Granted, in this context, it's just a migration which is hopefully only done once but it helps that it doesn't take too long since it makes it easier to not have any downtime.

What about python-fastjsonschema?

The name python-fastjsonschema just sounds very appealing but I'm just not sure how mature it is or what the subtle differences are between that and the more established python-jsonschema which I was already using.

It has two ways of using it either...


fastjsonschema.validate(schema, data)

...or...


validator = fastjsonschema.compile(schema)
validator(data)

That got me thinking, why don't I just do that with regular python-jsonschema!
All you need to do is crack open the validate function and you can now re-used one instance for multiple pieces of data:


from jsonschema.validators import validator_for


klass = validator_for(schema)
klass.check_schema(schema)  # optional
instance = klass(SCHEMA)
instance.validate(data)

I rewrote my projects code to this:


from jsonschema import validate

...

with open(os.path.join(settings.BASE_DIR, "schema.yaml")) as f:
    SCHEMA = yaml.load(f)["schema"]
_validator_class = validator_for(SCHEMA)
_validator_class.check_schema(SCHEMA)
validator = _validator_class(SCHEMA)

...


class Build(models.Model):

    ...

    @classmethod
    def validate_build(cls, build):
        validator.validate(build)

How do they compare, performance-wise?

Let this simple benchmark code speak for itself:


from buildhub.main.models import Build, SCHEMA

import fastjsonschema
from jsonschema import validate, ValidationError
from jsonschema.validators import validator_for


def f1(qs):
    for build in qs:
        validate(build.build, SCHEMA)


def f2(qs):
    validator = validator_for(SCHEMA)
    for build in qs:
        validate(build.build, SCHEMA, cls=validator)


def f3(qs):
    cls = validator_for(SCHEMA)
    cls.check_schema(SCHEMA)
    instance = cls(SCHEMA)
    for build in qs:
        instance.validate(build.build)


def f4(qs):
    for build in qs:
        fastjsonschema.validate(SCHEMA, build.build)


def f5(qs):
    validator = fastjsonschema.compile(SCHEMA)
    for build in qs:
        validator(build.build)


# Reporting
import time
import statistics
import random

functions = f1, f2, f3, f4, f5
times = {f.__name__: [] for f in functions}


for _ in range(3):
    qs = list(Build.objects.all().order_by("?")[:1000])
    for func in functions:
        t0 = time.time()
        func(qs)
        t1 = time.time()
        times[func.__name__].append((t1 - t0) * 1000)


def f(ms):
    return f"{ms:.1f}ms"


for name, numbers in times.items():
    print("FUNCTION:", name, "Used", len(numbers), "times")
    print("\tBEST  ", f(min(numbers)))
    print("\tMEDIAN", f(statistics.median(numbers)))
    print("\tMEAN  ", f(statistics.mean(numbers)))
    print("\tSTDEV ", f(statistics.stdev(numbers)))

Basically, 3 times for each of the alternative implementations, do a validation on a 1,000 JSON blobs (technically Python dicts) that is around 1KB, each, in size.

The results:

FUNCTION: f1 Used 3 times
    BEST   1247.9ms
    MEDIAN 1309.0ms
    MEAN   1330.0ms
    STDEV  94.5ms
FUNCTION: f2 Used 3 times
    BEST   1266.3ms
    MEDIAN 1267.5ms
    MEAN   1301.1ms
    STDEV  59.2ms
FUNCTION: f3 Used 3 times
    BEST   125.5ms
    MEDIAN 131.1ms
    MEAN   133.9ms
    STDEV  10.1ms
FUNCTION: f4 Used 3 times
    BEST   2032.3ms
    MEDIAN 2033.4ms
    MEAN   2143.9ms
    STDEV  192.3ms
FUNCTION: f5 Used 3 times
    BEST   16.7ms
    MEDIAN 17.1ms
    MEAN   21.0ms
    STDEV  7.1ms

Basically, if you use python-jsonschema and create a reusable instance it's 10 times faster than the "default way". And if you do the same but with python-fastjsonscham it's 100 times faster.

By the way, in version f5 it validated 1,000 1KB records in 16.7ms. That's insanely fast!