Filtered by JavaScript

Page 7

Reset

Test if two URLs are "equal" in JavaScript

July 2, 2020
3 comments JavaScript

This saved my bacon today and I quite like it so I hope that others might benefit from this little tip.

So you have two "URLs" and you want to know if they are "equal". I write those words, in the last sentence, in quotation marks because they might not be fully formed URLs and what you consider equal might depend on the current business logic.

In my case, I wanted http://www.peterbe.com/path/to?a=b to be considered equal to/path/to#anchor. Because, in this case the both share the exact same pathname (/path/to). So how to do it:


function equalUrls(url1, url2) {
  return (
    new URL(url1, "http://example.com").pathname ===
    new URL(url2, "http://example.com").pathname
  );
}

Truncated! Read the rest by clicking the link below.

findMatchesInText - Find line and column of matches in a text, in JavaScript

June 22, 2020
0 comments Node, JavaScript

I need this function to relate to open-editor which is a Node program that can open your $EDITOR from Node and jump to a specific file, to a specific line, to a specific column.

Here's the code:


function* findMatchesInText(needle, haystack, { inQuotes = false } = {}) {
  const escaped = needle.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");
  let rex;
  if (inQuotes) {
    rex = new RegExp(`['"](${escaped})['"]`, "g");
  } else {
    rex = new RegExp(`(${escaped})`, "g");
  }
  for (const match of haystack.matchAll(rex)) {
    const left = haystack.slice(0, match.index);
    const line = (left.match(/\n/g) || []).length + 1;
    const lastIndexOf = left.lastIndexOf("\n") + 1;
    const column = match.index - lastIndexOf + 1;
    yield { line, column };
  }
}

And you use it like this:


const text = ` bravo
Abra
cadabra

bravo
`;

console.log(Array.from(findMatchesInText("bra", text)));

Which prints:


[
  { line: 1, column: 2 },
  { line: 2, column: 2 },
  { line: 3, column: 5 },
  { line: 5, column: 1 }
]

The inQuotes option is because a lot of times this function is going to be used for finding the href value in unstructured documents that contain HTML <a> tags.

Benchmark compare Highlight.js vs. Prism

May 19, 2020
0 comments Node, JavaScript

tl;dr; I wanted to see which is fastest, in Node, Highlight.js or Prism. The result is; they're both plenty fast but Prism is 9% faster.

The context is all the thousands of little snippets of CSS, HTML, and JavaScript code on MDN.
I first wrote a script that stored almost 9,000 snippets of code. 60% is Javascript and 22% is CSS and rest is HTML.
The mean snippet size was 400 bytes and the median 300 bytes. All ASCII.

Then I wrote three functions:

  1. f1 - opens the snippet, extracts the payload, and saves it in a different place. This measures the baseline for how long the disk I/O read and the disk I/O write takes.
  2. f2 - same as f1 but uses const html = Prism.highlight(payload, Prism.languages[name], name); before saving.
  3. f3 - same as f1 but uses const html = hljs.highlight(name, payload).value; before saving.

The experiment

You can see the hacky benchmark code here: https://github.com/peterbe/syntax-highlight-node-benchmark/blob/master/index.js

Results

The results are (after running each 12 times each):

f1 0.947s   fastest
f2 1.361s   43.6% slower
f3 1.494s   57.7% slower

Memory

In terms of memory usage, Prism maxes heap memory at 60MB (the f1 baseline was 18MB), and Highlight.js maxes heap memory at 60MB too.

Disk space in HTML

Each library produces different HTML. Examples:

Prism


<span class="token selector">.item::after</span> <span class="token punctuation">{</span>
    <span class="token property">content</span><span class="token punctuation">:</span> <span class="token string">"This is my content."</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>

Highlight.js


<span class="hljs-selector-class">.item</span><span class="hljs-selector-pseudo">::after</span> {
    <span class="hljs-attribute">content</span>: <span class="hljs-string">"This is my content."</span>;
}

Yes, not only does it mean they look different, they use up a different amount of disk space when saved. That matters for web performance and also has an impact on build resources.

  • f1 - baseline "HTML" files amounts to 11.9MB (across 3,025 files)
  • f2 - Prism: 17.6MB
  • f3 - Highlight.js: 13.6MB

Conclusion

Prism is plenty fast for Node. If you're already using Prism, don't worry about having to switch to Highlight.js for added performance.

RAM memory consumption is about the same.

Final HTML from Prism is 30% larger than Highlight.js but when the rendered HTML is included in a full HTML page, the HTML compresses very well because of all the repetition so this is not a good comparison. Or rather, not a lot to worry about.

Well, speed is just one dimension. The features differ too. MDN already uses Prism but does so in the browser. The ultimate context for this blog post is; the speed if we were to do all the syntax highlighting in the server as a build step.

Throw JavaScript errors with extra information

May 12, 2020
0 comments Node, JavaScript

Did you know, if you can create your own new Error instance and attach your own custom properties on that? This can come in very handy when you, from the caller, want to get more structured information from the error without relying on the error message.


// WRONG ⛔️

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      throw new Error(`Failed at ${i}`);
    }
  }
} catch (err) {
  const iteration = parseInt(err.toString().match(/Failed at (\d+)/)[1]);
  console.warn(`Made it to ${iteration}`);
}

// RIGHT ✅

try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
  }
} catch (err) {
  const iteration = err.iteration;
  console.warn(`Made it to ${iteration}`);
}

The above examples are obviously a bit contrived but you have to imagine that whatever code can throw an error might be "far away" from where you deal with errors thrown. For example, imagine you start off a build and you want to get extra information about what the context was. In Python, you use exception classes as a form of natural filtering but JavaScript doesn't have that. Using custom error properties can be a great tool to separate unexpected errors from expected errors.

Bonus - Checking for the custom property

Imagine this refactoring:


try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
    if (Math.random() < 0.001) {
      throw new Error("something else is wrong");
    }
  }
} catch (err) {
  const iteration = err.iteration;
  console.warn(`Made it to ${iteration}`);
}

With that code it's very possible you'd get Made it to undefined. So here's how you'd make the distinction:


try {
  for (const i of [...Array(10000).keys()]) {
    if (Math.random() > 0.999) {
      const failure = new Error(`Failed at ${i}`);
      failure.iteration = i;
      throw failure;
    }
    if (Math.random() < 0.001) {
      throw new Error("something else is wrong");
    }
  }
} catch (err) {
  if (err.hasOwnProperty("iteration")) {
    const iteration = err.iteration;
    console.warn(`Made it to ${iteration}`);
  } else {
    throw err;
  }
}

```

How to use minimalcss without a server

April 24, 2020
0 comments Web development, Node, JavaScript

minimalcss requires that you have your HTML in a serving HTTP web page so that puppeteer can open it to find out the CSS within. Suppose, in your build system, you don't yet really have a server. Well, what you can do is start one on-the-fly and shut it down as soon as you're done.

Suppose you have .html file

First install all the stuff:

yarn add minimalcss http-server

Then run it:


const path = require("path");

const minimalcss = require("minimalcss");
const httpServer = require("http-server");

const HTML_FILE = "index.html";  // THIS IS YOURS

(async () => {
  const server = httpServer.createServer({
    root: path.dirname(path.resolve(HTML_FILE)),
  });
  server.listen(8080);

  let result;
  try {
    result = await minimalcss.minimize({
      urls: ["http://0.0.0.0:8080/" + HTML_FILE],
    });
  } catch (err) {
    console.error(err);
    throw err;
  } finally {
    server.close();
  }

  console.log(result.finalCss);
})();

And the index.html file:


<!DOCTYPE html>
<html>
    <head>
        <link rel="stylesheet" href="styles.css">
    </head>
    <body>
        <p>Hi @peterbe</p>
    </body>
</html>

And the styles.css file:


h1 {
  color: red;
}
p,
li {
  font-weight: bold;
}

And the output from running that Node script:

p{font-weight:700}

It works!

Suppose all you have is the HTML string and the CSS blob(s)

Suppose all you have is a string of HTML and a list of strings of CSS:


const fs = require("fs");
const path = require("path");

const minimalcss = require("minimalcss");
const httpServer = require("http-server");

const HTML_BODY = `
<p>Hi Peterbe</p>
`;

const CSSes = [
  `
h1 {
  color: red;
}
p,
li {
  font-weight: bold;
}
`,
];

(async () => {
  const csses = CSSes.map((css, i) => {
    fs.writeFileSync(`${i}.css`, css);
    return `<link rel="stylesheet" href="${i}.css">`;
  });
  const html = `<!doctype html><html>
  <head>${csses}</head>
  <body>${HTML_BODY}</body>
  </html>`;
  const fp = path.resolve("./index.html");
  fs.writeFileSync(fp, html);
  const server = httpServer.createServer({
    root: path.dirname(fp),
  });
  server.listen(8080);

  let result;
  try {
    result = await minimalcss.minimize({
      urls: ["http://0.0.0.0:8080/" + path.basename(fp)],
    });
  } catch (err) {
    console.error(err);
    throw err;
  } finally {
    server.close();
    fs.unlinkSync(fp);
    CSSes.forEach((_, i) => fs.unlinkSync(`${i}.css`));
  }

  console.log(result.finalCss);
})();

Truth be told, you'll need a good pinch of salt to appreciate that example code. It works but most likely, if you're into web performance so much that you're even doing this, your parameters are likely to be more complex.

Suppose you have your own puppeteer instance

In the first example above, minimalcss will create an instance of puppeteer (e.g. const browser = await puppeteer.launch()) but that means you have less control over which version of puppeteer or which parameters you need. Also, if you have to run minimalcss on a bunch of pages it's costly to have to create and destroy puppeteer browser instances repeatedly.

To modify the original example, here's how you use your own instance of puppeteer:

  const path = require("path");

+ const puppeteer = require("puppeteer");
  const minimalcss = require("minimalcss");
  const httpServer = require("http-server");

  const HTML_FILE = "index.html"; // THIS IS YOURS

  (async () => {
    const server = httpServer.createServer({
      root: path.dirname(path.resolve(HTML_FILE)),
    });
    server.listen(8080);

+   const browser = await puppeteer.launch(/* your special options */);
+
    let result;
    try {
      result = await minimalcss.minimize({
        urls: ["http://0.0.0.0:8080/" + HTML_FILE],
+       browser,
      });
    } catch (err) {
      console.error(err);
      throw err;
    } finally {
+     await browser.close();
      server.close();
    }

    console.log(result.finalCss);
  })();

Note that this doesn't buy us anything in this particular example. But that's where your imagination comes in!

Conclusion

You can see the code here as a git repo if that helps.

The point is that this might solve some of the chicken-and-egg problem you might have is that you're building your perfect HTML + CSS and you want to perfect it before you ship it.

Note also that there are other ways to run minimalcss other than programmatically. For example, minimalcss-server is minimalcss wrapped in a express server.

Another thing that you might have is that you have multiple .html files that you want to process. The same technique applies but you just need to turn it into a loop and make sure you call server.close() (and optionally await browser.close()) when you know you've processed the last file. Exercise left to the reader?

How post JSON with curl to an Express app

April 15, 2020
2 comments Node, JavaScript

tl;dr; No need install or require body-parser and it's important to send the right content-type header.

I know Express has great documentation but I'm still confused about how to receive JSON and/or how to test it from curl. A great deal of confusion comes from the fact that, I think, body-parser used to be a third-party library you had to install yourself to add it to your Express app. You don't. It now gets installed by installing express. E.g.

▶ yarn init -y
▶ yarn add express
▶ ls node_modules/body-parser
HISTORY.md   LICENSE      README.md    index.js     lib          package.json

Let's work backward. This is how you set up the Express handler:


const express = require("express");  // v4.17.x as of Apr 2020
const app = express();

app.use(express.json());

app.post("/echo", (req, res) => {
  res.json(req.body);
}); 

app.listen(5000);

And, this is how you test it:

▶ curl -XPOST -d '{"foo": "bar"}' -H 'content-type: application/json' localhost:5000/echo
{"foo":"bar"}%

That's it. No need to require("body-parser") or anything like that. And make sure you're sending the content-type: application/json in the curl command.

Things that can go wrong

I kept fumbling around on StackOverflow questions and rummaging the Express documentation until I figured out what mistake I kept doing. So, here's a variant of the handler above, but much more verbose:


app.post("/echo", (req, res) => {

  if (req.body === undefined) {
    throw new Error("express.json middleware not installed");
  }
  if (!Object.keys(req.body).length) {
    // E.g curl -v -XPOST http://localhost:5000/echo
    if (!req.get("content-Type")) {
      return res.status(400).send("no content-type header\n");
    }
    // E.g. curl -v -XPOST -d '{"foo": "bar"}' http://localhost:5000/echo
    if (!req.get("content-Type").includes("application/json")) {
      return res.status(400).send("content-type not application/json\n");
    }
    // E.g. curl -XPOST -H 'content-type:application/json' http://localhost:5000/echo
    return res.status(400).send("no data payload included\n");
  }

  // At this point 'req.body' is *something*.
  // For example, you might want to `console.log(req.body.foo)`
  res.json(req.body);
}); 

How you treat these things is up to you. For example, an empty JSON data might be OK in your application.
I.e. perhaps curl -XPOST -d '{}' -H 'content-type:application/json' http://localhost:5000/echo might be fine.

An important option

express.json() is a piece of middleware. By default, it has a simple mechanism for bothering to do put .body into the request object. The default configuration is as if you'd typed:


app.use(express.json({
  type: 'application/json',
}));

(it's actually a bit more complicated than that)

If you're confident that you'll always be sending JSON to this handler, and you don't want to have to force clients to have to specify the application/json Content-Type you can change this to:

app.use(express.json({
  type: '*/*',
}));

Now you'll find that curl -XPOST -d '{"foo": "bar3"}' localhost:5000/ will work fine.

Instead of curl, let's fetch

This code works the same with node-fetch or browser Fetch API.


fetch("http://localhost:5000/echo", {
  method: "post",
  body: JSON.stringify({ foo: "bar" }),
  headers: { "Content-Type": "application/json" },
})
  .then((res) => res.json())
  .then((json) => console.log(json));

Performance of truth checking a JavaScript object

February 3, 2020
0 comments Node, JavaScript

I'm working on a Node project that involves large transformations of large sets of data here and there. For example:


if (!Object.keys(this.allTitles).length) {
  ...

In my case, that this.allTitles is a plain object with about 30,000 key/value pairs. That particular line of code actually only runs 1 single time so if it's hundreds of milliseconds, it's really doesn't matter that much. However, that's not a guarantee! What if you had something like this:


for (const thing of things) {
  if (!Object.keys(someObj).length) {
    // mutate someObj
  }
}

then, you'd potentially have a performance degradation once someObj becomes considerably large. And it gets particularly degraded if the length of things is considerably large as it would do the operation many times.

Actually, consider this:


const obj = {};
[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

On my macBook with Node 13.5, this outputs:

Truthcheck obj: 260.564ms

Maps

The MDN page on Map has a nice comparison, in terms of performance, between Map and regular object. Consider this super simple benchmark:


const obj = {};
const map = new Map();

[...Array(30000)].forEach((_, i) => {
  obj[i] = i;
  map.set(i, i);
});

console.time("Truthcheck obj");
[...Array(100)].forEach((_, i) => {
  return !!Object.keys(obj).length;
});
console.timeEnd("Truthcheck obj");

console.time("Truthcheck map");
[...Array(100)].forEach((_, i) => {
  return !!map.size;
});
console.timeEnd("Truthcheck map");

So, fill a Map instance and a plain object with 30,000 keys and values. Then, for each in turn, check if the thing is truthy 100 times. The output I get:

Truthcheck obj: 235.017ms
Truthcheck map: 0.029ms

That's not unexpected. The map instance maintains a size counter, which increments on .set (if the key is new), so doing that "truthy" check just takes O(1) seconds.

Conclusion

Don't run to rewrite everything to Maps!

In fact, I took the above mentioned little benchmark and changed the times to be a 3,000 item map and obj (instead of 30,000) and only did 10 iterations (instead of 100) and then the numbers are:

Truthcheck obj: 0.991ms
Truthcheck map: 0.044ms

These kinds of small numbers are very unlikely to matter in the scope of other things going on.

Anyway, consider using Map if you fear that you might be working with really reeeeally large mappings.

JavaScript destructuring like Python kwargs with defaults

January 18, 2020
1 comment Python, JavaScript

In Python

I'm sure it's been blogged about a buncha times before but, I couldn't find it, and I had to search too hard to find an example of this. Basically, what I'm trying to do is what Python does in this case, but in JavaScript:


def do_something(arg="notset", **kwargs):
    print(f"arg='{arg.upper()}'")

do_something(arg="peter")
do_something(something="else")
do_something()

In Python, the output of all this is:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

It could also have been implemented in a more verbose way:


def do_something(**kwargs):
    arg = kwargs.get("arg", "notset")
    print(f"arg='{arg.upper()}'")

This more verbose format has the disadvantage that you can't quickly skim it and see and what the default is. That thing (arg = kwargs.get("arg", "notset")) might happen far away deeper in the function, making it hard work to spot the default.

In JavaScript

Here's the equivalent in JavaScript (ES6?):


function doSomething({ arg = "notset", ...kwargs } = {}) {
  return `arg='${arg.toUpperCase()}'`;
}

console.log(doSomething({ arg: "peter" }));
console.log(doSomething({ something: "else" }));
console.log(doSomething());

Same output as in Python:

arg='PETER'
arg='NOTSET'
arg='NOTSET'

Notes

I'm still not convinced I like this syntax. It feels a bit too "hip" and too one-liner'y. But it's also pretty useful.

Mind you, the examples here are contrived because they're so short in terms of the number of arguments used in the function.
A more realistic thing like be a function that lists, upfront, all the possible parameters and for some of them, it wants to point out some defaults. E.g.


function processFolder({
  source,
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder({ source: "/user", quiet: true }));

One could maybe argue that arguments that don't have a default are expected to always be supplied so they can be regular arguments like:


function processFolder(source, {
  destination = "/tmp",
  quiet = false,
  verbose = false
} = {}) {
  console.log({ source, destination, quiet, verbose });
  // outputs
  // { source: '/user', destination: '/tmp', quiet: true, verbose: false }
}

console.log(processFolder("/user", { quiet: true }));

But, I quite like keeping all arguments in an object. It makes it easier to write wrapper functions and I find this:


setProfile(
  "My biography here",
  false,
  193.5,
  230,
  ["anders", "bengt"],
  "South Carolina"
);

...harder to read than...


setProfile({
  bio: "My biography here",
  dead: false,
  height: 193.5,
  weight: 230,
  middlenames: ["anders", "bengt"],
  state: "South Carolina"
});

How depend on a local Node package without npmjs.com

January 15, 2020
0 comments JavaScript

Suppose that you're working on ~/dev/my-cool-project and inside ~/dev/my-cool-project/package.json you might have something like this:

"dependencies": {
     "that-cool-lib": "1.2.3",
     ...

But that that-cool-lib is one of your own projects. You're also working on that project and it's over at ~/dev/that-cool-lib. Within that-cool-lib you might be in a git branch or perhaps you're preparing a 2.0.0 release.

Now you're interested if that-cool-lib@2.0.0 is going to work here inside my-cool-project.

What you could do

First, you release this fancy that-cool-lib@2.0.0 to npmjs.com with that project's npm publish procedure. Then as soon as that's done and you can see that the release made it onto https://www.npmjs.com/package/that-cool-lib/v/2.0.0.

Then you go over to my-cool-project and start a new git branch to try the upgrade, npm install that-cool-project@2.0.0 --save so you have this:

"dependencies": {
-    "that-cool-lib": "1.2.3",
+    "that-cool-lib": "2.0.0",
     ...

Now you can try it that new version of my-cool-project and if that-cool-lib had any of its own entry point executables or post/pre install steps, they'd be fully resolved.

What you should do

Instead, use install-local. Don't use npm link because it might not install entry point executables and I also don't like the fact that I need to go into that-cool-lib and install it (globally?) first (when you do cd that-cool-lib && npm link). Also, see "What's wrong with npm-link?".

Here's how you do it:

npx install-local ~/dev/that-cool-lib

and it acts pretty much exactly as if you had gotten it from npmjs.com the normal way.

Notes

I almost never use npm these days. Go yarn! So, perhaps I've misinterpreted something.

Also, I try my very hardest to never use npm install -g ... (or yarn global ... for that matter) now that we have npx. Perhaps if you'd install it locally it'd speed up the use of local-install by 1-3 seconds each time you run this. Again, my skillset of modern npm is fading so I don't think I understand why it takes me 14 seconds the first time I run npx install that-cool-lib and then it takes 14 seconds again when I run the exact same command again. Does it not benefit from any caching? How much of that time is spent on npmjs.com resolving other sub-dependencies that that-cool-lib requires?

Hopefully, this helps other people stuck in a similar boat.

How to split a block of HTML with Cheerio in NodeJS

January 3, 2020
2 comments Node, JavaScript

cheerio is a great Node library for processing HTML. It's faster than JSDOM and years and years of jQuery usage makes the API feel yummily familiar.

What if you have a piece of HTML that you want to split up into multiple blocks? For example, you have this:


<div>Prelude</div>

<h2>First Header</h2>

<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>

<h2 id="second">Second Header</h2>

<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>

and you want to get this split by the <h2> tags so you end up with 3 (in this example) distinct blocks of HTML, like this:

first one


<div>Prelude</div>

second one


<h2>First Header</h2>

<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>

third one


<h2 id="second">Second Header</h2>

<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>

You could try to cast the regex spell on that and try to, I don't know, split the string by the </h2>. But it's risky and error prone because (although a bit unlikely in this simple example) get caught up in <h2>...</h2> tags that are nested inside something else. Also, proper parsing almost always wins in the long run over regexes.

Use cheerio

This is how I solved it and hopefully A) you can copy and benefit, or B) someone tells me there's already a much better way.

What you do is walk the DOM root nodes, one by one, and keep filling a buffer and then yield individual new cheerio instances.


const html = `
<div>Prelude</div>

<h2>First Header</h2>
<p>Paragraph <b>here</b>.</p>
<p>Another paragraph.</p>
<!-- comment -->

<h2 id="second">Second Header</h2>
<ul>
  <li>One</li>
  <li>Two</li>
</ul>
<blockquote>End quote!</blockquote>
`;

// load the raw HTML
// it needs to all be wrapped in *one* big wrapper
const $ = cheerio.load(`<div id="_body">${html}</div>`);

// the end goal
const blocks = [];

// the buffer
const section = cheerio
  .load("<div></div>", { decodeEntities: false })("div")
  .eq(0);

const iterable = [...$("#_body")[0].childNodes];
let c = 0;
iterable.forEach(child => {
  if (child.tagName === "h2") {
    if (c) {
      blocks.push(section.clone());
      section.empty();
      c = 0; // reset the counter
    }
  }
  c++;
  section.append(child);
});
if (c) {
  // stragglers
  blocks.push(section.clone());
}

// Test the result
const blocksAsStrings = blocks.map(block => block.html());
console.log(blocksAsStrings.length);
// 3
console.log(blocksAsStrings);
// [
//   '\n<div>Prelude</div>\n\n',
//   '<h2>First Header</h2>\n' +
//     '<p>Paragraph <b>here</b>.</p>\n' +
//     '<p>Another paragraph.</p>\n' +
//     '<!-- comment -->\n' +
//     '\n',
//   '<h2 id="second">Second Header</h2>\n' +
//     '<ul>\n' +
//     '  <li>One</li>\n' +
//     '  <li>Two</li>\n' +
//     '</ul>\n' +
//     '<blockquote>End quote!</blockquote>\n'
// ]

In this particular implementation the choice of splitting is by the every h2 tag. If you want to split by anything else, go ahead and adjust the conditional there where it's currently doing if (child.tagName === "h2") {.

Also, what you do with the blocks is up to you. Perhaps you need them as strings, then you use the blocks.map(block => block.html()). Otherwise, if it serves your needs they can remain as individual cheerio instances that you can do whatever with.