Skip to main content

I encountered a situation recently where I needed to perform transformations on a complicated HTML string in JavaScript. I also wanted to do this in vanilla JavaScript so that I did not have to install any dependencies.

Server-side JavaScript has several HTML parsing libraries available already, such as cheerio, htmlparser2 and jsdom. For client-side JavaScript, there is the native DOMParser interface.

DOMParser can parse an XML or HTML string into a DOM Document. All of the standard methods, like querySelector and getElementById will work on an instance of DOMParser, making it a reasonable alternative to third-party scripts.

Browser support is widespread, although only Internet Explorer 10 and higher support HTML string parsing.

Basic set up

Setting up DOMParser involves instantiating a new instance and calling the parseFromString method, passing the HTML string and specifying text/html as the content type:

const html = `<p>HTML</p>`;
const parser = new DOMParser();
const parsed = parser.parseFromString(html, 'text/html');

parsed will now act like the global document variable, with the same properties and methods available to it:

console.log(parsed.body.innerHTML); // returns "<p>HTML</p>".
console.log(parsed.body.innerText); // returns "HTML".

A working example

HTML parsed with DOMParser can be modified and returned, making it handy for tasks where DOM manipulation is required.

To demonstrate this, a class will be added to every <li> element in the below HTML snippet:

<ul class="list">
  <li>Item 1</li>
  <li>Item 2</li>
  <li>Item 3</li>
  <li>Item 4</li>
  <li>Item 5</li>
</ul>

First, define the HTML snippet in JavaScript and load it via DOMParser:

const html = `
  <ul class="list">
    <li>Item 1</li>
    <li>Item 2</li>
    <li>Item 3</li>
    <li>Item 4</li>
    <li>Item 5</li>
  </ul>
`;

const parser = new DOMParser();
const parsed = parser.parseFromString(html, 'text/html');

Then use the querySelectorAll method to fetch the <li> elements in the list and loop over each one to add a class attribute:

const elements = parsed.querySelectorAll('.list li');

elements.forEach(el => {
  el.setAttribute('class', 'list-item')
});

Once this is done, logging parsed.body.innerHTML will return:

<ul class="list">
  <li class="list-item">Item 1</li>
  <li class="list-item">Item 2</li>
  <li class="list-item">Item 3</li>
  <li class="list-item">Item 4</li>
  <li class="list-item">Item 5</li>
</ul>

If the transformed HTML will be added to the page, it can be done like so:

document.body.innerHTML = parsed.body.innerHTML;

Works cited

"DOMParser." MDN, Mozilla, 18 March 2019. https://developer.mozilla.org/en-US/docs/Web/API/DOMParser. Accessed 29 April 2019.

"Manipulating DOM Elements." Plain JavaScript - Manipulating DOM Elements, Pixabay.com, https://plainjs.com/javascript/manipulation/. Accessed 29 April 2019.

"The DOMParser interface." DOM Parsing and Serialization, W3C, 11 February 2019. https://w3c.github.io/DOM-Parsing/#the-domparser-interface. Accessed 29 April 2019.