Sanitizer#

Sanitizer’s job is to take in some HTML, do some magic, and at the end, will spit out HTML that is supposed to be safe.

If we take a look into what it actually do, it is quite simple:

It first parse the HTML, creating a data object
Then, it will work its magic, sanitize the input
Finally, it serialize the object back into HTML, and feed it to the browser
The browser simply reparses and renders it

So, TLDR: parse -> sanitize -> serialize -> reparse by browser -> render

However#

However, that’s a lie. Its job is not as simple as I made it to be. Mainly because HTML is a tolerant language. They are made with the intention of “one little mistake from a developer should not crash the whole website”.

Because of that, the HTML parser has to try its best to guess and fix the broken HTML that it receives. For example, <p>some text will be fixed into <p>some text</p> by the parser

To make it worse, there are many different HTML parsers, and although there is a standard, each one might do things a little different than others. This is called parser differentials

And sanitizing is not native to the browser, or parser. The most used library to sanitize HTML is DOMPurify, and it is developed and maintained by third-party.