Skip to main content

XSS security vulnerability

A real-world example of a classic security mistake, and how I tracked it down and fixed it.

5 min read


Recently, we discovered an XSS vulnerability at work, and I jumped in to fix it. And here’s the funny part: even though I’ve heard about XSS thousands of times throughout my career, I completely failed to recognize it at first.

So let’s walk through how this vulnerability happened, why it was so easy to miss, and what we can do to prevent issues like this in the future.

Origins

This vulnerability didn’t come out of nowhere. It appeared because several architectural choices aligned in just the wrong way:

  1. Generating and storing the HTML in the database This limits your ability to validate or sanitize content at the point of entry.
  2. Returning raw HTML string in the API request Whatever HTML the server sends will be trusted by the client.
  3. Injecting raw HTML into the component …and this is where everything falls apart.

When writing this I can see a lot of red flags, and I’m sure you are as well 😅

In hindsight, it’s a perfect storm. Each step looks innocent in isolation, but together they create a straight path for untrusted content to reach the browser.

If you’re using React, you probably know that dangerouslySetInnerHTML is intentionally scary. It bypasses React’s safety mechanisms and tells React:

Trust me bro, I know what I’m doing.

Here’s how it looked like in the rendering code:

function ExampleComponent() {
  const { data } = useSWR("/api/data", fetcher);

  const createContent = () => {
    return { __html: data.html };
  };

  return (
    <div>
      <div dangerouslySetInnerHTML={createContent()} />
    </div>
  );
}

Of course, this example is simplified to a point, when I’m showing only the relevant parts of the vulnerability. And I can see now that it’s pretty much a textbook example of the XSS injection.

A simple vulnerable HTML can look like this:

<img src="x" onerror="alert('You got hacked!')" />

Inline event handlers like onerror, onload, and onclick are the most common XSS vectors, because browsers execute them automatically, with no user interaction required.

And while we are not allowing user-submitted HTML in the database (because we are generating it ourselves), it’s always a good idea to have extra protection in place. Because when the database gets compromised, it would not allow further exploitation of the system through XSS, for example. And that’s why such “safe” (at first glance) pieces of data can be so dangerous.

Fixing it

Thankfully, there’s a great library available: DOMPurify.

DOMPurify does one thing extremely well: it takes untrusted HTML and strips anything that could execute JavaScript or break out of its sandbox. It’s small, fast, actively maintained, and used across many production systems.

With it, sanitization looks as simple as this:

import DOMPurify from "dompurify";

const cleanHtml = DOMPurify.sanitize(dirtyHtml);

For my use case, though, I had to use a little wrapper around this library: isomorphic-dompurify, because the project is using jest to perform testing.

Jest tests run in a Node.js environment, not a real browser, so there’s no DOM available. DOMPurify expects window, document, and other browser globals, so it fails in tests.

This wrapper library takes care of that, and allows for the same code to run perfectly fine in the browser and in unit test environments.

Updated example from before:

function ExampleComponent() {
  const { data } = useSWR("/api/data", fetcher);

  const createContent = () => {
    const cleanHtml = DOMPurify.sanitize(data.html);
    return { __html: cleanHtml };
  };

  return (
    <div>
      <div dangerouslySetInnerHTML={createContent()} />
    </div>
  );
}

And here’s the test that I’ve added to make sure that HTML coming from the server is properly sanitized:

it("should sanitize incoming html", () => {
  useSWR.mockReturnValue({
    data: {
      html: '<img src="x" onerror="alert(\'Hello there!\')" />',
    },
  });

  const { container } = render(<ExampleComponent />);

  expect(container).toMatchSnapshot();
});

The snapshot ensures that the final rendered markup contains no inline event handlers, scripts, or anything else that could lead to code execution. If sanitization ever breaks, this test will fail immediately.

Future

There’s surprising new developments in this space as well. There’s a proposal to add a Sanitizer API into the browser. Potentially it will allow us to not rely on the third-party libraries in the future, and do everything natively!

There’s a great blog post about this new API that I suggest reading. I’ll definitely come back to try this API out when browsers will fully support it!

Bonus: HTML to text in Java

Related to that HTML-in-the-database thing there was a similar challenge: when exporting, I needed to transform the HTML into plaintext. Here’s how I did it in Java:

public static String issueFixHtmlToText(String html) {
  if (!StringUtils.hasText(html)) {
    return "";
  }

  String replaced = html.replace("summary:", "")
      .replace("issue.fix.any", "Fix Any:")
      .replace("issue.fix.all", "Fix All:")
      .replace("&nbsp;&nbsp;&nbsp;&nbsp;issue.relatedNodes:", "Related nodes:")
      .replace("&nbsp;&nbsp;&nbsp;&nbsp;", " - ");

  return Jsoup.parse(replaced).wholeText().replaceAll("\\n+", "\n").trim();
}

To do that, I’ve used one more useful library: jsoup. This library could actually be used to cleanup HTML as well, and also to manipulate and transform HTML from Java code.

All in all - it’s a great tool to have in your Java toolbox.

Conclusion

This was a good reminder for me that even well-understood vulnerabilities like XSS can sneak into production if the architecture allows it. These issues rarely look dangerous at first glance, but they become dangerous when the right pieces line up.

Hopefully this post helps you spot similar pitfalls in your own systems, and gives you a clear path to fix them when they appear.

Want to receive updates straight in your inbox?

Subscribe to the newsletter

Comments