DOM XSS Explained
Cross Site scripting (XSS) has been a problem for well over a decade now, XSS just like other well known security issues such as SQL, XPATH, LDAP Injection etc fells inside the category of input validation attacks. An xss vulnerability occurs when an input taken from the user is not filtered/santized before it's returned back to the user. The XSS can be divided into the following three categories:
1) Reflected XSS
2) Stored XSS
3) DOM Based XSS
As time has passed by the security community has came up with lots of solutions to mitigate XSS attacks such as encoding libraries, Web Application filters based upon blacklists, Browser based XSS filters, Content security policy etc, however all of them have had some limitation either not being able to mitigate xss attacks in a certain context. For example - We have a well known PHP based protection library called HTML purifier, it offers great protection against XSS attacks in lots of different context, however currently it has no support for HTML 5, talking about content security policy which dubbed as the best solution for mitigating xss attacks has no support for inline JS as well as no support for mobile browsers.
However, traditional XSS has been mitigated to some extent where the input is sent to the server and when is returned without any sanitation. However what happens in case where the user input is never sent to the server side and get's executed on the client side?. In that case all the server side defenses would fail as the payload never arrives the server. This is what is referred as DOM based XSS when the payload is never sent to the server and is executed on the client side, in order to understand DOM based XSS better, we would need to understand, what's DOM.
What is DOM?
DOM stands for document object model and all it is a javascript's way of accessing page. Each and every HTML element has a correspoding entity inside of DOM.What is DOM Based XSS?
A DOM based XSS vulnerability occurs when a source get's executed as a sink without any sanitization. A source is commonly refered as any thing that takes input, which apparentely in javascript is "Everything taken from a URL". Common sources inside javascript are document.url, location.hash, location.href, document.referer. A more detailed list is available at the DOM based XSS Wiki.A sink is refered to anything that creates HTML in an insecure way. There are wide variety of different sinks depending upon the javascript libraray you use. For example: In javascript document.write, innerHTML, eval are the most commonly found sinks, where as in jquery we have .html(), .appendto() etc. We can find a list of commonly used sinks at DOM based XSS wiki.
How to find DOM Based XSS?
There are couple of different approaches to detecting DOM based XSS such as black box fuzzing, static analysis and Dynamic analysis. We will discuss all three of them.Black Box Fuzzing
In black box fuzzing approach, we try injecting different XSS vectors inside different javascript sources and hoping for javascript to be executed. A tool called Ra-2 is an example of blackbox fuzzing approach towards DOM based XSS. The Fundamental problem with this approach is that it's not context aware and contains several false negatives.
Static Analysis
Another approach is to detecting DOM based XSS is by performing a static source code analysis, where by we try finding out the sources/sinks and we trace if a source is being executed as a sink.DOM based XSS wiki contains a list of regular expressions which would help you find out the sources/sinks:
The following regular expressions would help you determine all the sources and sinks in a javascript file:
Finding Sources:
/(location\s*[\[.])|([.\[]\s*["']?\s*(arguments|dialogArguments|innerHTML|write(ln)?|open(Dialog)?|showModalDialog|
cookie|URL|documentURI|baseURI|referrer|name|opener|parent|top|content|self|frames)\W)|(localStorage|sessionStorage|
Database)/
Finding Javascript Sinks:
/((src|href|data|location|code|value|action)\s*["'\]]*\s*\+?\s*=)|((replace|assign|navigate|getResponseHeader|open
(Dialog)?|showModalDialog|eval|evaluate|execCommand|execScript|setTimeout|setInterval)\s*["'\]]*\s*\()/
Finding Jquery based sinks
/after\(|\.append\(|\.before\(|\.html\(|\.prepend\(|\.replaceWith\(|\.wrap\(|\.wrapAll\(|\$\(|\.globalEval\(|\.add\(|
jQUery\(|\$\(|\.parseHTML\(/
JSprime and IBM appscan apply this particular approach to detecting DOM based XSS. The fundamental problem with this approach is that javascript code may be compressed, packed or Obfuscated, in that scenario Static analysis won't help us. Also in case where the code is encoded and is decsoded and executed at runtime a static code analyzer won't be able to detect it. An example of this would be the following statement:
eval('var a'+'=loc'+'ation'+'.hash');
The strings would are spilitted and would be joined together in memory at runtime and would be executed by using the eval function.
Dynamic Analysis
As mentioned above, a static code analyzer won't be able to detect obfuscated and code which would execute at the run-time. In that case, we have another approach called as Dynamic taint tracking. The taint flag is added to the source and it's tracked until it is executed via a source at run-time. Dominator is currently the only tool in the market that utilizes this methodology, however there are several places where dominator is not effective1) Human interaction is necessary for the analysis to be performed.
2) If a human misses to test a feature dominator would also miss.
3) A better dynamic taint tracking is required.
Let's now talk about few examples:
Example:
<HTML>
<TITLE>Welcome!</TITLE>
Hi
<SCRIPT>
var pos=document.URL.indexOf("name=")+5;
document.write(document.URL.substring(pos,document.URL.length));
</SCRIPT>
<BR>
Welcome to our system
…
</HTML>
The following code is taken from Amiet klien's paper on "DOM based XSS", In the above script we see a javascript source document.url that takes input from a url, The indexof property searches for the name parameter inside the url and is saved into the "pos" variable, later the user input stored inside the pos variable is being directly written to the DOM without any sanitization.
By inserting the following payload, javascript would be executed and hence would finalize the DOM based XSS attack:
http://www.target.com/page?name=<script>alert(1);</script>
In the above case one might argue that the payload would be sent to the server in initial requests, but in the following scenario where we specify a hash before the name variable, the payload would not be sent to the server.
http://www.target.com/page?#name=<script>alert(1);</script>
Let's take a look at another example:
<html>
<body>
<h2>POC for settimeout DOMXSS</h2>
<body>
<script>
var i=location.hash.split('#')[1];
(function () {
setTimeout(i,3000);
})();
</script>
</body>
</html>
In this case, the source is location.hash, the split function is used which would split up everything after hash, the user input is being stored under variable "i" which is later executed via setTimeout() function which is a known javascript sink.
The following would trigger an alert after three seconds:
http://www.target.com/page.html#alert(1)
Example 3
<script>
var redir = location.hash.split("#")[1];
x = document.getElementById('anchor');
x.setAttribute('href',redir);
</script>
The following is another very simple example of DOM based XSS, the input is taken from location.hash and is stored inside redir variable. Next, we use getElementByID method to search for anchor element with id "anchor". next the value of redir is assigned to the href attribute via the setAttribute API. The sink in this case is the href. You may try replacing href with src and it would still result in the javascript being executed.
Cross Browser Detection
Some browsers encode special characters when taken from a particular before displaying it back to the DOM and due to this reason a DOM based XSS vulnerability might trigger in one browser, but does not trigger in another browser. For example: Firefox encodes every thing sent after ?, where as internet explorer doesn't. A list of these characters is already compiled up on DOM based XSS wiki. Consider, Spending some time reviewing it.Defending Against DOM Based XSS
The following steps shall be taken to mitigate DOM based XSS attacks1) Unsafe sinks shall not be used. They shall be replaced with safe methods. For example: innerHTML is classified as a dangerous sink, however innertext could be used in place of innerHTML, which is a safe DOM method. Similarly in jquery, .html() (equivalent to innerHTMLof javascript) is classified as a dangerous method, alternatively we have .text() method to safely write things to the DOM.
2) As there are several different javascript libraries, community should make effort in classifying all the dangerous sinks in that particular libraray and introduce safe methods