What if HTMLEditFormat() don’t cut it?
You know of course that you need to HTMLEditFormat() any user input that you intend to display somewhere on your page to avoid racing down the road to XSS hell; to save on processing resources the best time to do this would obviously be before the data goes to your persistance layer (be it some physical file or most likely a database).
If all you want to do is allow your users to store some plain old text, maybe seasoned with some kind of BB-code markup to allow for some limited text formatting, this method is just fine. If that is not enough and you actually need to allow a limited amount of good old HTML, you'll need some more sophisticated sanitizing mechanism to parse out any potentially harmful code elements like JavaScript actions and the like.
Users' HTML input may be harmful in more than one way - if it is not well formed, it may mess up the rest of your page; take a look at my earlier post on this issue "Coldfusion UDF-wrapper for JTidy to clean up HTML". In this post I promised to also write about the other evil that lurks in user input and how to fight it if HTMLEditFormat() is doing too much collateral damage in your use case.
A little while ago, a guy named Samy Kamkar rocked the blogosphere with his spectacular yet simple XSS-worm which gathered myriads of "friends" for him, because Samy was their hero. In honour of Samy, the Open Web Application Security Project (OWASP) gave birth to an extremely useful API to combat Samy copycats, along with implementations for .NET and - hooray! - Java, named the offspring AntiSamy, and generously put everything under a BSD license.
AntiSamy is not just a simple RegEx-Replacer that silently drops a certain set of potentially malicious code elements. You can define your own ruleset inside a policy file - or rulesets (i.e. policy files) for that matter - maybe you've got some group of privileged users whose input should be treated not quite as strict as that of your average Joe WannabeH4XX0r. And AntiSamy won't just do the sanitizing, it will also allow you to tell the user what has been rejected and why, as there's quite some potential for frustration if he designs some beautifully CSSed code and your policy chucks out all the good stuff.
Speaking of good stuff, let's get onto it.
First you'll need to head to the AntiSamy download page on Google Code. Grab the latest antisamy-bin.jar (antisamy-bin.1.3.jar at the time of this post) and drop it in your ColdFusion class path (e.g. /opt/coldfusion/lib in a standalone-server install). Restart ColdFusion.
On the download page you'll also find a couple of pre-made policy files. The slashdot policy file is the most resttrictive one, followed by the ebay- and myspace- files. The anythinggoes-file is too permissive for any serious application, so you may as well ignore that one. Put those policy files in a new AntiSamy-policy folder somewhere on your server - the files need not be web-accessible, so you may well use something like /var/local/lib/AntiSamyPolicies/. You may of course build your own policy file based on one of the existing examples and drop it in that same folder.
Now to use this from ColdFusion, you may grab my sanitizer.cfc:
<cfcomponent displayname="sanitizer" hint="sanitize user generated HTML from malicious elements"> <!--- call init() automatically when the CFC is instantiated ---> <cfset init()> <cffunction name="init" returntype="sanitizer" access="public" output="false"> <cfargument name="policy" default="myspace" type="string" required="no"> <cfscript> variables.policyVersion = '1.1.1'; variables.listPolicyDefaults = 'anythinggoes,ebay,slashdot,myspace'; variables.defaultPolicy = 'myspace'; variables.policyFilePath = '/var/local/lib/AntiSamyPolicies/'; variables.policyFile = variables.policyFilePath & setPolicy(arguments.policy); this.sanitizer = createObject("java", "org.owasp.validator.html.AntiSamy"); return this; </cfscript> </cffunction> <cffunction name="setPolicy" returntype="string" access="private" output="false"> <cfargument name="sPolicy" required="yes" type="string"> <cfscript> var sFullPolicyPath = variables.policyFilePath & 'antisamy-'; arguments.sPolicy = lCase(arguments.sPolicy); if (ListFind(variables.listPolicyDefaults,arguments.sPolicy,',')) { return 'antisamy-' & arguments.sPolicy & '-' & variables.policyVersion & '.xml'; } else { // in this case we might need to load a custom policy file sFullPolicyPath = sFullPolicyPath & arguments.sPolicy & '-' & variables.policyVersion & '.xml'; if (FileExists(sFullPolicyPath)) { return 'antisamy-' & arguments.sPolicy & '-' & variables.policyVersion & '.xml'; } else { return 'antisamy-' & variables.defaultPolicy & '-' & variables.policyVersion & '.xml'; } } </cfscript> </cffunction> <cffunction name="sanitize" access="public" output="false" returntype="s <cfargument name="sTaintedHTML" default="" required="yes" type=" <cfargument name="sPolicy" default="" required="no" type="string <cfscript> var objResults = 0; if (arguments.sPolicy neq '') {variables.policyFile = va objResults = this.sanitizer.scan(arguments.sTaintedHTML, this.arrayErrorMessages = objResults.getErrorMessages(); return objResults.getCleanHTML(); </cfscript> </cffunction> </cfcomponent>
To use this component, you would instantiate an AntiSamy instance for each required policy in your request scope for each request where you need this component:
<cfscript> request.objSanitizer=CreateObject('component','antisamy.sanitizer').init('slashdot'); </cfscript>
Now you can process your user's input:
<cfscript> variables.sSanitizedHTML=request.objSanitizer.sanitize(variables.sSomeTaintedHTML); if (arrayLen(request.objSanitizer.arrayErrorMessages) gt 0) { // do some error handling like displaying the errors to the user } // end if (arrayLen(request.objSanitizer.arrayErrorMessages) gt 0) </cfscript>
You may now store and output the processed HTML and also display information about what sanitizing changed in the original HTML to the user - these messages are available in the request.objSanitizer.arrayErrorMessages-array.
I hope some of you might find this useful!
BTW - for those who need functionality like this in PHP, maybe HTML Purifier will suit your needs.
