devbox@COMPUTEC The Computec development blog

20May/103

ColdFusion UDF to generate SEO-friendly URL strings

This function might be convenient if you need to create a seo-friendly URL from a headline that could contain special characters such as German umlauts or accented letters; spaces would be replaced by dashes as recommended by Matt Cutts of Google. Unrecognized characters in a certain Unicode range will finally be replaced by x's, everything that's still not recognized will simply be dropped.

<cffunction name="parse4URL" output="false" access="public" returntype="string" hint="returns a URL safe string">
<cfargument name="string" default="" type="string">
<cfscript>
var returnString = arguments.string;
var InvalidChars = "à,ô,ď,ḟ,ë,š,ơ,ß,ă,ř,ț,ň,ā,ķ,ŝ,ỳ,ņ,ĺ,ħ,ṗ,ó,ú,ě,é,ç,ẁ,ċ,õ,ṡ,ø,ģ,ŧ,ș,ė,ĉ,ś,î,ű,ć,ę,ŵ,ṫ,ū,č,ö,è,ŷ,ą,ł,ų,ů,ş,ğ,ļ,ƒ,ž,ẃ,ḃ,å,ì,ï,ḋ,ť,ŗ,ä,í,ŕ,ê,ü,ò,ē,ñ,ń,ĥ,ĝ,đ,ĵ,ÿ,ũ,ŭ,ư,ţ,ý,ő,â,ľ,ẅ,ż,ī,ã,ġ,ṁ,ō,ĩ,ù,į,ź,á,û,þ,ð,æ,µ,ĕ,À,Ô,Ď,Ḟ,Ë,Š,Ơ,Ă,Ř,Ț,Ň,Ā,Ķ,Ŝ,Ỳ,Ņ,Ĺ,Ħ,Ṗ,Ó,Ú,Ě,É,Ç,Ẁ,Ċ,Õ,Ṡ,Ø,Ģ,Ŧ,Ș,Ė,Ĉ,Ś,Î,Ű,Ć,Ę,Ŵ,Ṫ,Ū,Č,Ö,È,Ŷ,Ą,Ł,Ų,Ů,Ş,Ğ,Ļ,Ƒ,Ž,Ẃ,Ḃ,Å,Ì,Ï,Ḋ,Ť,Ŗ,Ä,Í,Ŕ,Ê,Ü,Ò,Ē,Ñ,Ń,Ĥ,Ĝ,Đ,Ĵ,Ÿ,Ũ,Ŭ,Ư,Ţ,Ý,Ő,Â,Ľ,Ẅ,Ż,Ī,Ã,Ġ,Ṁ,Ō,Ĩ,Ù,Į,Ź,Á,Û,Þ,Ð,Æ,Μ,Ĕ";
var ValidChars 	 = "a,o,d,f,e,s,o,ss,a,r,t,n,a,k,s,y,n,l,h,p,o,u,e,e,c,w,c,o,s,o,g,t,s,e,c,s,i,u,c,e,w,t,u,c,oe,e,y,a,l,u,u,s,g,l,f,z,w,b,a,i,i,d,t,r,ae,i,r,e,ue,o,e,n,n,h,g,d,j,y,u,u,u,t,y,o,a,l,w,z,i,a,g,m,o,i,u,i,z,a,u,th,dh,ae,u,e,A,O,D,F,E,S,O,A,R,T,N,A,K,S,Y,N,L,H,P,O,U,E,E,C,W,C,O,S,O,G,T,S,E,C,S,I,U,C,E,W,T,U,C,Oe,E,Y,A,L,U,U,S,G,L,F,Z,W,B,A,I,I,D,T,R,Ae,I,R,E,Ue,O,E,N,N,H,G,D,J,Y,U,U,U,T,Y,O,A,L,W,Z,I,A,G,M,O,I,U,I,Z,A,U,TH,Dh,Ae,U,E";
 
// trim the string
returnString = Trim(returnString);
returnString = StripCR(returnString);
// replace known characters with the corresponding safe characters
returnString = ReplaceList(returnString,InvalidChars,ValidChars);
// replace unknown characters in the x00-x7F-range with x's
returnString = returnString.ReplaceAll('[^\x00-\x7F]','x');
 
// Replace one or many comma with a dash
returnString = returnString.ReplaceAll(',+', '-');
// Other substitutions
returnString = Replace(returnString, "%", "percent","ALL");
returnString = Replace(returnString, "&amp;", " and ","ALL");
returnString = Replace(returnString, "&", " and ","ALL");
returnString = returnString.ReplaceAll('[:,/]', '-');
// Replace one or more whitespace characters with a dash
returnString = returnString.ReplaceAll('[\s]+', '-');
// And everything else simply has to go
returnString = returnString.ReplaceAll('[^A-Za-z0-9\/-]','');
// finally replace multiple dash characters with just one
returnString = returnString.ReplaceAll('-+','-');
// we're done
return returnString;
</cfscript>
</cffunction>

This should probably cover most of your needs if you need to deal with European languages.

Comments (3) Trackbacks (0)
  1. CV: <a href="viewfile.cfm/cid/#attributes.id#/application/#attributes.applicationid#/file/#ReplaceList(GetApplicant.cap_nomember_cv, 'ä,&auml,),,\’,'],%20,%20,%20,%20,%20 ‘ )#”>

    When the cap_nomember_cv har the word tävlingar this code dont fine ä why

  2. I am sorry, but I cannot make very much of your example code; could you check formatting? What exactly are you trying to do? And what do you mean by your last sentence?

    I you just wish to replace a couple of characters/strings, you might try something like
    ReplaceList(‘tävlingar’, ‘ä,&auml,),\,]’,'%20,%20,%20,%20,%20′)

    But I’d suggest you take a look at URLEncodedFormat(), this might suit your needs far better.

  3. works like a charm for german ;-) thx


Leave a comment

(required)

No trackbacks yet.