devbox@COMPUTEC The Computec development blog

4Feb/100

A little ReReplace CFConfusion (solved)

The following two snippets are only different in regard to the iSomeVal string which is used in a Regex replacement string. The first is a string that starts with a character ('fortytwo') the second is a string that starts with a number ('42'). The first example is working fine, whereas the second is somewhat lacking something quite essential:

<cfscript>
 variables.strSource = 'The answer is: Hmm.';
 variables.iSomeVal = 'fortytwo';
 variables.strTarget = ReReplace(variables.strSource,'^(.*:\s).*$','\1'&variables.iSomeVal,'ALL');
 writeOutput('~~' & variables.strTarget & '~~');
</cfscript>

Output: ~~The answer is: fortytwo~~

<cfscript>
 variables.strSource = 'The answer is: Hmm.';
 variables.iSomeVal = '42';
 variables.strTarget = ReReplace(variables.strSource,'^(.*:\s).*$','\1'&variables.iSomeVal,'ALL');
 writeOutput('~~' & variables.strTarget & '~~');
</cfscript>

Output: ~~The answer is:~~

WTF?

A few brain processor cycles later you realize what's going on: The replacement string now reads '\142' - and there's no back reference with that number.

So we need a little ugly trick because slash-escaping won't help us here either. We'll use the \u operator which usually would indicate to uppercase the following character (which in our case is an integer, so this doesn't harm us):

<cfscript>
 variables.strSource = 'The answer is: Hmm.';
 variables.iSomeVal = '42';
 variables.strTarget = ReReplace(variables.strSource,'^(.*:\s).*$','\1\u'&variables.iSomeVal,'ALL');
 writeOutput('~~' & variables.strTarget & '~~');
</cfscript>

Output: ~~The answer is: 42~~

Now in this case we're lucky. Any suggestions on what could be done if we don't actually know if the variable replacement bit after the first back reference would begin with a character or a number? I don't want to hack around it with a separating space or something similar - which could of course be removed in a second pass, but this doesn't seem elegant... Feels like I'm missing something extremely obvious here...

Update: Seems like I'm not missing anything ColdFusion-wise. There's actually a RegEx feature described on regular-expression.info as '$10 through $99 treated as $1 through $9 (and a literal digit) if fewer than 10 groups'. There's no clue as to the implementation in ColdFusion here, but judging from what I've seen in my example code, I'd say it's fair to assume that CF does not deliver the desired result in this category. But alas, all is not lost as we're running on top of Java, which I cannot ever shout out happily quite often enough under such circumstances. For behold:

<cfscript>
 variables.objRegex = createObject('component','JavaRegExp');
 variables.strSource = 'The answer is: Hmm.';
 variables.iSomeVal = '42';
 variables.strTarget = variables.objRegex.regExpReplace('^(.*:\s).*$',variables.strSource,'$1'&variables.iSomeVal,true);
 writeOutput('~~' & variables.strTarget & '~~');
</cfscript>

Output: ~~The answer is: 42~~

Yay! This snipped doesn't use ReReplace but the Java RegEx Component by massimocorner.com I mentioned in an earlier post UDF to strip certain chars, but leave UBB tags alone.

31Aug/090

Useful RegEx: Check if list contains only integers

Trivial, but useful... For 0 and positive integers only:

^[0-9]+(?:\,[0-9]+)*$

If you wish to include negative integers:

^(?:-?[0-9]+)(?:\,(?:-?[0-9]+))*$

So this might be a useful validity checker function:

<cffunction name="lstContainsIntegersOnly" output="no" returntype="boolean" access="private">
	<cfargument name="lstI" required="yes" type="string">
	<cfargument name="bNegativeAllowed" required="no" type="boolean" default="false">
	<cfscript>
		var bIntegersOnly = FALSE;
		if (arguments.bNegativeAllowed) {
			if (ReFind('^(?:-?[0-9]+)(?:\,(?:-?[0-9]+))*$',arguments.lstI)) { bIntegersOnly = TRUE; }
		} else {
			if (ReFind('^[0-9]+(?:\,[0-9]+)*$',arguments.lstI)) { bIntegersOnly = TRUE; }
		}
		return bIntegersOnly;
	</cfscript>
</cffunction>
21Jul/090

UDF to strip certain chars, but leave UBB tags alone

We are developing a commenting system which is supposed to discourage comment spam by making comments more or less unreadable when they crossed a certain threshold of negative ratings. We decided that we'd like to strip all vowels from the text, though we'd like to keep the UBB-style tags inside the comment unchanged.
You'll find that this last bit makes the whole task a little more complicated than just a simple Regex-Replace. We'll need to use a negative lookbehind, then mark the characters we do not wish to strip, then remove any "unmarked" characters and finally remove our marker.

12Jun/090

Regular Expression Optimization

Last night I bumped on article talking about regular expression performance tuning. After reading it and since we extensivly use regex to parse article & community content, I decided to see can we do something to boost performance on that side. So, here we go.