devbox@COMPUTEC The Computec development blog

5May/102

Full-text search with ColdFusion using Sphinx

Let's do some searching!

Now we've got everything in place including our API abstracting component, we can do some searching:

<cfscript>
// search in forum (incl. delta)
variables.objForumSphinx = CreateObject("component",
	"sphinxboardsearch").init(strIndexes='forummain;forumdelta;');	
// search in board id's 10 and 2
variables.objForumSphinx.setBoardIdsToSearch("10,2");
 
// weighted sort: title matches are more relevant than text matches
variables.objForumSphinx.setFieldWeights(iWeightTitle=100,iWeightText=10);
 
// uncomment to return only messages by user with id 346544
//variables.objForumSphinx.setUserIdsToSearch("346544");
 
// uncomment to return only distinct threads, not individual messages
// variables.objForumSphinx.setThreadGrouping();
 
// uncomment to sort by created asc, i.e. oldest first
// (would override setFieldWeights)
// variables.objForumSphinx.setSort(strAttribute='created',
	strSortDirection='desc');
 
// set date filtering
variables.objForumSphinx.setDateFilter(ParseDateTime('2002-04-01'),
	ParseDateTime('2010-05-06 16:48:04'));
 
// search for World of Warcraft, return 10 results max,
//    offset at 0, set match mode to extended
variables.myResult = variables.objForumSphinx.querySphinx(
	strQuery='world of warcraft',
	iLimit=10,iOffset=0,strMatchMode='extended');
</cfscript>
<cfdump var="#variables.myResult#">


The result might look somewhat like this:

Sphinx search result dump

Dump of a Sphinx search result (click to enlarge)

You'll notice that although we've got some basic information concerning the result meta data, something quite important is missing: The actual full text, for example the headlines of the postings we've found. We've got the total number of matches and the number of returned matches in the return struct (so implementing pagination is trivial), we can even brag a little about the blinding fast query time in our output, but we cannot actually list the headlines to the matching posts in our result (let alone URLs for the links to those postings).

So under any but the most unusual circumstances you won't be able to avoid just one little trip to your original database where it all began, something like:

<cfset variables.lstMessageIds = 
	valueList(variables.myResult.qResults.message_id) />
<cfquery name="variables.getMessages" datasource="forumdb">
SELECT message_id, title 
	FROM board_message 
	WHERE id in (<cfqueryparam 
				cfsqltype="cf_sql_integer" 
				value="#variables.lstMessageIds#" 
				list="yes">)
</cfquery>

Of course you should add some logic to catch those cases where you get zero results, but you can see that this part has become quite simple.

I hope that this somewhat longer blog post has given you an appetizing overview of how you could benefit from Sphinx as a ColdFusion developer. This useful beast has certainly solved a couple of our most pressing performance problems in regards to searching through large quantities of text stored inside databases - and there are a couple of more exotic application scenarios imaginable - just skim through the forums to get a little more inspiration.

« »

Comments (2) Trackbacks (1)

Leave a comment

(required)