devbox@COMPUTEC The Computec development blog

5May/102

Full-text search with ColdFusion using Sphinx

Sphinx search component

The search component I wrote as an example on how to use the API does not use all the available features of Sphinx - there's still a lot to explore. Please download sphinxboardsearch.cfc to take a look at the code - I hope I have documented everything well enough so you can see what's going on.

For every search we want to perform, we first create an instance of the sphinxboardsearch.cfc. I have hardcoded the configuration data (i.e. IP and port) in the component - if you've got more complex needs you may decide to pass some sort of configuration object into the init-method. Currently the only argument I offer is the list of indexes to use.

Now the search being an object, we set search parameters as properties. We might like to

  • limit our search to certain board_ids (setBoardIdsToSearch()),
  • thread-ids (setThreadIdsToSearch()) or
  • user-ids (setUserIdsToSearch()),
  • we might decide to fetch only one result per thread instead of all the matching messages (setThreadGrouping()),
  • we could limit our search to a certain date range given by either ColdFusion date objects or Unix timestamps (setDateFilter(),setDateFilterUnix),
  • set different relevance weights for title and text of messages setFieldWeights()) or
  • specify sorting options, i.e. by date ascending/descending, relevance, board_id, user_id or thread_id; there is also an extended sorting mode which I have not implemented here as we didn't have much use for it - you're roll your own of course.

You may notice that not all search parameters can be combined. For example: If you have set the sorting rules to anything other than relevance, the use of setFieldWeights() won't do anything for you.

Once you have set up the search specifics, you send the actual query off to Sphinx using the querySphinx() method, which will return a struct containing a query object. The method takes the query string (which may well be empty to match all and just use filters and sorting options), limit, offset and a match mode as arguments.

There are a number of matching modes in Sphinx, all of them are fairly well covered by the docs. The most important modes would probably be the simple modes all (all keywords must be found to match), phrase (keywords must appear exactly as entered to match) and any (finding any of the keywords will result in a match) and the full featured extended matching mode. If you like to use setFieldWeights(), you'll likely be out of luck with the simple matching modes, you'll need to use the extended mode - the proper weighting one expects is currently only supported with this mode; the extended mode will also allow quite complex queries using the Sphinx internal query language.

In the setFieldWeights() method you'll notice some weird logic to call either the Java method setFieldWeights() or setFieldeights(). This was necessary due to a bug in versions prior to 0.9.9-final; there was a simple typo in the Java library code, so the method missed a letter. This was corrected in 0.9.9-final; to remain backwards compatible I dropped a new method methodExists() in my component which allows me to check if a Java class implements a certain method without resolving to catching an error. See my previous blog post on the subject for the details.

Next page: Let's do some searching!

« »

Comments (2) Trackbacks (1)

Leave a comment

(required)