devbox@COMPUTEC The Computec development blog

5May/102

Full-text search with ColdFusion using Sphinx

Configuration (3/3)

Sphinx indexer

Please see the docs for details, you really need to tune these settings so it matches both the resources available and the amount of data in your indexes.

indexer
{
        mem_limit = 1024M
        max_xmlpipe2_field = 8M
        write_buffer = 12M
}


For the indexer to actually do something, we'll have to add jobs to the crontab - we'll deal with this later.

Sphinx search daemon (searchd)

searchd
{
        listen = 9312
        log = /var/log/sphinx/searchd.log
        query_log = /var/log/sphinx/query.log
        read_timeout = 120
        max_children = 100
        pid_file = /var/run/sphinx/searchd.pid
        preopen_indexes = 1
        max_packet_size = 32M
        crash_log_path = /var/log/sphinx/crashlog
        read_buffer = 1M
}

# --eof--

listen sets the port/and or socket path you have your clients connect on. In previous Sphinx versions the default port has been 3312. With 0.9.9 this has changed to 9312, which is now the official IANA assigned port for the Sphinx API. The rest of the settings are well documented in both the online docs and the example config file sphinx.conf.dist you'll find in /etc/sphinx/ after installation.

Sphinx indexing job

Now we have got everything set in place, we should do a first indexing run:

su sphinx -c "/opt/sphinx/bin/indexer \
 --config /etc/sphinx/sphinx.conf forummain"

This should output something like

Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/etc/sphinx/sphinx.conf'...
indexing index 'forummain'...
collected 4043648 docs, 1963.5 MB
sorted 515.8 Mhits, 100.0% done
total 4043648 docs, 1963534582 bytes
total 1431.203 sec, 1371946 bytes/sec, 2825.34 docs/sec
total 21 reads, 3.403 sec, 124044.4 kb/call avg, 162.0 msec/call avg
total 508 writes, 26.715 sec, 12481.5 kb/call avg, 52.5 msec/call avg

The procedure took several minutes. Now let's build the delta index:

su sphinx -c "/opt/sphinx/bin/indexer  \
 --config /etc/sphinx/sphinx.conf forumdelta"

This will take no more than a few seconds:

Sphinx 0.9.9-release (r2117)
Copyright (c) 2001-2009, Andrew Aksyonoff

using config file '/etc/sphinx/sphinx.conf'...
indexing index 'forumdelta'...
collected 20 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 20 docs, 10456 bytes
total 0.223 sec, 46709 bytes/sec, 89.34 docs/sec
total 2 reads, 0.000 sec, 11.4 kb/call avg, 0.0 msec/call avg
total 8 writes, 0.000 sec, 13.4 kb/call avg, 0.0 msec/call avg

Now it is time to set up the Sphinx start script and start the search daemon:

chmod +x /etc/init.d/sphinx
update-rc.d sphinx defaults
/etc/init.d/sphinx start

And finally we set up the jobs for the indexer in /etc/crontab:

# rebuild main and archive indexes from scratch once a day
10 1 * * * sphinx /opt/sphinx/bin/indexer --rotate --config /etc/sphinx/sphinx.conf forummain 2>&1 > /dev/null
30 1 * * * sphinx /opt/sphinx/bin/indexer --rotate --config /etc/sphinx/sphinx.conf forumarchive 2>&1 > /dev/null
# rebuild forumdelta index every five minutes from 00:00-00:55, 02:00-23:55,
# leaving a one hour gap from 01:00-01:55 during which the full index will be rebuilt
*/5 0,2-23 * * * sphinx /opt/sphinx/bin/indexer --rotate --config /etc/sphinx/sphinx.conf forumdelta > /dev/null

The additional --rotate argument tells the indexer to send SIGHUP to searchd, which causes the search daemon to reload the index from the newly created files into memory.

Concerning the server side, we're all done. Of course you should do the standard admin stuff like deal with log rotation, set up a nagios watchdog for /opt/sphinx/bin/searchd etc., so this won't be covered here.

Next page: The Sphinx search component

« »

Comments (2) Trackbacks (1)

Leave a comment

(required)