How_to_install_Sphinx

###How to install Sphinx ##OpenCats Sphinx Integration - latest version contributed by @zoomiest from the OpenCATS forums.

##Download Sphinx

Login As User Root in Server via SSH

Download Sphinx Version 2.1.3 Source Tar Ball

root@c1:~#cd /root

root@c1:~#wget [http://sphinxsearch.com/files/sphinx-2.1.3-release.tar.gz](http://sphinxsearch.com/files/sphinx-2.1.3-release.tar.gz)

**Extract Contents of Tar Ball **

root@c1:~#tar xvd sphinx-2.1.3-release.tar.gz

Configure Sphinx Installer

root@c1:~#cd sphinx-2.1.3-release

root@c1:~#./configure --prefix=/opt

NOTE: The command above will set the Directory where Sphinx will be installed "/opt"

Compile Sphinx Source Code

root@c1:~#make

Install Sphinx in Server

root@c1:~#make install

Configure Environment Settings for Sphinx

root@c1:~#echo "export PATH=/opt/bin:$PATH" >> ~/.bashrc

root@c1:~#echo "export LD_LIBRARY_PATH=/opt/lib:$LD_LIBRARY_PATH" >> ~/.bashrc

root@c1:~#source ~/.bashrc

NOTE: These commands ensures that Sphinx and its libraries are configured properly.

##2. Sphinx Configuration for Open Cats##

###1. Sphinx Configuration Explained###

1. source catsdb

{

    type                    = mysql

    sql_host                = localhost

    sql_user                = cats

    sql_pass                = yourpasshere

    sql_db                  = cats

    sql_port                = 3306  # optional, default is 3306

    sql_query_pre           = \

            REPLACE INTO sph_counter SELECT 1, MAX(attachment_id) from attachment

    sql_query               = \

            SELECT attachment_id, title, attachment.site_id AS site_id, UNIX_TIMESTAMP(attachment.date_created) AS date_added, text \

    FROM attachment LEFT JOIN candidate ON data_item_id = candidate_id \

    WHERE resume = 1 AND data_item_type IN(100,500) AND text IS NOT NULL AND text != '' \

    AND attachment_id <= (SELECT max_doc_id FROM sph_counter WHERE counter_id = 1)

    sql_attr_uint           = site_id

    sql_attr_timestamp      = date_added

    sql_query_info          = SELECT * FROM attachment WHERE attachment_id=$id

}

	NOTE: In this section parameters that are of interest are the following:

	sql_host – is the hostname where the mysql database is hosted. 

If the same server just set to localhost.

  sql_user  - username to access the database

sql_pass – password to access database

sql_db – database name

sql_port – port used to access the database. If mysql should be default to 3306

2. source delta : catsdb

{

    sql_query_pre =

    sql_query = \

            SELECT attachment_id, title, attachment.site_id AS site_id, UNIX_TIMESTAMP(attachment.date_created) AS date_added, text \

    FROM attachment LEFT JOIN candidate ON data_item_id = candidate_id \

    WHERE resume = 1 AND data_item_type IN(100,500) AND text IS NOT NULL AND text != '' \

    AND attachment_id > (SELECT max_doc_id FROM sph_counter WHERE counter_id = 1)

}

NOTE: This setting is the used for computing the delta/changes in the indexes when reindexing the database. This should not be changed.

3. index cats

{

    source                  = catsdb

    path                    = /opt/var/data/catsindex

    docinfo                 = extern

    min_word_len            = 1

    charset_type            = utf-8

}

NOTE: The parameter that might be of interest here is the path. This setting indicates where the index file will be generated by the indexer application. Normally you don’t need edit this parameter. Unless you run out of disk space and need to point that to another mountpoint or disk with space.

4. index catsdelta : cats

{

    source                  = delta

    path                    = /opt/var/data/catsdelta

}

NOTE: This is used as a reference file when reindexing the database. The parameter that might be of interest is the path parameter. Which indicated the location where the delta/difference of the index is stored. Normally you don’t need to edit this parameter unless you run of out disk space in /opt and need to point this to a different disk/mountpoint.

5. indexer

{

    mem_limit               = 32M

}

NOTE: This Configuration is for the memory allowed to be used by the indexer application. It’s assumed in the current setup we only have 512 MB of RAM. This setting has been computed based on the available memory left in the server with MySQL and Apache running in the same Virtual Machine. If you increase your memory and want to optimize you might want to increase this setting higher so that the indexer can run faster.

6. searchd

{

    listen                  = 9312

    listen                  = 9306:mysql41

    log                     = /opt/var/log/searchd.log

    query_log               = /opt/var/log/query.log

    read_timeout            = 5

    max_children            = 30

    pid_file                = /opt/var/log/searchd.pid

    max_matches             = 1000

    seamless_rotate         = 1

    preopen_indexes         = 1

    unlink_old              = 1

    workers                 = threads # for RT to work

    binlog_path             = /opt/var/data

}

NOTE: This are the searchd configuration. The parameters that are of interest are:

listen – This is the port where searchd will be listening for opencast queries

log – logfile where searchd enters results and errors encountered

query_log – logfile for search queries done

binlog_path – the directory where binary logs are written.

The settings identified are normally not edited unless you run out of disk space in /opt and needs to point these parameters to a different disk or mountpoint.

Installing Sphinx Configuration for Open Cats

This document comes with a sphinx.conf file with the default configuration detailed in section A of this chapter. In order to install that configuration file to sphinx upload the file to the server and execute the following commands:

root@c1:~#cp sphinx.conf /opt/etc/sphinx.conf

##Open Cats Configuration

**1. Open Cats Sphinx Configuration **

/* CATS can optionally use Sphinx to speed up document searching.

  • Install Sphinx and set ENABLE_SPHINX (below) to true to enable Sphinx.

*/

define('ENABLE_SPHINX', true);

define('SPHINX_API', '/var/www/cats/lib/sphinx_latest/sphinxapi.php');

define('SPHINX_HOST', 'localhost');

define('SPHINX_PORT', 9312);

define('SPHINX_INDEX', 'cats catsdelta');

NOTE: This are the parameters related to sphinx in the config.php file in /var/www/cats/. The SPHINX_INDEX and SPHINX_PORT should be the same as the settings in sphinx.conf. If you are using the sphinx.conf file that came with this document. These settings should work properly. The SPHINX_API parameter is the location of the php library to allow opencast talk to our sphinx server. The actual sphinxapi.php file is included in the sphinx tarball that was extracted earlier. A section later will outline details on how to setup sphinxapi.php

2. Installing the Search.php file

This update to Searchd comes with a Search.php file that has been modified to work with open cats. The original Search.php file had some "die" commands that caused that module to exit prematurely. We have commented those lines in the version of the Search.php file that came with this document. To install upload the search.php file in the server. Login as root and execute the following commands:

root@c1:~#cd /root

root@c1:~#cp Search.php /var/www/cats/lib/Search.php

NOTE: Copy the new Search.php to /cats/lib/.

3. Installing the Sphinxapi.php file

The sphinx tarball that was downloaded earlier contains the latest sphinxapi.php file compatible with the version of sphinx we configured. We need to install this in the open cats directory. To do that execute the following commands as user root:

root@c1:~#cd sphinx-2.1.3-release/api

root@c1:~#mkdir /var/www/cats/lib/sphinx_latest

root@c1:~#cp sphinxapi.php /var/www/cats/lib/sphinx_latest/.

NOTE: The following commands will create the directory /var/www/cats/lib/sphinx_latest. And copy the latest sphinxapi.php that came with the sphinx installer in that directory. This will now be used by the open cats application.

4. Installing config.php file in opencats

This update comes with some changed sections for config.php, and are detailed separately. document came with a config.php file. This file has already been preconfigured with the settings outlined in section A of this chapter. To install upload config.php in the server. And as user root execute the following commands.

root@c1:~#cp config.php /var/www/cats/.

NOTE: This command will install config.php in the cats directory.

4. Indexer and Cronjob

Indexer is the Sphinx application that creates and index file from the data in the mysql database. It is supposed to be run every 30 minutes.

1. Running the indexer

root@c1:~#indexer --all

NOTE: This command normally takes 1 minute based on the current 4GB size of data. After this has been ran you can proceed with running the searchd application.

2. Setting the Indexer to run in the cronjob every 30 minutes

In order to set a cronjob that will run the indexer every 30 minutes execute the following commands:

root@c1:~#crontab -e

NOTE: This command will open a vim shell it normally has default values like this

	# Edit this file to introduce tasks to be run by cron.

	# 

	# Each task to run has to be defined through a single line

	# indicating with different fields when the task will be run

	# and what command to run for the task

	# 

	# To define the time you can provide concrete values for

	# minute (m), hour (h), day of month (dom), month (mon),

	# and day of week (dow) or use '*' in these fields (for 'any').

	# 

	# Notice that tasks will be started based on the cron's system

	# daemon's notion of time and timezones.

	# 

	# Output of the crontab jobs (including errors) is sent through

	# email to the user the crontab file belongs to (unless redirected).

	# 

	# For example, you can run a backup of all your user accounts

	# at 5 a.m every week with:

	# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/

	# 

	# For more information see the manual pages of crontab(5) and cron(8)

	# 

	# m h  dom mon dow   command

NOTE: To be able to edit go to the last line and press the i key. Then add the following entries:

30,59 * * * * /opt/bin/indexer --merge cats catsdelta --rotate > /tmp/indexer.log

NOTE: Then press esc :wq then enter. This will save the line you added. To verify if the changes you did took effect run this command:

root@c1:~#crontab -l

NOTE: You should get a result similar to this

# Edit this file to introduce tasks to be run by cron.

#

# Each task to run has to be defined through a single line

# indicating with different fields when the task will be run

# and what command to run for the task

#

# To define the time you can provide concrete values for

# minute (m), hour (h), day of month (dom), month (mon),

# and day of week (dow) or use '*' in these fields (for 'any').#

# Notice that tasks will be started based on the cron's system

# daemon's notion of time and timezones.

#

# Output of the crontab jobs (including errors) is sent through

# email to the user the crontab file belongs to (unless redirected).

#

# For example, you can run a backup of all your user accounts

# at 5 a.m every week with:

# 0 5 * * 1 tar -zcf /var/backups/home.tgz /home/

#

# For more information see the manual pages of crontab(5) and cron(8)

#

# m h dom mon dow command

30,59 * * * * /opt/bin/indexer --merge cats catsdelta --rotate > /tmp/indexer.log

Indexer now runs and indexes every 30th minute and 59th minute every hour.

5. Searchd Startup and Shutdown

**1. Searchd Startup **

To run searchd login as user root in the server and run the following commands:

root@c1:~# searchd -c /opt/etc/sphinx.conf

or searchd -c /etc/sphinxsearch/sphinx.conf

NOTE: If the you already followed the instructions in the previous chapters and have already ran the indexer command in chapter 6, the searchd command stated here ought to work. To validate if it’s running you can run this command:

root@c1:~#ps –ef|grep searchd

NOTE: You should get something like this after running the ps –ef|grep searchd command.

root 21323 1 0 Nov26 ? 00:00:00 searchd -c /opt/etc/sphinx.conf

root 21324 21323 0 Nov26 ? 00:01:40 searchd -c /opt/etc/sphinx.conf

root 26474 26427 0 07:58 pts/0 00:00:00 grep --color=auto searchd

NOTE: 21323 is the process ID of the searchd application. Whenever you successfully run the searchd application Linux assigns a process ID to it. You can use this process ID in the next section when you want to kill the searchd application.

2. Searchd Shutdown

In case you need to shutdown searchd application you can run the ps –ef |grep searchd command in the previous section. Get the process id and run this command to kill the searchd application:

root@c1:~#kill -9 21323

NOTE: Where 21323 is the process id that was returned after you did the ps –ef |grep searchd command. This number/value changes every time you run searchd.

Last updated