# # Config file for mnoGoSearch and WWWOFFLE. # # This configuration file is used by mnoGoSearch with WWWOFFLE. # #### THE PROXY HOST IS "LOCALHOST" IN THE Proxy AND Server OPTIONS #### ########################################################################### # Use '#' to comment out lines. # All command names are case insensitive (DBAddr=DBADDR=dbaddr). # You may use '\' character to prolong current command to next line # when it is required. # # You may include enother configuration file in any place of the indexer.conf # using "Include " command. # Absolute path if starts with "/": #Include /usr/local/mnogosearch/etc/inc1.conf # Relative path else: #Include inc1.conf ########################################################################### ########################################################################### # Section 1. # Global parameters. ########################################################################### # DBAddr # Options (type, host, database name, port, user and password) # to connect to SQL database. # Do not matter for built-in text files support. # Should be used only once and before any other commands. # Command have global effect for whole config file. # Format: #DBAddr :[//[DBUser[:DBPass]@]DBHost[:DBPort]]/DBName/ # # ODBC notes: # Use DBName to specify ODBC data source name (DSN) # DBHost does not matter, use "localhost". # Solid notes: # Use DBHost to specify Solid server # DBName does not matter for Solid # # Currently supported DBType values are # mysql, pgsql, msql, solid, mssql, oracle, ibase. # Actually, it does not matter for native libraries support. # But ODBC users should specify one of supported values. # If your database type is not supported, you may use "unknown" instead. # If you are using PostgreSQL and do not specify hostname, # e.g. pgsql://user:password@/dbname/ # then PostgreSQL will not work via TCP, but will use Unix socket. DBAddr mysql://wwwoffle@localhost/mnogosearch/ ####################################################################### # DBMode single/multi/crc/crc-multi # Does not matter for built-in text files support # You may select SQL database mode of words storage. # When "single" is specified, all words are stored in the same # table. If "multi" is selected, words will be located in different # tables depending of their lengths. "multi" mode is usually faster # but requires more tables in database. # # If "crc" mode is selected, mnoGoSearch will store 32 bit integer # word IDs calculated by CRC32 algorythm instead of words. This # mode requres less disk space and it is faster comparing with "single" # and "multi" modes. "crc-multi" uses the same storage structure with # the "crc" mode, but also stores words in different tables depending on # words lengths like "multi" mode. # #Default DBMode value is "single": #DBMode single DBMode crc-multi ####################################################################### # VarDir /usr/local/mnogosearch/var # You may choose alternative working directory for built-in database # as well as for cache mode: # #VarDir /usr/local/mnogosearch/var ####################################################################### #SyslogFacility # This is used if indexer was compiled with syslog support and if you # don't like the default value. Argument is the same as used in syslog.conf # file. For list of possible facilities see syslog.conf(5) #SyslogFacility local7 ####################################################################### #LogdAddr host[:port] # Use cachelogd at given host and port if specified. # It is required for "cache mode" only. Default values are localhost # and port 7000 #LogdAddr localhost:7000 ####################################################################### # LocalCharset # Defines charset of local file system. It is required if you are using # 8 bit charsets and does not matter for 7 bit charsets. # This command should be used once and takes global effect for the config file. # Choose currently supported one: # # Western Europe: Germany #LocalCharset iso-8859-1 # # Central Europe: Czech #LocalCharset iso-8859-2 # # ISO Cyrillic #LocalCharset iso-8859-5 # # Unix Cyrillic #LocalCharset koi8-r # # MS Central Europe: Czech #LocalCharset windows-1250 # # MS DOS Cyrillic #LocalCharset cp866 # # MS Cyrillic #LocalCharset windows-1251 # # MS Arabic #LocalCharset windows-1256 # # Mac Cyrillic #LocalCharset x-mac-cyrillic # # ISO Greek #LocalCharset iso-8859-7 # # MS Greek #LocalCharset windows-1253 # # ISO Hebrew #LocalCharset iso-8859-8 # # MS Hebrew #LocalCharset windows-1255 # # ISO Baltic #LocalCharset iso-8859-4 #LocalCharset iso-8859-13 # # MS Baltic #LocalCharset windows-1257 # # ISO Turkish #LocalCharset iso-8859-9 # # MS Turkish #LocalCHarset windows-1254 ####################################################################### #ForceIISCharset1251 yes/no #This option is useful for users which deals with Cyrillic content and broken #(or misconfigured?) Microsoft IIS web servers, which tends to not report #charset correctly. This is really dirty hack, but if this option is turned on #it is assumed that all servers which reports as 'Microsoft' or 'IIS' have #content in Windows-1251 charset. #This command should be used only once in configuration file and takes global #effect. #Default: no #ForceIISCharset1251 no ########################################################################### # Ispell support commands. Detailed description is given in /doc/ispell.txt # Ispell commands MUST be given after LocalCharset definition. # Set ispell mode. Can be text (default) or db. If set to db then # Affix and Spell command should not be used. #IspellUsePrefixes yes/no # If enabled, indexer will use ispell prefixes, not only suffixes # Default: no #Ispellmode text # Load ispell affix file: #Affix # Load ispell dictionary file #Spell # File names are relative to mnoGoSearch /etc directory # Absolute paths can be also specified. # #Affix en en.aff #Spell en en.dict ########################################################################### #Phrase yes/no # Whether to index with phrase support. Default value is no. #Phrase no ########################################################################### #CrossWords yes/no # Whether to build CrossWords index # Default value is no #CrossWords no ########################################################################### # StopwordFile # Load stop words from the given text file. You may specify either absolute # file name or a name relative to mnoGoSearch /etc directory. You may use # several StopwordFile commands. # #StopwordFile stopwords.txt ########################################################################### # StopwordTable [...] # Load stop words from the given SQL table. You may use several # StopwordTable commands. This command has no effect work when compiled # without SQL database support. # StopwordTable stopword ####################################################################### # Word lengths. You may change default length range of words # stored in database. By default, words with the length in the # range from 1 to 32 are stored. Note that setting MaxWordLength more # than 32 will not work as expected. # #MinWordLength 1 #MaxWordLength 32 ####################################################################### # MaxDocSize bytes # Default value 1048576 (1 Mb) # Takes global effect for whole config file #MaxDocSize 1048576 ####################################################################### # HTTPHeader
# You may add your desired headers in indexer HTTP request # You should not use "If-Modified-Since","Accept-Charset" headers, # these headers are composed by indexer itself. # "User-Agent: mnoGoSearch/version" is sent too, but you may override it. # Command has global effect for all configuration file. # #HTTPHeader User-Agent: My_Own_Agent #HTTPHeader Accept-Language: ru, en #HTTPHeader From: webmaster@mysite.com ####################################################################### # ServerTable (SQL only, not supported with build-in database) # Load servers with all their parameters from the table "table_name". # Check an example of these tables structure in create/mysql/server.txt # You may use several arguments for this command: #ServerTable my_servers1 my_servers2 my_servers3 # or the only one argument: # #ServerTable server ####################################################################### #DeleteNoServer yes/no # Use it to choose whether delete or not those URLs which have no # correspondent "Server" commands. # Default value is "yes". #DeleteNoServer yes ########################################################################## # Section 2. # URL control configuration. ########################################################################## #Allow [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] # Use this to allow URLs that match (doesn't match) given argument. # First three optional parameters describe the type of comparison. # Default values are Match, NoCase, String. # Use "NoCase" or "Case" values to choose case insensitive or case sensitive # comparison. # Use "Regex" to choose regular expression comparison. # Use "String" to choose string with wildcards comparison. # Widlcards are '*' for any number of characters and '?' for one character. # Note that '?' and '*' have special meaning in "String" match type. Please use # "Regex" to describe documents with '?' and '*' signs in URL. # "String" match is much faster than "Regex". Use "String" where it # is possible. # You may use several arguments for one 'Allow' command. # You may use this command any times. # Takes global effect for config file. # Note that mnoGoSearch automatically adds one "Allow regex .*" # command after reading config file. It means that allowed everything # that is not disallowed. # Examples # Allow everything: #Allow * # Allow everything but .php .cgi .pl extensions case insensitively using regex: #Allow NoMatch Regex \.php$|\.cgi$|\.pl$ # Allow .HTM extension case sensitively: #Allow Case *.HTM ########################################################################## #Disallow [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] # Use this to disallow URLs that match (doesn't match) given argument. # The meaning of first three optional parameters is exactly the same # with "Allow" command. # You can use several arguments for one 'Disallow' command. # Takes global effect for config file. # # Examples: # Disalow URLs that are not in udm.net domains using "string" match: #Disallow NoMatch *.udm.net/* # Disallow any except known extensions and directory index using "regex" match: #Disallow NoMatch Regex \/$|\.htm$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ # Exclude cgi-bin and non-parsed-headers using "string" match: #Disallow */cgi-bin/* *.cgi */nph-* # Exclude anything with '?' sign in URL. Note that '?' sign has a # special meaning in "string" match, so we have to use "regex" match here: #Disallow Regex \? # Exclude some known extensions using fast "String" match: Disallow *.b *.sh *.md5 *.rpm Disallow *.arj *.tar *.zip *.tgz *.gz *.z *.bz2 Disallow *.lha *.lzh *.rar *.zoo *.ha *.tar.Z Disallow *.gif *.jpg *.jpeg *.bmp *.tiff *.tif *.xpm *.xbm *.pcx Disallow *.vdo *.mpeg *.mpe *.mpg *.avi *.movie *.mov *.dat Disallow *.mid *.mp3 *.rm *.ram *.wav *.aiff *.ra Disallow *.vrml *.wrl *.png Disallow *.exe *.com *.cab *.dll *.bin *.class *.ex_ Disallow *.tex *.texi *.xls *.doc *.texinfo Disallow *.rtf *.pdf *.cdf *.ps Disallow *.ai *.eps *.ppt *.hqx Disallow *.cpt *.bms *.oda *.tcl Disallow *.o *.a *.la *.so Disallow *.pat *.pm *.m4 *.am *.css Disallow *.map *.aif *.sit *.sea Disallow *.m3u *.qt *.mov # Exclude Apache directory list in different sort order using "string" match: Disallow *D=A *D=D *M=A *M=D *N=A *N=D *S=A *S=D # More complicated case. RAR .r00-.r99, ARJ a00-a99 files # and unix shared libraries. We use "Regex" match type here: Disallow Regex \.r[0-9][0-9]$ \.a[0-9][0-9]$ \.so\.[0-9]$ ########################################################################## #CheckOnly [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] # The meaning of first three optional parameters is exactly the same # with "Allow" command. # Indexer will use HEAD instead of GET HTTP method for URLs that # match/do not match given regular expressions. It means that the file # will be checked only for being existing and will not be downloaded. # Useful for zip,exe,arj and other binary files. # Note that you can disallow those files with commands given below. # You may use several arguments for one "CheckOnly" commands. # Useful for example for searching through the URL names rather than # the contents (a la FTP-search). # Takes global effect for config file. # # Check some known non-text extensions using "string" match: #CheckOnly *.b *.sh *.md5 #CheckOnly *.arj *.tar *.zip *.tgz *.gz #CheckOnly *.lha *.lzh *.rar *.zoo *.tar*.Z #CheckOnly *.gif *.jpg *.jpeg *.bmp *.tiff #CheckOnly *.vdo *.mpeg *.mpe *.mpg *.avi *.movie #CheckOnly *.mid *.mp3 *.rm *.ram *.wav *.aiff #CheckOnly *.vrml *.wrl *.png #CheckOnly *.exe *.cab *.dll *.bin *.class #CheckOnly *.tex *.texi *.xls *.doc *.texinfo #CheckOnly *.rtf *.pdf *.cdf *.ps #CheckOnly *.ai *.eps *.ppt *.hqx #CheckOnly *.cpt *.bms *.oda *.tcl #CheckOnly *.rpm *.m3u *.qt *.mov #CheckOnly *.map *.aif *.sit *.sea # # or check ANY except known text extensions using "regex" match: #CheckOnly NoMatch Regex \/$|\.html$|\.shtml$|\.phtml$|\.php$|\.txt$ ########################################################################## #HrefOnly [Match|NoMatch] [NoCase|Case] [String|Regex] [ ... ] # The meaning of first three optional parameters is exactly the same # with "Allow" command. # # Use this to scan a HTML page for "href" tags but not to index the contents # of the page with an URLs that match (doesn't match) given argument. # Commands have global effect for all configuration file. # # When indexing large mail list archives for example, the index and thread # index pages (like mail.10.html, thread.21.html, etc.) should be scanned # for links but shouldn't be indexed: # #HrefOnly */mail*.html */thread*.html # How to combine Allow, Disallow, CheckOnly, HrefOnly commands. # # indexer compares URLs against all these command arguments in the # order of their appearence in indexer.conf file. # If indexer find that URL matches some rule it will make a decision of what # to do with this URL, allow it, disallow it or use HEAD instead # of the GET method. So, you may use different Allow, Disallow, # CheckOnly, HrefOnly commands order. # If no one of these commands are given, mnoGoSearch will allow everything # by default. # # There are many possible combinations. Samples of two of them are here: # # Sample of first useful combination. # Disallow known non-text extensions (zip,wav etc), # then allow everything else. This sample is uncommented above (note that # there is actually no "Allow *" command, it is added automatically after # indexer.conf loading). # # Sample of second combination. # Allow some known text extensions (html, txt) and directory index ( / ), # then disallow everything else: # #Allow .html .txt */ #Disallow * ################################################################ # Section 3. # Mime types and external parsers. ################################################################ #UseRemoteContentType yes/no # This command specifies if the indexer should get content type # from http server headers (yes) or from it's AddType settings (no). # If set to 'no' and the indexer could not determine content-type # by using its AddType settings, then it will use http header. # Default: yes #UseRemoteContentType yes ################################################################ #AddType [String|Regex] [Case|NoCase] [...] # This command associates filename extensions (for services # that don't automatically include them) with their mime types. # Currently "file:" protocol uses these commands. # Use optional first two parameter to choose comparison type. # Default type is "String" "NoCase" (case insensitive string match with # '?' and '*' wildcards for one and several characters correspondently). # AddType text/plain *.txt *.pl *.js *.h *.c *.pm *.e AddType text/html *.html *.htm AddType image/x-xpixmap *.xpm AddType image/x-xbitmap *.xbm AddType image/gif *.gif # # You may also use quotes in mime type definition # for example to specify charset. e.g. Russian webmasters # often use *.htm extension for windows-1251 documents and # *.html for unix koi8-r documents: # #AddType "text/html; charset=koi8-r" *.html #AddType "text/html; charset=windows-1251" *.htm # # More complicated example for rar .r00-r.99 using "Regex" match: #AddType Regex application/rar \.r[0-9][0-9]$ # # Default unknown type for other extensions: AddType application/unknown *.* # Mime # # This is used to add support for parsing documents with mime types other # than text/plain and text/html. It can be done via external parser (which # must provide output in plain or html text) or just by substituting mime # type so indexer will understand it. # # and are standard mime types # is either text/plain or text/html # # Optional charset parameter used to change charset if needed. # Mime command understands case insensitive string match # with ? and * signs. You may use: # # Mime application/pdf* # # Command line may have $1 parameter which stands for temporary file name. # Some parsers can not operate on stdin, so indexer creates temporary file # for parser and it's name passed instead of $1. There are many ways to use parsers, # take a look into doc/parsers.txt for other parser types and parsers usage explanation. # Examples: # # # #Mime application/msword "text/plain; charset=cp1251" "catdoc $1" #Mime "application/pdf; charset=iso-8859-1" "text/plain" "pdftotext $1" #Mime application/x-troff-man text/plain "deroff" #Mime text/x-postscript text/plain "ps2ascii" #Mime application/vnd.ms-excel text/plain "xls2csv $1" #Mime "text/rtf*" text/html "rthc --use-stdout $1 2>/dev/null" ######################################################################### # Section 4. # Aliases configuration. ######################################################################### #Alias # You can use this command for example to organize search through # master site by indexing a mirror site. It is also usefull to # index your site from local file system. # mnoGoSearch will display URLs from while searching # but go to the while indexing. # This command has global indexer.conf file effect. # You may use several aliases in one indexer.conf. #Alias http://www.mysql.com/ http://mysql.udm.net/ #Alias http://www.site.com/ file:/usr/local/apache/htdocs/ ######################################################################### #AliasProg # AliasProg is an external program that can be called, that takes a URL, # and returns the appropriate alias to stdout. Use $1 to pass a URL. This # command has global effect for whole indexer.conf. # Example: #AliasProg "echo $1 | /usr/local/mysql/bin/replace http://localhost/ file:/home/httpd/" ####################################################################### # Section 5. # Servers configuration. ####################################################################### #Period