-
Notifications
You must be signed in to change notification settings - Fork 147
Setup Search Module
Install the module with the package manager command: install-package BetterCms.Module.LuceneSearch.
The module will install with two workers, executed as asynchronous background processes that won't block the web application:
- Index source watcher: This worker scans the Better CMS "pages" table and adds new pages to the indexing queue.
- Indexing robot: This worker scans list of pages from the indexing queue and crawls specified URLS. At first, new pages are crawled, followed by failed pages and then already-crawled pages.
Use these parameters for configuring Lucene search module:
- LuceneWebSiteUrl: web site URL (prefix, which will be added to scraping URLs)
- LuceneFileSystemDirectory: Lucene files directory
-
LucenePagesWatcherFrequency: frequency time span, how often the worker should look for newly created pages. Set to
00:00:00to disable the new pages watcher. - LuceneIndexerPageFetchTimeout: page fetching timeout (how long the system will wait for a page to respond). Default value: 00:01:00 (1 minute)
-
LuceneIndexerFrequency: frequency timespan, how often the content indexer should re-index a page's content. Set to
00:00:00to disable the indexer. - LuceneMaxPagesPerQuery: maximum number of re-indexed pages per query. Default value: 1000
- LucenePageExpireTimeout: indexed page expire timeout.
- LuceneDisableStopWords: disables stop words such as ["a", "the", "of", ...] when indexing the content.
-
LuceneSearchForPartOfWords: if set to true, searches within words will be performed (similar to
LIKE %query%in SQL) - LuceneIndexPrivatePages: if set to true, searches within private pages will be performed (authorization is required)
-
LuceneAuthorizationUrl: authorization URL (where user credentials are sent using POST method). May be the same URL as log in form (for example,
/login/). -
LuceneAuthorizationForm: authorization form POST's parameters with values, e.g.
LuceneAuthorizationForm.UserName,LuceneAuthorizationForm.Password,LuceneAuthorizationForm.CustomField
Example:
<search>
<add key="LuceneWebSiteUrl" value="http://bettercms.sandbox.mvc4.local/" />
<add key="LuceneFileSystemDirectory" value="../../../Lucene.BetterCms" />
<add key="LuceneIndexerFrequency" value="00:05:00" />
<add key="LuceneIndexerPageFetchTimeout" value="00:01:00" />
<add key="LucenePagesWatcherFrequency" value="00:05:00" />
<add key="LuceneMaxPagesPerQuery" value="1000" />
<add key="LucenePageExpireTimeout" value="00:00:00" />
<add key="LuceneDisableStopWords" value="true" />
<add key="LuceneSearchForPartOfWords" value="true" />
<add key="LuceneIndexPrivatePages" value="true" />
<add key="LuceneAuthorizationUrl" value="http://bettercms.sandbox.mvc4.local/login" />
<add key="LuceneAuthorizationForm.UserName" value="admin" />
<add key="LuceneAuthorizationForm.Password" value="admin" />
<add key="LuceneAuthorizationForm.RememberMe" value="true" />
</search>
There is an ability to log Lucene workers to another log file. Just use Lucene search module namespace LuceneSearchModule in the log configuration files.
There is an example, how all the information should be logged to file bettercms.log and Lucene search module's information - to the file bettercms.search.log:
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<targets>
[...]
<target name="log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.log" archiveFileName="${basedir}/logs/error_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />
[...]
<target name="search_log_file" xsi:type="File" fileName="${basedir}/logs/bettercms.search.log" archiveFileName="${basedir}/logs/search_log_${shortdate}_{#####}.log" layout="${longdate} ${message}${newline}${exception:format=message,tostring:maxInnerExceptionLevel=10:innerFormat=message,tostring}" concurrentWrites="true" archiveEvery="Day" archiveNumbering="Rolling" maxArchiveFiles="100" />
[...]
</targets>
<rules>
<logger name="LuceneSearchModule" writeTo="search_log_file" minlevel="Trace" final="true" />
[...]
<logger name="*" writeTo="log_file" minlevel="Trace" maxlevel="Fatal" />
</rules>
</nlog>
Install module with package manager command: install-package BetterCms.Module.GoogleSiteSearch.
For enabling Google Site search, user should have created Google Site Search account (can be registered here). It's paid service, prices are available here.
Google search is being done using such an URL query: https://www.googleapis.com/customsearch/v1?key={0}&cx={1} (read more here). These parameter can be set within cms.config file's search section:
-
GoogleSiteSearchApiKey: Your google API key (
keyin the URL). -
GoogleSiteSearchEngineKey: Search engine's ID (
cxin the URL).
Example:
<search>
<add key="GoogleSiteSearchApiKey" value="[BETTERCMS_GOOGLE_SEARCH_API_KEY]" />
<add key="GoogleSiteSearchEngineKey" value="[BETTERCMS_GOOGLE_SEARCH_ENGINE_KEY]" />
</search>
When BetterCms.Module.GoogleSiteSearch or BetterCms.Module.LuceneSearch module is installed, main search module BetterCms.Module.Search is installed also as referenced module. It creates two widgets within category Search: Search input form widget and Search results widget.
How to setup these widgets is discussed here.
To use search module API method, module BetterCms.Module.Search.Api should be installed (with package manager command install-package BetterCms.Module.Search.Api)