Solr Search Power for Joomla!: Solr4Joomla & SaaS4Joomla

engineachooInternet and Web Development

Dec 4, 2013 (3 years and 11 months ago)

120 views






















Solr Search Power for Joomla!: Solr4Joomla & SaaS4Joomla
Doc
umentation


Solr4Joomla

Components for integrating Apache Solr with Joomla






Table of content


1. Overview

................................
................................
................................
............................

3

2. In
stallation

................................
................................
................................
..........................

3

3. solR Search Component


com_solrsearch (Joomla Component)

................................
.....

4

3.1 Indexing PDF Files

................................
................................
................................
.......

4

3.2 Indexing the Joomla Database

................................
................................
......................

5

3.3 Indexing Blog Posts created with the Joomla Wordpress Plugin

................................
.

5

3.4 Solr Server Configuration

................................
................................
............................

5

3.5 Separator in CSV File

................................
................................
................................
..

6

3.6 Separator for meta data fields in „Keywords“ (Article
s)

................................
.............

6

3.7 Defining SearchId and Metafield:Values

................................
................................
.....

6

3.8 Activating Access Level Checking

................................
................................
..............

7

3.9 Meta field for displaying as Headline for the standard Hit List

................................
...

7

4. Hit List


mod_shi_hitlist (Joomla Module)

................................
................................
......

7

4.1 General

................................
................................
................................
.........................

7

4.2 Configuration

................................
................................
................................
...............

9

4.3 Visual Representation

................................
................................
................................

10

4.4 Displaying Documents from the Hit List

................................
................................
...

11

5. Facettes


mod_shi_facets (Joomla Module)

................................
................................
...

11

5.1 Configuration

................................
................................
................................
.............

11

5.2 Visual Representation

................................
................................
................................

12

6. Search input field


mod_shi_search (Joomla Module)

................................
...................

12

6.1 Configuration

................................
................................
................................
.............

12

6.2 Visual Representation

................................
................................
................................

13

7. Search Dropdown


mod_shi_searchcombo (Joomla Module)

................................
.......

14

7.1 Configuration

................................
................................
................................
.............

14

7.2 Visual Representation

................................
................................
................................

15

8. Date Search


mo
d_shi_searchdate (Joomla Module)

................................
.....................

15

8.1
Configuration

................................
................................
................................
.............

15

8.2 Visual Representation

................................
................................
................................

16

9. SolrSearchInitialiser
-

plg_shi_solrsearchinitialiser
-

(Joomla Plugin)

............................

16

10. SearchId


Directly assign Search Requests to Menu Entries

................................
.......

16

11. Solr Configuration

................................
................................
................................
..........

17

11.1 Joomla Component

................................
................................
................................
...

17

11.2 Solr Instance

................................
................................
................................
.............

17




1.
Overview







2.
Installation


If you are updating, please make sure that all plugin, component and modules of any prior
version are being removed manually

before installing the new ones
.

All extensions are being installed in one step.
Dur
ing installation of the component
“com_solrsearch” the following extensions will additionaly be installed:


-

mod_shi_facets

-

mod_shi_hitlist

-

mod_shi_search

-

mod_shi_searchcombo

-

mod_shi_searchdate

-

plg_shi_solrsearchinitialiser




The plugin „plg_shi_solrsearch
initialiser“ will be automatically activated during installation.

With the current version, all extension
are being

automatically uninstalled
when uninstalling
“com_solrsearch”.

3.
solR Search
Component



com_solrsearch (Joomla
Component
)

3.1
Indexi
ng

PDF

Files

The solrR Search Component allows you to index PDF files.
The solr Search UI can be
opened in the Components menu of the Joomla backend




The solR Search page shows a
n

input field and a button to start the indexing process.
In the
input field the

root directory, under which all PDF file are stored, has to be entered. Since
only one root directory for PDF files can be specified it is recommended to organize your
PDF file accordingly. Please also note, that the specified root directory has to be acc
essible as
an URL within the

web context of your Joomla site.


3.1.1 Using a CSV file to specify meta data

Most PDF files do not have proper meta data
such as
title

or author. Solr4Joomla therefore
supplies a mechanism to provide all necessary meta data ex
ternally through a CSV file.

The
CVS file has to be located in the specified root directory.

Please note, that when using a CSV file, only the PDF files listed there will be indexed.

Within this CSV file the first entry of each line must specify the name a
nd relative path
(relative to
t
he root directory entered) of a PDF file.
The consecutive entries within a line will
then define the meta data.


Example:

Specified

r
oot

directory

/opt/
pdf
/
archiv
e


CSV
file
:

2002/01/
FileToIndex1
.pdf,
year
:2002,
m
on
th
:Januar
y

2003/01/
FileToIndex2
.pdf,
year
:2003,
month
:Januar
y




Field name and value of meta data are separated by a colon:

foo/bar/my.pdf, title:This is the Title


The separator of the single meta fields can be configured in the Joomla Backend (see section
3.5 Sep
arator in CSV File)


Note
:

Solr automatically extracts meta data
(e.g. author or title)
from the files indexed.

If you want
to override the meta data that is stored in the indexed files, meta field names have to be
postfixed with “_prio”.


Example:

2002/01
/
FileToIndex
.pdf, author_prio:
John Smith
, content_type_prio:PDF

d
o
c
ument


3.2 Indexing
the

Joomla Dat
abase

Besides indexing PDF files, the SolR Search Component can of course index Joomla articles
stored in the jos_content table of Joomla
´s database.

Only
articles having the published flag set will be indexed.


Following meta fields of an article will be indexed:


Article

Indexed

under following meta key

Article ID

id

Title

title

Modified

last_modified

Author

Author

Access Level

access

Section

section

Category

category

Start Publishing

publish_up

Finish Publishing

publish_down

Article Text

text

Keywords

keywords

Archived

archive


3.3

Indexing Blog

Posts

created with the

Joomla
Wordpress Plugin

Blog
posts

created with the Wordpress plugin for Joo
mla can
usually not be searched with
Joo
mla
´s standard search. Solr4Joomla, however, makes these Wordpress blog posts
searchable.

Solr4Joomla indexes the posts stored in the table
jos_wp_postmeta

including the
corresponding key words from the
jos_wp_postme
ta

table. Only those entries are being
indexed, that have post_state “publish” and post_type “post”.


Note:

This feature is available as an additional component only.


3.
4

Solr Server
C
onfiguration

The Solr serve
r

configuration allows to set the parameters

for connecting to
t
he Solr server
instance
. Host, port and instance name must be set here.

“Limit” determines how many hits will be displayed in each page of the hit list (see section 4,
Hit List).




3.
5

Separator
in CSV

File

In the configuration sectio
n „Separator for CSV File“

you can set the character that will be
used as a separator in each line of the CSV file (see Indexing PDF Files in section 3.1).



3.
6

Separator
for meta data fields

in

Keywords


(
Articles
)

Configuration:

This defines the separ
ator, that will be used for meta data entered
in “Metadata
-

Keywords”
for articles.




Article:



3.
7

Defining
SearchId
a
nd Metaf
i
eld:Values

In this section you can define a list of SearchIds. SearchIds are used to let a menu item
execute an automatic s
earch. To do this a menu item has to be connected with one
of
the
SearchIds defined in this list.
If you want to set a filter to the automatic search you can fill the
“metafield:value”

field
. Example
:
The

menu item “
Articles by John Doe
” is added to the ma
in
menu. On a click on the menu item, only results of the Author “john doe” should be
displayed. The SearchId could be defined as “authorsearch” and the “metafield:value” field
would be “author:john doe”.
After t
hat the
SearchId
“authorsearch” has to be as
signed
to the
menu item (see section 10. SearchId).

If no filtering is necessary the field “metafield:value” can be left blank.





3.
8

Activating Access Level Checking

When this option is activated, a user
´s access rights will be checked before a search
request is
being processed and the hit list will only list those documents, that the user is allowed to
access.

If this option is deactivated, all documents will be listed in the hit list.





3.
9

Meta

f
ield for displaying as Head
l
ine for
t
he standard Hit

List

In case the
sol
R

Hitlist

module

is not activated, a standard hit list will be displayed to show
search results.
With “Metafield for displaying as headline in case of standard hitlist”, the meta
field which content should be displayed as the title of
a hit in the hit list can be set.



4.
Hit List



mod_shi_hitlist (Joomla
Module)

The hitlist module is used to display the results of
an executed search and navigate to the
content of each result.


4.1
General

In order

to have a hit

list displayed,
this
hit list must be assigned a menu entry of type “
s
olR
Search



Searchresults


Display Layout
”. To do this a menu entry of this type (e.g. “My Hit
List”) must be created.




After such a menu entry is created
,

ad
d

the „sol
R

Hitlist“

module
and
select there

the menu
created before
:




In order

to have the hit list properly
placed
, a placeholder should be
inserted at an appropriate
spot within your template. Therefore you should insert right above “
<
jdoc:include

type
=
"component"

/>
”, the following code

into
“index.php”
:

<
jdoc:include

type
=
"modules"

name
=
"main_content"

/>


The new placeholder must additionally be
registered in “templateDetails.xml”. Therefore you
must enter the following line into the “Positions” list in “templateDetails.xml”:

<
position
>
main_c
ontent
</
position
>


4.2
C
onfiguration



Pre
m
eta

fields
:

Here you can optionally enter one or more meta field names.
The content of
the entered meta field(s) will be displayed above the title of the hit entry. This is useful to e.g.
display the last modifi
cation date (last_modified) or the author.

The syntax for specifying these meta fields is as follows:

Date of last Modification
|last_modified,Aut
h
or|author

This means, that you will have to specify a comma separated list with each entry being the
text to b
e shown to the user as a label separated with a pipe symbol (“|”) from the name of the
meta
field
.


Title

m
eta

Fie
ld:

Name
of

the meta field
, the content of which will be displayed as the title of
the hits in the hit list.


Meta
f
ields:

Here you can option
ally enter one or more meta field names.
The content of the
entered meta field (s) will be displayed right below the title of the hit entry. This is useful to
e.g. display the last modification date (last_modified) or the author.

The syntax for specifying
these meta fields is as follows:

Date of last Modification
|last_modified,Aut
t
or|author

This means, that you will have to specify a comma separated list with each entry being the
text to be shown to the user as a label separated with a pipe symbol (“|”) fro
m the name of the
meta field to be used for sorting.


SearchID:

SearchId
to connect the hit list with.


Label:

Tex
t to be displayed at the top of

t
he hit list.


Highlight Tag:
The tag, that is
s
et in the configuration of the Solr server
to be used for
disp
laying highlighted text. The tag name must be entered without “<” and “>” characters.

You will find the corresponding Solr configuration in “solrconfig.xml” at
highlighting


formatter (see also “Highlighting” in section 11.2.4).


Hit Sorting:

You can opti
onally
specify

one or more meta fields

that can be used for sorting
the hit list. If this setting is not left blank, a dropdown will be displayed in the hit list, from

which the user can choose a so
r
t
ing option.

The syntax for specifying sort fields is as
follows:

Date of last Modification
|last_modified,Aut
h
or|author

This means, that you will have to specify a comma separated list with each entry being the
text to be shown to the user

in the dropdown

separated with a pipe symbol (“|”) from the
name of the m
eta field to be used for sorting.


Display number of result hits:

If you select „Yes“, the number
of results returned by the
search will be displayed on the hit list page.

See
‚shi_hitlist_result_count’ for information about how to format displaying.


4.3
Visual Representation

The visual representation of

the hit list can be
configured
by modifying

the following
CSS
classes
:


shi_hitlist_title:

Defines rendering of a hit´s title


shi_hitlist_key_METAKEY:

Defines rendering of a single meta field´s
label
.
The

‚METAKEY’ part oft he class name must be identical to one of

the meta field names
specified either in
Pre Metafields

or in
Meta Fields
. (e.g.
shi_hitlist_key_last_modified or
shi_hitlist_key_
author)


shi_hitlist_value_METAKEY
:
Defines rendering of a sing
le meta field´s
content
.
The
‚METAKEY’ part oft he class name must be identical to one of

the meta field names
specified either in
Pre Metafields

or in
Meta Fields
. (e.g.
shi_hitlist_key_last_modified or
shi_hitlist_key_
author)


hitlistDivider:
Defines th
e spacing and any separator to

be displayed between hit entries in
the hit list.


shi_hitlist_result_count:
Defines rendering of
t
he text for indicating the number of hits
returned from the search.

Paging

/

Navigation
:


divHitlistNavigation:

Defines render
ing of the total
paging
.


highlightActualHitPage:
Defi
nes rendering of

t
he paging
´s current page.


disableNavigation:
Defines rendering of non clickable page
s

of

the

paging
.


4.4
Displaying Documents from the Hit List

The only special case
s

to consider are

PDF documents. When clicking a hit pointing to a PDF
file, a new browser window will be opened and the PDF file will be loaded into that window.

There might, however, be hit entries in the hit list, that point to a PDF file for which the user
has insuffic
ient access rights, a corresponding message will be displayed.

There will also be a message displayed, when the document to be opened is unavailable.

The messages can be found and be edited in the following file:

‚/components/com_solrsearch/display.php’
be
ginning at line

20 ($err_msg_norights
a
nd
$err_msg_notfound).


5.
Facette
s



mod_shi_facets (Joomla
Module)

With the facets module
search results can be displayed and filtered categorical.


5.
1

C
onfiguration



Meta

f
i
eld:

The

meta field to be used for fac
e
ting in this module.


Headline
:

Headline for the

Facet.


SearchID:

SearchId
to connect this facet with


Show number of hits
:

If „Yes“ is selected, the number of matching documents will be
displayed
for

each facet

entry.


Sorting:

Defines in which way the
facet entries will be sorted.


Label for cancelling filtering
:

Once the user has clicked on a facet entry, the hit list will be
filtered to only display hits tha
t

match th
is

facet.
A label with the text entered here will be
displayed, that can be clicked b
y the user to cancel filtering and redisplay the hit list with the
filter removed.


Facets entry must contain X results
:

Indicates how many results a facet entry must contain
to get displayed


Use
as filter list

:

If „Yes“ is selected, the
behavior

of

the
module will change completely.

In this case the module will display a list of all
facets

clicked


respectively of all filters set.
Following each entry in this list, a “X” will be displayed in the frontend. By clicking this “X”,
the user can remove the fi
lter and the hit list will be redisplayed without the filter set.

If the „Use as filter list“ mode is select
e
d,
all other input fields will be ignored and do not
have to be
filled.


5.
2

Visual Representation

The visual representation of

facets

can be confi
gured
by modifying

the following
CSS classes
:


shi_facet_description:

Defines rendering oft he facets


shi_facet_remove_filter:

Defines rendering oft he label for removing a facet

When used as a filter list
:


shi_facet_filterlist_
description
: Defines rende
ring of
the description of the filter list.


shi_facet_filterlist_
div
:
Defines rendering of
the DIV element which contains the filter list.


shi_facet_filterlist_row
:
Defines rendering of a line in the filter list.


shi_facet_filterlist_key
:
Defines render
ing of a filter´s label.


shi_facet_filterlist_value
: Defines rendering of a filter´s value.


shi_facet_filterlist_remove
: Defines rendering of

the „X“ being displayed next

t
o each entry
in the filter list.


6.
S
earch input field



mod_shi_search (Joomla
Module)

The search module executes a search against the solR server. After the search is executed it
will redirect to the hitlist and display the results.

6.1
C
onfiguration



Label
s
earc
h
field
:

Text to

be initially displayed in the search input field


e
.g.
“enter your
query here”


Label search
button:

Text
on the search button
.


Show search button:

If “No” is selected, no search button will be displayed. The search must
be started by the user by hitting the enter key.


SearchID:

SearchId
to connect the s
earch input field to.


Filter the search on following meta fields
:

Optionally a comma separated list of meta field
names (e.g.
author,title) can be supplied here. In case meta fields are specified here, the search
will only be done in these meta fields.


S
how suggestions
:

If yes is being selected, a dropdown
,

supplying word suggestions,
will be
displayed

in the frontend while the user types his query.


Suggestions meta key
:

Meta

f
i
eld
f
rom which the suggested words will be taken.


Suggestions count
:

Defines
, how many
entries the suggestion dropdown will have.


6.2
Visual Representation

shi_search_form
:

Defines rendering of the form which contains the input field and submit
button


shi_search_input
:

Defines rendering of the input field of the search formular


shi_search_button
:

Defines rendering of the submit button of the search formular



7.
Search

D
ropdown


mod_shi_searchcombo (Joomla Modul
e
)

With the search dropdown module it is possible to search on a list of predefined meta fields so
you can make your
search more precise.

7.1
C
onfiguration



Label
s
earch

field:

Text to

be initially displayed in the search input field


e.g.
“enter your
query here”


Label search button:

Text
on the search button.


Meta fields
:

A

comma separated list of meta field
labels

and
names (e.g.
Author|author,Title|title). In the frontend, a dropdown will be displayed for each meta field
label/name pair. The dropdown will provide all words contained in the specified meta field for
the user to select from.


SearchID:

SearchId
to c
onnect the search dropdown with.


Show search field
:

If „Yes“ is selected an input field for entering a query will be displayed in
addition to the dropdown(s).


Filter the search on following meta fields
:

Optionally a comma separated list of meta field
nam
es (e.g.
author,title) can be supplied here. In case meta fields are specified here, the search
will only be done in these meta fields.


Activate test mode:

If „No“ is selected, a
l
l values for

the dropdown(s) will be loaded once
an
d

then be stored within t
he session.
This provides best performance. If, however, the
configuration changes, these changes will not be reflected in the current session.

If “Yes” is selected, the values for the dropdown(s) will be loaded from the Solr server,
whenever a dropdown is

to be displayed. This mode is useful when testing configurations


Standar
d

logic interconnection
:

With this option you specify the bo
o
lean l
o
gic
to be applied
to values selected in multiple dropdowns.



7.2
Visual Representation

shi_searchcombo_form
:

Defi
nes rendering of the form which contains the dropdowns, the
input field and the submit button


shi_searchcombo_table
:

Defines rendering of the table which contains the single dropdowns


shi_searchcombo_input
:

Defines rendering of the input field of the sea
rch formular


shi_searchcombo_submit
:

Defines rendering of the submit button of the search formular



8.
Dat
e Search



mod_shi_searchdate (Joomla Modul
e
)

With the date search module it is possible to make a date range search on the documents in
the solR in
dex.

8.1
C
onfiguration



Label
s
earch

field:

Text to

be initially displayed in the search input field


e.g.
“enter your
query here”


Label search button:

Text
on the search button.


Meta

fields:

A

comma separated list of meta field
labels and
names (e.g.

Date of last
modification
|
last_modified
,
Date of creation
|
created
). In the frontend, a
date search element

will be displayed for each meta field label/name pair.


SearchID:

SearchId
to connect the date search with.


Show
input

field:

If „Yes“ is selected
an input field for entering a query will be displayed in
addition to the
datepicker
.


Filter the search on following meta fields
:

Optionally a comma separated list of meta field
names (e.g.
last_modified, created) can be supplied here. In case meta fields
are specified
here, the search will only be done in these meta fields.

You would usually use the same meta fields as specified in “Meta fields” but you are free to
pick the dropdown values from one meta field and search for them in another.


Standar
d

logic

interconnection:

With this option you specify the bo
o
lean l
o
gic
to be applied
to dates selected in multiple date search elements.


min.
Year
:

Minimum
y
ear to be available in the datepicker.


max.
Year
:

Minimum
y
ear to be available in the datepicker.



8.2

Visual Representation

shi_searchdate_form
:

Defines rendering of the form which contains the the input field and
the submit button


shi_searchdate_input
:

Defines rendering of the input field of the search formular


shi_searchdate_submit
:

Defines rendering
of the submit button of the search formular


9.
SolrSearchInitialiser

-

plg_shi_solrsearchinitialiser

-

(Joomla
Plugin)

IMPORTANT: Should you want to uninstall the component or one of

the modules, this plugin
must be deactivated first.


10.
SearchId



Dire
ctly assign Search Requests to Menu Entries

The SearchID can be directly assigned to a Menu entry
! This extremely powerful feature
allows you directly assign a search request to a menu entry

and thus build highly dynamic
sites.

When the user selects such a

menu entry, the search request (filter) stored with the SearchID
will be executed and a hitlist will immediately be displayed.

SearchIDs and their optional
filter settings are described in “
3.7 Defining SearchId and
Metafeld:Values



For e.g. displaying c
ertain documents from an archive, you will have to create a menu entry of
ty
p
e
“solR Search” (see “
4. Hit List


mod_shi_hitlist (Joomla Modul
e
)
”) and assign one of
the SearchIDs created before.




Please make sure, that the SearchID was set up correctly
by checking the field „Link“ in the
menu entry
´s configuration form.



11.
Solr
C
onfiguration

11.1
Joomla
C
omponent

The connection
from

the Joomla modules
to the Solr server instance is being configured in the
components configuration form (see “
3.4 Solr
Server Configuration
”).


11.2
Solr Instan
ce

11.2.
1

Request Handler

The following Solr request handlers are required by the Joomla modules and have to

be
configured in „solrconfig.xml“.


<requestHandler name="/update/extract"
class="org.apache.solr.handler.
extraction.ExtractingRequestHandler"
startup="lazy">



<lst name="defaults">



<!
--

All the main content goes into "text"... if you need to return



the extracted text or do highlighting, use a stored field.
--
>



<str name="fmap.content
">text</str>



<str name="lowernames">true</str>



<str name="uprefix">ignored_</str>




<!
--

capture link hrefs but ignore div attributes
--
>



<str name="captureAttr">true</str>



<str name="fmap.a">links</str>



<str name="fmap.d
iv">ignored_</str>



</lst>

</requestHandler>


<requestHandler name="/terms"
class="org.apache.solr.handler.component.SearchHandler">



<lst name="defaults">



<bool name="terms">true</bool>



</lst>





<arr name="components">



<str>t
ermsComponent</str>



</arr>

</requestHandler>


<requestHandler name="standard" class="solr.SearchHandler" default="true">



<!
--

default values for query parameters
--
>



<lst name="defaults">



<str name="echoParams">explicit</str>




<str name="fl">*,score</str>






<!
--

Anzahl der Treffer pro Seite
--
>



<int name="rows">10</int>




<str name="qf">text</str>






<!
--

Highlighting
--
>



<str name="hl">true
</str>



<str name="hl.fl">text</str>



<str name="hl.fragsize">600</str>







<!
--

Faceten
--
>



<str name="facet">true</str>



<str name="facet.sort">lex</str>



<str name="facet.mincount">1
</str>






<!
--

dgv facet fields
--
>



<str name="facet.field">jahrgang</str>



<str name="f.jahrgang.facet.sort">lex</str>



<str name="facet.field">monat</str>



<str name="f.monat.facet.sort">lex</s
tr>






<!
--

Spellcheck
--
>



<str name="spellcheck">true</str>






</lst>






<arr name="last
-
components">



<str>spellcheck</str>



</arr>




</requestHandler>


11.2.
2

Teaser

The teaser (teaser tex
t) is being used to generate a preview text
if no highlighting text is
available
.
The teaser configuration has to be added to “schema.xml”:


<schema>


...

<copyField source="text" dest="teaser" maxChars="350"/>

</schema>


11.2.
3

Highlighting

By Highlightin
g, words entered in the query will be marked

(highlighted)

in the
preview text
snippets,
that

are being displayed for each hit in the hit list. Highlighting is being done by
marking the words to be highlighted with a defined HTML tag (e.g. <b>…</b>).

The H
TML tag to be used can be configured in “solrconfig.xml” in section “highlighting”:


<highlighting>


<formatter name="html" class="org.apach
e.solr.highlight.HtmlFormatter"
default="true">


<lst name="defaults">

<str name="hl.simple.pre"><![CDATA[
<b>
]]></s
tr>

<str name="hl.simple.post"><![CDATA[
</b>
]]></str>

</lst>


</formatter>

</highlighting>


11.2.
4

Publish date

In order to enable searches for publishing date ranges
of articles, the following fields have to
be defined in “schema.xml”, section “fields”:


<fields>


...


<field name="publish_up" type="date" indexed="true" stored="true"/>


<field name="publish_down" type="date" indexed="true" stored="true"/>


...

</fields>