CSCI 572: Information Retrieval and Search Engines

saucecopywriterInternet and Web Development

Feb 2, 2013 (4 years and 2 months ago)

259 views

Introduction to Apache Lucene/Solr

CSCI 572: Information Retrieval and
Search Engines

Summer 2010

May
-
20
-
10

CS572
-
Summer2010

CAM
-
2

Outline


What is Lucene/Solr?


Where did it come from?


What are the current versions of Lucene/Solr?


What can it do?


May
-
20
-
10

CS572
-
Summer2010

CAM
-
3

Apache Lucene


The brainchild of Doug

Cutting


Free
-
text indexing library that implements most of
the functionality I’ve talked to you about


Query Models, Ranking, Indexing


Core API is implemented in Java


C++/C, Ruby, Python APIs as well, but small
communities or automatically generated


Initially Sourceforge, moved to Apache in 2001

May
-
20
-
10

CS572
-
Summer2010

CAM
-
4

Apache Solr


Originally developed at CNET


Web service layer built on top

of Lucene library


Provides schema and

understanding of field types, conversion to and from
representation


Provides huge
-
scale scalability, deployed on top of
application server like Tomcat or Jetty


P/L independent programming APIs


Sharing, replication, faceting, highlighting, explain, more
like this and other functionality provided easily

May
-
20
-
10

CS572
-
Summer2010

CAM
-
5

How to get started


Lucene (2.9.2 and 3.0.1 stable)


Put your Java hat on


Have Eclipse ready or your favorite IDE


Download lucene
-
core
-
<version>.jar from


http://repo1.maven.org/maven2/org/apache/lucene/


Download src and build from


http://www.apache.org/dyn/closer.cgi/lucene/java/


Check out some example Java code that demonstrates
indexing and querying from Otis Gospodnetic


http://onjava.com/pub/a/onjava/2003/01/15/lucene.html

May
-
20
-
10

CS572
-
Summer2010

CAM
-
6

How to get started


Solr


Grab a release of Solr (1.4.0 stable)


http://www.apache.org/dyn/closer.cgi/lucene/solr/



Unpack into e.g., /usr/local/solr


Deploy onto tomcat


Install tomcat into /usr/local/tomcat


Create solr.xml file and drop into
/usr/local/tomcat/conf/Catalina/localhost/


Create solr.home JNDI property and point to /usr/local/solr/solr


Start tomcat


Head over to $solr/example/example
-
docs


curl http://localhost:8983/solr/update
-
H 'Content
-
type:text/xml;

charset=utf
-
8'
--
data
-
binary @artists.xml

May
-
20
-
10

CS572
-
Summer2010

CAM
-
7

Modifying your schema.xml


Field Types


Analyzers


Tokenizers

http://wiki.apache.org/solr/SchemaXml



May
-
20
-
10

CS572
-
Summer2010

CAM
-
8

Solr Faceting


facet=on&facet.field=&facet.field=…


http://wiki.apache.org/solr/SimpleFacetParameters


May
-
20
-
10

CS572
-
Summer2010

CAM
-
9

Advanced Topics


Standing up cores


Sharding


Replication


Zookeeper and Cloud

May
-
20
-
10

CS572
-
Summer2010

CAM
-
10

Development currently in flux


Stick with release versions


Depending on trunk won’t really help


Lucene and Solr have merged

May
-
20
-
10

CS572
-
Summer2010

CAM
-
11

Wrapup


Lots more information at


http://lucene.apache.org


http://lucene.apache.org/solr/


http://lucene.apache.org/java/



Possible projects


Geospatial search


Improving existing code and contributing back to Apache SIS
and to Apache Solr


Improving date faceting


Rewriting the ResponseWriter framework

May
-
20
-
10

CS572
-
Summer2010

CAM
-
12

Acknowledgements


Material inspired by discussions and talks on the
Apache Mailing lists for Solr, Lucene and through
discussions with the rest of the Lucene community