Enterprise Search with ColdFusion Solr

ballscauliflowerΛογισμικό & κατασκευή λογ/κού

30 Ιουν 2012 (πριν από 5 χρόνια και 4 μήνες)

800 εμφανίσεις

Enterprise Search with ColdFusion
Solr
Dan Sirucek
cf.Objective 2012
May 2012
2
About Me
• Senior Learning Technologist at WellPoint, Inc
• Developer for 14 years
• Developing in ColdFusion for 8 years
• Started in SQL Server, ASP, ASP.NET, VB.NET
• Also work in Flash Builder/Flex, Java, and C#
3
Where We’ve Been:
Growth and Consolidation
WellPoint, Inc. was formed in 2004 as the result of a merger
between Anthem, Inc. and WellPoint Health Networks, creating
the nation’s largest health benefits company by membership

4
Where We Are:
National Scale
1 out of 9 Americans are covered by
WellPoint’s affiliated health plans
Note: Provider Network refers to BlueCard
®
PPO Network

Nation’s Largest Insurer
• ~34 million medical members


Total Revenue
• Nearly $60 billion


Provider Network Advantage
• ~94% Hospitals
• ~82% Primary Physicians
• ~84% Specialists


Blue Licensee
• 14 states
5
Agenda
• Problem and Goal
• Why Apache Solr for ColdFusion 9.01
• Solr Multi-core Overview
• Replication Overview
• Installation
• Replication Configuration
• Managing Collections on Multiple Solr Instances
• Extending ColdFusion Solr Schema
• Creating a Custom Search
• Q & A
• Resources
6
Problem and Goal
• Problem
• Slow search response
• Constant corruption issues
• Verity wasn’t scalable
• No redundancy
• Goal
• Improve search response
• Create an enterprise scalable solution
• Implement redundancy for high availability
• Maintain compatibility with <cfsearch /> & <cfindex /> tags
7
Why Apache Solr for ColdFusion 9.01
• Performance
• Fast, very fast
• Optimized for high volume web
traffic
• Scalable
• Distributed searches
• Replication
• Redundancy
• Replication supports
• Master
• Slave
• Repeater
8
Solution Architecture
9
Technologies Used
• Windows Server 2008 64 bit
• IIS 7.0
• Application Request Routing
• ColdFusion 9.01 Multi-server
• Apache Tomcat 6
• Master instance
• Apache Solr Standalone Installation for ColdFusion 9.01
• Slave instances
• Java SE JDK 1.6_026 64-bit
10
Solr Multi-core Overview
• Solr core = ColdFusion collection
• Multiple Cores
• Single Solr instance
• Each Solr core has its own configuration and index
• Unified administration
• Multi-core template
• A template is used for creating a new core (collection)
• The template contains a directory structure and the configuration files
needed to create a new core
• Location SolrInstallationDirectory\multicore\template
11
Solr Multi-core Template
• conf directory
• Contains configuration files used when creating a new Solr core
• Two key files:
 schema.xml
– Contains the details about which fields your index can contain
– How those fields should be dealt with when adding documents to the
index
– How those fields should be dealt with when querying those fields
 solrconfig.xml
– Contains the configuration settings for the Solr core
– Used to configure replication
12
Solr Multi-core Template Continued
• conf directory continued
• Files referenced by schema.xml:
 protwords.txt
– Words that need protection from stemming
– i.e. “maine” is stemmed to “main”
 stopwords.txt
– Words to not index e.g. a, an, and
 synonyms.txt
– Synonym groups e.g. GB,gib,gigabyte,gigabytes
– Mappings used for spelling corrections e.g. hippa => hipaa
13
Solr Multi-core Template Continued
• conf directory continued
• Optional file:
 solrcore.properties
– User defined properties to be referenced within solrconfig.xml
– Syntax – Property=Value
– File is referenced by default when present in conf directory
– Example:

• data directory
• Empty directory
• Solr will create the following directories the 1
st
time content is indexed
 index

spellindex

14
Solr Replication Overview
• Replication Features
• Efficient and automated distribution of index additions, updates, and
deletions
• Pull strategy allows for easy addition of slaves
• Configurable distribution interval allows tradeoff between timeliness and
cache utilization - interval is set by the slave instance
• Replication and automatic reloading of configuration files
• Works over HTTP
• Works across platforms with same configuration
• Replication Modes
• Master – optimized for indexing
• Slave – optimized for searches
• Repeater – used in WAN to reduce bandwidth between data centers
15
Solr Replication Considerations &
Challenges
• Considerations
• Replication is not a server level configuration
• Replication is configured in at the solr core (search collection) level
• New cores need to be created on all solr instances
• Challenges
• Modify the multi-core template to implement replication when new cores
are created
• Automate the creation of a solr core on all solr instances
• Create a consolidated view of cores on all instances
16
Solr Replication Requirements
• Basic Requirement
• One master solr instance
• One or more slave solr instances
• Configuration of replication request handlers on master and slave instances
• Replication Request Handler
• Configuration is handled in the solrconfig.xml
• Replication is defined by adding a request handler using XML syntax
• Settings are used to set the properties for the request handler
• Master and slave instances are both configured using a request handler,
but use different attributes to define its role
17
Master Replication Request Handler
• Replication request handler with all possible attributes
• Screen shot

18
Required Master Settings
• replicateAfter
• Configures when replication will be triggered
• Valid values: startup, commit, optimize
• If using startup option, it is necessary to have a commit/optimize entry
also, if you want to trigger replication on future commits/optimizes.
• Example:
19
Recommended Master Settings
• confFiles
• Used to specify configuration files to be replicated
• Comma delimited list of files to replicate
• Can be configured to rename files on replication
 Syntax – source_file_name.xml:destination_file_name.xml
• Example:
20
Optional Master Settings
• backupAfter
• Configures when a backup will be created
• Valid values: optimize, startup, commit
• maxNumberOfBackups
• Maximum number of backups to retain
• commitReserveDuration
• Default 10 seconds
• If commits are very frequent and network is slow, you can tweak this value
21
Slave Replication Request Handler
• Slave replication request handler with all possible settings
• Add screen shot and high level notes
22
Required Slave Settings
• Configuration file
• solrconfig.xml
• masterUrl
• Sets the url of the Solr master instance
• ${solr.core.name} – system variable
• pollInterval
• Sets the polling interval of the slave to poll the master for changes
• Considerations
 Frequency of updates to index
 Network Bandwidth
 Acceptable latency
23
Optional Slave Settings
• httpConnTimeout
• Sets connection timeout on the underlying HttpConnectionManager
• Default value 5000ms
• httpReadTimeout
• Sets timeout when fetching index from master
• Default value 10000ms
• httpBasicAuthUser
• Use if basic authentication is enabled on master
• httpBasicAuthPassword
• Use if basic authentication is enabled on mast
er
• Compression
• Use only if your bandwidth is low
24
Slave Replication Configuration
Examples
• Basic configuration example




• Using solrcore.properties configuration example
25
Slave Solr Installation
• Slave Servers
• Windows Server 2008 (64 bit 8gb ram)
• Install Java SE JDK 1.6_026 64-bit
 Note location of installation directory
– Example : D:\Apps\Java\jdk1.6.0_26
• Execute Apache Solr Standalone Installation for ColdFusion 9.01 installer
 Change Java Home from default to:
javaInstallationDirectory\jdk1.6.0_26\jre
– Example: D:\Apps\Java\jdk1.6.0_26\jre
26
Master Solr Installation
• Master Solr Server
• Windows Server 2008 (64 bit 8gb ram)
• Download Java JDK1.6_026 64-bit
• Download Apache Tomcat 6 32-bit/64-bit Windows Service Installer
• Execute Java JDK Installer
 Note installation directory
 Example: E:\Apps\java
• Execute the Tomcat 6 installer
 Java JRE – specify the jre in the jdk 1.6.0_26 installation
– Example: E:\Apps\Java\jdk1.6.0_26\jre
 Select installation directory
– Example: E:\Apps\tomcat6
27
Master Solr Installation Continued
• Master Solr Installation continued
• Create a solr directory – example E:\Apps\solr
• Copy the following from slave installation
 solr.war to solr directory
– installationDirectory\webapps\solr.war
 Mutli-core directory to solr directory
– installationDirectory\mutlicore
• Configure Tomcat service
• Launch Configure Tomcat
• Java tab
• Set initial memory pool
• Set maximum memory pool

28
Configure Tomcat for Solr
• Stop Apache Tomcat 6 service
• Create solr context
• A Context is what Tomcat calls a web application
• Location: tomcatInstallDir\conf\Catalina\localhost\
• Create a solr.xml file
• Edit solr.xml and define Solr context
• Example:

• Start Apache Tomcat 6 service
• Launch Tomcat 6 - http://127.0.0.1:8080/manager/html

• Navigate to solr application

29
Tomcat 6 Web Application Manager
30
Slave Configuration
• Apache Solr for ColdFusion 9.01 runs on a Jetty servlet
• Jetty Configuration
• Configuration file location
 SolrInstallationDirectory\etc\jetty.xml
• Connector system properties
 jetty.port – default = 8983
 jetty.host – default = not defined
• Default configuration listens only on 127.0.0.1
• Add jetty.host system property to the connector setting
 0.0.0.0 = listen on all IPs

Example:



31
Slave Jetty Configuration Continued
• Default connector configuration




• After update
32
Slave Service Configuration
• Service start up configuration
• Default java ram maximum memory setting is 256mb
 InstallationDirectory\solr.lax

• Adjust maximum memory setting -Xmx
• Add a minimum memory setting -Xms
• Example:
33
Master Solr Multi-core Template
Configuration
• Create solrcore.properties
• Create a text file named solrcore.properties in the Solr multicore template
directory
• Add two properties
 MASTER_CORE_URL=http://masterHostnameUrl:masterPort/solr
 POLL_TIME=hh:mm:ss
• Example:

• Create solrconfig_slave.xml
• Make a copy of solrconfig.xml in the master Solr multicore template
directory
• Name the file solrconfig_slave.xml
34
Master Solr Multi-core Template
Configuration Continued
• Configure solrconfig.xml for replication
• Add master and slave replication request handlers
• solrconfig.xml





• solrconfig_slave.xlm

35
Slave Solr Multi-core Template
Configuration
• solrcore.properties
• Copy solrcore.properties in template/conf directory on master to
template/conf directory on slave
• solrconfig.xml
• Delete solrconfig.xml file in template/conf on slave
• Copy solrconfig_slave.xml in template/conf directory on master to
template/conf directory on slave
• Rename solrconfig_slave.xml to solrconfig.xml on slave
36
Creating New Collections
• Collections (cores) need to be created on all Solr instances
• Use Solr API to create new cores
• REST-like API
• Create new core parameters
 action – CREATE
 name – name of new core
 instanceDir – directory path for new instance
 template – directory path for the core template
 wt – writer type
– Format of response
– Options: json, javabin, xml
– Default = xml

version = 1



37
Creating New Collections Code
• In CF create an array of server instances
• Define collection name
38
Creating New Collections Code
Continued
• Loop over server instance array
• Create collection on each instance
39
Collection Create Result Struct
• De-serialized file content (cfdump from previous slide)
• core – collection name
• responseHeader
 QTime – query time milliseconds
 status
• saved
 File path to multicore\solr.xml
 multicore\solr.xml file is used to store
core names and instance directory
40
Solr Admin Master Replication
• Core admin
• Navigate to Replication


• Replication admin
• Index version
• Location
• Size

41
Solr Admin Slave Replication
• Core admin
• Navigate to Replication

• Replication admin
• Master
• Poll Interval
• Local Index
 Version & location
 Replication status
• Controls
 Disable Poll

Replicate Now


42
Deleting Collections
• Collections (cores) should be deleted from all Solr instances
• Use Solr API to delete cores
• Delete core parameters
 action – UNLOAD
 core – name of core to delete
 wt – writer type
– Format of response
– json, javabin, xml
– Default = xml
 version = 1



43
Delete Collections Code
• Loop over server instance array
• Delete collection on each instance

44
Extend ColdFusion Solr Schema (cfcore)
• Reasons to extend/change default functionality
• Change default operator
 The default is OR
• Enable delete by key capability
• Enable case sensitivity on search
• Possible changes to schema.xml
• Default operator between words is OR
 Changing default operator to AND will reduce number of results




45
Extend ColdFusion Solr Schema –
Enable Delete by Key
• Enable delete by key
• Default unique key is a system generated identifier
• Possible use case
 Use API to delete indexed content by the key value
• Changes
 Create a copy of schema.xml and name it schema_slave.xml
 Update replication conf attribute to use schema_slave.xml: schema.xml
 Changes to schema.xml
– Change index attribute on key field to true

– Change unique key from uid to key


Changing unique key on slave instances will break cfsearch tag


46
Extend ColdFusion Solr Schema –
Enable case sensitivity on search

• Enable case sensitivity on search
• Default configuration uses a filter to change text to lower case
• Possible use case
 Search by title and retain case sensitivity
• Schema Change
 Comment out solr.LowerCaseFilterFactory
47
Creating a Custom Search
• Use case
• Return category facet counts
• Date range search
• Solr Search API
• Basic query parameters
 q – search query
 fq – facet query
 qt – query type – name of the request handler in solrconfig.xml
 start – start row
 rows – number of rows to return in response
 fl – comma delimited list of fields to include in response
 wt – write response type

48
Creating a Custom Search Continued
• Solr Search API continued
• Highlight parameters
 hl – enable highlighted snippets to be generated
 hl.fragsize – the size in characters, of the snippets created by highlighter
 hl.snippets – maximum number of snippets to generate per field
 hl.simple.pre – text which appears before highlighted term
 hl.simple.post – text which appears after highlighted term
• Facet parameters
 facet – enable facet counts in query response
 facet.field – specify a field which should be treated as a facet
 facet.mincount - minimum count to include facet in response
49
Creating a Custom Search Continued
• JSON specific parameter
• json.nl
 Controls the output format of NamedList used for field faceting data
 flat (default) – flat array
– Example: [name1,val1, name2,val2]
 map – JSON object
– Is a hash and can have repeated keys, but preserves order
 arrarr – an array of two element arrays
– Example: [[name1,val1], [name2, val2], [name3,val3]]


50
Creating a Custom Search Code
• Code Review
51
Custom Search User Interface Example
52
Q & A







21555 Oxnard Dr
Dan Sirucek

MS: CAAC08-088I Sr. Learning Technologist
Woodland Hills, CA 91316 Learning Technologies and
Tel (818) 234-8017 Content
Mobile (323) 251-1236 www.wellpoint.com
dan.sirucek@wellpoint.com
53
Resources
• Apache Tomcat 6 -
http://tomcat.apache.org/download-60.cgi

• Apache Solr Standalone Installer for ColdFusion 9.0.1 -
http://www.adobe.com/support/coldfusion/downloads.html

• Java JDK 1.6_26 download-
http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u26-download-400750.html

• Apache Solr -
http://lucene.apache.org/solr/

• Solr Wiki -
http://wiki.apache.org/solr/FrontPage

• Solr Replication -
http://wiki.apache.org/solr/SolrReplication

• Solr JSON Response Writer -
http://wiki.apache.org/solr/SolJSON#JSON_Query_Response_Format

• Solr Facet Parameters -
http://wiki.apache.org/solr/SimpleFacetParameters

• Solr Highlighting Parameters -
http://wiki.apache.org/solr/HighlightingParameters