Web Search Interfaces

judgedrunkshipServers

Nov 17, 2013 (3 years and 8 months ago)

71 views

1

Web Search Interfaces

by Ray Mooney

2

Web Search Interface


Web search engines of course need a web
-
based
interface.


Search page must accept a query string and submit
it within an HTML <
form>
.


Program on the server must process requests and
generate HTML text for the top ranked documents
with pointers to the original and/or cached web
pages.


Server program must also allow for requests for
more relevant documents for a previous query.

3

Submit Forms


HTML supports various types of program input in
forms, including:


Text boxes


Menus


Check boxes


Radio buttons


When user submits a form, string values for
various
parameters
are sent to the server program
for processing.


Server program uses these values to compute an
appropriate HTML response page.

4

Simple Search Submit Form

<form action="http://titan.cs.utexas.edu:8082/servlet/irs.Search"


method="POST">

<p>

<b> Enter your query: </b> <input type="text" name="query" size=40>

<p>

<b>Search Database: </b>

<select name="directory">


<option selected value="/u/mooney/ir
-
code/corpora/cs
-
faculty/">


UT CS Faculty


<option value="/u/mooney/ir
-
code/corpora/yahoo
-
science/">


Yahoo Science

</select>

<input type="hidden" name="start" value="0">

<br>

<br>

<input type="submit" value="Submit Query">

<input type="reset" value="Reset Form">

</form>

5

What’s a Servlet?


Java’s answer to CGI programming for processing
web form requests.


Program runs on Web server and builds pages on
the fly.


When would you use servlets?


Page is based on user
-
submitted data e.g search
engines.


Data changes frequently e.g. weather
-
reports.


Page uses information from a databases e.g. on
-
line
stores.


Requires running a web server that supports
servlets.


6

Apache Tomcat


Freely available web server with servlet support.


http://jakarta.apache.org/tomcat/



Currently installed in (tomcat_dir).


/usr/local/jakarta
-
tomcat
-
4.0.1



Starting and stopping the server.


user_install_dir/bin/startup.sh

and

shutdown.sh


7

Setting Up Your Environment


Add to your CLASSPATH:


tomcat_dir/lib/servlet.jar



Place your servlet classes in:

-

user_install_dir/webapps/WEB_INF/clas
ses/


8

Basic Servlet Structure

import java.io.*;

import javax.servlet.*;

import javax.servlet.http.*;


public class SomeServlet extends HttpServlet {


// Handle get request


public void doGet(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {


//
request



access incoming HTTP headers and HTML form data


//
response

-

specify the HTTP response line and headers


// (e.g. specifying the content type, setting cookies).



PrintWriter out = response.getWriter();
//
out

-

send content to
browser



}


}

9

A Simple Servlet

import java.io.*;

import javax.servlet.*;

import javax.servlet.http.*;


public class HelloWorld extends HttpServlet {


public void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException,
IOException {




PrintWriter out = response.getWriter();


out.println("Hello World");


}

}

10

Running the Servlet


Place your classes in user_install_dir/… (on
previous slide)


Run servlet using:
http://host:portNumber/servlet/ServletName



e.g.


http://penrose.cs.unca.edu:8835/servlet/HelloWorld


…/servlet/package_name.class_name


Restart the server if you recompile.


Class is loaded the first time servlet is accessed and
remains resident until server is restarted.

11

Generating HTML

public class HelloWWW extends HttpServlet {


public void doGet(HttpServletRequest request, HttpServletResponse
response) throws ServletException, IOException {




response.setContentType("text/html");



PrintWriter out = response.getWriter();


out.println("<HTML>
\
n" +



"<HEAD><TITLE>HelloWWW</TITLE></HEAD>
\
n" +



"<BODY>
\
n" + "<H1>Hello WWW</H1>
\
n" +





"</BODY></HTML>");


}

}

12

HTML Post Form

<FORM ACTION=“/servlet/hall.ThreeParams”


METHOD=“POST”>


First Parameter: <INPUT TYPE="TEXT"
NAME="param1"><BR>


Second Parameter: <INPUT TYPE="TEXT"
NAME="param2"><BR>


Third Parameter: <INPUT TYPE="TEXT"
NAME="param3"><BR>


<CENTER>


<INPUT TYPE="SUBMIT">


</CENTER>

</FORM>


13

Reading Parameters

public class ThreeParams extends HttpServlet {


public void doGet(HttpServletRequest request,
HttpServletResponse response) throws ServletException,
IOException {


response.setContentType("text/html");


PrintWriter out = response.getWriter();


out.println(… +"<UL>
\
n" +



"
<LI>param1: " +
request.getParameter("param1")

+ "
\
n" +


"
<LI>param2: " +
request.getParameter("param2")

+ "
\
n" +


"
<LI>param3: " +
request.getParameter("param3")

+ "
\
n" +


"</UL>
\
n" + …);

}

public void doPost(HttpServletRequest request,
HttpServletResponse response) throws ServletException,
IOException {


doGet(request, response);


}


}

14

Form Example

15

Servlet Output

16

Reading All Parameters


List of all parameter names that have values:


Enumeration paramNames = request.getParameterNames();



Parameter names in unspecified order.



Parameters can have multiple values
:


String[] paramVals =
request.getParameterValues(paramName);



Array of param values associated with
paramName.

17

Session Tracking


Typical scenario


shopping cart in online store.


Necessary because HTTP is a "stateless" protocol.


Common solutions: Cookies and URL
-
rewriting.


Session Tracking API allows you to:


Look up session object associated with current request.


Create a new session object when necessary.


Look up information associated with a session.


Store information in a session.


Discard completed or abandoned sessions.

18

Session Tracking API
-

I


Looking up a session object:


HttpSession session = request.getSession(true);



Pass
true
to create a new session if one does not exist.


Associating information with session:


session.setAttribute(“user”,







request.getParameter(“name”))


Session attributes can be of any type.


Looking up session information:


String name = (String) session.getAttribute(“user”)

19

Session Tracking API
-

II


getId


The unique identifier generated for the session.


isNew


true

if the client (browser) has never seen the session.


getCreationTime


Time in milliseconds since session was made.


getLastAccessedTime


Time in milliseconds since the session was last sent
from client.


getMaxInactiveInterval


# of seconds session should go without access before
being invalidated.


Negative value indicates that session should never
timeout.


20

Simple Search Servlet


Based on
directory

parameter, creates or selects
existing InvertedIndex for the appropriate corpus.


Processes the query with VSR to get ranked results.


Writes out HTML ordered list of 10 results starting
at the rank of the
start

parameter.


Each item includes:


Link to the original URL saved by the spider in the top of
the document in a comment.


Name link with page <TITLE> extracted from file.


Additional link to the “absolute” cached file.


If all retrievals not already shown, creates a submit
form for “
More Results
” starting from the next
ranked item.

21

Simple Search Interface Refinements


Currently reprocesses query for “
More
results
” requests.


Could store current ranked list with the user
session.


Could integrate relevance feedback
interaction.


Could provide “
Get similar pages
” request
for each retrieved document (as in
Google
).


Just use given document text as a query.

22

Other Search Interface Refinements


Highlight search terms in the displayed document.


Provided in cached file on
Google
.


Allow for “advanced” search:


Phrasal search (“..”)


Mandatory terms (+)


Negated term (
-
)


Language preference


Reverse link


Date preference


Machine translation of pages.

23

Clustering Results


Group search results into coherent “clusters”:


“microwave dish”


One group of on food recipes or cookware.


Another group on satellite TV reception.


“Austin bats”


One group on the local flying mammals.


One group on the local hockey team.


Northern Light
groups results into “folders” based
on a pre
-
established categorization of pages (like
Yahoo or DMOZ categories).


Alternative is to dynamically cluster search results
into groups of similar documents.

24

User Behavior


Users tend to enter short queries.


Study in 1998 gave average length of 2.35 words.


Users tend not to use advance search options.


Users need to be instructed on using more
sophisticated queries.