A web application for browsing research papers

coordinatedcapableSoftware and s/w Development

Nov 4, 2013 (3 years and 9 months ago)

70 views

A web application for browsing research papers

By: Rhea Dookeran 09’

Overview


Goal: To build an interactive web interface based on the
XML output of Amy’s Machine Learning algorithms.


Usage: A class tool for CS 349 that enables convenient
browsing and selection from a corpus of CS research
papers.


Technology



PHP


XML


HTML, CSS, JavaScript



Comparison


CiteSeer (
http://citeseer.ist.psu.edu/citeseer.html
)


Focuses on CS/IS literature

Positive


Allows you to search by through citations,
acknowledgments and Google docs.



Can assign tag to the document.

Negative


Hovering shows one abstract at a time.


Clicking on the link redirects you to a new page.


Abstract and list of citations


CiteSeer

Comparison


GoogleScholar (
http://scholar.google.com/
)

Positive


Large corpus


Does not cover one specific field of study.


Familiar


Follows the generic Google Search Engine Results Page (SERP)
layout. (Minus ads)

Negative


Hard to compare papers. No abstract: only shows the line where
search term/s appear. Have to click on link to learn more.





Google Scholar

Comparison


Citeulike (
http://www.citeulike.org/
)

Positive


Can add tags to documents and filter by common tags


Overall Web 2.0 style

Negative


Not visually appealing. Must scroll down to see results.


Must click on title to learn more (slide menu categories)


Citeulike

Our System’s Key Features


Allows user to compare papers side by side before choosing to
view the whole document. (slide menu and hover functionality)


Result filtering:


Author
-

List, click or search


Keyword
-

Click, cloud or search


Title
-

Cloud or search


Easy to use and visually appealing


Dynamic/intuitive interface


Menus scroll with user


PDF files open in a new tab (easily forgotten in other systems)



Not built over a relational database.




So how does it work?


XML (Extensible Markup Language)


Tag based data storage format


To allow one to describe a document in terms of its structure,
rather than its page layout


Used PHP’s Document Object Model (DOM) library


‘context
-
rich’


Tags can be defined to specify both element names and
attributes.


<summary filename=“Arasu
-
SearchingTheWeb.pdf”>


In essence, the tags describe a flat
-
file database



So how does it work?


PHP


Widely used scripting language that can be embedded
into HTML


Dynamic content using static file


Reading demonstrated how to use the DOM Library to
parse XML.


Runs on the web server and can run on most OS’s


Free download:


Wamp Server running on PC


Includes Apache, MySQL and PHP5 for windows


http://www.wampserver.com/en/


<papers>

<summary fileName="Arasu
-
SearchingTheWeb.pdf">


<title>Searching the Web</title>


<authors>



<author>Arvind Arasu</author>


<author>Junghoo Cho</author>


<author>Hector Garcia
-
Molina</author>


<author>Andreas Paeckpe</author>


<author>Sriram Raghavan</author>


</authors>


<abstract>


An overview of current Web search engine design. After introducing a generic search engine architecture,

we examine each engine component in turn. We cover crawling, local Web page storage, indexing, and the

use of link analysis for boosting search performance. The most common design and implementation

techniques for each of these components are presented. For this presentation we draw from the literature

and from our own experimental search engine testbed. Emphasis is on introducing the fundamental

concepts and the results of several performance analyses we conducted to compare different designs.


</abstract>


<keywords>


Authorities, crawling, HITS, indexing, information retrieval, link analysis, PageRank, search engine


</keywords>


</summary>



</papers>

So how does it work?


JavaScript (
Jquery
)


Used to create slide effects


One of many useful and extensive JavaScript libraries


“…simplifies HTML document traversing, event handling,
animating, and Ajax interactions for rapid web development.”


Code is concise thanks to “
chainability



Object
-
oriented programming design pattern


Every method within
jQuery

returns the query object itself,
allowing you to 'chain' upon it, for example:



menuYloc
=
parseInt($(name).css(“top”).substring(0,$(name).css(“top”).indexOf(“
px”)))


one line grabs the “100” out of “…top:100px;…” in
css

file


Free and easy to download

Summary


Process


3 major iterations


1.0
-

Basic display & browsing of all documents (10/page)


2.0
-

Basic display
s.t
. keywords were active filtering links


3.0


Active author, and keyword links


Floating side menus


list of authors and paper frequency, # to display


Basic search (by title, author or keyword) functionality


Cloud of most common words.


Speed Bumps


Dealing with non
-
alphanumeric ASCII characters


Figuring out the most appropriate term search algorithm


Version compatibility issues
-
>
wamp

server


Positive


Gained valuable experience using popular web application development tools.


Had fun designing!


Research sparked interest in semantic web


Future Plans


Add tag assignment functionality.


Integrate the system into the Semantic Web
using RDF.


Expand upon filtering functionality


Modularize code for use in other CS courses



Sources


XML and PHP


http://php.net/


http://www.ibm.com/developerworks/library/os
-
xmldomphp/


http://www.sis.pitt.edu/~mbsclass/standards/fischer/XML
Uses.html


PHP/Apache/
MySQL

PC download


http://www.wampserver.com/en


Jquery


http://jquery.com/