Andrew Moss - Department Of Computer Science - Old Dominion ...

crateleftInternet and Web Development

Dec 4, 2013 (3 years and 11 months ago)

83 views

Running Head

Lab I
I


READ Product Prototype Specification

1





Lab I
I



READ Product
Prototype Specification

Andrew Moss

CS411

Janet Brunelle

April 10
, 2013




Version 1












Lab I
I



READ Product
Prototype Spec
ification

2


Table of Contents

1 Introduction

................................
................................
................................
................................
................

3

1.1 Purpose

................................
................................
................................
................................
....................

4

1.2 Scope

................................
................................
................................
................................
.......................

4

1.3 Definitions, Acr
onyms, and Abbreviations

................................
................................
.............................

7

1.4 References

................................
................................
................................
................................
.............

10

1.5 Overview

................................
................................
................................
................................
...............

10

2 General Description

................................
................................
................................
................................
.

11

2.1 Prototype Architecture Description

................................
................................
................................
......

11

2.2 Prototype Functional Description

................................
................................
................................
.........

12

2.3 External Interfaces

................................
................................
................................
................................

15

2.3.1 Hardware Interfaces

................................
................................
................................
...........................

15

2.3.2 Software Interfaces

................................
................................
................................
............................

15

2.3.3 User Interface

................................
................................
................................
................................
.....

16

2.3.4


Communication Protocols and Interfaces

................................
................................
.......................

16

3


Specific Requirements

................................
................................
................................
..........................

17

3.1


Functional Requirements

................................
................................
................................
...................

17


List of Figures

Figure 1


Major Functional Component Diagram
................................
................................
.......

12

Figure 2
-

User Process Flow

................................
................................
................................
........

13

Figure 3
-

Scraper Process Flow

................................
................................
................................
...

15

Figure 4
-

READ Site Map

................................
................................
................................
...........

16


List of Tables

Table 1


Side
-
by
-
side Comparison of

Real World Product and Prototype

................................
...

5





Lab I
I



READ Product
Prototype Spec
ification

3



1 Introduction

Publications are the primary method of distributing the results that come from conducting
research. There

are approximately 4,600 universities (NCES, 2011) that “account for more than
half of the basic research conducted in the United States (McRobbie, 2012)”. Unfortunately,
many of these institutions lack an efficient online resource for organizing and displ
aying both the
publications resulting from their research and information about the grants that helped finance it.
Such a system would provide research universities and the departments therein, as well as the
students and professors performing the research
, with increased recognition and awareness of
their work.


One example of a university in need of an improved publication system is Old Dominion
University (ODU) in Norfolk, Virginia. Their Computer Science Department (ODUCS), in
particular, would benefit
a great deal from having an online well
-
maintained system for
publications and grants as it lacks one entirely. This department’s professors are burdened with
manually updating their own web pages to provide awareness of their recent publications. In the
p
ast there was a single web page for the entire department that was maintained by an individual
member of their Systems Group. However, this page was last updated in 2008, likely a result of
the slow, tedious, and manual nature of the process.




(This spac
e intentionally left blank.)


Lab I
I



READ Product
Prototype Spec
ification

4



1.1 Purpose



The team behind READ, a Repository for Electronic Aggregation of Documents, intends
to
alleviate the lack of quality online resources for displaying publications and grants
.
The
READ system w
ill use a scraper to provide researchers with a
method of organizing their
publications and grants in a format that allows for easy searching, sorting, filtering, and
browsing. Additionally, content authors will be able to verify that the listed publicatio
ns are
actually their own work in the event that READ mistakenly shows something written by another
researcher with the same name.

1.2 Scope

There will be a prototype READ system developed for ODUCS as a proof of concept to
display its most basic capabilit
ies. A prototype is necessary due to time constraints placed on
development. This prototype will provide public and private user interfaces to publication and
grant databases, user controls for publication verification, and most importantly, a scraper that

will gather links to publications automatically at set intervals to minimize manual effort.

The READ prototype will use data from real authors from ODUCS and the database will
be populated with their publications by Schaefer’s Scraper. It will offer nearly the same
functionality as the Real World Product (RWP). Due to
limited time for development
, the
prototype will not feature graphical representation of data about publications and grants, nor will
it implement a learning algorithm to automatically decide whether a publication does or does not
likely belong to a specific author. This is shown in
Table 1.



Lab I
I



READ Product
Prototype Spec
ification

5


Table
1



Side
-
by
-
side Comparison of Real World Product and Prototype

Features

Real World Project

Prototype

Browsing
Capabilities

Ability to browse all grants and
publication

Ability to browse all grants and
publications

Publication
Filtering
Capabilities

Filtered by title, publisher, authors,
publication date, date added, and
keywords.

Filtered by title, publisher, authors,
publication date, date added, and
keywords.

Grant Filtering
Capabilities

Filtered by title,
funding agency,
principal or co
-
principal
investigator, start date, end date,
and active state.

Filtered by title, funding agency,
principal or co
-
principal
investigator, start date, end date, and
active state.

Add, edit, and
delete publications
and grant
s

Included. A thumbnail image and
files may be associated with the
document. Fields can be
automatically filled in using a
Bibtext document.

Included. A thumbnail image and
files may be associated with the
document. Fields can be
automatically filled in us
ing a
Bibtext document.

Faculty page

Lists faculty and provides a link to
each person’s profile page

Not included.

Login interface

Linked to Old Dominion University
Computer Science accounts

Linked to Old Dominion University
Computer Science accounts


Lab I
I



READ Product
Prototype Spec
ification

6


Pr
ofile Page

Displays authors’ profile picture,
job title, email address, personal
webpage link, and the author’s
publications and grants. Displays
graphs

Displays authors’ profile picture,
job title, email address, personal
webpage link, and the author’s
pu
blications and grants. Graphs not
included.

Scraper

Will update the system with new
publications and grants and alert
users when one is added to the
system under their name.

Will update the system with
publications only and alert users
when one is added
to the system
under their name.

Prediction
algorithm

Predicts if the consumer has enough
space to use the READ system.

Not included

Administrative
Privileges

Administrators are able to edit, add,
or remove anything in the system.

Administrators are able
to edit, add,
or remove anything in the system.





(
This Space Intentionally Left Blank
.)




Lab I
I



READ Product
Prototype Spec
ification

7


1.3 Definitions, Acronyms, and Abbreviations


Administrator/Administrative User
: a user with increased privileges for editing database content

Author
: A person that is able to add and edit publications and grants to the system under their

name.

BibTeX
: A file format for reference information in XML format. It will be used to automatically

fill in key information when uploading or editing publications

and grants.

Computer Science (CS)
: An academic discipline based on advancing computing theory and

algorithm development, that sometimes includes theory about software engineering

methods.

Client application
:

In a client/server architecture, the module
that takes input and creates queries

to be processed by a server, and receives the results from the server.

Client/Server Architecture
: A software engineering paradigm that separates functionality into a

“client” application and a “server” application th
at interact.

CSS
: A programming language used to specify presentation of HTML pages

Data Mining
: The act of going through a source of input to find specific information.

Database Schema
: A description of the structure of database

Funding Agency
:

The source of funds for research grants.

These organizations usually have a

limited amount of money to (pass out) principle investigator’s that submit an accepted

application for research funds.

GIT
: A software system for controlling and organizing s
oftware versioning.

GoogleScholar
: Google Scholar provides a simple way to broadly search for scholarly literature.


Lab I
I



READ Product
Prototype Spec
ification

8



Graphical User Interface (GUI)
: A computer interface composed of icons, text fields, menus, etc

that can be interacted with via a mouse a
nd keyboard, through which a user interacts with

a software application.

Used to differentiate from a “command
-
line interface”, in which

a user interacts with a software application solely through a text terminal.

JQuery Sparklines
: A development librar
y for the visualization of data.

ODU
: Old Dominion University.

MicrosoftAcademic
: Microsoft Academic Search is a free service developed by Microsoft

Research to help scholars, scientists, students, and practitioners quickly and easily find

academic conte
nt, researchers, institutions, and activities.

MySQL
: A relational database management system

Parse
: A technical term usually used to describe the processing of a statement written in a

programming language.

May be used generally to describe the process
ing of any

statement for specific meaning.

Perl
: A widely
-
used programming language on the server
-
side of web applications.

PHP
:

A widely
-
used programming language on the server
-
side of web applications.

Principle Investigator (PI)
: The primary researche
r that a research grant is bestowed upon,

responsible for documenting the work and publishing research results.

Publication or Academic Publication
: A document created by a faculty member

to share

research.

They are usually published in an academic jou
rnals, technical reports, and

records of conference proceedings.

Query
: An algorithm sent to the database to either change the database or get back results

READ
: Repository for Electronic Aggregation of Documents


Lab I
I



READ Product
Prototype Spec
ification

9


RSS
: A system for subscribing to and distr
ibuting news.

Scraper
: An automated application designed to scan a source of input such as a document or a

website for pertinent information.

Server application
: In a client/server architecture, the module that takes queries or requests from

a client mod
ule, process them, and returns the result to the client.

Software Compatibility
: A description of whether different softwares, or versions of software,

can communicate/interact.

SQL
: A widely
-
used programming language used to query databases.

SQL
injection
: Performing unauthorized queries on a database for malicious purposes.

User Authentication
: The process of verifying the access credentials of a user of an automated

system, usually accomplished by requesting a username and password combination.

Viewer
: In the scope of our project an outside person who wishes to query the information

contained in the READ database.

Version Control
: A method for organizing and recording different versions of documents that

have been created over time.

Virtual Se
rver
:

A software version of a hardware server.

Webserver
:

A group of applications run on a computer or VPS in to serve webpages and provide

server
-
side computation for browser
-
based client applications. A web server is a

constantly “on” resource whose
sole or main job is to respond to HTTP requests from

browsers.

XML
: Extensible markup language.




Lab I
I



READ Product
Prototype Spec
ification

10



(This space intentionally left blank.)

1.4 References

McRobbie, Michael A (2012, December 19).
The Multibillion
-
Dollar Threat to Research
Universities
. From

The Chronicle of Higher Education:


http://chronicle.com/article/The
-
Multibillion
-
Dollar
-
Threat/136363/

Moss, Andrew.

(2013).

LAB I


READ Product Description.

Norfolk, VA: Author.

National Center for Education Statistics.
Degree
-
granting institutions and

branches, by controls

and level of institution and state or jurisdiction, 2010
-
11
. From the Digest of Education

Statistics:
http://nces.ed.gov/programs/digest/d11/tables/dt11_280.asp

1.5 Overview


This product specification details the features
,

components
,and capabilities

of the READ
prototype, as well all necessary hardware and software. The following sections
offer further
i
nformation to that effect.







(This space intentionally left blank.)


Lab I
I



READ Product
Prototype Spec
ification

11


2 General Description

READ is an automated system using a database to store links to articles, the publications
themselves and information about grants involved.

It will allow anyone with Internet access to
browse the lists of publications and filter them by author, date, keywords, and publication type. It
will minimize the need for manual effort on the part of the author by automatically finding their
publication
s making it easier to manage the work they have already done.

2.1 Prototype Architecture Description


The major functional components of the READ solution prototype are shown in Figure 1.
The scraper will comb through a pre
-
defined list of specific web sit
es, searching for new
publications by the author names given as input. It will then parse the results and export them to
a MySQL

database.


The database will store links to publications, information about the publications, and in
some cases, publications
themselves. It will also contain information about grants associated
with the aforementioned publications. Additionally, unique strings that identify the authors at the
external web sites will be store in the database.

The web interface contains both publi
c and private sections. The latter will be accessible
only to document authors and administrative staff. Access to this section will be strictly
protected by requiring user authentication before it can be viewed. The web interfaces will be
written in a com
bination of jQuery/javascript and PHP.


(This space intentionally left blank.)


Lab I
I



READ Product
Prototype Spec
ification

12


Figure
1



Major

Functional Component Diagram

2.2 Prototype Functional Description


Read will allow anyone with internet access to view publications, grants, and author
profiles. To access more features, the user will have to log in with valid ODUCS Linux/Unix
credentials. If invalid credentials are entered, the user will still be conside
red only a viewer.
Upon successful authentication, the user will be identified as either an author or administrator. If
the user is determined to be an author, she will have access to edit her own publication and grant
information, add missing publications

and grants, and edit the information displayed on her
public profile. Alternatively, if the user is an administrator, she will be able to edit or remove

Lab I
I



READ Product
Prototype Spec
ification

13


publication and grant information and edit anyone’s profile information. This process flow is
visualiz
ed in Figure 2.

Figure
2

-

User Process Flow



The scraper starts by searching for publications at external websites for authors from t
he
Computer Science Department. For each publication it finds, it checks to see if the publication is
already referenced in the database. If the publication is already in the database, the scraper will

Lab I
I



READ Product
Prototype Spec
ification

14


check to see if the author for whom it was searching i
s listed as an owner/author of the paper.
If
the author is not already associated with the work in the database, the association is made, but set
to an unapproved status. Otherwise, the scraper resumes searching for publications.

In the event that a scra
ped

publication is not already in the database, it is added to the
database and the user is added as an author. However, this publication will not be made viewable
yet as it will be in an unapproved status.

There will be a cron job that runs periodically w
hich
sends out e
-
mail notifications to authors that a publication has been attributed to them.

If an author denies ownership of a paper the database will be queried to determine if the
publication is owned by any other author in the database.

Should the q
uery return true, the author
for whom the scraper was originally searching is removed from the list of the publication’s
owners. Otherwise, the paper will be removed from the database entirely.

In the event that the
author confirms that she wrote the work
in question
, the database is queried to determine if there
are any authors who should also be added to the list of owners.

This chain of events is illustrated
in Figure 3.






(This space intentionally left blank.)


Lab I
I



READ Product
Prototype Spec
ification

15


Figure
3

-

Scraper Process Flow


2.3 External Interfaces

2.3.1 Hardware Interfaces


READ will not require any custom
-
built hardware. Any device with internet connectivity
and a web browser can be used to test its functionality.

2.3.2 Software Interfaces


A physi
c al server running the Microsoft Hyper
-
V hypervisor will host the virtual
machine where the READ solution is being developed.
The READ database will be hosted with
MySQL server. MySQL
client is a
command line client will be used to connect to the server
instance. The READ web site uses Joomla, an open source content management system
, and is
written with a combination of PHP and jQuery/javascript
.
Python

was used to write and modify
the scraper
.


Lab I
I



READ Product
Prototype Spec
ification

16


2.
3.3 User Interface


A site map showing the use
r

interfaces can be seen in Figure 4.

Figure
4

-

READ Site Map


















2.3.4


Communication Protocols and Interfaces


RE
AD will only make use of TCP/IP


Lab I
I



READ Product
Prototype Spec
ification

17


3


Specific
Requirements

3.1


Functional Requirements


UI Requirements (Jacob Phillmon and Marcus Zehr)

The UI is what a person using the READ system will actually see. It governs all the
functions of the READ display and allows people to interact with the system. Th
e UI will be
used by many types people including viewers, authors, and administrators, and extra interface
functionality will be provided for each. The UI must follow the following requirements:


3.1.1 Publications Query Page

This page is used to browse
through all publications in the system. Filters can be chosen to
narrow down specific publications of relevant interest. Publications consist of many forms of
academic media, including but not limited to articles in conference proceedings, journal articles
,
tech reports, and abstracts. They usually are based off research done by specific individuals. The
Publication Query Page must serve the following functional requirements:


1.

The page must initially display publications with those that were most recently p
ublished
at the top of the page.

2.

The page must allow the following filters for publications displayed

a.

Title.

b.

Authors

c.

Date published

d.

Date added

e.

Keywords


Lab I
I



READ Product
Prototype Spec
ification

18


f.

Publisher

3.

The page must display the following information for publications.


.

Title

a.

Authors

b.

Date published

c.

Date added to system

d.

Conference name or journal name or TR number

e.

Volume number

f.

Number of pages

g.

Page numbers

h.

Abstract (if available)

i.

A clickable link to where the publication is located.

j.

Thumbnail image

3.1.2 Grants Query Page

This page is used to browse
through all grants within the system. Filters can be chosen to
narrow down specific grants of relevant interest. Grants are lump sums of money awarded for
research to old dominion faculty. The Grant Query Page must serve the following functional
requiremen
ts:


1.

The page must initially display grants with those that were most recently granted at the
top of the page.

2.

The page must allow Grants must be filterable in all of the following ways.

a.

Title


Lab I
I



READ Product
Prototype Spec
ification

19


b.

Funding agency

c.

Principal or co
-
principal Investigator

d.

Start dat
e

e.

End date

f.

Active state

3.

Grants must display the following information:


.

Title

a.

Funding agency

b.

Award amount

c.

Principal and co
-
principal investigators

d.

Start date

e.

End date

f.

Division

g.

Award number

h.

Abstract

i.

A clickable link to where the grant is located


3.1.3 Main
Page

The main page is the home page of the READ application. It is the first page that will be
visited by anyone browsing the system. The most recent documents in the system can be found
here as well as the ability to navigate to other parts of the user in
terface. The Main Page must
serve the following functional requirements:



Lab I
I



READ Product
Prototype Spec
ification

20


1.

The main page must display the most recent publications and grants that have been added
to the system.

2.

The number of publications and/or grants displayed must match the current amoun
t set by
the system administrator or defaulting to a list displaying items published in the last 3
months.

3.

It must allow navigation to the following pages:

a.

Grants page

b.

Publications page

c.

Login page

d.

Profile page


3.1.4 Login Page

The login page will allow
registered users to log into the system for authentication
purposes. Logged in users will be able to edit publications or grants they have ownership of. The
Login Page must serve the following functional requirements:

1.

Provide an interface for a user to ent
er his or her login information

2.

All login information must be linked to Old Dominion University CS accounts.

3.1.5 Profile Page

The profile page will list information on the specific user currently logged in. This page
is used to view any publications and
grants associated with a specific user. Logged in users can
use the profile page to choose to edit their own profile as well as any publications and grants they
have ownership of. The Profile Page must serve the following functional requirements:



Lab I
I



READ Product
Prototype Spec
ification

21


1.

The prof
ile page must display all grants and publications the user currently has on the
system.

2.

Provide the user the ability to select an option to edit information in grants and
publications the user owns after said user has logged into the system.

3.

The profile pa
ge must display the following:

a.

Profile picture

b.

Job title

c.

Email addresses

d.

Personal webpage link

4.

Provide the user the ability to select an option to edit information displayed

5.

Provide the user the ability to select an option to add publications or grants
manually into
the system. This function must be block if the user has not logged in or is not the owner
of the specified profile page

6.

Provide the user the ability to edit the profile page information, as well as grant and
publication data; must be blocked
if the user has not logged in or is not the owner of the
specified profile page.

3.1.6 Publication Add Page

The Publication Add Page will allow the user to submit publications manually into the
system. The page will allow logged in users to add publication
s to the system if they do not wish
to wait for the scraper to add it in. The Publication Add Page must serve the following functional
requirements:


1.

Provide the user with the ability to enter publication fields manually.


Lab I
I



READ Product
Prototype Spec
ification

22


2.

Provide the user with the ability
to submit a BibTex document to automatically fill in
various fields.

3.1.7 Grant Add Page

The Grant Add Page will allow the user to submit grants manually into the system. The
page will allow logged in users to add grants to the system if they do not wish
to wait for the
scraper to add it in. The Grant Add Page must serve the following functional requirements:

1.

Provide the user with the ability to enter grant fields manually.

3.1.8 Profile Edit Page

The profile editing page will allow the user to submit
changes to their profile page. A
logged in user would use this page to edit information displayed such as their job title, additional
email addresses, personal website link, or profile picture. The Profile Edit Page must serve the
following functional requ
irements:

1.

Provide the user the ability to alter existing profile information including:

a.

Job title

b.

Email Addresses

c.

Personal webpage link

2.

Provide the user the ability to submit a profile picture.

3.1.9 Publication Edit Page

The publication editing page will
allow the user to alter their own publication
information. This page would be used to edit publications that were already stored in the system.
The main reason for this would be to fix any mistakes that may have been created during the
scraping process. Th
e Publication Edit Page must serve the following functional requirements:



Lab I
I



READ Product
Prototype Spec
ification

23


1.

Provide the user the ability to alter information within the publication data.

2.

Provide the user the ability to submit a Bibtext document to automatically fill in various
fields.

3.

Pro
vide the user the ability to remove the publication from the system if it is not their
own.

3.1.10 Grant Edit Page

The grant editing page will allow the user to alter their own grant information. This page
would be used to edit grants that were already sto
red in the system. The main reason for this
would be to fix any mistakes that may have been created during the scraping process.

The Grant
Edit Page must serve the following functional requirements:

1.

Allow the user to review existing grant information.

2.

Allow the user to alter information within the grant data.

3.

Allow the user to remove grants from the system if it is not their own.

3.1.11 Administration Page

The administration page houses all abilities that are restricted solely to system
administrators.
Administrators would use this page to edit system settings such as the number of
grants and publications displayed per query page

The Administration Page must serve the
following functional requirements.

1.

Provide an administrator the ability to set the def
ault number of publications displayed on
the publications query page.

2.

Provide an administrator the ability to set the default number of grants shown on the
grants query page.


Lab I
I



READ Product
Prototype Spec
ification

24


3.

Provide an administrator the ability to set the default number of grants or publi
cations
displayed in the RSS feed on the main page.

3.2 System User Requirements: (Jacob Phillmon)



The system users are people that access the read system through the UI and interact with it in
many ways. there are three different types of users, each
of which have unique privileges. The
system users are made up of the following types and requirements:

3.2.1 Viewer Requirements

Viewers are able to view the system but are unable to edit any information within it. They
are people that may wish to use the
system to view publications and grants that are already in the
system, but not add anything to it. Viewers must have the following capabilities:

1.

Viewers must have access to the following pages.

a.

Main Page

b.

Publications Query Page

c.

Grants Query Page

d.

Profile pa
ge

e.

Login Page

2.

Viewers are able to view grants and publications stored within the system.

3.

Viewers are able to view personal profile pages of registered users.

3.2.2 Author Requirements

Authors are both able to view the system and edit information in which
they have access
to. They are people that actually add publications and grants to the system under their own name.
They must have the following capabilities:



Lab I
I



READ Product
Prototype Spec
ification

25


1.

They must have access to the following pages:

a.

Main Page

b.

Publications Query Page

c.

Grants Query Page

d.

Profile page

e.

Login Page

2.

Authors are able to view grants and publications stored within the system.

3.

Authors are able to add grants and publications to the system manually.

4.

Authors are able to edit grants and publications they have ownership of.

5.

Authors are

able to edit their own profile page.

3.2.3 Administrator Requirements

Administrators Are

able to view the system as well as edit any information displayed on
the system. Administrators are separate from Authors in the fact that they don’t actually own any
publications or grants in the system. They are able to make adjustments to anything with
in the
system though. Administrators must have the following capabilities:


1.

They must have access to the following pages:

a.

Main Page

b.

Publications Query Page

c.

Grants Query Page

d.

Profile Page

e.

Login Page

f.

Administration Page


Lab I
I



READ Product
Prototype Spec
ification

26


2.

Administrators are able to view grants

and publications stored within the system.

3.

Administrators are able to add grants and publication to the system manually.

4.

Administrators are able to edit any grant and publication stored within the system.

5.

Administrators are able to edit any profile page.

6.

Administrators are able to set the default number of publications and grants displayed per
page.

3.3 Backend User Interface (Jim Lawrence Calderon)

3.3.1 Publications Page:

The backend of this page will be responsible for querying the database for the
info
rmation pertaining to publications that will be displayed to the viewer.

1.

By default, publications are queried to show the most recent publications first

2.

Alter results shown based on the following filters:

a.


Title

b.


Author



c.


Date published

d.


Keywords

e.


Publisher

3.3.2 Grants Page:

The backend of this page will be responsible for querying the database for the
information pertaining to grants that will be displayed to the viewer.

1.

By default, grants are que
ried to show the most recent grants first.

2.

Alter results shown based on the following filters:

a.


Funding agency


Lab I
I



READ Product
Prototype Spec
ification

27


b.


Award amount

c.


Investigator type

i.


Principle

ii..


Co
-
op Principle

d.


Start date

e.


End date

f.


Current activity status

3.3.3 Login:

1. The following must be verified:

a.


Username is alphanumeric

b.


Credentials entered by user exist in the database

c.


Password is correct

3.3.4 Editing:

The backend of this page will allow for
privileged users to update information on
publications and grants using a form with fields for each updateable field.

1. Update the following information within the database based on user input for:

a. Publications

i.

Title

ii.

Author

iii.

D
ate published

iv.

Keywords

v.

Publisher

b. Grants


Lab I
I



READ Product
Prototype Spec
ification

28


i.

Funding Agency

ii.

Award Amount

iii.

Investigator Type

1. Principle

2. Co
-
op Principle

iv.

Start Date

v.

End Date

vi.

Current Activity Status

3.3.5 Profile page:

The backend of this page will be responsible for querying the database for the
information belonging to the user who owns the particular profile.

1. Query the following information associated with the viewed profile page:

a.


Grants

b.


Publicatio
ns

c.


Profile picture

d.


Job title

e.


Email address

f.


Personal webpage link

2.Display the queried information.

3. Update information within the database based on user input.

3.3.6 Profile Editing:

1. Update the following information

within the database based on input:

a.


Profile Picture


Lab I
I



READ Product
Prototype Spec
ification

29


b.


Job Title

c.


Email Address

d.


Personal Webpage Link

3.4.Database Requirements(Andrew Sprague and Andrew Moss):



The READ database will be used to store all of the informati
on that the READ system will use
to display the information and run the Schaefer Scrapper. The following functional requirements
must be met:

3.4.1.Database must be made with MySQL



The READ database must be created using MySQL. Both the creation of tab
les and the
interfacing with tables will be done through a MySQL account. This will be done because of the
widespread use and access to MySQL

3.4.2.Database must be normalized



The READ database must be normalized. This will be done in order to keep the

database as
efficient as possible. Keeping the database efficient should allow for it to grow without taking up
a large amount of space.

3.4.3.Auths table



The READ database must include a table to store the authors information. The table exists so
tha
t the authors can be accessed by both the Schafer Scraper, for helping to associate the authors
with their documents, and so that their information may be displayed to viewers. The following
functional requirements must be met:

1.

Authors must have an AID int

as a primary key

2.

Authors must have a String Variable to hold Degree

3.

Authors must have a String Variable fname to hold the first name of the author


Lab I
I



READ Product
Prototype Spec
ification

30


4.

Authors must have a String Variable lname to hold the last name of the author

5.

Authors must have a UserName S
tring Variable

6.

Authors must have a Password String Variable

7.

Authors must have a String Variable Email to hold the Email Address

8.

Authors must have a String Variable Link to hold their personal webpage link

9.

Authors must have a String Variable Pic to hold the

location of the profile picture

10.

Authors must have a String Variable Pos to hold their position with the department

11.

Authors must have a Bit Variable CurrentFaculty to hold information about if the Author
is a current faculty member

12.

Authors must have a Bit
Variable Admin to determine if the author has

administrator privileges

3.4.4.Papers Table



The READ database must include a table for storing the information on publications. This
table must exist in order for the Schaefer Scrapper to store the informat
ion into and for viewers
to be able to access information on the papers.The following functional requirements must be
met:

1.

Papers must have a PID int as a primary key

2.

Papers must have a Date variable Paper_Date to hold the date the publication was added

3.

Pa
pers must include a String variable Title to hold the title of the paper

4.

Papers must include a TID foreign key to Tags

5.

Papers must include a String variable pData to hold information about the paper

6.

Papers must include a String variable link to hold the UR
L the paper is held at


Lab I
I



READ Product
Prototype Spec
ification

31


7.

Papers must include a VarChar variable Clevel to show the clearance level of the paper.
An ‘A’ will be stored for approved papers and an ‘U’ will be stored for unapproved
papers

8.

Papers must include a String variable Abstract to hold
the abstract of the paper.

9.

Papers must include a String variable PubType to hold information on what kind of
publication the record is.

10.

Papers must include an Int variable Year_Published to keep the year when the paper was
published.

11.

Papers must include a
String variable Date_Published to hold the month and day the
paper was published

12.

Papers must include a String variable ConName to hold the name of the convention or
journal that a paper was published in.

13.

Papers must include an int variable Volume to hold t
he number of the convention or
journal that the paper was published in.

14.

Papers must include an int variable NumPages to record the number of pages in a
publication.

15.

Papers must include a String variable thumbnail to hold information on the address of a
thumbnail uploaded to the system.

16.

Papers must include a String variable AuthString to list the authors in the format that was
given in the publication, and to simplify the process of citing authors not at the
university.

17.

Papers must include an int DOI to i
dentify the papers unique DOI number

18.

Every Paper must be associated with an author


Lab I
I



READ Product
Prototype Spec
ification

32


19.

3.4.5.Grants Table



The READ database must include a grants table to store information on grants. The table must
not be filled out by any scrapper but must have to be fil
led out by authors. The table must also be
accessed by viewers who wish to see the faculties grants. The following functional requirements
must be met:


1.

Grants must have an int GID as a primary key

2.

Grants must include a TID foreign key to Tags

3.

Grants must
include an Int variable StartYear to hold the start year of the grant

4.

Grants must include an Int variable EndYear to hold the end year of the grant

5.

Grants must include a String Variable StartDate to hold the month and day a grant started

6.

Grants must includ
e a String variable EndDate to hold the month and day agrant ended

7.

Grants must include a String variable OrgAttrib to hold the name of the organization
receiving the grant

8.

Grants must include a String variable FundAgency to hold the name of the Agency givi
ng
the funds

9.

Grants must include a String variable FundDirect to hold the name of the Directorate
providing the funds

10.

Grants must include an Int variable AwardNum to hold the ID number that the funding
agency placed on the grant

11.

Grants must include an Int
variable Amount to hold the amount the grant was for

12.

Grants must include a String variable GName to hold the name of the grant

13.

Grants must have a foreign key PI to Authors AID


Lab I
I



READ Product
Prototype Spec
ification

33


14.

Grants must have a foreign key to CO_PI called COPIs

15.

The same grant can not

show up multiple times


3.4.6.Tags Table



The READ database must include a tags table in order to store tag information on papers and
grants. The tags table must be filled out by the author for grants and by the Schaefer Scrapper for
papers. The follow
ing functional requirements must be met:


1.

Tags must have a TID to Identify the grant or paper it belongs to

2.

Tags must have a String Keyword to Identify what the tag is

3.

Every Tag Record must be associated with either a grant or a paper.

4.

When a paper is dele
ted there must be a cascading deletion of tags

5.

When a grant is deleted there must be a cascading deletion of tags

3.4.7.Owns Table



The READ database must include a owns table to associate authors to papers. This table must
be organized in a way that al
lows for multiple authors to be associated with one paper if
necessary. The following functional requirements must be met:


1.

Owns must have a Foreign key to Authors

2.

Owns must have a Foreign key to Papers

3.

Owns must have an int Priority to determine what auth
or has priority in edits

4.

Owns may not have two instances where Authors and Papers are the same



Lab I
I



READ Product
Prototype Spec
ification

34


3.4.8.CO_PI Table



The READ database must include a CO_PI table to associate authors to grants. This table must
be organized in a way that allows for
multiple authors to be associated with one grant if
necessary. The following functional requirements must be met:

1.

CO_PI must have a PI_Num

2.

CO_PI must have a Foreign key to Authors

3.4.9.SearchStrings Table



The READ database must include a SearchStrings

table to store information on how to search
for the authors on different websites. The table must have information that will tell the system
what to search and how to search it for each author. The following functional requirements must
be met:

1.

SearchStri
ngs must have an AID foreign key to Authors

2.

SearchStrings must have a Varchar string to the website they pertain to

3.

SearchStrings must have a VarChar String to specify the authors site code

3.4.10. Important fields can not be null



Some of the fields in

the database can not allow for null values. This is because these values
are required for the database to run. The following functional requirements must be met:



1.Paper Title



2.Grant Title



3.OrgAttrib



4.Funding Agency



5.Funding direct
orate



6. Agency Division


Lab I
I



READ Product
Prototype Spec
ification

35




7.Award number



8. Amount



9. PI



10.Keyword of Tags



11.Grant Start Date



12.Grant End Date



13.any Primary Key

3.4.11. Some fields values must be unique



1.User name



2.Primary Keys

3.4.12. Date
must be stored as YYYY
-
MM
-
DD



The reason that dates must be stored this was is that it is the ISO standard for writing dates.

3.4.13.

The Database must be accessed in the system through a MySQL account that has
limited privileges







(This space inte
ntionally left blank.)




Lab I
I



READ Product
Prototype Spec
ification

36


Figure
5

-

Database Schema



Lab I
I



READ Product
Prototype Spec
ification

37


3.5 Microsoft Academic Research Scraper and Results Processing (Troy Connor and Philip
McDonald)

3.5.1. Microsoft Academic Research Scraper

1.

Cron

Job set on intervals of once a month per user

2.

Text file to split users into groups

3.

Either Python or PHP to execute script

4.

List of indexes from scraped sites per user in database table

5.

Text file parser to read results from text file where results were save
d

6.

PHP script that checks database for existing pubs/grants so no duplicates will be added

7.

Regular Expression text parser to compare title results (some titles are not labeled the
same)

3.5.2. BibTex Results Parser (Philip McDonald)

1.

Parser shall be
triggered by initiation of scraper and input of search results.

2.

The parser shall check for valid BibTex file.

3.

The parser shall fail if the file format is not valid BibTex.

4.

If the parser fails due to non
-
valid BibTex, this failure shall be written to a logf
ile
specific to the parser component.

5.

If the parser finds valid BibTex, this success shall be written to a logfile specific to the
parser component.

6.

The parser shall check the file for content.

7.

If the parser fails to find any content in the BibTex file, th
is failure shall be written to a
logfile specific to the parser component.


Lab I
I



READ Product
Prototype Spec
ification

38


8.

If the parser finds content in the BibTex file, all entries (a BibTex 'type' entry) shall be
processed.

9.

Each entries shall contain at least two fields:

a.

author

b.

title

10.

If an entry cont
ains both fields, the entry is 'valid' and the data shall be retained for use in
the 'update database' component.

11.

If an entry does not contain both fields described in requirement 9, then the entry is
'invalid' and shall not be retained for use in the 'upd
ate database' component.

12.

If an entry is valid, it shall be checked for the following types:

// use Bibtex names, refer
to this


.

'article'

a.

'inproceedings'

b.

'book'

13.

If an entry is of one of the types referenced in requirement 12, it wil be checked for the
foll
owing fields:


.

'year'

a.

'volume'

b.

'pages'

14.

Certain fields are only included with specific types.

The following requirements are per
type.


.

'article' types shall be checked for the following fields:


i.

'journal' field.


Lab I
I



READ Product
Prototype Spec
ification

39


a.

'inproceedings' type shall be checked for
the following fields:


.

'booktitle'

b.

'book' types shall be checked for the following fields:


.

???

c.

(in general, the results provided by MAS do not contain all fields as required by the
BibTex standard.)

15.

If entry data is found after checking the fields
referenced in requirements 13 and 14.1
-
3,
it shall be associated with the specific entry and retained for use in the "update database"
component.

16.

Fields shall be formatted according to the format specified in the database schema before
being used in the "u
pdate database" component.

//reference schema requirements

3.5.3. Database Updater (Philip McDonald)

1.

The updater component shall be triggered by the parser component.

2.

The updater shall perform the update process for all entries ("set of entries") supplied

by
the parser.

3.

If the set of entries is empty, the updater will not perform the updating process.

4.

Each set of entries supplied by the parser shall be associated with a maximum of one
author, the "current author".

5.

The following requirements shall be met fo
r each entry:

a.

The updater shall check for duplicate publication using the title data using 'title'

b.

If a duplicate publication is found, the updater shall check the author of the paper.

c.

If a duplicate publication is found, and the current author of the dupl
icate publication is
not an owner of the publication, the current author shall be made an an author of the publication.


Lab I
I



READ Product
Prototype Spec
ification

40


d.

If a duplicate publication is found, and the current author of the duplicate publication is
an owner of the publication, the entry shall

be discarded.

e.

If the publication is not a duplicate, the following requirements shall be met:

i.

The 'Authors' table shall be updated to reflect ownership of the new publication.

ii.

The 'Paper' table shall be updated with a new row.

The new row shall contain t
he data
from the entry in the following format:

1.

The 'Title' attribute shall be written with the data from the entry's
'title' field.

2.

The 'Paper_Date' shall be written with the data from the entry's
'date' field.

3.

The 'Clevel' shall be set to the value corre
sponding to an
"unapproved" publication.


4.

If a certain data item is not included in search results for a grant or
publication, the database entry shall be left null

3.5.4. Email Notifier (Troy Connor)



The email notifier

is a tool that will alert the author of a publication that is found from the
scraper.

The email notifier will allow the user to decide if the publication that was found to
either be approved or disapproved.

If approved, the publication will remain in th
e database and
be able to be viewed.

If disapproved, the publication will be deleted and not viewable in the
system.

The email notifier will also alert the author of publications/grants that they have
uploaded.

The email notifier must have the following

requirements:

1.

An automatic email sent when publication is found for Author

2.

Link in email to activate publication awaiting approval


Lab I
I



READ Product
Prototype Spec
ification

41


3.

Link in email to delete publication awaiting disapproval

4.

Email notification to tell users to READ once a month

5.

PHP to execut
e script to alert users

6.

Cron job set to run email notification at intervals when required

7.

Email sent to verify uploaded submission (grant or publication)

8.

Email notification to alert user that profile has been changed