Essay/Assignment Cover Sheet

keckdonkeyInternet and Web Development

Nov 18, 2013 (3 years and 6 months ago)

110 views



1


M慲捨 3Ⱐ2011

Essay/Assignment Cover Sheet


Programme / Intake

Bachelor in Information Technology

F/T

P/T


Subject Code

CSCI311

____

Subject Title:

Software Process Management


X

Submission Type:
ESSAY / SEMINAR / TUTORIAL PAPER

/ AS
SIGNMENT



Individual Submission


Group Submission

Assignment Title:

Assignment 3



Student’s Name:

Melvyn Dominic Brown / S8203988J / 3934287_


Name in Full / NRIC* / UOW Student No.*


Wu Yang

/S8674996C/3934329
___


Name in Full / NRIC* / UOW Student No.*


Lee Wee Ying /



Name in Full / NRIC* / UOW Student No.*


Leow Lee Sze

/ _ _


Name in Full / NRIC* / UOW Student No.*



Khian Hong

/
___


Name in Full / NRIC* / UOW Student No.*








Contact No. or e
-
mail of group representative :

Melvyn_D_Brown@hotmail.com


X

Tutor’s Name:





PLAGIARISM: The penalty for deliberate plagiarism
is FAILURE in the subject. Plagiarism is cheating by
using the written ideas or submitted work of
someone else. The University of Wollongong has a
strong policy against plagiarism.


The University of Woll
ongong also endorses a
policy of non
-
discriminatory language practice and
presentation.


PLEASE NOTE:STUDENTS MUST RETAIN A COPY
OF ANY WORK SUBMITTED


DECLARATION: I/We certify that this is entirely my/our
own work, except where I/we have given fully
-
docu
mented references to the work of others, and that
the material contained in this essay has not previously
been submitted for assessment in any formal course of
study. I/we understand the definition and consequences
of plagiarism.



Signature:







________________ ____________________


LATE SUBMISSION

A written explanation for the late submission is required and lecturer reserve the right not to mark the
assignment if no valid reason is pro
vided. Busy work commitment is not a valid reason as it is the
responsibility of student to plan ahead since submission due dates were provided in the subject outline on
commencement of the course.

For Official Use


Submission Deadline :




Received On :




(SIM's stamp or Lecturer's acknowledgement)



Received Late On:






2


M慲捨 3Ⱐ2011

Table of Contents

Revision History

................................
................................
................................
.........................

3

Team Contribution

................................
................................
................................
.....................

3

1.0 Introduction

................................
................................
................................
..........................

4

1.1 Objectiv
es

................................
................................
................................
.........................

4

1.2 Requirements

................................
................................
................................
...................

4

1.3 SDLC Model

................................
................................
................................
.....................

4

2.0 Literature

Review

................................
................................
................................
.................

9

2.1 Search engine techniques

................................
................................
................................

9

2.2 Distributed System

................................
................................
................................
..........
11

3.0 Project Planning and management

................................
................................
.....................
12

3.1 Team distribution

................................
................................
................................
.............
12

3.2 Minutes of meeting

................................
................................
................................
..........
13

3.3 Risk Management

................................
................................
................................
............
13

4.0 Syst
em distribution

................................
................................
................................
..............
14

5.0 Test plans and system

................................
................................
................................
........
15

6.0 Project Management Software

................................
..............

Error! Bookmark not defined.















3


M慲捨 3Ⱐ2011

Revision History


Date <DD/MM/YYYY>

Ver
sion

Description

Author


1.0

Draft report structure

Melvyn


1.1

Use cases d
raft, Mins of
meetings,

wu yang, Lee Sze,
Khiang Hong


1.2


Lee sze, wu yang,
eve, Melvyn


1.3


Eve, Khiang Hong


1.4


Eve, wu yang, lee
sze, Melvyn,
Khiang Hong


1.5

Consolidate and finalize report

Melvyn, Lee Sze



Team Contribution


NAME

Contribution

Signature

Melvyn Dominic Brown

Contributed


Wu Yang

Contributed


Leow Lee Sze

Contributed


Low Wee Ying

Contributed


Khian Hong

Contributed






4


M慲捨 3Ⱐ2011

1.0 Introduction

1.1 Objectives

Building a distributed search engine

Each Group is to develop a
Distributed Search Engine
which can either be a web search engine that
crawls the World Wide Web (WWW), or a desktop search engine that crawls the local hard disk drives.

Our group did some research and discussion and has decided to develop the Web search engine
that
crawls the WWW

as we have more exposure relating to that.


1.2 Requirements

We will be using distributed system on a single laptop using virtualization techniques. This will save cost
and space for a small operation and is sufficient to demonstrate our web search engin
e.


The Search Engine

must be able to search by the following
functions
:



AND operation



OR operation









double quotes
for exact word match search

(e.g. “distributed search engine”)



( )


brackets operations (.e.g. (once upon a time AND “happily ever
after”) )



1.3 SDLC Model

Waterfall model is considered an inflexible form of Software Development Life Cycle (
SDLC
) as changes
are hardly done or revisited in previous phrase after moving to next phrase.

We recommend having the
Spiral model

approach as:



it combines some key aspect of waterfall model



Rapid prototyping methodologies

The spiral model has the following processes.



formulate plans to:
identify software targets, selected to implement the system, clarify the
project development restrictions



Ris
k analysis:

An assessment on risk of selected programs to identify and eliminate risk.



Implementation of the project:

Implement the system development and verify the programs.



5


M慲捨 3Ⱐ2011


Image of a spiral model

Advantages of Spiral Model



Design flexibility allows c
hanges to be implemented at several stages of the project.



Process of building up large systems in small segment makes it easier to do cost calculation.



Clients will be involved in the development of each segment, retains control over the direction and
implementation of the project.



Clients’ knowledge of the project grows as the project grows, so that they can interface effectively
with management.










6


M慲捨 3Ⱐ2011

1.4 Use Case Diagram










user
Search keyword and view webpage
Use Case Element

Description

Actor Name

User

Use Case
Description

User who

runs and use the web search function

Precondition

NA

Trigger

NA

Basic Flow

1) The actor enters the keywords that he want to search

2) The actor select link from search results to view the webpage

Exception

1) The actor enters an invalid keyword or
database does not contain the
keyword, the engine will return an error message/invalid search result.

Use Case Element

Description

Actor Name

User

Use Case
Description

User who runs and use the web search function

Precondition

NA

Trigger

NA

Basic
Flow

1) The actor enters the keywords that he want to search

2) The actor select link from search results to view the webpage

Exception

1) The actor enters an invalid keyword or database does not contain the
keyword, the engine will return an error messag
e/invalid search result.



7


M慲捨 3Ⱐ2011

1.5 Activity Diagram



1.

The Search Engine will search the keyword entered by the user in the database. If the
keyword is not found, the search engine will return an error message/invalid search
result.

2.

If the search is
successful, results will be displayed, else if search failed, the search
activity will exit.

3.

If search result is clicked, the webpage will be displayed. If not, search activity will exit.

4.

After webpage is displayed, the search activity will exit.


Enter keyword/s to be search
Display Results
Display webpage
Invalid Keyword/Keywords Not Found
Search Result clicked
Exit
keyword found
Exit


8


M慲捨 3Ⱐ2011

1.6 Sequ
ence Diagram


1.

User input keyword to search engine.

2.

Search engine will then search for the keyword in the data file.

3.

Data file will return the search result if there are any match in the data file.

4.

Search Engine will then display the results to User.

5.

User
will then select 1 of the links from the results.

6.

Search Engine will send a request signal to the website.

7.

Website will then display its webpage to User










data file
search engine
website
: User
1 : Input_keyword()
2 : Search for keyword()
3 : Return results()
4 : DisplayResults()
5 : SelectResults()
6 : SendRequest()
7 : DisplayWebsite()


9


M慲捨 3Ⱐ2011

2.0 L
iterature Review


2.1 Search engine techniques

In the simplest terms, search engines
collect data about a unique website by sending an
electronic spider to visit the site and copy its content which is stored in the search engine's
database.

Generally known as 'bots', these spiders are designed to follow links from one
document to the next.

As they copy and assimilate content from one document, they record
links and send other bots to make copies of content on those
linked documents.


Spiders are designed to read site content like you and I read a newspaper. Starting in the top
left hand cor
ner, a spider will read site content line by line from left to right. If columns are used
(as they are in most sites), spiders will follow the left hand column to its conclusion before
moving to central and right hand columns. If a spider encounters a link

it can follow, it will record
that link and send another

spider

both to copy and record data found on the document the link
leads to. The spider will proceed through the site until it records everythi
ng it can possible find
there.


Once a search spider fi
nds your site, helping it get around is the first priority. This is best
accomplished by providing easy to follow text links directed to the most important pages in the
site at the bottom of each document. One of these text links should lead to a text
-
base
d sitemap,
which lists and provides a text link to every document in the site. The sitemap can be the most
basic page in the site as its purpose is more to direct spiders than help lost site visitors, though
designers should keep site visitors in mind when

creating the sitemap.


Offering spiders access to the areas of the site one wants them to access is half the battle. The
other half is found in the site content. Search engines are supposed to provide their users with
lists of documents that relate to us
er entered keyword phrases or queries. Search engines need
to determine which of billions of documents is relevant to a small number of specific words. In
order to do this, the search engine needs to know your site relates to those words.


There are four b
asic areas, or elements, a search engine looks at when examining a document.
After the URL of a site, the first information a search spider records is the title of the site. Next, it
examines the Description Meta tag. Both of these elements are found in th
e section of the
source code.


Titles should be written using the strongest keyword targets as the foundation
.

Some titles are
written using two or three basic two
-
keyword phrases. A key to writing a good title is to
remember that human readers will see th
e title as the reference link on the search engine
results page. Don't overload your title with keyword phrases. Concentrate on the strongest
keywords that best describe the
topic of the document content.




10


M慲捨 3Ⱐ2011


Good content is the most important aspect of sear
ch engine optimization. The easiest and most
basic rule of the trade is that search engine spiders can be relied upon to read basic body text
100% of the time. By providing a search engine spider with basic text content, SEOs offer the
engines information
in the easiest format for them to read. While some search engines can strip
text and link content from Flash files, nothing beats basic body text when it comes to providing
information to the spiders.


The last on
-
site element a spider examines when readi
ng the site (and later relating the content
to user queries), is the anchor text used in internal links. Using relevant keyword phrases in the
anchor text is a basic SEO technique aimed at solidifying the search engine's perception of the
relationship betw
een documents and the words used to

phrase the link.

Reference List


About the Author
:
Jim Hedger is a writer, speaker and search engine marketing expert based
in Victoria BC. Jim writes and edits full
-
time for StepForth and is also an editor for the Inte
rnet
Search Engine Database. He has worked as an SEO for over 5 years and welcomes the
opportunity to share his experience through interviews, artic
les and speaking engagements.

In order to summarize the above mentioned, please refer to the diagram below.





11


M慲捨 3Ⱐ2011

2.2 Distributed System

Wu yang to fill in this part

















12


M慲捨 3Ⱐ2011

3.0 Project Planning and management

3.1 Team distribution



Team Lead


Is responsible to organize the team, resolve problems and issues that arise.
Meetings are organized every week to discuss on updates, changes and problems, and its
solutions.
He is also responsible for risk management and its solution.

Software

Architecture


Is
responsible for the

Use cases and Sequence diagrams. He is to
follow strictly on the requirements that have been discussed in meetings and come out with
diagrams that meets the requirements.

Project Administrator


Is responsible for the

meeting minutes,
and also the software tester role,
as we need an independent tester to test the software at its various stages.

Software Engineer


Is responsible for the designing, coding and implementing of the search
engine system.

Software tester


Is responsible for designing and testing the software. The test plans and
cases should be produced at later stage for test to be conducted.

She is also responsible for
Gantt Chart diagram.






13


M慲捨 3Ⱐ2011

3.2 Min
ute
s of meeting


Wu yang to put all her meetings of mins

here.


3.3 Risk Management

A risk plan is a list of all risks that threaten the project, along with a plan to mitigate some or all of those
risks. The risk plan is an insurance policy against certain unforeseen situations.

In order to mitigate risk,
we

have to first identify the risk, estimate impact, brainstorm for solution and work
on it.

In the early stage of software planning, we have

identified some risks that might

be present. Below are
the risks that were identified.

Risk Assessment

Risks

Impact

Actions to be taken

Risk

Open / Close
d

Limited expertise in area of
search engine.

The project cannot be
completed on time.

Research for
search
engine and how it works.


Closed

Database and programming
language cannot produce the
search engine that we
want

System cannot be
produced and we will
have to use a different
programming language
or database.

Research before starting
on the project if the
programming language
and database can be
used.


Closed

Lack of storage space for
information that are crawl
ed

As it is the world wide
web, we might be out of
storage space.

Analyze how much data
will be used to fill up 100
gigabyte of storage
space. Proceed to do the
math and come up with
an estimate of storage
space needed.


Closed

Will virtualization be bet
ter
than each individual OS
system?

Virtualization will be
slower and might cause
lag to users at peak
period
.

Test the system during
peak period with a high
amount of users (100)
accessing it at the same
time.

Open

SQL Injection to retrieve
sensitive
data from system

Sensitive data stored in
the database might be
retrievable by external
user.

Restrict command SQL
injection command to be
entered into the search
engine.

Open

Will there be a backup system
if the system goes down?

If system goes down, the

search engine function
cannot be access.

We need to have another
separate system to be
located at another
location from the first
system. This is for
contingency purpose.

Open



14


M慲捨 3Ⱐ2011

4.0 System distribution


This column is about the virtualization (need to state how the virtualization is done)

“The amount of distribution supported, for instance, are the data distributed, or is the processing, or
both?
Does your system tolerate failures of individual components?
20%”



15


M慲捨 3Ⱐ2011














5.0 Test plans and system

5.1 Test plans and cases





16


M慲捨 3Ⱐ2011























5.2 System screen shots






17


M慲捨 3Ⱐ2011