Why Desktop Search Engine - GoogleCode

perchmysteriousΔιαχείριση Δεδομένων

30 Νοε 2012 (πριν από 4 χρόνια και 10 μήνες)

251 εμφανίσεις



CSCI 311

Search Engine
Report

CSCI 311 Software Process Management



**********To be completed******************

1.2

Search Engine Type

---

Jizong

3.3

Desktop Search Engine Technologies
-----
This one can be excluded.

6.2

Non
-
Functional Requirements

-----
Crimson

7.2

Distribution

-----
Jizong

7.3

Crawler

8.5

Network Activity Diagram

-----
Jizong

8.7

Timeline

-----
Crimson

8.8

Effort Estimation

-----
Ziyun

11

Screenshots
-----
Ziyun

Can someone add in the log (to show we really use the svn) from google code ?

Done!

**********To be completed******************




System Report




1

Table of Contents

1

Application Specification

................................
................................
................................

3

1.1

Overview

................................
................................
................................
...........................
3

1.2

Search Engine Type

................................
................................
................................
............
4

1.3

IDE

................................
................................
................................
................................
....
5

1.4

Open Source Search Engine

................................
................................
................................
6

1.5

Lucene Features

................................
................................
................................
.................
7

2

Why Desk
top Search Engine

................................
................................
...........................

8

3

Introduction to Desktop Search Engines

................................
................................
.........

9

3.1

What is a Desktop Search engine

................................
................................
........................
9

3.2

How Desktop Search Engines works
................................
................................
....................
9

3.3

Desktop Search Engine Technologies

................................
................................
................

10

4

List of Search Engines
-

Desktop Search

................................
................................
........

11

4.1

Main Features and Benefits

................................
................................
..............................

12

4.2

Building a Desktop Search Engine

................................
................................
.....................

12

4.3

O
pen
-
source Search Engines in the market

................................
................................
.......

13

4.4

Use Lucene Indexing

................................
................................
................................
........

14

5

Objective

................................
................................
................................
.....................

15

5.1

Setting Goals and Objective

................................
................................
..............................

15

5.2

Sub Objective
................................
................................
................................
...................

15

5.3

Budget Breakdown

................................
................................
................................
..........

15

6

Functional

and Non
-
Functional

Specification

................................
................................

16

6.1 Functional Requirements

................................
................................
................................
........

16

6
.2
Non
-
Functional Requirements

................................
................................
................................

16

7

Technical Specification

................................
................................
................................
.

17

7.1

System Architecture

................................
................................
................................
.........

17

7.2

Distribution

................................
................................
................................
.....................

18

7.3

Crawler

................................
................................
................................
............................

18

8

Project Planning

................................
................................
................................
...........

19

8.1

Software Methodology

................................
................................
................................
....

19

8.1.1

Waterfall

................................
................................
................................
................................

19

8.1.2

Ration
al Unified Process (RUP)

................................
................................
.............................

20

8.1.3

Proposed Model

................................
................................
................................
....................

21

8.2

Risk and Counter Measure

................................
................................
...............................

22

8.3

Use
-
Case Diagram

................................
................................
................................
............

23

8.4

Sequence Diagram

................................
................................
................................
...........

24

8.5

Network Activity Diagram

................................
................................
................................

24

8.6

Role and Tabular Summary

................................
................................
..............................

24

8.7

Effort Estimation

................................
................................
................................
..............

25

8.8

Timeline

................................
................................
................................
..........................

26

9

Test Cases

................................
................................
................................
....................

27


System Report




2

10

Version Contr
ol Ideology

................................
................................
..............................

28

10.1

Introduction

................................
................................
................................
....................

28

10.2

Type

................................
................................
................................
................................

28

10.3

Setup

................................
................................
................................
...............................

29

10.3.1

Repository

................................
................................
................................
.........................

29

10.3.2

Client

................................
................................
................................
................................
.

30

11

Screenshots

................................
................................
................................
.................

33

12

Appendices

................................
................................
................................
..................

34

12.1

Project Meeting

Minutes

................................
................................
................................
..

34

12.2

References

................................
................................
................................
.......................

37





System Report




3

1

Application Specification


1.1

Overview

A distributed search engine developed ba
sed on Apache
-
Lucene Version 3.
4
.
0
. It is
able search files in all local hard drive within a very short period of time even though the
volumes of files are large. The search engine consists of mainly two parts and both run
independently of each other. The two parts are the SearchIndexer an
d the
SearchEngine.

SearchIndexer
Searches through all drive or folder and indexes files to a local indexing
database. This database will be utilized by the SearchEngine.

SearchingEngine
Base on the indexing database created, the system can access the
co
ntents of files and search for any keyword. The result will be return once the search
has been completed.


Specification

Chosen Option

Search Engine Type:

Desktop

Platform:

Windows

Programming Language

Java

Integrated Development Environment
(IDE)

NetBeans

Open Source Search Engine

Lucene


























System Report




4

1.2

Search Engine Type





System Report




5

1.3

IDE


NetBeans

is a platform framework for Java desktop applications, and an integrated
development environment (IDE) for developing with Java, JavaScript and other
common
application

development.

The latest NetBeans IDE version is 7.0 which
were

used by project
develop
ment.

The NetBeans IDE is written in Java and can run anywhere a compatible

JVM is installed, including Windows, Mac OS, Linux, and Solaris. A JDK is required for Java
development functionality, but is not required for development in other common programm
ing
languages such as C,

C++ and
PHP
.

There are other well known Java development tools such as Eclipse and Jcreator. However, we
have chosen Netbeans mainly because of several reasons.

1. Everything we need is available

Many required features are availabl
e out of the box. I can just download the pack I need and
start using it right away.

No coding need for drag and drop controls which save us lots of time.

2. Support for multiple packs

Netbeans is feature rich and I can select the appropriate packs for my
different needs and
resolve incompatible libraries easily.

3. Past experience

Most of the group membe
rs have experience using Netbeans

for java development in either
console or GUI applications.



System Report




6

1.4

Open Source Search Engine

Below is a list of options which
our team has shortlist:

Name

Description

Sphinx

Sphinx is a free software search engine designed with indexing database content in mind.
It currently supports MySQL and PostgreSQL

natively. It is distributed under the terms of
the GNU General Public License v2.


PhpDig

A

Web spider and search engine written in PHP, using a MySQL database and flat files. It
builds a glossary with words found in indexed static and dynamic pages. On
a search query,
it displays a result page containing the search keys, ranked by occurrence. It includes a
template system and can index PDF, Word, Excel, and PowerPoint documents using
external tools.


Lucene

Apache Lucene is a high
-
performance, full
-
feat
ured text search engine library written
entirely in Java. Full
-
text search & cross
-
platform. Apache Lucene is an open source project
available for free download.



We have chosen Lucene due to following reason stated below:

Active development community

This would help in picking up the usage of the search engine API. References and help are easily
found on the internet.


Support Language

Lucene used Java which our team is more comfortable with. This would help in reducing time to
research.
Based on Java,

it make our product platform independent.




System Report




7

1.5

Lucene Features


Feature provided by Lucene

Scalable, High
-
Performance Indexing



over 95GB/hour on modern hardware



small RAM requirements
--

only 1MB heap



incremental indexing as fast as batch indexing



index size

roughly 20
-
30% the size of text indexed


Powerful, Accurate and Efficient Search Algorithms



ranked searching
--

best results returned first



many powerful query types: phrase queries, wildcard queries, proximity queries, range
queries and more



fielded sear
ching (e.g., title, author, contents)



date
-
range searching



sorting by any field



multiple
-
index searching with merged results



allows simultaneous update and searching


Cross
-
Platform Solution



Available as Open Source software under the Apache License which lets you use Lucene
in both commercial and Open Source programs



100%
-
pure Java



Implementations in other programming languages available that are index
-
compatible


System Report




8

2

Why Desktop Search Engine


Our group decided to develop a desktop search engine rather than a web search engine due to
the fact a desktop is simpler compared with the web. For example if we have a web search
engine we will need to deal with more issues such protocol, the formatting,

HTML parser and
etc. Therefore we feel that we are able to implement and develop a desktop engine within the
given time frame with the experience and knowledge we have.



System Report




9

3

Introduction to Desktop Search Engines

3.1

What is a Desktop Search engine


Desktop searc
h engine is the name for a tool that a user uses to search files within the content
of its own computer. It is simply a software application that sorts through the large amount of
data in a hard disk or multiple using some algorithm and tries to locate wha
t a user is searching
as quickly and accurately.

Desktop search is emerging as a concern for large firms for two main reasons: untapped
productivity and security. A commonly cited statistic states that 80% of a company's data is
locked up inside unstructur
ed data the information stored on an end user's PC, the files and
directories they've created on a
network
, documents stored in repositories such as corporate
intranets

and a multitude of other locations. Moreover, many companies have structured or
unstruc
tured information stored in older
file formats

to which they don't have ready access.


3.2

How
Desktop
Search Engines works


How a search engine works is all about indexing, it collects, parses, and stores
data

to facilitate
fast and accurate information retri
eval.

The purpose of storing an index is to optimize speed and performance in finding relevant
documents for a search query. Without an index, the search engine would
scan

every document
in the hard disk, which requires processing power as well as taking
a longer time. For example,
while an index of 10,000 documents can be queried within milliseconds, a sequential scan of
every word in 10,000 large documents could take hours.


When indexing the files, desktop search tools will store three types of data, ma
inly:

1) File and directory names

2) Metadata, such as titles, authors, comments in file types such as mp3, PDF and jpeg

3) Content of supported documents.

For example Google Desktop search application when its

installed it starts indexing the PC's main drive.
The process, which only takes place when the computer is idle for 30 seconds or more, can take
anywhere from several hours to a few days, depending on the volume of data.

After the drive is scanned, indexi
ng takes place in real time with little effect on the computer's
performance.

To perform a search you simply type in the
keywords

or
phrases

you are looking for and click the
search

button. From that, you will be able to get the results of all those files that are relating to your keyword.

There are different types of searches you can perform with a Desktop Search engine for example
Keyword,
Phrase

and Boolean, The example queries be
low describe each of these types of searches:


System Report




10

3.3

Desktop

Search Engine
Technologies


System Report




11

4

List of Search Engines
-

Desktop Search


Name

Platform

Remarks

License

Autonomy

Windows

IDOL Enterprise Desktop Search.

Proprietary,
commercial

Beagle

Linux

Open source desktop search tool for Linux
based on Lucene.

A mix of the
X11/MIT License
and the Apache
License

Copernic
Desktop

Windows

Considered best overall search engine in
2005 UW benchmark study.

Free for home use

Docco

Cross
-
platform
(Java)

Based

on Apache's indexing and search
engine Lucene, and it requires a Java
Runtime Environment.

BSD License

Docfetcher

Cross
-
platform

Open source desktop search tool for
Windows and Linux, based on Apache
Lucene.

Eclipse Public
License

dtSearch
Desktop

Windows

-

Proprietary (30 day
trial)

Easyfind

Mac OS

-

Freeware

Everything

Windows

Find files and folders by name instantly on
NTFS volumes.

Freeware

Google
Desktop

Linux, Mac OS,
Windows

Integrates with the main Google search
engine page. 5.9 Release
now supports
x64 systems.

Freeware

GNOME
Storage

Linux

Open Source desktop search tool for
Unix/Linux.

GPL

imgSeek

Linux, Mac OS,
Windows

Desktop content
-
based image search.

GPL v2

InSight
Desktop
Search

Windows

Metadata
-
based search utility.

Freeware

ISYS Search
Software

Windows

ISYS: desktop search software.

Proprietary (14 day
trial)

Likasoft
Archivarius
3000

Windows

-

Proprietary (30 day
trial)



System Report




12

Meta Tracker

Linux, Unix

Open Source desktop search tool for
Unix/Linux.

GPL v2

Recoll

Linux, Unix

Open Source desktop search tool for
Unix/Linux.

GPL

Spotlight

Mac OS

Found in Apple Mac OS X "Tiger" and later
OS X releases.

Proprietary

Strigi

Linux, Unix,
Solaris, Mac OS
X and Windows

Cross
-
platform open source desktop
search engine.

LGPL v2

Terrier

Search
Engine

Linux, Mac OS,
Unix

Desktop search for Windows, Mac OS X
(Tiger), Unix/Linux.

MPL

Tropes Zoom

Windows

Semantic Search Engine.

Freeware and
commercial

Windows
Search

Windows

Part of Windows Vista and later OSs.
Available as Windows Desktop
Search for
Windows XP and Server 2003. Does not
support indexing UNC paths on x64
systems.

Proprietary, freewa



4.1

Main Features and Benefits


Normal search tools are extremely slow, scanning the hard disk for each search. And it will be
very time consuming as hard drives will contain hundreds of gigabytes of data. The time taken
for every search is practically the same, be it a normal tex
t file or an email. For example,
by using the ‘Windows search companion’ search tool, it can only search through windows
files and folders only, not e
-
mail or contact databases, and unless you enable the Indexing
Service. For a desktop search eng
ine, all you need to do is to index the folders/drives
that you would want to search. The indexing might take quite a while, but after it is
completed, your search results will be returned in just a few seconds. And for all
other
search thereafter, it will take the same amount of time, few seconds.


4.2

Building a
Desktop
Search Engine


We have created a simple Desktop Search Engine with a
Graphic User Interface (GUI)

by using
java swing where the user will be to index the fol
ders then do the searching. We make it very
simple just by indexing a specific folder rather than the whole hard disk, just to demonstrate
how Lucence API as well as the indexing works.


System Report




13

4.3

O
pen
-
source

Search Engines
in the market


Based on our group research
, we have identified some of the available open source search
engines which best fits our requirements.



A table comparing the indexing performance over this Twitter data set across the select vertical
search solutions:




System Report




14



Lucene

was the only solution that produced an index that was smaller than the input data size.
Shaves an additional 5 megabytes if one runs it in optimize mode, but at the consequence of
adding another ten seconds to indexing. sphinx and zettair index the fastes
t.


4.4

Use Lucene
Indexing


Based on these preliminary results and anecdotal information collected from the web and
people in the field Lucene (which is an
IR

library


use a wrapper platform like
Solr

w/
Nutch

with dressings like snippets, crawlers, servlet
s) for many vertical search indexing applications.

The reasons for doing so are as follows:



































System Report




15

5

Objective


5.1

Setting Goals and Objective


To develop a fully operational desktop search engine and a complete documentation by
25th November 2011 which meets the requirements specified in CSCI311 Assignment
1. The es
timated budget will be
$2
0000.


5.2

Sub Objective


In order to achieve our objective, we need to meet the sub
-
objectives stated below:



Allocate the tasks evenly to all members in the team



Consensus of decisions and tools to be used



Thorough test on search eng
ine



Not exceeding the allocated budget




5.3

Budget

Breakdown

Items

Cost

Quantity

Subtotal

Laptop

$2000

5

$10000

Labour

$100X20 days

5

$10000

Total



$20000


…..










System Report




16

6

Functional

and Non
-
Functional

Specification


6.1
Functional Requirements


The system should be able to do the
following functions
:


1.
Crawl
:




The crawler must be able to crawl all available local drives.



Must be able to search into content of txt file



Must be able to index desire suffix files



2. Query
:



Must be able to take in

B
oolean

operator
s



Able to return the correct result


6
.2
Non
-
Functional Requirements




Search query is able to return result within 5 sec in 99.99% of the time





System Report




17

7

Technical Specification


7.1

System Architecture




System Report




18

7.2

Distribution

<How we going to make it
distrubted>


7.3

Crawler

<the scope of the crawler>

<where is it supposed to run>




System Report




19

8

Project Planning


8.1

Software Methodology

8.1.1

Waterfall


Waterfall model is a sequential design process in which the project phase flow from top to
bottom. T
his model is suitable for

software management because it provide certain artifacts at
different stages of the model. Each stage is distinct and must be completed before proceeding
to the next time.


Due to the rigid design, there is no feedback to previous stages.
A

requirement ch
anges during
the implementation will be very hard to make amendments.

No working prototype will be
created until the later stage. This will make it harder to integrate and visual the final product.







System Report




20

8.1.2

Rational Unified Process (RUP)


This model is an ite
rative software development process framework. The 4 phase of this model
are:

-

Inception
:

o

Get basic requirement, identify business risks, establish scope of system

-

Elaboration
:

o

Design, implement, baseline an executable architecture; address major technica
l
risks

-

Construction
:

o

Deploy internal & alpha releases, address user’s needs, end phase by providing
full functional release with support documentation

-

Transition
:

o

Ensure software meets user requirements; fine tune, configure, and installation
of final p
roduct


RUP is deliberately flexible making use of the 4 phases, multiple disciplines and iterations.

It is
able
to resolve the project risks associated with the client's evolving requirements requiring
careful

changes request management. Lesser time is
required for integrated as the process goes on
throughout the entire project phase.

The downside of this model is it required all team members to be expert in their field to develop
software under this methodology. The development process is complex and ma
y not be easy to
understand.



System Report




21

8.1.3

Proposed Model

Our proposed model will be Waterfall model.
Despite the disadvantages of the model, this
project is still suitable.


Here are our key reasons for the model:

-

Well
-
defined requirement would mean there will be litt
le or no changes during the
entire project phase.

-

Small scale project and tight schedule make suitable for this model

-

E
asy to understand the progress of the project due to the distinct stages. This is useful
for our team as our team experience

in project
management is little
.




8.2

Risk and Counter Measure

No

Risk

Description

Impact

Likelihood

Counter Measure

Applicable

1

Duration

Duration for the entire project is tight

5

5

Develop project plan and
monitor
project progress closely

All

2

Management

Team members’ knowledge of project
management might be limited

4

3

Choose simpler method and start
of slowly

All

3

Technical

Experience of developing software
might be limited.

3

3

Use the programming
language
most members are proficient with.

System Architect,
System Analyst

4

Development
cost

Project budget affect the tools we used
to develop software



As there is no budget, we will opt
to use open
-
source tools.

All

5

Communication

As group member has other
commitments, it will be hard to arrange
meeting.

4

2

Meet regularly after lesson and
hold online meeting to update on
project progress via MSN and
emails.

All

6

Unrealistic
project goal

If the project goal is unrealistic,
planning for it will face many
difficulties.

4

3

Review of project goal need to be
done regularly according to the
progress

All

7

Poor reporting
of project
status

Team member giving poor report will
result in poor decision made by the
project management w
hich in turn
affect the project progress

4

2

Each module need to be check and
integrate if possible. This help to
detect any possible error earlier.

System Analyst,
tester, designer

8

Exceed budget

This will affect the project cost

5

1

As there is no
budget for this
project, we will opt for options
that are open
-
source

All

9

Missing
deadline

If certain stages deadline is delayed,
the other stages cannot proceed. This
will affect the entire project timeline.

3

3

Hold regular meeting to know
team progress and make
contingency
plan for phase for
stages that are likely to be delay

All


8.3

Use
-
Case Diagram





System Report




24

8.4

Sequence Diagram



8.5

Network Activity Diagram



8.6

Role and Tabular Summary

Roles

Description

Project

Manager

-

Responsible for the progress of the project.

-

S
chedule mee
ting

-

Allocate task to members

-

M
onitor and ensure progress to meet deadline.


System Architect

-

D
esign architecture

-

I
dentify key components for the system


System Analyst

-

Research existing

solution

-

Design and implement module


Tester

-

Design test cases

-

C
arry out the test to ensure functionality


System Report




25


Designer

-

Design User Interface of the system to integrate the functionality



Due to tight schedule and small project scope, team member have to

take up multiple role to
speed up project progress. Below is the role allocation table:

Role

Boon Ping

Crimson

Jizong

Kelvin

Ziyun

Project Manager





x

System Architect


x

x



System
Analyst

x

x

x

x


Tester

x





Designer




x


Documentation

x

x

x

x

x


8.7

Effort Estimation



8.8

Timeline




System Report




27

9

Test Cases


S/N

Test Case

Input

Expect Output

Actual Output

Result

Remarks

1

Empty Field

empty

No results found

No results found



2

Single word that
does not exist

ASDQWEASD

No results found

No results found



3

Single word

Microsoft

All TXT files containing Microsoft

All TXT files containing
Microsoft



4

SPACE

Microsoft Oracle

All TXT files containing Microsoft
Oracle

All TXT files containing
Microsoft Oracle



5

AND

Microsoft AND
Oracle

All TXT files
containing Microsoft
and Oracle

All TXT files containing
Microsoft and Oracle



6

OR

Microsoft OR
Oracle

All TXT files containing either
Microsoft or Oracle

All TXT files containing either
Microsoft or Oracle



7

-

Microsoft
-
Oracle

All TXT files
containing Microsoft
and without Oracle

All TXT files containing
Microsoft and without Oracle



8

“”

“Microsoft Oracle”

All TXT files containing the exact
words Microsoft Oracle

All TXT files containing the
exact words Microsoft Oracle



9

AND and OR

Microsoft AND
Oracle OR Apple

All TXT files containing Microsoft
Oracle and Microsoft Apple

All TXT files containing
Microsoft Oracle and
Microsoft Apple



10

AND and OR and
-

Microsoft AND
Oracle OR Apple
-

HP

All TXT files containing Microsoft
Oracle
without HP and Microsoft
Apple without HP

All TXT files containing
Microsoft Oracle without HP
and Microsoft Apple without
HP



11

AND and OR and
-

and “”

Microsoft AND
Oracle OR Apple
-

“HP”

All TXT files containing Microsoft
Oracle without HP and
Microsoft
Apple without “HP”

All TXT files containing
Microsoft Oracle without HP
and Microsoft Apple without
“HP”





10


Version Control

Ideology


10.1

Introduction

Version controlling

is a critical tool for software development team
because of the

frequent
changes to code and document made during the entire project phase. It also allow multiple
team members to update the same files at the same time


Each changes made to document are recorded down. This promotes accountability and makes
it easier to

solve problems by rolling back to an earlier version if a serious mistake is made.


The main 2 model for version control are client
-
server model and distributed model. In client
-
server model, there will only be a single repository in the server while the
rest of the team
member will use client to update or retrieve the main copy in repository. In distributed
approach, each team member work directly with their repository, any changes are shared
between repositories as a separated step.


10.2

Type

Open
-
source and

P
roprietary

As there is no budget for this project, we will choose to use open
-
source.


Client
-
server and Distributed

Main features of distributed system:

-

Easier to work without a network connection because you can commit changes to your
own repository

-

Po
ssible to have multiple ‘central’ branch for different use such as development, stable
branches

-

Action such as committing and view history log are very fast as there is no need to
access the central server


Main feature of client
-
server:

-

Easier for a sing
le person to keep control of the whole history and project access

-

A master copies are kept centrally rather than having multiple competing version


We decided to choose traditional client
-
server approach over distributed approach as
our team
has some exper
ience in this approach and
it is easier to understand workflow. We do not have
any restriction over network connection thus this shouldn’t be an issue.




System Report




29

10.3

Setup

10.3.1

Repository

Our ideal repository has to be:

-

reliable to allow team member to save changes
anytime

-

secure to avoid unauthorized access to our source code and document

-

easy to setup due to the tight project
schedule

-

easy to use as not all of our team members are familiar with version control concept


Our chosen repository is

Google Code
.


The pro
ject host can be setup via
http://code.google.com/hosting/





System Report




30

10.3.2

Client

We have chosen Tortoise SVN as client due to the ease of use. A
ll team members need to pick it
up after a short period
.


The tortoise clie
nt can be download
ed

at
http://tortoisesvn.net/
. The URL to our repo
sitory is:

https://csci311
-
distributed
-
search
-
engine.googlecode.com/svn/trunk/

To check out, use the checkout function and use enter the

URL:







System Report




31

10.3.3

Tortoise SVN

Log


Log Messages






System Report




32

Log Statistics



System Report




33

11


Screenshots



1.

Text Box

a.

Enter the key words for your search

b.

Operators such as AND, OR,
-
, “” and combination of operators are supported

2.

Search Button

a.

Click on this button to begin searching

3.

Total Hits

a.

This is the total results returned from the search

4.

File Name

a.

Filename which consist of the key word searched

b.

Double click on the cell to open up the file directly

5.

Directory

a.

Directory where the file resides

b.

Double click on the cell to open up th
e directory



System Report




34

12


Appendices


12.1

Project
Meeting

Minutes

Meeting No:

01

Date:

28th October 2011

Location:

School Lab Level 5
-
17C/D

Time / Duration:

2100 Hrs / 1 hour

Present:

Crimson Thia

Kelvin Yap

Marcus Lin

Ng Boon Ping

Zhuo Jizong


Topics Discussed:




Everyone to research on crawler



Outlined project requirements


Raw Information (notes taken down
during meeting):



First of all, we need to understand
what is crawler
. It is something we
been using often but no experience
in
implementing it.



Open source
project which can
refer
to




What are the requirements of the
project
, the functional ones and
non
-
functional.



What is the timeline given
, base on
the timeline, we need know what
are the tasks in each phase of
development


Meeting No:

02

Date:

4th November 2011

Location:

School Lab Level 5
-
17C/D

Time / Duration:

2100 Hrs / 1 hour

Present:

Crimson Thia

Kelvin Yap

Marcus Lin

Ng Boon Ping

Zhuo Jizong


Topics Discussed:



.
Decision on Desktop or Web
search engine


System Report




35



.
Tasks Allocation



.
Tools to be used




Raw Information (notes taken down
during meeting):



Based on our research, w
e have
decided to develop desktop search
engine after
the
discussion
, from
the group member

s perspective,
desktop search engine is easier to

implement and test.



Tasks are distributed to all the
members



Decisions on what
programming
language we going to use
, we
decided to use java as there are
more source code released using
Java.



Decisions on what developing tools
we

are
going to use
, we have
decided
to use Netbean as it is
comprehensive compare to other
development tools.



Decisions on what version control
we going to use
. SVNTortoise was
chosen as we have group members
that have lots of experience using it.



Meeting No:

03


Date:

11th November 2011

Location:

School Lab Level 5
-
17C/D

Time / Duration:

2100 Hrs / 1 hour

Present:

Crimson Thia

Kelvin Yap

Marcus Lin

Ng Boon Ping

Zhuo Jizong


Topics Discussed:




Proto
t
ype of search engine



Documentation first draft



Distributed system






System Report




36


Raw Information (notes taken down
during meeting):



Demo
nstrations

on basic in
dexing
and searching
using console
application. Discover various
problems such as duplicate
indexing and
failure

to index.



Integration on all document
works
done by group member
. Reviewed
on the critical sections such as
development methodology and
application specification.



Decision on distribution techniques
that is to applied by our search
engine
.


Meeting No:

0
4


Date:

1
8
th November 2011

Location:

School Lab Level 5
-
17C/D

Time / Duration:

2100 Hrs / 1 hour

Present:

Crimson Thia

Kelvin Yap

Marcus Lin

Ng Boon Ping

Zhuo Jizong


Topics Discussed:




Demonstration of product




Documentation
Second draft



Distributed system






Raw Information (notes taken down
during meeting):




Demo on
the fully
functional
indexing and searching with GUI

and
Boolean

operator.



Integration on all document works
done by group member



Update of project documentation






System Report




37

12.2

References

Apache Lucene,

Apache Lucene
-

Overview,

http://lucene.apache.org/java/docs/index.html


Apache Lucene,

Lucene 3.4.0 core API,

http://lucene.apache.org/java/3_4_0/api/core/index.html


TortoiseSVN,

About TortoiseSVN,

http://tortoisesvn.net/


Wikipedia The Free Encyclopedia
,

Index (search
engine),http://en.wikipedia.org/wiki/Search_index


Wikipedia The Free Encyclopedia,

Web search engine,

http://en.wikipedia.org/wiki/Web_search_engine


Wikipedia The Free Encyclopedia,Desktop search,

http://en.wikipedia.org/wiki/Desktop_sear
ch


Wikipedia The Free Encyclopedia,

Systems development life
-
cycle,
http://en.wikipedia.org/wiki/Systems_development_life
-
cycle


Wikipedia The Free Encyclopedia,

Software development
methodology,http://en.wikipedia.org/wiki/Software_development_methodologies