The Archaeotools project, faceted

blabbingunequaledAI and Robotics

Oct 24, 2013 (3 years and 7 months ago)

63 views

The Archaeotools project, faceted
classification and natural language processing
in an archaeological context.


University of York, April 2008

AHRC
-
EPSRC
-
JISC eScience research grants scheme:


AIM: To allow archaeologists to discover, share and analyse
datasets and legacy publications which have hitherto been very
difficult to integrate into existing digital frameworks

BUILDS UPON: Common Information
Environment Enhanced Geospatial browser

PARTNERS: Natural Language Processing
Research Group, Department of Computer
Science, University of Sheffield

Joint Information
Systems Committee


Workpackage 1
-

Advanced Faceted Classification /Geo
-
spatial
browser


1m+ records; 4 primary facets (What, Where, When
and Media).



Workpackage 2


Natural language processing /Data
-
mining of
Grey Literature; plus tagging



Workpackage 3


Data
-
mining of Historic Literature; plus
geoXwalk

Three distinct Workpackages:


Datasets include:


National Monuments Records (Scotland, Wales, England)


Excavation Index (EH)


Archive Holdings


Local Authority Historic Environment Records



Thesauri include:


Thesaurus of Monuments Types (TMT)


Thesaurus of Object Types


MIDAS Period list


UK Government list of administrative areas, County,
District, Parish (CDP)


Not MIDAS

Work package 1

Oracl e

RDBMS

MIDAS XML
Record

Information
Extraction

RDF Resource

Knowledge
triple store

XML Docs of
Thesaurus

Query

User Interface

Information
Extraction

When, Where, What ontologies

as entries to faceted index

Input

Input

“WHAT”


Records that have

no
subject information






Records that use terms

not
found

in TMT, so these records
cannot be indexed (
6,442
unique terms)



Records (1,001,407)

19,269 records (2%)

Records (1,001,407)

101,507 records (10.1%)

“WHEN”


Records that have

no
temporal information






Records that use period terms

not
found

in MIDAS so these
records cannot be indexed (
457
types of irresolvable dates)



Records (1,001,407)

292,793 records (29.2%)

Records (1,001,407)

114,505 (11.4%)

1066, 1001
-
1100,11
th

Centuary, C11, 11C, Eleventh Century

“WHERE”


Records that have

no
spatial information






Records that use terms

not
found

in CDP, so these records
cannot be indexed.


Records (1,001,407)

11,126(1.1%)

Records (1,001,407)

245,601 records (24.5%)

linear


Workpackage 1
-

Advanced Faceted Classification /Geo
-
spatial
browser


1m+ records; 4 primary facets (What, Where, When
and Media).



Workpackage 2


Natural language processing /Data
-
mining of
Grey Literature; plus tagging



Workpackage 3


Data
-
mining of Historic Literature; plus
geoXwalk

Three distinct Workpackages:

XML tagging of semantic content

CIDOC:
CRM

University Researchers

Local authority curators


Workpackage 1
-

Advanced Faceted Classification /Geo
-
spatial
browser


1m+ records; 4 primary facets (What, Where, When
and Media).



Workpackage 2


Natural language processing /Data
-
mining of
Grey Literature; plus tagging



Workpackage 3


Data
-
mining of Historic Literature; plus
geoXwalk

Three distinct Workpackages:

http://ads.ahds.ac.uk/project/archaeotools
/