NBII Metadata Clearinghouse - Ecoinformatics 2008 - Eionet Projects

feelingmomInternet and Web Development

Dec 7, 2013 (3 years and 8 months ago)

103 views

U.S Geological Survey

National Biological
Information Infrastructure



Technical Overview:

NBII Metadata Clearinghouse




May 2008

Mike Frame

Topics for discussion


Metadata CH Background


New Metadata CH Design & Demo


Underlying Architecture


text

Describe and Discover

www

.

NBII

.

gov

PORTAL

My

.

NBII

.

gov

Content Management

Integrated

/

Federated Search

Collaboration Services

Database and Web

Services

Model Services

Geospatial Services

ITIS

DIGR

Catalog

Thesaurus

Mapping

Geoparsing

Catalog

Geo

-

referencing

Discovery

Catalog

Operations

Dublin Core

(

plus

)

UDDI

/

WSDL

??

OGC

/

ISO

FGDC

/

ISO

Distributed Applications

,

Databases

,

Websites

,

Tools and Models

Consume

Integrated View

Distributed

Services

Resource and

Service Catalogs

Distributed

Resources

Resource

Catalog

Geospatial

Services

Catalog

Geospatial

Dataset

Resource

Clearinghouse

Database and

Web Services

Catalog

Model

Services

Catalog

Services Overview

NBII Metadata Resources

http://metadata.nbii.gov



http://metadata.nbii.gov
Metadata Resources:

FGDC Metadata Program

Tool reviews

Training Opportunities

Resources for using
the Standard

NBII Clearinghouse

7 Sections make up the FGDC Standard:

1.
Identification Information

2.
Data Quality Information

3.
Spatial Data Information

4.
Spatial Reference Information

5.
Entity and Attribute Information

6.
Data Distribution Information

7.
Metadata Reference Information

Some basic metadata
facts…about the FGDC Standard

NBII Metadata CH

Rational for Metadata CH Redesign


User Feedback


Metadata creation


Metadata management


Metadata integration with data


Open architecture framework


Speed and Reliability


Data quality


Data visualization


License Costs


NBII Metadata CH provides:


Single portal

to information contained in disparate
data management systems


Free text, fielded, spatial, and temporal search

capabilities


Allow individuals and database managers to
distribute their data

while maintaining complete
control and ownership


Leverage investment
in existing information
systems and research


NBII is part of the Mercury Consortium @ ORNL

NBII CH: New Functionalities



Rich Client Interface


Combined search results (status page)


Filterring search results (Facet)


Dynamic sorting of search results


Bookmark brief and full metadata pages


Based on open source technologies:


Lucene


Solr

NBII CH New Functionalities
Cont..


SOA based design


Web services


RSS services for search results


Portlet support


Search Sharing support



Thesaurus Support


Seamless data ordering/data extraction with various
data partners


Seamless data visualization integration with external
visualization tools


Improved User Statistics Collection


The Clearinghouse is operated for
NBII by the Oak Ridge National
Laboratory

Over 38,000 records

41 partners contributing metadata
records

Ability to search in a variety of ways

Redesigned in 2008


The NBII Clearinghouse

NBII CH Demo


NBII Clearinghouse interface:
http://mercdev3.ornl.gov/nbii3/


How does the NBII Clearinghouse work?


How does the NBII Clearinghouse work?


How does the NBII Clearinghouse work?


How does the NBII Clearinghouse work?


Metadata CH RSS

World Data Center


http://wdc.nbii.gov

NBII Metadata Clearinghouse

Architecture

Metadata CH Architecture


CH Function of the NBII Metadata Program
Operated by ORNL


NBII is 1 Organization in Mercury Consortium


Established relationship in 2001


Formerly based on “Blue Angel
Technologies”


Currently based on Lucene/Solr Open
Source Technologies

3.
Remote users query the index via
a Web
-
based browser

6.
Highly detailed data
and documentation
are downloaded
directly from the
contributing agency

1.

Principal investigators create
detailed metadata and data files
using local applications or ORNL
-

OME

2. NBII
Mercury collects metadata and key data
from contributing agencies’ servers distributed
around the country and builds a centralized index

4.
Metadata summaries are returned to
the remote users, including

links

back to
detailed information and data at the PIs’
server or data repository

5.

Remote

users
select
links

to data
of interest

Index

Users

Virtual Internet Database

P.I. Summary



John Smith


Product A


Container: 1; 10/12/2003


Container 2; 01/20/2002


Container 3; 07/05/2001


Product B


Container 1; 03/05/1999


….

P.I. Name

Product Number

Product Title

Site

Subject Area

Thematic Area

Keywords

etc.

Distributed Data Discovery and Access

System

Custom

Export

Program

Existing

Database

Existing

Database

Existing

Database

Encrypted

XML

Index


Metadata exists in remote
legacy databases using any
platform, OS or RDBMS

Metadata are extracted into
XML files yielding
standardized data objects

Harvested metadata are

combined at the central site,

transformed (if needed),

and indexed

Users work with a single,
simple, web
-
like interface to
access all data simultaneously


Databases can be of
different structures
and content


Export programs are easily
written and automated

These files can be
remotely harvested via
the Internet

Frequent, automated
harvesting and complete re
-
building of the index keeps
the aggregate database up to
date

No re
-
programming
of existing systems
required

Business as usual
for contributing
databases

Encrypted

XML

Custom

Export

Program

Z39.50

or

WS

A Virtual Aggregate Database

NBII CH Design Diagram

Solr Schema for
defining the fields

Index metadata


records

NBII CH Harvester

FGDC
-
BIO

Transformed Files

MySQL

Mercury3_harvests_nbii

DB updater tool

(custom Java)

Solr Indexer tool

(custom java)

XML Beans to extract
the contents

SOLR Search Server

Extended Lucene Index

UI

Solr Searcher

(custom Java Spring)

Web Service

RSS

Portlets

External Metadata


http, ftp, web crawl


Future Development


Phase II (May 2008 to September 2008):


Harvester engine to use open source tools (Remove COTS) (Phase I & II)


Portal integration through JSR
-
168 Portlet standard


Search portlets, portlets for recent datasets, top most searched words etc..


Web service implementation (Phase I & II):


Thesaurus support (semantic web integration support)


Gazetteer web service implementation


OGC Catalog Service (include Web Mapping/Coverage/Feature Servers in
search)


Universal Description, Discovery, and Integration (UDDI) Directory Services


Dynamic RSS support, including Geo
-
RSS support


ISO 19115 support


OpenSearch support


Documentation and Help (Phase I & II)


User Statistics Application modifications


Phase III (October 2008 to January 2009):


Save, Retrieve and Email user queries


Possible integration to OPeNDAP


Web Service Harvesting (OAI)


Internationalization


????



Search technology using
Lucene/SOLR


Lucene


Overview


Who uses Lucene


Solr


Overview


Who uses Solr

Lucene Overview


High
-
performance, full
-
featured text search
engine library written entirely in Java


Mature Apache Open Source Java Project


Index speed and integrity, search speed


uses file based full text and
inverted index
ing


is extremely fast with built
-
in caching


Can easily handle millions of documents


Very active mailing list for support




Who uses Lucene





Wikipedia



MediaWiki



European Bioinformatics Institute


Liferay



Bigsearch.ca



Monster



Academic Archive On
-
line



Complete list:


http://en.wikipedia.org/wiki/Lucene


http://wiki.apache.org/lucene
-
java/PoweredBy


SOLR Overview


Open source enterprise search server based on the
Lucene Java

search library


Apache project, sub
-
project of Lucene


Advanced Full
-
Text Search Capabilities


Optimized for High Volume Web Traffic


Standards Based Open Interfaces
-

XML and HTTP


Solr uses Lucene search library and extends it

SOLR Overview
Contd..


A Real Data Schema, with Numeric Types, Date
fields, Dynamic Fields


Dynamic Faceted Browsing and Filtering


Advanced, Configurable Text Analysis


Highly Configurable and User Extensible Caching


External Configuration via XML


Scalability
-

Efficient Replication to other Solr Search
Servers


Administration Interface is available

Who uses SOLR


CNET Reviews



shopper.com



AOL Music



netflix



search.com



The Digital Commonwealth



mindquarry


for complete list:
http://wiki.apache.org/solr/PublicServers


Mercury Instances Demo


NBII Clearinghouse interface:
http://mercdev3.ornl.gov/nbii3/


ORNLDAAC interface:
http://daac.ornl.gov/



LBA Mercury interface:
http://mercdev3.ornl.gov/lba3/


DADDI Mercury interface:
http://mercdev3.ornl.gov/daddi3/



GFIS RSS Portal interface:
http://www.gfis.net/gfis/home.faces





User Statistics Report
Generation Tool

Open source Harvester Re
-
design
(Aperture)

Questions,

Comments,

Mike Frame

865 576
-
3605

mike_frame@usgs.gov

Thanks to:

Giri Palanisamy

Systems Architect and Team Leader

Mercury Consortium


palanisamyg@ornl.gov

Vivian Hutchison

NBII Metadata Program Manager

vhutchison@usgs.gov