Data Clustering Engine

muttchessΤεχνίτη Νοημοσύνη και Ρομποτική

8 Νοε 2013 (πριν από 3 χρόνια και 8 μήνες)

74 εμφανίσεις

Data Clustering Engine (DCE)
can be used with formatted or
unformatted data. The data
does not need to be cleaned
or scrubbed to achieve very
high quality results.
The quality and performance
is such that it can be used on
large and extreme volumes of
data, as well as for very high
risk data matching projects.
DATA CLUSTERING ENGINE
PERFORMS VERY THOROUGH DATA GROUPING
Thorough analysis of relationships
Data Clustering Engine is capable
of performing extremely thorough
analysis of fi les to discover the
relationships between people,
companies, addresses, products and
other entities, by the names, addresses,
dates, codes and other identifying
attributes recorded about them.
Easy rules selection
Selectable rules control the degree of
analysis applied to each fi le or clustering
project. They can be adjusted to suit
the level of matching and throughput
required and to the quality of the data.
Compensates for spelling
and word variation
The matching of names and addresses
compensates for spelling, keying, word
variation and sequence errors in the
data. It addresses the issues of multi-
valued fi elds, including compound names
and account structures. The matching
of dates and codes compensates for the
particular error and variation found in
this class of data. It supports data from
over 60 countries, in local character sets.
www.identitysystems.com
Application software to match or group multiple files
using names and addresses and identification data
I DENTI T Y SYSTEMS PRODUCTS
Data Clustering Engine
Handles large and extreme
volumes of data
It is a highly scalable system that
operates effi ciently on single or
multi-processor computers running
Unix or NT. By using its own high-
performance database and multi-
threading techniques, large and
extreme volumes of data can be
processed, and it is not constrained
by operating system fi le size limits.
Achieves productive results quickly
With initial specialist help, Data
Clustering Engine can achieve
productive results in the user’s
environment in a few days. Because
of the rapid setup time, iterative
tuning with users can be based upon
actual results with production data,
rather than the typical lengthy design,
build, test and change process.
Rule Manager
The Rule Manager loads and manages
the Clustering rule defi nitions
under the control of a GUI Console
and Project Editor. Rule defi nitions
control input and output data views,
transformation logic, key-building,
search and matching rules & levels,
as well as cluster membership and
process rules.
Extract, Load, and Post
Utilities
The Extract and Load utilities read
data from fl at fi les, or directly from
the user’s source database, and load
it into the Clustering database while
building the required SSA-NAME3 key
indexes. User defi ned views specify
input fi le formats or database table
layouts and denormalization rules.
The Post utility extracts the Clustering
results to one or more output fi les
using a layout defi ned by the user.
Clustering Engine
The Clustering Engine is the main
processing component. It performs
the actual Clustering process on
the data. Depending on the type of
project, this may involve clustering
all of the data in the Clustering
database, appending new data to
the database, or matching external
data against the database.
Console server
The Console server allows the
Clustering Engine to run on a
different computer, controlled by
a networked or remote Console
client.
ISS-DB
ISS-DB
Application software to run data grouping and clustering
projects using names and addresses and identifi cation data.
Data Clustering Engine
Data Clustering Engine can be run either stand-alone or as
a component of a system or batch process. Its execution is
controlled by a GUI Console or by batch scripts.
SSA-NAME3
SSA-NAME3 is Identity Systems’ core
technology. It provides the intelligent
key-building algorithms, search
strategies and match decisions to the
Load Utility and Clustering Engine.
SSA-NAME3 makes available to the
DCE Standard Population rules for
over 60 countries and languages.
Clustering Rulebase and
Database
The Clustering Rulebase is where
the Clustering rules are loaded and
stored during a Clustering run. The
Clustering Database is where the data
is loaded and indexed and where the
Clustering process is performed. It is a
high-performance database designed
specifi cally to optimize Clustering
activity without the overheads of a
commercial database.
DCE Console
The DCE Console is the primary
administration tool for controlling
the Clustering rules and jobs. The
DCE Console manages various tasks
including monitoring the DCE Server,
editing and loading the Clustering
rules, defi ning and scheduling the
Clustering jobs, task management,
progress and logging. The Cluster
Viewer is launched from the Console.
Cluster Viewer
The Cluster Viewer is a Java GUI
application that allows a user to
review the output from a Clustering
run in a convenient screen-report
format.
Argentina
Australia
Belgium
Brazil
Canada
Chile
Czech Republic
Denmark
Finland
France
Germany
Greece
Hong Kong
Hungary
India
Indonesia
Ireland
Italy
Japan
Korea
Luxembourg
Malaysia
Mexico
Netherlands
Norway
New Zealand
Peru
Philippines
Poland
Portugal
Singapore
Spain
Sweden
Switzerland
Taiwan
Thailand
Turkey
United Kingdom
USA
Vietnam
Identity Systems supports identifi cation data from over 60 countries, including all Latin-based character sets, Arabic,
Chinese (Traditional & Simplifi ed), Cyrillic, Greek, Hebrew, Japanese, Thai and Unicode. Countries include:
www.identitysystems.com
© 2004-2007 Identity Systems, a Nokia company. All logos, brand and product names are or may be trademarks of their respective owners.
Prod_DCE_070227
I DENTI T Y SYSTEMS PRODUCTS
For other locations and distributors, visit
www.identitysystems.com/contact.htm