Relational Evaluation Techniques

journeycartΤεχνίτη Νοημοσύνη και Ρομποτική

15 Οκτ 2013 (πριν από 3 χρόνια και 5 μήνες)

64 εμφανίσεις

Relational Evaluation Techniques

Daniel McEnnis

1
/29

Outline


Definition


Component Overview


Existing Approaches


Descriptions of the Components


Applications and Examples

2
/29

Relational Evaluation Techniques
Definition


Experimental setup for evaluating the
performance of algorithms that use data
that span more than one table or
instance vector


Can use either relational algebra or
hypergraph
-
based descriptions

3
/29

Components


Data Acquisition


Ground Truth Acquisition


Cross
-
Validation Technique


Query Type


Scoring Metric


Significance Test

4
/29

Existing Approaches


Machine Learning


Relational Machine Learning


TREC


Collaborative Filtering


ISMIR


Social Network Analysis

5
/29

Machine Learning


Predetermined flat data, no sampling


Predetermined ground truth


Typically simple queries


Sophisticated cross
-
validation


Basic set based metrics


No significance tests

6
/29

Relational Machine Learning


Predetermined relational data


Predetermined ground truth


Predefined simple query


Sophisticated cross
-
validation


Basic set
-
based metrics


No significance tests

7
/29

TREC


Predetermined flat data


Sophisticated ground truth sampling.


Sophisticated queries


Machine
-
learning cross
-
validation


Ranked set
-
of
-
sets scoring


Simple significance tests

8
/29

Collaborative Filtering


Predetermined flat/relational data


Predetermined ground truth


Simple, predefined query


No cross
-
validation


Sophisticated Scoring metrics


No significance tests

9
/29

ISMIR


Sampled flat data


Predetermined ground truth


Sophisticated queries


Machine
-
learning cross validation


Simple set based scoring metrics


Sophisticated significance tests

10
/29

Social Network Analysis


Sophisticated data sampling


Sophisticated statistical techniques

11
/29

Sequences of Choices


Plug ‘n play an experiment


Different aspects are evaluated


Some algorithms simply don’t work


Extensive algorithm rewrites sometimes
needed

12
/29

Data Acquisition


Data structure


Where is it?


What sampling technique to use


Random Access


Snowball


Hypergraph Snowball


How much data is needed?

13
/29

Ground Truth Acquisition


What is being tested?


TREC extended ground truth sampling


Structure of the output

14
/29

Cross
-
Validation


Actor Based


Link Based


Graph Based


No Cross Validation

15
/29

Graph Notation


Actor definition


Link definition


Graph definition


Database table / instance vector
equivalence


Foreign key / link equivelance

16
/29

Actor Cross
-
Validation


Traditional Machine Learning approach


Divisions by database table


Folds usually random assignment


Works well on flat data


Trouble with relational data

17
/29

Link Cross Validation


Rare machine learning approach


Divisions by foreign key reference


Less statistical independence than actor


Works for collaborative filtering


Usually random assignment

18
/29

Graph Cross Validation


Relational Machine Learning


Divisions by predetermined discrete
graphs


Statistical independence


Non
-
learning based approaches


Clustering based fold generation

19
/29

No Cross Validation


Standard over fitting problems


Useful after implied cross
-
validation

20
/29

Query Type


Information Need definition


Actor based query


Set or List based query


Conditional queries

21
/29

Scoring Metrics


Comparisons against ground truth


Set based metrics


Ranked based metrics


List based metrics

22
/29

Set Based Metrics


Recall and Precision


F
-
Measure


Mean Average Performance

23
/29

Ranked List Metrics


Pearson Correlation


Spearmans Correlation


Mean Absolute Error


Linear Algebra Distance Metrics


Serendipity

24
/29

Ordered List Metrics


Half Life


Kendall Tau


NDPM


Sequence Alignment Algorithms


Hamming Distance

25
/29

Significance Tests


Pairwise student t
-
test


ANOVA


ANOVA/Tukey
-
Kramer statistical test

26
/29

Evaluation Questions


Does the data contain time (global
ordered sequence)


Actor
-
, Link
-
, Graph
-
, or Set
-
based
queries


List, Set, or Set
-
of
-
Lists output


Contextual question or absolute


Statistical purity versus maximum
information

27
/29

Music Recommendation


Example
-

Personalized Dynamic Tag Radio


LastFM profile data


LastFM tag data


Semantic Web data


Next
-
week
-
data ground truth


Conditional query


Graph cross
-
validation


Kendall Tau scoring metric


ANOVA/Tukey
-
Kramer statistical analysis


28
/29

Conclusions


No one
-
size
-
fits
-
all


Data and ground
-
truth set the
framework


Question determines the final structure


Each discipline has a piece of the
answer


Graph
-
RAT 0.5

29
/29

Future Work


Finish exploring Social Network
Analysis significance tests


Fully explore set
-
of
-
sets evaluation
metrics


Debugging of Graph
-
RAT cross
-
validation schedulers


Ease of use improvements to Graph
-
RAT