Detailed q2/Q2 results for 100 bootstraps for final runs with (38 + dummy features)

cartcletchΤεχνίτη Νοημοσύνη και Ρομποτική

19 Οκτ 2013 (πριν από 3 χρόνια και 10 μήνες)

70 εμφανίσεις

Detailed q2/Q2 results

for 100 bootstraps for

final runs with

(38 + dummy features)

Ordered by correlation coefficient

Example last pass feature

Selection from 80 sensitivity

bags

Random dummy variable

Bagged relative sensitivity from

80 bootstraps for the random

dummy variable: descriptors with

lower sensitivities will be

eliminated in the next iteration

Descriptors that will be

eliminated in the next

iteration

STRIPMINER OPERATION MODE

Mode #6: feature selection


with sensitivity analysis

(~ 1000 neural nets)

(Q2 = 0.46, all molecules)

Bootstraps with sensitivity

analysis with a dummy var

for descriptor selection

(480


39 descriptors)

Mode #0: train neural nets

300 bootstrap ANNs

(300 neural nets trained)

Ensemble bagging for

selected descriptors

Note: all ANNs 39x13x11x1


trained to error of 0.12


11 pats in validation set

Mode #4: predict for test set


using bagging weights

(100x30/300 bags)

(3000 ANNs in user mode)


Bag prediction on test set

Note: ensemble results weighted


by Q2 calculated in Mode#0

Stripminer Neural Network

Sensitivity Analysis

With Dummy Feature

REPEAT

REPEAT 100x

Do neural network bootstrap

And calculate Q2 for validation set

There is one random dummy feature

There is a validation set for bagging

Prepare file for sensitivity analysis


(can be up to 30 MB)

Run neural net in user mode for


sensitivity analysis


MetaNeural

Calculate sensitivity results for 13


levels and tally results in sen#.txt

SENSIT

Reduce features by dropping feats

with lower sensitivity than dummy

CONTINUE

Bag sensitivities

TEST

(repeat until the dummy variable is the least

sensitive feature)

Bagging and

feature selection







Molecular

weight

H
-
bonding

Boiling Point

Hydrofobicity

Biological response

Electrostatic

interactions





w
11

w
34





w
23

w
11

h

h

Neural Network

Molecular
Descriptor

Observable

Projection

Neural Network Sensitivity Analysis

RENSSELAER



Keep all inputs frozen at median values



Turn one input at a time from 0 to 1



Monitor vaqiation in outputs



Outputs with largest variation are most


sensitive


more important

Correleation biased Sum of 30 best correlated variables

seems to have spurious correlation