Machine Learning - Department of Computer Science

zoomzurichΤεχνίτη Νοημοσύνη και Ρομποτική

16 Οκτ 2013 (πριν από 4 χρόνια και 22 μέρες)

78 εμφανίσεις

Machine Learning

Usman

Roshan

Dept. of Computer Science

NJIT

What is Machine Learning?


“Machine learning is programming computers
to optimize a performance criterion using
example data or past experience.” Intro to
Machine Learning,
Alpaydin
, 2010


Examples:


Facial recognition


Digit recognition


Molecular classification



A little history


1946: First computer called ENIAC to perform numerical
computations


1950: Alan Turing proposes the Turing test.
Can machines
think
?


1952: First game playing program for checkers by Arthur
Samuel at IBM. Knowledge based systems such as ELIZA
and MYCIN.


1957: Perceptron developed by Frank
Roseblatt
. Can be
combined to form a neural network.


Early 1990’s: Statistical learning theory. Emphasize learning
from data instead of rule
-
based inference.


Current status: Used widely in industry, combination of
various approaches but data
-
driven is prevalent.




Example up
-
close


Problem: Recognize images representing digits
0 through 9


Input: High dimensional vectors representing
images


Output: 0 through 9 indicating the digit the
image represents


Learning: Build a model from “training data”


Predict “test data” with model


Data model


We assume that the data is represented by a set of
vectors each of fixed dimensionality.


Vector: a set of ordered numbers


We may refer to each vector as a
datapoint

and each
dimension as a feature


Example:


A bank wishes to classify humans as risky or safe for loan


Each human is a
datapoint

and represented by a vector


Features may be age, income,
mortage
/rent, education,
family, current loans, and so on

Machine learning datasets


NIPS 2003 feature selection contest


mldata.org


UCI machine learning repository

Machine Learning techniques we will
learn in the course

Bayesian classification

Univariate and multivariate

Linear regression

Maximum likelihood estimation

Naïve
-
Bayes

Feature selection

Dimensionality reduction

PCA

Clustering

Nearest neighbor

Decision trees and random forests

Linear discrimination

Logistic regression

Support vector machines

Kernel methods

Regularized risk minimization

Hidden Markov models

Graphical models

Perceptron and neural networks



In practice


Combination of various methods


Parameter tuning


Error trade
-
off
vs

model complexity


Data pre
-
processing


Normalization


Standardization


Feature selection


Discarding noisy features

Background


Basic linear algebra and probability


Vectors


Dot products


Eigenvector and eigenvalue


See Appendix of textbook for probability
background


Mean


Variance


Gaussian/Normal distribution

Assignments


Implementation of basic classification
algorithms with Perl and Python


Nearest Means


Naïve Bayes


K nearest neighbor


Cross validation scripts


Experiment with various algorithms on
assigned datasets

Project


Experiment with NIPS 2003 feature selection
datasets


Goal: achieve highest possible prediction accuracy
with scripts we will develop through the course


Predict labels of given datasets with two
different classifiers

Exams


One exam in the mid semester


Final exam


What to expect on the exams:


Basic conceptual understanding of machine
learning techniques


Be able to apply techniques to simple datasets


Basic runtime and memory requirements


Simple modifications

Grade breakdown


Assignments and project worth 50%


Exams worth 50%