Regression and
Machine Learning
Bianca Cung
Justin Hsueh
Levon Kolesnikov
Khang Lu
Least Squares
Linear Regression
An Introduction to Regression
What is Regression?
Type of Data Mining
Recall from Lecture 21: Data Mining is the
analysis of large amounts of data in order to
discover meaningful patterns
Regression models and analyzes the
correlation of several variables
Notably models to an XY graph
Ex: Linear Regression models a linear graph
Predictions
Minimizing Errors
To make a better
model, we minimize
the errors
An error is considered
the distance between
the actual data and
the model data
Variance and the Best Fit Line
Variance
: is a measure of how far a set of
numbers are spread out from each other
Regression Line is drawn so that there is minimal
amount of error in predictions for all of the
already known values
Whether variance is large or small, as long as
the total error is optimized to its minimum, then
line is best fit
Variance Affecting Predictions
Correlation and Causation
Remember: Regression is a model of correlation
between two or more variables
Correlation does not imply causation
Correlation: Shows a connection between 2 or
more things
Causation: Second event only arises if first event
occurs
Examples:
Correlation: Getting an A on the final correlates with
an A in the class
Causation: Getting a 90% or higher on a test causes
a grade of A on the test
Assumptions
Data represents whole population
Error is a random variable
Constant variance
(Linear independence)
(Uncorrelated errors)
Other TypesB
What if your data isn’t a straight lineB?
Logarithmic Regression
Y = a + b (ln x)
Quadratic Regression
Y = a*x
2
+b*x+c
Power Regression
Y = a*x
b
Exponential Regression
Y = a*b
x
Try Other TransformationsB
Divide by X
An exampleB
Transform the Y
Transform both Y and X
Y = a*exp(bx) is equivalent toB
ln Y = ln (a) + bx
Multiple Regression
Mile Time Gender Height
(inches)
Weight
(lbs)
Age GPA
10 1 62.2 120 20 3.3
11 0 64.5 166 21 2.8
7 1 70.1 132 18 4.0
8 0 75.0 133 23 1.6
14 0 58.9 121 19 3.7
10 0 68.8 100 25 3.5
Multiple Regression
Y = β
0
+ β
1
(gender) + β
2
(height) + β
3
(weight) +
β
4
(age) + β
5
(gpa)
3+ variables and 1000+ observations
Use a computer!
As before, you can transform these variables if
the model does not fit
By cautious of independence of variables
Machine Learning
An application of Regression
What is Machine learning?
Part of artificial intelligence
Creating algorithms allowing computers to evolve
behaviors based on empirical data
Regression
Automatically learn and recognize complex
patterns and make decisions
Difficulty is that the set of possible behaviors can
be to large
Supervised learning Regression Problem
Autonomous Driving
Learning algorithmgradient descent
Digitizes the road ahead and records the
person’s steering directions
Once learnedB
digitizes the road
feeds the image to its neural networks.
Measure each steering direction’s confidence
Alvinsystem of artificial
neural networks
Gradient Descent
Similar to Directed Random
Search
Find an optimal hypothesis
function
Pick a random point on the graph
and go down in the direction that
gives most downward descent.
Repeat until local minimum
reached
Update parameters
Continue until the hypothesis
function converges
This function has the least overall
error
Unsupervised Learning
Tries to find structure in unlabeled data
Clustering
Cocktail party Problem
An unsupervised learning algorithm can separate the two sources of
sound
In Matlab, could be done in one line
[W,s,v]=svd((repmat(sum(x.*x,1),size(x,1),1).*x)*x’)
Checkers Program
Program plays checkers with itself
Sees which positions leads to a win
How to Make $$$ with
Machine Learning
Sentiment Analysis
Sentiment Analysis
Usually used in texts
(type of text classification)
Popular Machine Learning application
May be used in conjunction with speech
recognition as well
Relevance
Social Media
Twitter, facebook, youtube, yelp, etc.
Track ad campaigns
Pattern Analysis
Machine doesn’t “understand” text
Must read random strings
Favorable Reviews
The word “sweet” – favorable 46,
unfavorable 22
Repeat for other words “pleasant” (156)
Add values up – Naïve Bayes Classifier
Location of words
Algorithm not perfect – “bag of words”
Modifiers – “not”
Other difficulties
This ___ makes ____ look like a great
_____
Some sentences are difficult even for
humans
“good ___ but ____”
Process
Training data
Use training data to create model
Test model on new data
See accuracy of model
Refine model
Improvements?
Sets of words
“this will blow your mind”
Group identical words together to improve
“goodness” value (enjoy + enjoyed)
Eliminate useless words such as “the”
Attach words to other words (adjectives to
nouns, etc)
Where is all this useful?
Business
Economics, Engineering, Marketing, Computer
Science
Physical Sciences
Physics, Astronomy, Chemistry
Health & Medicine
Genetics, Clinical Trials, Epidemiology,
Pharmacology
Government
Census, Law, National Defense
Environment
Agriculture, Ecology, Forestry, Animal Populations
Some fun resourcesB
Sheather, Simon – “Modern Approach To Regression W/
R”
Exponential Regression Applet
http://science.kennesaw.edu/~plaval/applets/ERegression.html
Your Graphing Calculator
“R”
Support Vector Machine
ftp://ftp.cs.wisc.edu/mathprog/talks/csnatt.ppt
University of Illinois at Chicago
http://www.cs.uic.edu/~liub/FBS/sentimentanalysis.html
Enter the password to open this PDF file:
File name:

File size:

Title:

Author:

Subject:

Keywords:

Creation Date:

Modification Date:

Creator:

PDF Producer:

PDF Version:

Page Count:

Preparing document for printing…
0%
Comments 0
Log in to post a comment