# Assignment #9 and Mini-Midterm

AI and Robotics

Oct 15, 2013 (4 years and 8 months ago)

136 views

1

STAT 425

Modern Methods of Data Analysis

(
23 + 54
pts.)

Assignment

9

Treed Regression
,

and a Mini
-
Midterm

PROBLEM
1

PREDICTING THE AGE O
F AN ABALONE

The age of abalone is determined by cutting the shell through the cone, staining it, and
counting the number of rings through a microscope
--

a boring and time
-
consuming
task. Other measurements, which are easier to obtain, are
often times
used to predict
the
age.

Further information, such as weather patterns and location (hence food
availability) may be required to solve the problem.

Attribute Information:

Given is the attribute name, attribute type, the measurement unit and a brief
description. The num
ber of rings is the value to
predict. These data are contained in the
data frame
Abalone
.

Name / Data Type / Measurement Unit / Description

Length / continuous / mm / Longest shell measurement

length

Diameter / continuous / mm / perpendicular to
length

diam

Height / continuous / mm / with meat in shell

height

Whole weight / continuous / grams / whole abalone

whole.weight

Shucked weight / continuous / grams / weight of meat

shucked.weight

Viscera weight / continuous / grams / gut weight (
after bleeding)

visc.weight

Shell weight / continuous / grams / after being dried

shell.weight

Rings / integer /
--

/ +1.5 gives the age in years

Rings

a)

Develop

a

model (
gbm
)

to
predict the
number of rings
. Show or
explain how

you

,
J
m
, and
M
)
.
(5 pts.)

b)

Find the R
2

for your final model and estimate RMSEP using the residuals from the fit.
Plot

̂

and

̂

̂
.
Also look at partial plots for the predictors

using
gbm.plot()
command
.
Discuss all. (6

pts.)

c)

Use treed regression (
Cubist
) to build a model to predict the number of rings. Use
cross
-
validation to determine if boosting (i.e. committees > 1) helps prediction and
choose an optimal
value

for the number of committees,
M
.
Also

pl
ot

̂

and

̂

̂

for your final treed regression model. Discuss all. (6 pts.)

d)

How does
the
RMSEP for the

compare to
MARS,
RPART,
bagged RPART
, and Random Forests
?

(
6

pts.)

2

PROBLEM
2

––

PREDICTING THE
STRENGTH OF CONCRETE

(MINI
-
MIDTERM)

Given

below

are the variable name, variable type, the measurement un
it and a brief
description. To predict the
concrete compressive strength is the regression problem.

Cement
--

quantitative
--

kg
in a m
3

mixture
--

Input Variable

Blast Furnace Slag
--

quantitative
--

kg in a m
3

mixture
--

Input Variable

Fly Ash
--

quantitative
--

kg in a m
3

mixture
--

Input Variable

Water
--

quantitative
--

kg in a m
3

mixture
--

Input Variable

Superplasticizer

--

quantitative
--

kg in a m
3

mixture
--

Input Variable

Coarse Aggregate
--

quantitative
--

kg in a m
3

mixture
--

Input Variable

Fine Aggregate
--

quantitative
--

kg in a m
3

mixture
--

Input Variable

Age
--

quantitative
--

Day (1~365)
--

Input Variable

Concrete compressive strength
--

quantitative
--

MPa
--

Output Variable

(Y)

These data can be obtained from the UCI Machine Learning Repository under Concrete.

http://archive.ics.uci.edu/ml/machine
-
learning
-
databases/concrete/compressive/

Read it into Excel first and save it as comma
-
delimited (.CSV) format after shorten
ing
the variable names. Use the command below to read the dataset into R.

a)

Develop models to predict concrete compressive strength. Use the following modeling
approaches:

OLS

possibly using ACE
/AVAS to help find appropriate
transformations

Projection Pursuit

MARS

Neural networks

RPART

Bagged RPART

Random Forests

Trees

Treed Regression

Be sure to include some discussion
for each method

on how you “tuned” the fit using
that modeling approach. Be sure to use the same response for each!!!!

(36 points

4 pts. each)

b)

Identify

which of the predictors are most important on the basis of the models you fit.
Also give at least one visualizat
ion of the predictor “effects” from the

models fit in part
(a). Discuss all of this in practical terms. (6 pts.)

3

c)

Using MC cross
-
validation, decide which modeling approach would be best to use to
predict the compressive strength of concrete. Be sure th
at all MCCV functions have
been fixed to perform similarly and correctly. Also make sure that you use the same
response for each method, so the RMSEP values can be fairly compared.
Put your
results in a table and discuss. (12 pts.)