Insititude of Statistics
Title
：
Statistical Inference under the General Response Transformation
Heteroscedastic Robust Regression Model
Principal Investigator
：
Chih

Rung Chen
Sponsor
：
National Science Council
Keywords
：
Bayesian Inference, Power Transformation
, Exponential Transformation,
Aranda

Ordaz Transformation, Heteroscedasticity
When there exist heteroscedastic errors and/or departures from normality in the
data, a popular approach is to transform the response. Originally, transforming the
response was
proposed both as a means of achieving homoscedasticity and
approximate normality and for inducing a simpler linear model for the transformed
response (Box and Cox, 1964). In such situations, Box and Cox (1964) propose the
following response transformation
normal homoscedastic regression model for
modeling independent continuous data:
h(yi;λ) = f(xi;β) + εi, i = 1, …, n,
where yi is the observation for subject i, λ is a finite

dimensional transformation
parameter vector, h(
⋅
;λ) is a strictly increasing and d
ifferentiable transformation, xi is
a known covariate vector for subject i, β is a finite

dimensional regression parameter
vector, f(
⋅
;β) is a regression function, and ε is are i.i.d. N(0,σ2) errors with unknown
variance σ2 > 0.
When both heteroscedastic e
rrors and departures from normality cannot be
removed simultaneously in the data by any single transformation, the Box

Cox model
is further generalized to the following response transformation normal heteroscedastic
regression model for modeling independen
t continuous data:
h(yi;
λ) = f(xi;β) + g(f(xi;β),xi;γ) εi, i = 1, …, n,
where γ is a variance parameter vector, g(
⋅
,
⋅
;γ) is a positive weight function, and
ε is are i.i.d. N(0,1) standardized errors.
However, if the range of the response transformation is different from R
(≡(−∞,∞
)), the corresponding errors cannot be normally distributed. Commonly used
examples are the power transformations (Box and Cox, 1964), exponential
transformations (Manly, 1976), and Aranda

Ordaz transformations (Aranda and Ordaz,
1981). Moreover, the corre
sponding errors don't even have the same distributions, due
to the fact that they may have different supports.
Thus, Chen and Wang (2003) propose the following general transformation
truncated
normal heteroscedastic regression model
h(yi;λ) = f(xi;β) + g(f(xi;β),xi;γ) εi, i = 1, …, n,
where ε is are independent standardized truncated normal errors with median 0.
In this project, we shall first utilize the likelihood function proposed in Chen and
Wang (2003) and then propose the follow
ing general transformation truncated normal
heteroscedastic regression model
h(yi;λ) = f(xi;β) + g(f(xi;β),xi;γ) εi, i = 1, …, n,
where yi is the observation for subject i, λ is a finite

dimensional random
transformation parameter vector with normal (or tr
uncated normal or vague) prior
distribution, h(
⋅
;λ) is a strictly increasing and differentiable transformation, xi is a
known covariate vector for subject i, β is a finite

dimensional random regression
parameter vector with normal (or truncated normal or v
ague) prior distribution, f(
⋅
;β)
is a regression function, γ is a random variance parameter vector with inverse Wishart
(or truncated inverse Wishart or vague) prior distribution, g(
⋅
,
⋅
;γ) is a positive weight
function, and ε is are independent standardize
d truncated normal errors with median 0.
Next, we shall propose the corresponding Markov chain Monte Carlo (MCMC)
posterior estimation, hypothesis testing, credible region, and prediction, and the
corresponding finite

sample and large

sample properties for
the proposed Bayesian
regression model.
NSC96

2118

M

009

003

(96N214)


Title
：
A
S
tudy for Tolerance Interval
Principal Investigator
：
Lin

An Chen
Sponsor
：
National Science Council
Keywords
：
The tolerance interval is often used to investigate if there is
percentage of
acceptable products in a lot at some de
sired confidence. This paper shows that this
confidence, with percentage
fixed , is actually an unknown parameter and shows
the popularly used shortest version of tolerance interval be Eisenhart et al.. (1940) is
not capable to
serve as a test statistic for hypothesis assuming the unknown
confidence to be a desired constant
0
q
. A new test is shown to be more capable in
this purpose . The sample size determination based on this new test ensuring to protec
t
the manufacturer’s benefits and risks when the specification limits indicate true
confidence well , respectively , above and below
0
q
has been studied .
NSC96

2119

M

009

002

(96N215)


Title
：
Statistical
V
alidation and
I
nferences of
E
ndophenotypes
(1/2)
Principal Investigator
：
Guan

Hua Huang
Sponsor
：
National Science Council
Keywords
：
E
ndophenotype
,
G
enetic
A
nalysis
,
H
eritability
,
V
ariance
C
omponent
A
nalysis
Endophenotypes, which involve the sam
e biological pathways as diseases b
ut
presumably are closer to the
relevant gene action than diagnostic phenotypes, have
emerged as an important concept in the genetic
studies of complex diseases. In this
project, we propose to develop a formal statistical
methodology for
validating
endophenotypes. The proposed method was motivated by the conditioning strategy
used for
surrogate endpoints commonly seen in clinical research. Indices such as
proportion of heritability explained,
adjusted association and relat
ive heritability are
used as operational criteria of validation. Besides, we will
provide relevant confidence
intervals for these indices for making statistical inferences. Using these
confidence
intervals, we will construct some criteria to help us search
a useful endophenotype.
Usefulness
of the proposed methods will be demonstrated through computer
simulations.
NSC96

2118

M

009

001

MY2
(96N185)


Title
：
The Bayesian Infrernce and
W
eighted
F
requentist Inference
(2/2)
Principal Investigator
：
Hui

Nien HUNG
Sponsor
：
National Science Council
Keywords
：
In the statistical theory, the frequentist and Bayesian point of views are
different.Sometimes, for computing
purpose, the frequentist methods use Bayesian
prior as a computing tool. But, Bayesian statistician always treat those methods as
Bayesian method and criticize them from Bayesian point of view. I don’t think that is
a right way. In many statistics problem
s, form frequentist point of view, there is no
best solution (for example, UMP test not always exists). In these situations, if we can
put“right” weight on the whole or part of parameter space, then we may find the best
solution in the frequentist point of
view. There are three major points in this two years
project. The first one is: Even the weight function in frequentist point of view and the
prior function in the Bayesian point of view are similar, the “best method criterions”
in Bayesian and frequentis
t are different, the “right weight meaning ” in Bayesian and
frequentist are different, and sometime the frequentist only put weight on part of the
parameter space. Therefore, we need to think about the differences between those two
methods. The second goa
l is: The weight function or the prior function may be
improper, and we may have improper posterior. From frequentist point of view, we
think that we need to take the improper weight function to be the limit of a sequence
of proper weight functions. Unfort
unately, if we choose different sequences to
approach the same improper weight function, we may have different results.
Therefore, we will try to find a “good” parameterization of the parameter space, in
order to find a“right” sequence of weight functions
to approach the improper weight
function. The final goal is: We will try to find, form the frequentist point of view, the
“best” weight function on the whole or part of the parameter space. This is not an easy
problem; we hope that we can have some improve
ment with these two years.
NSC95

2118

M

009

004

MY2 (95R144

1)

Title
：
A Study on SPC Phase I Process Monitoring
(2/2)
Principal Investigator
：
Shiau, J.

J. H.
Sponsor
：
National Science Council
Keywords
：
Univariate Control Charts, Multivariate Control Charts, Phase I Process
Monitoring, Overall False Alarm Rate, Family

wise Erro
r Rate, False
Discovery Rate, Multiple Comparisons, Profile Monitoring
The implementation of SPC process monitoring usually consists of two phases,
Phase I and Phase II. The main task for Phase I is to detect and filter out the out

of
control data points
from the historical data so that the remaining in

control data can be
used to establish appropriate control limits for Phase II process monitoring. Phase I is
an iterative process by recalculating trial control limits each time when some data
points are cl
aimed out

of

control and assignable causes are found. However, the
effectiveness of the traditional Phase I approach may be doubtful since existing
out

of

control data points may inflate the variability estimate and hence some
out

of

control data points ma
y go undetected, which in turn affects the performance of
Phase II. To our knowledge, there is no statistical study on how effective the Phase I
approach is. In recent years, Phase I research starts evaluating Phase I methods from
the multiple comparisons
point of view. However, by controlling the overall false
alarm rate (also called family

wise error rate, FWER) of the whole phase I monitoring
to a certain level, say, 0.05, and giving a false alarm rate for each individual test by
the Bonferroni approach,
it creates the problem of very low detecting power for each
of the individual tests, even when the number of the tests in Phase I is fairly small. In
this project, we propose a detailed study on phase I performance, including finding
out the effectiveness
of the current iterative approach and how effective the new
controlling overall false alarm rate approach is. To remedy the above mentioned
problems, we propose (i) using robust methods for choosing in

control data points and
(ii) controlling FDR (false d
iscovery rate), a popular criterion in bioinformatics,
instead of the overall alarm rate for getting higher detecting power. We will study the
performance of the proposed method theoretically and/or by simulation. In the first
year of the project, we will
concentrate on commonly

used univariate control charts
such as
XR
−
charts or
XS
−
charts. In the second year, we will focus on multivariate
control charts, which is a lot more complicated than the univariate case. If time and
manpower permit, we will extend
the study further to profile monitoring.
NSC95

2118

M

009

006

MY2
(95R146

1)

Title
：
Statistical Analysis of Large Genetic Networks in Yeast
(1/2
)
Principal Investigator
：
Lu, H. H.

S.
Sponsor
：
National Science Council
Keywords
：
S
ystem
B
iology,
C
omputational
C
omplexity,
D
imension
R
eduction,
M
ulti

D
imensional
S
caling (MDS),
C
ell
C
ycle,
C
lustering,
C
lassification,
R
egistration,
D
iauxic
S
hift,
F
ermentati
on, Boolean
N
etworks, Bayesian
N
etworks,
P
rotein
I
nteraction
N
etwork
Is it possible to develop simplified models to gain deep insights for large and
complex biologic networks? This is a top challenge for system biology in the era of
post

genomic studies.
We plan to develop statistical methods for this purpose. The
large genetic networks in yeast will be used as examples.
First of all, it is crucial to reduce the computational complexity of statistical
methods for analyzing the large genetic networks. For i
nstance, we plan to develop
the improved methods with low computational complexity for dimension reduction
techniques, including multi

dimensional scaling (MDS) and related methods in
nonlinear dimension reduction. These improved methods will be applied to
the
analysis of yeast cell cycles and their genetic networks.
Secondly, it is often necessary to develop statistical methods to analyze gene
expression curves for investigating the large genetic networks. The gene expression
curves could have time shifts
that will need registration in clustering and classification.
We plan to develop statistical methods for analyzing the gene expression curves of
diauxic shift in fermentation for yeast. The network analysis by Boolean and Bayesian
networks will be used for
the follow

up analysis.
Finally, it is challenging to develop statistical methods for analyzing the
interaction patterns of large genetic networks. The interaction patterns could be
distinct and the resulting observation types in various experiment techniques will be
different. H
ence, we plan to develop statistical methods of estimation and inference
for analyzing interaction patterns in yeast protein interaction networks by integrating
databases from different experiment techniques and laboratories.
At the end of this long term p
roject, we will develop improved statistical
methods with low computational complexity for the analysis of dimension reduction,
network analysis and interaction pattern in yeast genetic networks. These methods can
be applied to study large genetic networks
in human and other species for the
investigation of system biology.
NSC96

2118

M

009

004

MY2 (96N186)

Title
：
Analysis of Instant Trend of
t
he Price of
a
Stock by Solving
t
he Model of
t
he
Markov Chains in Random Environments
Principal Investigator
：
Nan Fu Peng
Sponsor
：
National Science Council
Keywords
：
We use the model of the Markov chains in ran
dom environments
to analyze the
instant trend of the price of a stock. We assume first the price of a stock to be a
geometric Brownian motion. Conditional on the Brownian motion, the trend follows a
two

state Markov chains. Our goal is to find the finite time distributions
and the
limiting
distributions of the stochastic transition probabilities of the Markov chains.
Extensions
of this model is also explored.
NSC96

2118

M

009

002

(96N213)




Title:
Exact
C
onfidence
C
oefficients of
C
onfidence
I
ntervals for
D
iscrete
D
istributions
Principal Investigator:
Hsiuying Wang
Sponsor:
National Science Council
Keywords
:
For a confidence interval (
L
(
X
)
,U
(
X
)) of a parameter
_
in discre
te distributions, the
coverage probability is a variable function of
_
. The confidence
coefficient is the infimum of
the coverage probabilities, inf
_
P
_
(
_
2
(
L
(
X
)
,U
(
X
))).
Since we do not know which point in
the parameter space the infimum coverage probab
ility occurs at, the exact confidence
coefficients are unknown.
Beside confidence coefficients, evaluation of a confidence intervals
can be
based on average coverage probability. Usually, exact average probability
is also
unknown and it was approximated by
taking the mean of coverage
probabilities at some
random chosen points in the parameter space. In this
research, we plan to propose
methodologies for computing the exact confidence coefficients of confidence intervals for the
discrete distributions in
the
first year, and propose methodologies for computing the exact
average
coverage probabilities of confidence intervals for discrete distributions in the
second
year.
NSC95

2118

M

009

011

MY2 (95R748

1)


Title
：
Semi

P
arametric Estimation for Dependent Truncation Data (2/3)
Principal Investigator
：
Wei
J
ing Wang
Sponsor
：
National Science Council
Keywords
：
Archimedean Copula Model, Semi

P
arametric Inference, T
runcation
In many useful applications, the variable of int
erest
many be
truncated by
another random variable.
Most existing
inference
methods are derived under
the
assumption that the
truncation
variable
is
independent
of the variable of interest
.
Despite that a couple of papers have discussed t
esting quasi

indep
endence
between
the two variables, assessing the underlying dependent relationship is still an open
problem in the literature.
In this project, we assume
that the dependence
structure
follows
a
semi

parametric copula model. Objectives of statistical inference include
estimation of the marginal distribution
functions;
the truncation proportion
and
the
association parameter. The whole problem is quite challenging since all of the above
three quanti
ties are unknown and estimating
each of them under truncation is not an
eas
y
task.
Simulations will be performed to assess the validity of the estimators and
evaluate
their finite sample performance
s
. Large sample theory
of the proposed
method
will be deve
loped.
NSC95

2118

M

009

005

MY3 (95R145

1)

Comments 0
Log in to post a comment