NPEBseq_helpx

richessewoozyBiotechnology

Oct 1, 2013 (3 years and 10 months ago)

146 views

NPEBseq: A method for DE analysis of RNA
-
seq count data


Yingtao Bi

February

8
, 201
3



1 Introduction

This vignette describes how to use NPEBseq

to evaluate differential expression using count data. This
package is currently in beta evaluation form, and subject
s

to continual improvement in the interface.


2 Install the packages

Download the source and
install
all three packages.


Availability
:
An

R

package containing examples and sample datasets is available at
http://bioinformatics.wistar.upenn.edu/NPEBseq


Install.packages() ###the path to the package

e.g.
install.packages("/home/yingtao/R/x86_64
-
unknown
-
linux
-
gnu
-
library/2.14/NPEBseq_1.0.tar.gz"
,

repos = NULL, type="source"
)

3
Preparations


NPEBseq requires a count data matrix.


4

NPEBseq: Example from the real data



4.1 Read the data into R


First, we read an example dataset and annotation files into R

Library(NPEBseq)

## the other two packages will be loaded automatically

data(genetable2)

## gene level data from R package DEXSeq

data(BrookExon
2
)

## exon level data from R package DEXSeq

data(egindex2) ## the mapping between genes and exons


>head(genetable2)

## there are three treated samples and four untreated samples


treated1fb treated2fb treated3fb untreated1fb untreated2fb untreated3fb untreated4fb

FBgn0036145 31

11 11 53 70 19 36

FBgn0036186 156 147 89 149 179 95 81

FBgn0036191 743 766 615 868 1015 594

593

FBgn0036192 57 31 29 38 76 26 24

FBgn0036196 165 182 135 163 286 200 224

FBgn0036294 3 2 1

12 13 4 9

4.2

Construct the input matrix



G1data<
-
genetable
2
[,1:3] ## treated samples

G2data<
-
genetable
2
[,4:7] # untreated samples

maxid1<
-
which.max(colSums(G1data)
)


maxid2<
-
which.max(colSums(G2data)
)


E1data<
-
BrookExon2[,1:3] ## treated samples

E2data<
-
BrookExon2 [,4:7] # untreated samples

maxid1<
-
which.max(colSums(E1data))

maxid2<
-
which.max(colSums(E2data))


4.3

Compute the prior distributions


## this part is relatively computationally intensive

##
both

Q1 and Q2 are included in the

genetable2
” dataset

, so you
can skip this step
..

Q1<
-
compu_prior(
G1data
[,maxid1],maxiter=3000,grid.length=1000)

## compute the prior distribution
of gene for the treated condition


Q2<
-
compu_prior(G2data[,maxid2],maxiter
=3000,grid.length=1000)

## compute the prior distribution
of gene for the untreated condition


head(Q1)


head(Q2)


E1<
-
compu_prior(E1data[,maxid1],maxiter=3000,grid.length=1000)

## compute the prior distribution of
exon for the treated condition

E2<
-
compu_
prior(E2data[,maxid2],maxiter=3000,grid.length=1000)

#### compute the prior distribution
of exon for the untreated condition



4.4

Find the DE genes



resg<
-
NPEBSeq_biordf(G1data,G2data,Q1,Q2)

## find DE genes


>head(resg)

## the output data.frame



ID

lgfc



pvalues

FDR

FBgn0024288 FBgn0024288 3.057526 2.220446e
-
16 2.678746e
-
14

FBgn0039827 FBgn0039827 2.746745 0.000000e+00 0.000000e+00

FBgn0031923 FBgn0031923

2.451836 3.022804e
-
12 2.762660e
-
10

FBgn0038082 FBgn0038082 2.451836 3.022804e
-
12 2.762660e
-
10

FBgn0085391 FBgn0085391 2.451836 3.022582e
-
12 2.762660e
-
10

FBgn0034434 FBgn0034434 2.434671 1.110223e
-
16 1.762333e
-
14




4.5

Find the differential usage of exon
s


resExons<
-
NPEBSeq_exon(G1data,G2data,E1data,E2data,Q1,Q2,E1,E2,egindex
2
)

## find differential
usage of exons


>head(resExons)

## the output data.frame


ID


pvalues FDR

FBgn0053464+FBgn0003425:005 FBgn0053464+FBgn0003425:005 0 0

FBgn0261451:039 FBgn0261451:039 0 0

FBgn0039566:004 FBgn0039566:004 0 0

FBgn0261451:028

FBgn0261451:028 0 0

FBgn0011747:001 FBgn0011747:001 0 0

FBgn0013733:005 FBgn0013733:005 0 0


sum(resExons$FDR<0.1)

## the number of exons with FDR less than 10%