Region-base Genome-wide Association Study (RGWAS) Perl script User Guide

helmetpastoralSoftware and s/w Development

Dec 13, 2013 (4 years and 19 days ago)

95 views


Region
-
base Genome
-
wide Association Study (RGWAS) Perl script User Guide


System requirement: Plink installed, Perl installed

(If you have already installed Plink and Perl and also added them to the system path, please ignore the
following part.)



Plink

home page:


http://pngu.mgh.harvard.edu/~purcell/plink/

Plink download page:

http://pngu.mgh.harvard.edu/~purcell/plink/d
ownload.shtml

Please download the proper zip file for your platform and unzip it to a proper folder.



*
If you are using
Windows
, please add the Plink path to the default system environment path
as follows:

-

Right click "My computer" icon, select "proper
ties" at the bottom.

-

In the "system property" window, select the tab "advanced" and left click the "Environment
Variables" button.

-

In the bottom window, select the column whose "variable" is "path", left click "Edit".

-

Go to the folder where Plink.exe is. C
opy the path from the address bar.

-

Go back to the "Edit System Variable" window, click the text window after "variable value"

and type
semicolon ";" at the and of the text, then paste the copied path after ";"

Example:

If plink.exe is in the path: C:
\
prog
ram files
\
plink
\
plink.exe,

then the copied path
should be "C:
\
program files
\
plink".

Then if the text in the "variable value" is "C:
\
program
files
\
java",

the text after edit should be "C:
\
program files
\
java;C:
\
program files
\
plink".

-

Click "OK" for all the wi
ndows and close all the commend line windows.


Active Perl

download page:
http://www.activestate.com/activeperl/

(
Linux

has
Perl

installed automatically and ignore the following part
)

-

Go to the left d
own part of the page and click the icon "Active perl download now" for windows.

-

Execute the downloaded file.

-

Also add the path for perl.exe to the system path as Plink.






Input file format:


Linkage format:

normal .ped file and .map file, which Plink n
ormally uses.


Binary linkage format: .bim, .bed, .fam file, binary format
, which

save
s

memory for large data set.


Details for the file format, please refer to Plink page: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml


Output file format:


The out
put data is listed by regions according to their evaluation
P
.


The information
is

listed by following order:



Region evaluation
P

(consider both region
association

P

and genotyping error
P
)



Region association
P



Genotyping error probability



Chromos
ome (1
-
22, excluding sex chromosome and mitochondrion chromosome)



Start position of the region (by kb)



End position of the region (by kb, the
window size

is decided by user, default is 400kb)

(
The following data is the information of the independent SN
Ps in the
region
.

T
hey are listed according
to their physical position.
)



SNP ID

(rsxxxxxx)



SNP physical position



Distance from last SNP in same region



Conditional P


(region conditional logistic P value of the SNP )



Original p value
(single SNP
association

test p value)



Original rank for the SNP



Odds r
atio

(
values below 1 indicates that the minor allele is the protective allele for the disease.
)



F_A:


frequency of minor allele (MAF) in affected samples.



F_U:


frequency of minor allele (MA
F) in unaffected samples.



L95:

Lower bound of 95% confidence interval for odds ratio
.



U95:

Upper bound of 95% con
fidence interval for odds ratio
.



Command

in Use
:



--
help



-
h


help


--
file



-
f


normal Linkage file (.ped and .map file)



--
bfile


-
b


binary Linkage file (.bed, .bim and .fam file )


--
out



-
o


output file name (default : default.txt)


--
pmin


-
p


p value
threshold

for candidate SNP (default 0.01)

(
Simple association test is performed for the whole SNP da
taset (
--
assoc
command in Plink) and only the SNPs whose association P values reach a
certain threshold (0.01 by default) are selected for further test.
)


--
pind


-
i


p value
threshold

for independent SNP (default: 0.05)


--
maxsnp


-
m



m
axim
um number of independent SNPs in the same region





(
Default

3)



--
window


-
w
window size (by kb
, default 400kb
)


--
genoerror

-
g


"no" for not consider relative genotype error
.



(
Default is consider genotype erro
r

probability)

Test


The sample test file is in binary format: test.fam, test.bed, test.bim


The sample output file is Sample_output.txt


Go to the path where you put the rgwas.pl and these files in command line and type:



perl rgwas.pl

b test

The window

will continue jumping while running and please be patient for several minutes.
The default output file name is output.txt