STATA_BC_PLINK.RJLA.NOV2007

underlingbuddhaBiotechnology

Oct 2, 2013 (4 years and 3 months ago)

112 views


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BIOSTATISTIC/BIOINFORMATIC TOOLS FOR GENETICS
DATA:

DATA MANAGEMENT AND ANALYSIS


RICHARD ANNEY

NEUROPSYCHIATRIC GENETICS RESEARCH GROUP

WORKSHEET, TUTORIALS AND SLIDES AVAILABLE ON

P:
\
Personal Folders
\
anneyr
\
stata9
\
talk

http://www.medicine.tcd.ie/psychiatry/research/neuropsychiatry/


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

Overview


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

STATA9


A STATISTICAL SOFTWARE PACKAGE


LESS PRETTY THAN SPSS GUI


POWERFUL AND “SCRIPT” FRIENDLY


LESS CLICKING AND DROP
-
DOWN …MORE SCRIPTING


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

STATA9: SET UP FOLDER STRUCTURE


SET UP FOLDERS TO STORE
YOUR;


DO
-
FILES


CR FILE


AN FILE


DTA
-
FILES


LOG
-
FILES


INPUT
-
FILES (TXT)


OUTPUT
-
FILES


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


HOW DO I GET FILES INTO STATA?


HOW DO I MERGE MY DATA WITH ANOTHER FILE?


CAN I GENERATE A FEW BASIC STATISTICS ON MY
MARKERS?


CAN I PERFORM A CASE
-
CONTROL STUDY?


IS MY QUANTITATIVE VARIABLE ASSOCIATED WITH A
GENOTYPE?


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

STATA9: LOOK AT ME!! MAIN WINDOW


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

STATA9: LOOK AT ME!! DO
-
WINDOW


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

STATA9: LOOK AT ME!! MAIN WINDOW


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

STATA9: LOOK AT ME!! DTA
-
EDITOR WINDOW


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


cr00 genotype_qtlsnp.do

1.
ADDING TAB
-
TEXT FILES TO STATA
USING THE
INSHEET

COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT

COMMAND AND SAVE AS *.DTA
FILES USING THE
SAVE

COMMAND

2.
CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE
GENERATE

AND
REPLACE

COMMAND

3.
MERGING USING THE KEY VARIABLE
USING THE
MERGE

COMMAND

4.
TABULATING THE MERGE USING THE
TABULATE

COMMAND AND ORDER
VARIABLES USING THE
ORDER

VARIABLE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


cr00 genotype_qtlsnp.do

1.
ADDING TAB
-
TEXT FILES TO STATA
USING THE
INSHEET

COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT

COMMAND AND SAVE AS *.DTA
FILES USING THE
SAVE

COMMAND


2.
CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND

3.
MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND

4.
TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


cr00 genotype_qtlsnp.do

1.
ADDING TAB
-
TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND

2.
CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE
GENERATE

AND
REPLACE

COMMAND

3.
MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND

4.
TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


cr00 genotype_qtlsnp.do

1.
ADDING TAB
-
TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND

2.
CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND

3.
MERGING USING THE KEY VARIABLE
USING THE
MERGE

COMMAND

4.
TABULATING THE MERGE USING THE
TABULATE COMMAND AND ORDER
VARIABLES USING THE ORDER
VARIABLE



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


THE COMBINED *.DTA FILE


THE TABULATE FUNCTION


1= ONLY IN 1
st

FILE


2=ONLY IN 2
nd

FILE


3=IN BOTH 1
st

& 2
nd

FILE



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


cr00 genotype_qtlsnp.do

1.
ADDING TAB
-
TEXT FILES TO STATA
USING THE INSHEET COMMAND,
SORTING THE KEY VARIABLE USING THE
SORT COMMAND AND SAVE AS *.DTA
FILES USING THE SAVE COMMAND

2.
CONVERTING “STRINGS” TO NUMBER
VARIABLES USING THE GENERATE AND
REPLACE COMMAND

3.
MERGING USING THE KEY VARIABLE
USING THE MERGE COMMAND

4.
TABULATING THE MERGE USING THE
TABULATE

COMMAND AND ORDER
VARIABLES USING THE
ORDER

VARIABLE



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


an00 genotype_qtlsnp.do


CREATING THE LOG FILE USING
THE
LOG

COMMAND


OPENING THE *.DTA FILE USING
THE
USE

COMMAND


CREATING GENOTYPE
VARIABLES FROM ALLELE
VARIABLES USING
GTYPE

PROTOCOL


TABULATE THE GENOTYPE
VARIABLES USING THE
TABULATE

COMMAND


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY

1.
TEST HWE USING
GTAB

COMMAND

2.
TEST HWE USING
GENHW

COMMAND


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY

1.
TEST PAIR
-
WISE LINKAGE
DISEQUILIBRIUM USING
PWLD

COMMAND

2.
TEST ASSOCIATION WITH
BINARY TRAIT USING
GENCC

COMMAND


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


QTLSNP COMMAND MODELS


CODOMINANT (THREE MODELS)


DOMINANT


RECESSIVE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY

1.
TEST WHETHER A
QUANTITATIVE VARIABLE IS
ASSOCIATED WITH
DIFFERENT INHERITENCE
MODELS USING
QTLSNP

COMMAND
-

CODOMINANT


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY

1.
TEST WHETHER A
QUANTITATIVE VARIABLE
IS ASSOCIATED WITH
DIFFERENT INHERITENCE
MODELS USING
QTLSNP

COMMAND


DOMINANT

2.
NOT ASSOCIATED SO
MINIMAL OUTPUT


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY

1.
TEST WHETHER A
QUANTITATIVE VARIABLE
IS ASSOCIATED WITH
DIFFERENT INHERITENCE
MODELS USING
QTLSNP

COMMAND
-

RECESSIVE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PROBLEM 1:

BASIC CASE
-
CONTROL ASSOCIATION STUDY


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax©


DATABASE AND ANALYSIS PLATFORM


MASTER DATABASE FOR STORING ALL OUR
“MASTER” GENETIC AND PHENOTYPE DATASETS


ONGOING PROCESS TO UPLOAD AND MANAGE DATA


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Structure


FIVE DOMAINS;

1.
GENOTYPES/SNPS

2.
MAPS

3.
PEDIGREES

4.
AFFECTION

5.
PHENOTYPES



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Structure


FIVE DOMAINS;

1.
GENOTYPES/SNPS

2.
MAPS

3.
PEDIGREES

4.
AFFECTION

5.
PHENOTYPES



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Structure


FIVE DOMAINS;

1.
GENOTYPES/SNPS

2.
MAPS

3.
PEDIGREES

4.
AFFECTION

5.
PHENOTYPES



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Structure


FIVE DOMAINS;

1.
GENOTYPES/SNPS

2.
MAPS

3.
PEDIGREES

4.
AFFECTION

5.
PHENOTYPES



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Structure


FIVE DOMAINS;

1.
GENOTYPES/SNPS

2.
MAPS

3.
PEDIGREES

4.
AFFECTION

5.
PHENOTYPES



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Structure


FIVE DOMAINS;

1.
GENOTYPES/SNPS

2.
MAPS

3.
PEDIGREES

4.
AFFECTION

5.
PHENOTYPES



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA)


TWO EXAMPLES

1.
BASIC EXCEL FILE

2.
TAQ
-
MAN FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA):

BASIC EXCEL FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN PED AFF
-
FILE (VIA STATA):

BASIC EXCEL FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA):

BASIC EXCEL FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA):

BASIC EXCEL FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA):

BASIC EXCEL FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA):


TAQ
-
MAN FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA):


TAQ
-
MAN FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

FROM OUTPUT TO GEN
-
FILE (VIA STATA):


TAQ
-
MAN FILE


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Types of Analysis


QUALITY


PED
-
CHECK


MERLIN


BASIC MEASURES (MAF, HWE, CALL)


FAMILY
-
BASED


MENDEL


MERLIN


GENEHUNTER


SIMWALK


FBAT/PBAT


TRANSMIT


QTDT


PLINK


HAPLOVIEW


R
-
PACKAGE


CASE
-
CONTROL


ALLELE ASSOCIATION


MENDEL


PHASE


SNPHAP


PLINK


R
-
PACKAGE




:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax: Types of Analysis


FOR MOST ANALYSIS YOU NEED TO SELECT MATCHED


GEN


PED


MAP


b128 NOW UPLOADED


AFF



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

BC|SNPmax


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK… GETTING STARTED


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK…


RUNNING PLINK FROM YOUR OWN COMPUTER


WHY?

1.
MULTIPLE ANALYSES

2.
KEEP A RECORD OF YOUR WORK IN BAT AND SCRPT

3.
EASE OF USE

4.
EASE OF REPEATING TASK

5.
SCRIPTS NOT DROP DOWN MENUS

6.
RUNNING >1 CHROMOSOME (BC|SNPmax ADDRESSED)

7.
POST
-
ANALYSIS INTERGRATION USING PERL AND
STATA


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK…


FOLDER STRUCTURE


ANALYSIS


DATASET


OUTPUT




:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK… DATASETS


PED

& MAP


BINARY FILES


BINARY PED (BED)


BINARY MAP (BIM)


FAMILY FILES (FAM)



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK…


PED &
MAP


BINARY FILES


BINARY PED (BED)


BINARY MAP (BIM)


FAMILY FILES (FAM)



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK…


PED & MAP


BINARY FILES


BINARY PED (
BED
)


BINARY MAP (BIM)


FAMILY FILES (FAM)



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK…


PED & MAP


BINARY FILES


BINARY PED (BED)


BINARY MAP (
BIM
)


FAMILY FILES (FAM)



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK…


PED & MAP


BINARY FILES


BINARY PED (BED)


BINARY MAP (BIM)


FAMILY FILES (
FAM
)



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

EXAMPLE ANALYSES IN PLINK…


DATA TRANSFORMATION


DATA FILTERING AND PRUNING


DATA MERGING


SUMMARY STATS


MISSINGNESS


HWE


MAF


MENDEL ERRORS


INCLUSION THRESHOLDS


POPULATION STRATIFICATION


ASSOCIATION


CASE/CONTROL


QTL


GxE


NEW

MULTIPLE CORRECTION TESTING (
--
adjust)


FAMILY
-
BASED


TDT


POO


PERMUTATION


EPISTASIS


HAPLOTYPE ANALYSIS


NEW

PROXY
-
ASSOCIATION (FROM SNP TO
HAPLOTYPE)


R
-
PACKAGE


NEW

MODIFY OUTPUT


PLOG10


P<x


GENOMIC CONTROL


QQ
-
PLOT


:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK… : RUNNING TDT IN PLINK


CAN RUN FROM COMMAND LINE
AND USING gPLINK (GUI)


RECOMMEND BAT AND SCRPT FILES




:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

PLINK… : SUMMARY TABLES IN STATA


INSHEET THE TDT.CLEAN FILE


ADD GENE NAMES


ADD CHROMOSOME POSITION


ADJUST OR TO RISK


GENERATE GRAPHS OF DATA


GENERATE TABLES BY GENE


GENERATE TABLES BY POSITION


GENERATE TABLES BY P
-
VALUE


SELECT COLUMNS FOR OTHER
ANALYSES (GENMAPP)



:NEUROPSYCHIATRIC GENETICS


[BIOSTATISTICS|BIOINFORMATICS] CORE

THE END!