Bioinformatics Course


2 Οκτ 2013 (πριν από 3 χρόνια και 8 μήνες)

Bioinformatics Course

Sao Paulo, Brazil,
June 9
11, 2011

Answers Exercise

NCBI Entrez
, Day 1

Exercise 3: The proto
oncogene MYC

Mutations in the gene of the tumor protein MYC are associated with different types of cancer. You will
now try to gather
information related to this gene using the different databases with NCBI Entrez.

(a) Go to the website of NCBI
Entrez and find the RefSeq ID of the mRNA transcript and protein of the
human MYC gene.

Search for “MYC homo sapiens”, open Entrez gene page for

id 4609, scroll to section “NCBI
Reference Sequences (RefSeq).

NM_002467.4 (mRNA) and NP_002458.2 (protein)
, myc prot
oncogene protein

(b) On which chromosome is the human MYC gene located?

Chromosome 8

(see “Genomic context” on the Entrez gene page of

id 4609)

(c) What is the UniGene ID of the human MYC gene?


(see “Additional Links” on the Entrez gene page of id 4609)

(d) How many sequences are part of this UniGene cluster?

496 sequences
, which are composed of 14 mRNA sequences + 482 EST

(Go to UniGene page for id Hs.202453

can change when new sequences are added to the

(e) Write down the GenBank ID of one of the sequences in the cluster.


(choose one of the identifiers under “mRNA sequences”)

(f) Which other names (no IDs!) are used for the MYC gene?

Synonyms are
Myc, bHLHe39 and MYC

(see “Also known as” section of Entrez gene
page of id 4609)

(g) To understand the function of MYC better you will investigate and compare several organis
ms. Make
a list of Entrez Gene identifiers of the MYC gene for 3 different organisms.

Click on “HomoloGene” in the right side menu (Links). There you get a list of genes identified as
putative homologs.

MYC, Homo sapiens, Entrez ID: 4609

MYC, P. troglodyte
s, Entrez ID: 464393

MYC, C. lupus, Entrez ID: 403924

MYC, B. taurus, Entrez ID: 511077

Myc, M. musculus, Entrez ID: 17869

(h) Use the OMIM database to find information on the disease which is associated with the MYC gene.

Burkitt lymphoma

(Go to the Ent
rez Gene page of id 4609 and click on the link OMIM)

Exercise 4: Lung cancer

Lung cancer is a common form of cancer and has a few subtypes. The risk for getting lung cancer is
influenced by both genetic factors as well as environmental factors. A well
defined environmental factor
is smoking of tabaco. We will now explore the Entrez databases to see what they can tell us about the
genetic factors that influence lung cancer.

(a) Try to find human genes that are related to lung cancer using Entrez databas
e(s). Which genes can
you find? Give both their name, RefSeq mRNA ID and RefSeq protein ID.

A few examples:









Please note that
this list is not complete and that other genes are also associated with lung

(b) One of the genes associated with lung cancer is BRAF. Have a look at the OMIM record of BRAF. As
you can see BRAF has several allelic variations which increase the r
isk of developing cancer. Write down
two allelic variations of BRAF that are associated with lung can
cer and find per variation the PubM
identifier of the paper describing this genetic variation.

int: select “Allelic Variants” in the display options)






Naoki et al. (2002) identified a gly465
val (G4
65V) mutation in exon 1
1 of the
BRAF gene in 1
of 127 primary human lu
ng adenocarcinomas (see 211980)
screened. Based on the revised
numbering sys
tem of Kumar et al. (2003), the
GLY465VAL mutation has been renumbered as


Naoki et al. (2002) identified a leu596
arg (L596R) mutation in exon 15 of the

BRAF gene in 1
of 127 primary human lung adenocarcinomas (see 211980)

screened. Based on the revised
numbering system of Kumar et al. (2003), the

LEU596ARG mutation has been renumbered as


In a no
nsmall cell lung carcinoma, Brose et al. (2002) identified a leu596

(L596V) change
in exon 15 of the BRAF gene. Based on the revised numbering

system of Kumar et al. (2003), the
LEU596VAL mutation has been renumbered

as LEU597VAL.

(c) The BRAF ge
ne is not only associated with lung cancer but also with other types of cancer. Which
types of cancer are that?

Melanoma, Colon cancer, Non
hodgkin lymphoma

(d) OMIM uses two types of identifiers for the allelic variations. Describe these two. What are t
advantages and disadvantages of each identifier type?

OMIM has internal identifiers consisting of 6 numbers (=the id of the OMIM record), a dot and 4
numbers, for example 164757.0006. An alternative is the identifiers that consists of the gen
name and
the amino acid substitution, for example [BRAF, GLY466VAL], which means a
mutation of GLY (=glycine) to VAL (=valine) at position 466 in the BRAF gene. The advantage of
using the latter method is that the identifier has more biological meaning. A disadvant
age of this
method is that the position of the amino acid in the gene is not guaranteed in the future. If the
position changes the name of the variant will change as well.