AllerTool: a web server for predicting allergenicity and allergic cross ...


Oct 1, 2013 (4 years and 9 months ago)


Vol.23 no.4 2007,pages 504–506
Sequence analysis
AllerTool:a web server for predicting allergenicity and
allergic cross-reactivity in proteins
Zong Hong Zhang
,Judice L.Y.Koh,Guang Lan Zhang,Khar Heng Choo,
Martti T.Tammi
and Joo Chuan Tong
Institute for Infocomm Research,21 Heng Mui Keng Terrace,Singapore 119613 and
Department of
Biological Sciences,National University of Singapore,14 Science Drive 4,Singapore 117543
Received on August 21,2006;revised on November 23,2006;accepted on December 5,2006
Advance Access publication December 6,2006
Associate Editor:John Quackenbush
Summary:Assessment of potential allergenicity and patterns of
cross-reactivity is necessary whenever novel proteins are introduced
into human food chain.Current bioinformatic methods in allergology
focus mainly on the prediction of allergenic proteins,with no infor-
mation on cross-reactivity patterns among known allergens.In this
study,we present AllerTool,a web server with essential tools for the
assessment of predicted as well as published cross-reactivity pat-
terns of allergens.The analysis tools include graphical representation
of allergen cross-reactivity information;a local sequence comparison
tool that displays information of known cross-reactive allergens;a
sequence similarity search tool for assessment of cross-reactivity in
based on support vector machine (SVM).A 10-fold cross-validation
results showed that the area under the receiver operating curve
) of SVMmodels is 0.90with86.00%sensitivity (SE) at specificity
(SP) of 86.00%.
Availability:AllerTool is freely available at http://research.i2r.a-star.
Atopic allergy and other hypersensitivity reactions are major
causes of chronic ill health in affluent industrial nations,affecting
up to 25%of the general population (Mekori,1996;Nieuwenhuizen
and Lopata,2005).Allergy is caused by adverse immunological
reaction to causative agents known as allergens that are otherwise
innocuous in nature.The acute symptoms of allergy are usually due
to the release of inflammatory mediators when an allergen cross-
links immunoglobulin E (IgE) antibodies on mast cells or basophils
(Sutton and Gould,1993).This may be followed by a late-phase
reaction characterized by the influx of T-cells,eosinophils and
monocytes (Gould et al.,2003).Atopic individuals may have
one or more manifestations of the disease including asthma,con-
junctivitis,dermatitis (eczema),rhinitis (hay fever) and some
experience life-threatening severe anaphylaxis.
Methods for assessing potential allergenicity are essential
whenever new proteins are brought into contact with humans,
either through food,or other modes of exposure.The current
joint recommendation by the World Health Organization (WHO)
and Food and Agriculture Organization (FAO) is a scheme based
on a decision tree,which compares local sequence similarity of a
query protein against known allergenic proteins (FAO/WHO,
2003).Two decision criteria have been proposed for the assessment
of allergenic potential:identity of six or more contiguous amino
acids,or minimum 35% sequence similarity over a window of 80
amino acids.Several research groups,including Gendel (1998,
2002),Stadler and Stadler (2003) and Fiers et al.(2004) developed
computational tools to scan sequences that satisfy these criteria.
While these tools are useful for standardized prediction of
potential allergenicity of proteins according to the current recom-
mendations of the FAO/WHO Expert Consultation,more complex
techniques are needed as the six amino acid rule is non-specific and
the minimum of 35% sequence similarity is too stringent to find
most true allergens (Li et al.,2004;Hileman et al.,2002;Stadler
and Stadler,2003;Silvanovich et al.,2006).
More sophisticated bioinformatic tools for detecting motifs
among allergenic sequences have been recently described.Zorzet
et al.(2002) combined FASTA3 algorithm with k-Nearest-
Neighbour (kNN) classifier to assess potential food protein aller-
genicity.Soeria-Atmadja et al.(2004) extended the study on a larger
set of allergens using a combination of kNN classifier,Bayesian
linear Gaussian classifier and Bayesian quadratic Gaussian classi-
fier.Li et al.(2004) demonstrated the use of wavelet transform to
predict potential allergens.Bjo¨rklund et al.(2005) introduced the
use of allergen-representative peptides for detection of potentially
allergenic proteins.Cui et al.(2006) as well as Saha and Raghava
(2006) reported the use of support vector machine (SVM) for the
prediction of novel allergen proteins.
In this paper,we present AllerTool,a web server providing
essential tools for assessing predicted as well as published allergic
cross-reactivity patterns of clinically relevant protein allergens.
Three different programs are available for assessing the potential
allergenicity of protein sequences—(1) a sequence similarity search
tool for assessment of allergenicity in accordance to FAO/WHO
Codex alimentarius guidelines;(2) a SVM-based method for
prediction of protein cross-reactivity with little or no similarity
to known allergens and (3) a modification of BLAST that displays
cross-reactive allergens.In addition,AllerTool also provides
potential cross-reactivity information of a query sequence through
a graphical representation of the cross-reactivity network of the
similar proteins.The main purpose of AllerTool is the support of
To whom correspondence should be addressed.
 The Author 2006.Published by Oxford University Press.All rights reserved.For Permissions,please
by guest on October 1, 2013 from
by guest on October 1, 2013 from
by guest on October 1, 2013 from
molecular studies of allergens,the assessment of allergic responses
and of allergic cross-reactivity.
Allergen data were extracted from the International Union of
Immunological Societies (IUIS) Allergens website (http://www. and stored in the ALLERDB database (Zhang
et al.,manuscript in preparation;
Templar/DB/Allergen/).The dataset consists of all IUIS allergens
and isoallergens that have protein sequences available in the public
sequence databases or publication references.The dataset consists
of 373 allergens,260 isoallergens and 128 instances of reported
cross-reactivity collected from the literature and verified using a
text-mining tool ABK (Miotto et al.,2005).
Analysis tools
AllerTool and web interface are written in C/C++ and Perl and run
on a SunOS 5.9 UNIXsystemwith Apache web server.It comprises
of four integrated tools for assessing the potential allergenicity of
protein sequences—XR-BLAST,XR-Graph,ALR-SCAN and
XR-BLAST (Koh et al.,2004) is a local sequence comparison
tool based on BLAST2.2.3 (Altschul et al.,1997) that outputs
information on allergens that have reported cross-reactivity with
the individual matches.A sample output of XR-BLAST is given in
Figure 1.
XR-Graph (
en) is a visualization tool for graphical representation of allergen
cross-reactivity information.Each graph displays allergens (boxes)
that are related by reported cross-reactivity (links).This visual tool
enables user to establish possible allergen cross-reactivity relation-
ships not reported before.This tool has potential uses in the devel-
opment of novel allergy diagnostics approaches.Asample output of
XR-Graph is shown in Figure 2.
ALR-SCAN (Koh et al.,2004) is a sequence similarity search
tool that reports sequence similarity in accordance to the current
FAO/WHO recommendation for the assessment of allergenicity.
Both the six contiguous amino acids identity rule,and >35%
identity over a stretch of 80 amino acids are implemented.Users
can submit the protein of interest to ALR-SCAN,which will return
the list of matches that satisfy either of the rules.Sample query and
output is shown in Figure 3.
ALR-SVM is a useful tool for predicting protein allergenicity
based on global description of amino acid sequence using SVMas
the prediction engine (Cui et al.,2006;Fig.4).The training dataset
consists of 460 allergens and 560 non-allergens,while the testing
dataset includes 114 allergens and 140 non-allergens derived from
Fig.4.An example of ALR-SVMoutput.
Fig.2.Anexampleof XR-Graphoutput for Alng1.Reportedcross-reactivity
patterns are represented by links.Possible cross-reactivity relationships may
be inferred by missing links.
Fig.3.An example of ALR-SCAN output.
Fig.1.An example of XR-BLAST output.
505¼9343 (Bjo
et al.,2005) selected using a debiasing strategy based on sequence
similarity of protein sequences commonly found in consumed food
with no records in existing allergen databases (Saha and Raghava,
2006).The percentage of allergens represents 45% of the testing
dataset,while non-allergens represent the remaining 55%.Different
kernel functions (linear,polynomial,radial and sigmoid) were
explored to improve the prediction accuracy of the SVM models.
ALR-SVMis based on the third degree polynomial kernel function
encoded using descriptors derived from amino acid composition.
The A
value is 0.90 (
DB/AllerTool/Algorithms.html).Using amino acid composition
as input for training and testing ALR-SVM,the system can predict
allergenic proteins with sensitivity (SE) of 86.00% and specificity
(SP) of 86.00%,respectively.These values are comparable to the
SVMapproach by Saha and Raghava (2006) (SE ¼ 85.02%,SP ¼
84.00%) and allergen-representative peptides (SE ¼ 81.00%,SP ¼
90.00%;Bjorklund et al.,2005),and outperform motif-based
approach using MEME/MAST software (SE ¼ 93.94%,SP ¼
33.34%;Saha and Raghava,2006).For the predicted allergenic
sequences,a list of high-similarity IUIS allergen sequences and
reported cross-reactivity information is provided.
With the advent of genetically modified proteins in foods,
therapeutics and biopharmaceuticals (Saha and Raghava,2006),
AllerTool provides a new service for the assessment of predicted
as well as published cross-reactivity patterns of novel proteins.
ALR-SCAN is useful for assessment of allergenicity in proteins
according to the FAO/WHO Codex alimentarius guidelines.
However,concerns have been raised about the validity of the
current FAO/WHO guidelines (Li et al.,2004;Hileman et al.,
2002).Various groups including Silvanovich et al.(2006) and
Stadler and Stadler (2003) reported that short sequence searches
of six contiguous amino acids to identify allergenic proteins is a
product of chance and adds little value to allergy assessments
for newly expressed proteins.There is a need for more sophis-
ticated techniques for screening of allergenicity in proteins.
ALR-SVMhas been developed to capture non-linear characteristics
that may be encapsulated within allergenic protein sequences.
Future work will focus on the development of other supplementary
methods to support and refine the prediction of cross-reactivity
Authors are thankful to Prof.Vladimir Brusic (UQ,Australia) for
critically reading the manuscript.
Conflict of Interest:none declared.
Altschul, al.(1997) Gapped BLAST and PSI-BLAST:a new generation of
protein database search programs.Nucleic Acids Res.,25,3389–3402.
Bjorklund, al.(2005) Supervised identification of allergen-representative
peptides for in silico detection of potentially allergenic proteins.Bioinformatics,
Cui, al.(2006) Computer prediction of allergen proteins from sequence-derived
protein structural and physicochemical properties.Mol.Immunol.,In press.
FAO/WHO (2003) Codex Principles and Guidelines on Foods Derived from Biotech-
nology.Joint FAO/WHO Food Standards Programme,Rome,Italy.
Fiers, al.(2004) Allermatch,a webtool for the prediction of potential
Allergenicity according to current FAO/WHO Codex alimentarius guidelines.
BMC Bioinformatics,5,133–138.
Gendel,S.M.(1998) The use of amino acid sequence alignments to assess potential
allergenicity of proteins used in genetically modified foods.Adv.Food Nutr.Res.,
Gendel,S.M.(2002) Sequence analysis for assessing potential allergenicity.Ann.N.Y.
Hileman, al.(2002) Bioinformatic methods for Alergenicity assessment using a
comprehensive allergen database.Int.Arch.Allergy Immunol.,128,280–291.
Koh, al.BioWare:a framework for bioinformatics data retrieval,annotation
and publishing,in ACM SIGIR Workshop on Search and Discovery in Bioinfor-
matics (SIGIRBIO),Sheffield,UK,July 2004.
Li, al.(2004) Predicting allergenic proteins using wavelet transform.
Mekori,Y.A.(1996) Introduction to allergic diseases.Crit.Rev.Food Sci.Nutr.,36,
Miotto, al.(2005) Supporting the curation of biological databases with reusable
text mining.Genome Inform.,16,32–44.
Nieuwenhuizen,N.E.and Lopata,A.L.(2005) Fighting food allergy.Curr.approaches.
Saha,S.and Raghava,G.P.S.(2006) AlgPred:prediction of allergenic proteins and
mapping of IgE epitopes.Nucleic Acids Res.,34,W202–W209.
Silvanovich, al.(2006) The value of short amino acid sequence matches for
prediction of protein allergenicity.Toxicol.Sci.,90,252–258.
Soeria-Atmadja, al.(2004) Statistical evaluation of local alignment features pre-
dicting allergenicity using supervised classification algorithms.Int.Arch.Allergy
Stadler,M.B.and Stadler,B.M.(2003) Allergenicity prediction by protein sequence.
FASEB J.,17,1141–1143.
Stothard,P.(2000) The Sequence Manipulation Suite:JavaScript programs for analyz-
ing and formatting protein and DNA sequences.Biotechniques,28,1102–1104.
Sutton,B.J.and Gould,H.J.(1993) The human IgE network.Nature,366,421–428.
Zorzet, al.(2002) Prediction of food protein allergenicity:a bioinformatic learning
systems approach.In Silico Biol.,2,525–534.
Z.H.Zhang et al.