Read, analyze, and visualize genomic and proteomic data
provides algorithms and visualization techniques for Next Generation Sequencing
(NGS), microarray analysis, mass spectrometry, and gene ontology. Using toolbox functions, you can read
genomic and proteomic data from standard file formats such as SAM, FASTA, CEL, and CDF, as well as from
online databases such as the NCBI Gene Expression Omnibus and GenBank
. You can explore and visualize this
data with sequence browsers, spatial heatmaps, and clustergrams. The toolbox also provides statistical techniques
for detecting peaks, imputing values for missing data, and selecting features.
You can combine toolbox functions to support common bioinformatics workflows. You can use ChIP-Seq data to
identify transcription factors; analyze RNA-Seq data to identify differentially expressed genes; identify copy
number variants and SNPs in microarray data; and classify protein profiles using mass spectrometry data.
Learn more about
Next Generation Sequencing analysis and browser
Sequence analysis and visualization, including pairwise and multiple sequence alignment and peak detection
Microarray data analysis, including reading, filtering, normalizing, and visualization
Mass spectrometry analysis, including preprocessing, classification, and marker identification
Phylogenetic tree analysis
Graph theory functions, including interaction maps, hierarchy plots, and pathways
Data import from genomic, proteomic, and gene expression files, including SAM, FASTA, CEL, and CDF,
and from databases such as NCBI and GenBank
What additional features would you like in Bioinformatics Toolbox?
NGS browser (top), circular DNA map (bottom), and secondary structure of RNA sequence (left). Bioinformatics Toolbox
includes a variety of tools for visualizing sequence data.
Next Generation Sequencing Analysis
Bioinformatics Toolbox provides algorithms and visualization techniques for Next Generation Sequencing
analysis. The toolbox enables you to analyze whole genomes while performing calculations at a base pair level of
resolution. You can use the NGS browser to visualize and investigate short-read alignments using either
single-end or paired-end short reads. You can also build custom analysis routines, as shown in the following
Exploring Protein-DNA Binding Sites from Paired-End ChIP-Seq Data
Perform a genome-wide analysis of a transcription factor in the
Identifying Differentially Expressed Genes from RNA-Seq Data
Load RNA-seq data and tests for differential expression using a statistical model.
Visualizing and Investigating Short-Read Alignment
Using the NGS browser, you can verify and investigate the alignment of short-read sequences in support of
analyses that measure genetic variation and gene expression. The NGS browser lets you:
Visualize short-read data aligned to a nucleotide reference sequence
Compare multiple data sets aligned against a common reference sequence
View coverage of different bases and regions of the reference sequence
Investigate quality and other details of aligned reads
Identify mismatches due to base-calling errors or polymorphisms
Visualize insertions and deletions
Retrieve feature annotations relative to a specific region of the reference sequence
NGS browser, showing single nucleotide polymorphisms (SNPs) in bold. You can display multiple tracks of data, examine peaks,
identify insertions and deletions, and inspect read quality.
Custom plot mapping E-box motifs to peaks in a wavelet denoised signal.
Storing and Managing Short-Read Sequence Data
The data sets used in Next Generation Sequencing analysis are often too large to fit into physical memory.
Bioinformatics Toolbox provides specialized data containers that enable you to analyze entire genomes.
object lets you access the contents of text files containing nonuniform-sized entries such
as sequences, annotations, and cross references to the data set. You can generate these objects from tables, flat
files, or application-specific formats such as SAM, FASTA, and FASTQ.
class stores information from short-read sequences, including sequence headers, read sequences,
quality scores, and data about alignment and mapping to a single reference sequence. You can use object
properties and methods to explore, access, filter, and manipulate the data contained in a
Microarray Data Analysis and Visualization
Bioinformatics Toolbox enables you to analyze and comprehend raw microarray data.
You can use several methods for normalizing microarray data, including lowess, global mean, median absolute
deviation (MAD), and quantile normalization. You can apply these methods to the entire microarray chip or to
specific regions or blocks. Filtering and imputation functions let you clean raw data before running analysis and
Data Analysis and Visualization
Bioinformatics Toolbox lets you perform background adjustments and calculate gene (probe set) expression
values from Affymetrix
microarray probe-level data using Robust Multi-Array Average (RMA) and GC Robust
Multi-Array Average (GCRMA) procedures. You can apply circular binary segmentation to array CGH data and
estimate the false discovery rate of multiple hypotheses testing of gene expression data from a microarray
experiment. You can also perform rank-invariant set normalization on either probe intensities for multiple
Affymetrix CEL files or gene expression values from two different experimental conditions.
Specialized routines for visualizing microarray data include volcano plots, box plots, loglog plots, I-R plots, and
spatial heat maps of the microarray. You can also visualize ideograms with G-banding patterns.
Volcano plot of microarray data showing significance versus gene expression ratio.
Using routines from
, you can classify your results, perform hierarchical and K-means
clustering, and represent your microarray data in statistical visualizations, such as 2D clustergrams with optimal
leaf ordering, heat maps, principle component plots, and classification trees.
Copy number alterations (left) calculated and viewed alongside ideograms (right) using Bioinformatics Toolbox.
Mass Spectrometry Data Analysis
Bioinformatics Toolbox provides a set of functions for mass spectrometry data analysis. These functions enable
preprocessing, classification, and marker identification from SELDI, MALDI, LC/MS, and GC/MS data.
Preprocessing functions include baseline correction, smoothing, calibration, and resampling. You can align raw
spectra data using the M/Z axis and perform retention-time alignment on LC/MS and GC/MS data. You can plot
multiple spectra simultaneously.
Label-free differential proteomics and metabolomics analysis using Bioinformatics Toolbox.
You can smooth, align, and normalize spectra and then use classification and statistical learning tools to create
classifiers and identify potential biomarkers.
Identifying Significant Features and Classifying Protein Profiles
See classification of mass spectrometry data and statistical tools that can be used to look for
potential disease markers and proteomic pattern diagnostics.
Graph Theory, Statistical Learning, and Gene Ontology
Graph Theory and Visualization
Bioinformatics Toolbox enables you to apply basic graph theory to sparse matrices. You can create, view, and
manipulate graphs such as interaction maps, hierarchy plots, and pathways. You can determine and view shortest
paths in graphs, test for cycles in directed graphs, and find isomorphism between two graphs.
Statistical Learning and Visualization
Bioinformatics Toolbox provides functions that build on the classification and statistical learning algorithms in
Support vector machine (SVM) and K-nearest neighbor classifiers
Functions for setting up cross-validation experiments and measuring the performance of different
Interactive tools for feature selection, mapping, and displaying hierarchy plots and pathways
Bioinformatics Toolbox enables you to access the Gene Ontology database from within
, parse gene
ontology annotated files, and obtain subsets of the ontology such as ancestors, descendants, or relatives.
Bioinformatics Toolbox provides sequence analysis and visualization tools for genomic and proteomic sequence
data. You can perform a variety of analyses, including multiple sequence alignments and the building and
interactive viewing and manipulation of phylogenetic trees.
The toolbox provides functions, objects, and methods for sequence analysis, including pairwise sequence,
sequence profile, and multiple sequence alignment. These include:
implementations of standard algorithms for local and global sequence alignment, such as the
Needleman-Wunsch, Smith-Waterman, and profile-hidden Markov model algorithms
Progressive multiple sequence alignment
Graphical representations of alignment results matrices
Standard scoring matrices, such as the PAM and BLOSUM matrix families
Consensus sequence calculation and sequence logo display
Sequence Utilities and Statistics
The toolbox lets you manipulate and analyze your sequences to gain a deeper understanding of your data. You
Convert DNA or RNA sequences to amino acid sequences using the genetic code
Perform statistical analysis on the sequences and search for specific patterns within a sequence
Apply restriction enzymes and proteases to perform in-silico digestion of sequences or create random
sequences for test cases
Predict minimum free energy secondary structure of RNA sequences
The toolbox enables you to visualize sequences and alignments. You can view linear or circular maps of
sequences annotated with GenBank features. You can visualize secondary structure diagrams of an RNA
sequence. Interactive viewers let you explore and modify pairwise and multiple sequence alignments.
Phylogenetic Tree Analysis
The toolbox enables you to create and edit phylogenetic trees. You can calculate pairwise distances between
aligned or unaligned nucleotide or amino acid sequences using a broad range of similarity metrics such as
Jukes-Cantor, p-distance, alignment-score, or a user-defined distance method. Phylogenetic trees are constructed
using hierarchical linkage with a variety of techniques, including neighbor joining, single and complete linkage,
and Unweighted Pair Group Method Average (UPGMA).
The toolbox supports weighting and rerooting trees, calculating subtrees, and calculating the canonical form of
trees. The phylogenetic tree viewer lets you prune, reorder, and rename branches; explore distances; and read or
write Newick-formatted files. You can also use the annotation tools in MATLAB to create presentation-quality
Product Details, Examples, and System Requirements
Protein Feature Analysis
The toolbox provides protein sequence analysis techniques, including routines for calculating properties of a
peptide sequence such as atomic composition, isoelectric point, and molecular weight. You can determine the
amino acid composition of protein sequences, cleave a protein with an enzyme, and create backbone plots and
Ramachandran plots of PDB data. You can use the Sequence Tool to view the properties of an amino acid
sequence or use the Molecule Viewer to display and manipulate 3D molecular structures.
Data Import and Application Deployment
File Formats and Database Access
You can access standard file formats for biological data, online databases, and Web sites. Bioinformatics Toolbox
enables you to:
Read sequence data from standard file formats, including FASTA, PDB, and SCF
Read microarray data from file formats such as Affymetrix DAT, EXP, CEL, CHP, and CDF files; ImaGene
results format data; Agilent
Feature Extraction Software files; and GenePix
GPR and GAL files
Read data from online databases such as GenBank, EMBL, NCBI BLAST, and PDB
Import data directly from the NCBI Gene Expression Omnibus Web site with a single command
Read cytogenetic banding information from NCBI ideograms or UCSC cytoband text files
Read mass spectrometry data from MZXML and JCAMP-DX files
Sharing Algorithms and Deploying Applications
provides tools that let you turn your data analysis program into a customized software application.
These include tools for building graphical interfaces, a visual integrated development environment, and a profiler.
MATLAB application deployment products let you integrate your MATLAB algorithms with existing C, C++, and
applications, deploy the developed algorithms and custom interfaces as standalone applications, convert
MATLAB algorithms into Microsoft
.NET or COM components that can be accessed from any COM-based
application, and create Microsoft Excel
You can integrate MATLAB with commonly used bioinformatics tools such as BioPerl, SOAP-based Web
services, and COM plug-ins.
Calling BioPerl Functions from MATLAB
Pass arguments from MATLAB to Perl scripts and pull BLAST search data back to MATLAB.
Online User Community
Third-Party Products and Services
© 2012 The MathWorks, Inc. MATLAB and Simulink are registered trademarks of The MathWorks, Inc. See
for a list of
additional trademarks. Other product or brand names may be trademarks or registered trademarks of their respective holders.