manual - Broad Institute

tenderlaSoftware and s/w Development

Dec 13, 2013 (4 years and 5 months ago)



if necessary

see old CPA manual



CellProfiler Analyst 2.0 (CPA 2.0) provides tools for exploring and analyzing cell
phenotypes from high
throughput, image
based screens.

Chief among these is
: throug
h supervised machine learning, Classifier can be quickly trained to
recognize complicated and subtle phenotypes, enabling automatic scoring of hundreds
of millions of cells.

CPA 2.0 other tools include a
for flexible exploration of plate a
microarray data, and a sophisticated

for visualizing biological images
composed of multiple channels.

Images may be conveniently viewed while exploring
phenotype populations as well as image and object measurements.

CPA is designed to us
e data from CellProfiler, but other formats are supported and may
be converted.

Edit as desired
:) The project was developed by Adam Fraser and Thouis R. Jones of
the Broad Institute Imaging Platform and grew out of earlier work begun in the
of David M. Sabatini (Whitehead Institute for Biomedical Research) and
Polina Golland (MIT Computer Science and Artificial Intelligence Laboratory).

I. Classifier

I.A About Classifier

Classifier was developed at The Broad Institute Imaging Platform and


distributed under the GNU General Public License version 2. (See LICENSE.txt)

Classifier has been primarily tested on MacOS 10.4 and 10.5, but it has now

been shown to function on Windows XP as well.

Classifier was designed to work with data proces
sed by CellProfiler, though any

experiment that measures object features from images should be easily

adaptable, if it stores its data in a similar form.

I.B Preliminary Data Requirements

access to the following data sources:

image table

and a
object table

containing measurements and
metadata. These may reside either in a MySQL database, an SQLite database or
in individual comma
separated value (CSV) files.


used to create the data tables.

can be stored
locally or
remotely and accessed via HTTP.
mage files are expected to be monochromatic
and represent a single

channel. However, any number of images may be
combined by adding path and filename columns to the per
image table of your
database for each

Configuration details for each of these data sources must first be specified in a user
properties file

Step 1: Setting up the Properties File

The properties file is a simple .txt file that contains all the configuration informati
necessary for CPA to access your database (or CSV files) and process your images.
Each setting is stored on a line in the form: field = value(s). Lines that begin with a #
are comments and are ignored by CPA.

If you are using CellProfiler for your ima
ge analysis, a properties file may be
automatically generated at the end of your pipeline by the ExportToDatabase module.
CPA 2.0 is not compatible with properties files from CellProfiler Analyst version 1.0, but
the two formats may be easily converted by

Note: A completed sample properties file appears at end of this section.

I.B.1.a Database Information

First, tell Classifier how to access to your database. Required fields are listed below for
accessing to different data sources. Values surround
ed with <> need to be filled in by

To connect to a MySQL database:

The suggested mode of data storage/access is in a MySQL database, however, this is
not the only supported data source.

db_type = mysql

db_port = 3306

db_host = <your host


db_name = <your database name>

db_user = <your user name>

db_passwd = <your password>

To connect to an SQLite database:

SQLite provides another mode of data storage in which the tables are stored in one big
file somewhere on your computer

db_type = sqlite

db_sqlite_file = <path to your SQLite db file>

To access CSV data written by CellProfiler’s ExportToDatabase module:

One of the easiest ways to analyze data from CellProfiler in CPA is to use this mode of
input. You must spe
cify the path to the .SQL file written by the ExportToDatabase
module. When you run CPA with these settings, it will use this file and the CSV files
output by ExportToDatabase to create an SQLite database file in your home directory.
This could take a long

time for larger databases, but this only needs to be done once.

db_type = sqlite

db_sql_file = <path to .SQL file output by ExportToDataBase>

To access data stored in 2 CSV files:

In this case, you may tell CPA to find your per
image and per
ject tables in 2 comma
separated values (CSV) files. When you run CPA with these settings, it will take the
data from the 2 CSV files and insert it into an SQLite database file in your home
directory. This could take a long time for larger databases, but t
his only needs to be
done once.

db_type = sqlite

image_csv_file = <path to your per
image csv>

object_csv_file = <path to your per
object csv>

In your properties file,

image TABLE:

The per
image table requires 1 column for a _uniq
ue_ image ID and 2 columns for

each channel represented in your images: One column for the image path, and

one column for the image filename (which may include multiple path elements).

For example, if you took images of cells stained with GFP and Hoechs
t, you

would have 2 channels and your per
image table would look something like this:



| ImgID | GFP_path | GFP_file | Hoechst
_path | Hoechst_file | other cols |


| 1 | path | gfp1.tif | path | hoechst1.tif | ... |

| 2 | path | gfp2.tif | path | hoechst2.tif | ..
. |

| ... |



The per_object table requires 4 columns: a foreign key image ID column,

_unique_ object ID column, a column for the object x
location, and a column for

the object y
location. The location columns should contain values in pixel

coordinates for where each object falls in its parent image.



| ImgID | ObjID | X_Coord | Y_Coord | other cols |


| 1 | 1 | 3.243 | 125.234 | ... |

| 1 | 2 | 411.12 | 50.001 | ..
. |

| ... |