manual - Broad Institute

tenderlaSoftware and s/w Development

Dec 13, 2013 (3 years and 9 months ago)

53 views

COVER PAGE

COVER INFO (
if necessary


see old CPA manual
)

TOC

INTRODUCTION


CellProfiler Analyst 2.0 (CPA 2.0) provides tools for exploring and analyzing cell
phenotypes from high
-
throughput, image
-
based screens.


Chief among these is
Classifier
: throug
h supervised machine learning, Classifier can be quickly trained to
recognize complicated and subtle phenotypes, enabling automatic scoring of hundreds
of millions of cells.


CPA 2.0 other tools include a
PlateMapBrowser
for flexible exploration of plate a
nd
microarray data, and a sophisticated
ImageViewer

for visualizing biological images
composed of multiple channels.


Images may be conveniently viewed while exploring
phenotype populations as well as image and object measurements.




CPA is designed to us
e data from CellProfiler, but other formats are supported and may
be converted.


(
Edit as desired
:) The project was developed by Adam Fraser and Thouis R. Jones of
the Broad Institute Imaging Platform and grew out of earlier work begun in the
laboratories
of David M. Sabatini (Whitehead Institute for Biomedical Research) and
Polina Golland (MIT Computer Science and Artificial Intelligence Laboratory).


I. Classifier


I.A About Classifier


Classifier was developed at The Broad Institute Imaging Platform and

is

distributed under the GNU General Public License version 2. (See LICENSE.txt)


Classifier has been primarily tested on MacOS 10.4 and 10.5, but it has now

been shown to function on Windows XP as well.


Classifier was designed to work with data proces
sed by CellProfiler, though any

experiment that measures object features from images should be easily

adaptable, if it stores its data in a similar form.



I.B Preliminary Data Requirements


Classifier
requires
access to the following data sources:




A
per
-
image table

and a
per
-
object table

containing measurements and
metadata. These may reside either in a MySQL database, an SQLite database or
in individual comma
-
separated value (CSV) files.




The
images

used to create the data tables.
These

can be stored
ei
ther
locally or
remotely and accessed via HTTP.
I
mage files are expected to be monochromatic
and represent a single

channel. However, any number of images may be
combined by adding path and filename columns to the per
-
image table of your
database for each
channel.


Configuration details for each of these data sources must first be specified in a user
-
defined
properties file
.


I.B.1
Step 1: Setting up the Properties File


The properties file is a simple .txt file that contains all the configuration informati
on
necessary for CPA to access your database (or CSV files) and process your images.
Each setting is stored on a line in the form: field = value(s). Lines that begin with a #
are comments and are ignored by CPA.


If you are using CellProfiler for your ima
ge analysis, a properties file may be
automatically generated at the end of your pipeline by the ExportToDatabase module.
CPA 2.0 is not compatible with properties files from CellProfiler Analyst version 1.0, but
the two formats may be easily converted by
hand.


Note: A completed sample properties file appears at end of this section.


I.B.1.a Database Information


First, tell Classifier how to access to your database. Required fields are listed below for
accessing to different data sources. Values surround
ed with <> need to be filled in by
you.


To connect to a MySQL database:

The suggested mode of data storage/access is in a MySQL database, however, this is
not the only supported data source.


db_type = mysql

db_port = 3306

db_host = <your host

name>

db_name = <your database name>

db_user = <your user name>

db_passwd = <your password>


To connect to an SQLite database:

SQLite provides another mode of data storage in which the tables are stored in one big
file somewhere on your computer
.


db_type = sqlite

db_sqlite_file = <path to your SQLite db file>


To access CSV data written by CellProfiler’s ExportToDatabase module:

One of the easiest ways to analyze data from CellProfiler in CPA is to use this mode of
input. You must spe
cify the path to the .SQL file written by the ExportToDatabase
module. When you run CPA with these settings, it will use this file and the CSV files
output by ExportToDatabase to create an SQLite database file in your home directory.
This could take a long

time for larger databases, but this only needs to be done once.


db_type = sqlite

db_sql_file = <path to .SQL file output by ExportToDataBase>


To access data stored in 2 CSV files:

In this case, you may tell CPA to find your per
-
image and per
-
ob
ject tables in 2 comma
-
separated values (CSV) files. When you run CPA with these settings, it will take the
data from the 2 CSV files and insert it into an SQLite database file in your home
directory. This could take a long time for larger databases, but t
his only needs to be
done once.


db_type = sqlite

image_csv_file = <path to your per
-
image csv>

object_csv_file = <path to your per
-
object csv>




In your properties file,


per
-
image TABLE:

The per
-
image table requires 1 column for a _uniq
ue_ image ID and 2 columns for

each channel represented in your images: One column for the image path, and

one column for the image filename (which may include multiple path elements).


For example, if you took images of cells stained with GFP and Hoechs
t, you

would have 2 channels and your per
-
image table would look something like this:



Example_per
-
image_Table


+
------------------------------------------------------------------------
+


| ImgID | GFP_path | GFP_file | Hoechst
_path | Hoechst_file | other cols |


|
-------
+
----------
+
-----------------------------------------------------
+


| 1 | path | gfp1.tif | path | hoechst1.tif | ... |


| 2 | path | gfp2.tif | path | hoechst2.tif | ..
. |


| ... |


+
------------------------------------------------------------------------
+


PER_OBJECT TABLE:

The per_object table requires 4 columns: a foreign key image ID column,
a

_unique_ object ID column, a column for the object x
-
location, and a column for

the object y
-
location. The location columns should contain values in pixel

coordinates for where each object falls in its parent image.




Example_Per_Object_Tabl
e


+
------------------------------------------------
+


| ImgID | ObjID | X_Coord | Y_Coord | other cols |


|
-------
+
----------
+
-----------------------------
+


| 1 | 1 | 3.243 | 125.234 | ... |


| 1 | 2 | 411.12 | 50.001 | ..
. |


| ... |


+
------------------------------------------------
+