Creating a connection pool - cassandra-tesis

jockeyropeInternet και Εφαρμογές Web

2 Φεβ 2013 (πριν από 4 χρόνια και 6 μήνες)

150 εμφανίσεις

Hector:

A high level cassandra java client.


Cassandra is a highly available column oriented database: http://incubator.apache.org/cassandra/


Hector is the greatest warrior in the greek mithology, Troy's builder and brother of Cassandra


http://en.wikiped
ia.org/wiki/Hector


http://en.wikipedia.org/wiki/Cassandra






This client provides:





o high level, simple object oriented interface to cassandra



o failover behavior on the client side



o connection pooling for improved performance and scalability



o JMX conters for monitoring and management



o load balancing




The work was initially inspired by http://code.google.com/p/cassandra
-
java
-
client/ but has taken off to different directions
since.



Source:
http://github.com/rantav/hector/blob/master/README




/**


* Insert a new value keyed by key


* @param key Key for the value


* @param value the String value to insert


*/


public

void

insert
(
final

String

key,
final

String

value
)

throws

Exception

{


exec
ute
(
new

Command
(){


public

Void

execute
(
final

Keyspace ks
)

throws

Exception

{


ks.
insert
(
key, createColumnPath
(
COLUMN_NAME
)
, bytes
(
value
))
;


return

null
;


}


})
;


}



/**


* Get a string value.


* @return The string value;
null if no value exists for the given key.


*/


public

String

get
(
final

String

key
)

throws

Exception

{


return

execute
(
new

Command
(){


public

String

execute
(
final

Keyspace ks
)

throws

Exception

{


try

{


return

string
(
ks.
getColumn
(
key, createColumnPath
(
COLUMN_NAME
))
.
getValue
())
;


}

catch

(
NotFoundException e
)

{


return

null
;


}


}


})
;


}



/**


* Delete a key from cassandra


*/


public

void

delete
(
final

String

key
)

throws

Exception

{


execu
te
(
new

Command
(){


public

Void

execute
(
final

Keyspace ks
)

throws

Exception

{


ks.
remove
(
key, createColumnPath
(
COLUMN_NAME
))
;


return

null
;


}


})
;


}


Here are the high level features of
Hector
, currently hosted at github.



A high
-
level object oriented interface to cassandra. As noted

before, Cassandra’s out of the box client is a thrift
client, which isn’t always that nice and clean to work with. I wanted to provide higher level and cleaner API. This
part was mainly inspired by the mentioned cassandra
-
java
-
client. The API is defined i
n the
Keyspace

interface. See
for example methods such as Keyspace.insert() and keyspace.getColumn()



Failover support. Cassandra is a d
istributed data store and it may handle very well one or several hosts going down.
However, out of the box thrift provides no support for failing clients. What it the client is configured to connect a
cassandra host that just happened to be down right now?

In hector, if a client is connected to one host in the ring
and this host goes down, the client will automatically and transparently search for other available hosts to perform
its operation before giving up and returning an error to its user. There are c
urrently 3 ways to configure the failover
policy: FAIL_FAST (no retry, just fail if there are errors, nothing smart),
ON_FAIL_TRY_ONE_NEXT_AVAILABLE (try one more host before giving up) and
ON_FAIL_TRY_ALL_AVAILABLE (try all available hosts before giving u
p). See
CassandraClient.FailoverPolicy
.



Connection pooling. This is a real necessity for high scale applications. The usual patt
ern for DAOs (Data Access
Objects) is large number of small reads/writes. Clients cannot afford to open a new connection with each and every
request, not only because of the overhead in the tcp handshake (thrift uses tcp), but also because of the fact that

sockets remain in
TIME_WAIT

so a client may easily run out of available sockets if it operates fast enough. This
part was also inspired by cassandra
-
java
-
client but was improved in my

version. Hector provides connection
pooling and a nice framework that manages all its gory details.



JMX support. It’s a widely known fact that applications have a life of their own. You built it to do X but it does Y
b/c you didn’t expect Z to happen. Run
ning an application without the ability to monitor it is like walking
blindfolded on a dark highway; sooner or later you’ll get hit by something. Hector exposes JMX for many
important runtime metrics, such as number of available connections, idle connectio
ns, error statistics and more.



Support for the Command design pattern to allow clients to concentrate on their business logic and let hector take
care of the required plumbing. This is demonstrated in the code above.


Source:
http://prettyprint.me/2010/02/23/hector
-
a
-
java
-
cassandra
-
client/



Pelops:

In Greek mythology Cassandra is captured by the triumphant king Agamemnon after the fall of Troy, with whom she has
two sons, Pelo
ps and Teledamus. This Java client library is Pelop’s namesake nicknamed “Cassandra’s beautiful son”
because it offers a beautiful way to code against the Cassandra database. This is a quick introduction to the library.

You can find the open source code he
re
http://pelops.googlecode.com/


Objectives

Pelops was born to improve the quality of Cassandra code across a complex commercial project that makes extensive use of
the database. The main objectives the library are:



To faithfully expose Cassandra’s API in

a manner that is immediately understandable to anyone:

simple, but beautiful



To completely separate low
-
level concerns such as connection pooling from data processing code



To eliminate “dressing code”, so that the semantics of data processing stand clear
and obvious



To accelerate development through intellisense, function overloading and powerful high
-
level methods



To implement strategies like load balancing based upon the per node running operation count



To include robust error handling and recovery that
does not mask application
-
level logic problems



To track the latest Cassandra releases and features without causing breaking changes



To define a long
-
lasting paradigm for those writing client code


Up and running in 5 minutes

To start working with Pelops an
d Cassandra, you need to know three things:

1.

How to create a connection pool, typically once at startup

2.

How to write data using the
Mutator
class

3.

How to read data using the
Selector
class.

It’s that easy!


Creating a connection pool

To work with a Cassandra

cluster, you need to start off by defining a connection pool. This is typically done once in the
startup code of your application. Sometimes you will define more than one connection pool. For example, in our project, we
use two Cassandra database clusters
, one which uses random partitioning for data storage, and one which uses order
preserving partitioning for indexes. You can create as many connection pools as you need.

To create a pool, you need to specify a name, a list of known contact nodes (the libra
ry can automatically detect further
nodes in the cluster, but see notes at the end), the network port that the nodes are listening on, and a policy which control
s
things like the number of connections in your pool.


Here a pool is created with default poli
cies:

Pelops.addPool(


"Main",


new String[] { "cass1.database.com", "cass2.database.com", "cass3.database.com"},


9160,


new Policy());


Using a Mutator

The
Mutator
class is used to make mutations to a keyspace (which in SQL speak translates a
s making changes to a
database). You ask
Pelops
for a new mutator, and then specify the mutations you wish to make. These are sent to Cassandra
in a single batch when you call its
execute
method.

To create a mutator, you must specify the name of the connec
tion pool you will use and the name of the keyspace you wish
to mutate. Note that the pool determines what database cluster you are talking to.


Mutator mutator = Pelops.createMutator("Main", "SupportTickets");


Once you have the mutator, you start specify
ing changes.


/**


* Write multiple sub
-
column values to a super column...


* @param rowKey The key of the row to modify


* @param colFamily The name of the super column family to operate on


* @param colName

The name of the super column


* @param subColumns A list of the sub
-
columns to write


*/

mutator. writeSubColumns(


userId,


"L1Tickets",


UuidHelper.newTimeUuidBytes(), // using a UUID value that sorts by time


mutator.new
ColumnList(


mutator.newColumn("category", "videoPhone"),


mutator.newColumn("reportType", "POOR_PICTURE"),


mutator.newColumn("createdDate", NumberHelper.toBytes(System.currentTimeMillis())),


mutator.newColumn("capture", jpegB
ytes),


mutator.newColumn("comment") ));


/**


* Delete a list of columns or super columns...


* @param rowKey The key of the row to modify


* @param colFamily The name of the column family to operate on


* @param
colNames The column and/or super column names to delete


*/

mutator.deleteColumns(


userId,


"L1Tickets",


resolvedList);


After specifying the changes, you send them to Cassandra in a single batch by calling
execute
. This takes t
he Cassandra
consistency level as a parameter.


mutator.execute(ConsistencyLevel.ONE);


Note that if you need to know a particular mutation operation has completed successfully before initiating some subsequent
operation, then you should not batch your mut
ations together. Since you cannot re
-
use a mutator after it has been executed,
you should create two or more mutators, and execute them with at least a QUORUM consistency level.

Browse the
Mutator
class to see the methods and overloads that are available
here


Using a Selector

The
Selector
class is used to read data from a keyspace. You ask
Pelops
for a new selector, and then read data by calling i
ts
methods.


Selector selector = Pelops.createSelector("Main", "SupportTickets");


Once you have a
selector
instance, you can start reading data using its many overloads.


/**


* Retrieve a super column from a row...


* @param rowKey

The key of the row


* @param columnFamily The name of the column family containing the super column


* @param superColName The name of the super column to retrieve


* @param cLevel The Cassandra con
sistency level with which to perform the operation


* @return The requested
SuperColumn


*/

SuperColumn ticket = selector.getSuperColumnFromRow(


userId,


"L1Tickets",


ticketId,


ConsistencyLevel.ONE);


assert tick
etId.equals(ticket.name)


// enumerate sub
-
columns

for (Column data : ticket.columns) {


String name = data.name;


byte[] value = data.value;

}


/**


* Retrieve super columns from a row


* @param rowKey The key of the row


* @p
aram columnFamily The name of the column family containing the super columns


* @param colPredicate The super column selector predicate


* @param cLevel The Cassandra consistency level with which to
perform the operation


* @return A list of matching columns


*/

List<SuperColumn> allTickets = selector.getSuperColumnsFromRow(


userId,


"L1Tickets",


Selector.newColumnsPredicateAll(true, 10000),


ConsistencyLevel
.ONE);


/**


* Retrieve super columns from a set of rows.


* @param rowKeys The keys of the rows


* @param columnFamily The name of the column family containing the super columns


* @param colPredicate

The super column selector predicate


* @param cLevel The Cassandra consistency level with which to perform the operation


* @return A map from row keys to the matching lists of super columns


*/

Ma
p<String, List<SuperColumn>> allTicketsForFriends = selector.getSuperColumnsFromRows(


Arrays.asList(new String[] { "matt", "james", "dom" }, // the friends


"L1Tickets",


Selector.newColumnsPredicateAll(true, 10000),


ConsistencyLevel.ONE);


/
**


* Retrieve a page of super columns composed from a segment of the sequence of super columns in a row.


* @param rowKey The key of the row


* @param columnFamily The name of the column family containing the super
columns


* @param startBeyondName The sequence of super columns must begin with the smallest super column name
greater than this value. Pass
null

to start at the beginning of the sequence.


* @param orderType The scheme us
ed to determine how the column names are ordered


* @param reversed Whether the scan should proceed in descending super column name order


* @param count The maximum number of super columns that can be retrieved

by the scan


* @param cLevel The Cassandra consistency level with which to perform the operation


* @return A page of super columns


*/

List<SuperColumn> pageTickets = getPageOfSuperColumnsFromRow(


u
serId,


"L1Tickets",


lastIdOfPrevPage, // null for first page


Selector.OrderType.TimeUUIDType, // ordering defined in this super column family


true, // blog order


10, // count shown per page


ConsistencyLevel.ONE);


There are a huge n
umber of selector methods and overloads which expose the full power of Cassandra, and others like the
paginator methods that make otherwise complex tasks simple. Browse the
Selector
class to see what is available
here


Other stuff

All the main things you need to start using Pelops have been covered, and with your current knowledge you can easily feel
your way around Pelops inside your IDE

using intellisense. Some final points it will be useful to keep in mind if you want
to work with Pelops:



If you need to perform deletions at the row key level, use an instance of the
KeyDeletor

class (call
Pelops.createKeyDeletor
).



If you need metrics from a Cassandra cluster, use an instance of the
Metrics

class (call
Pelops.createMetrics
).



To work with Time UUIDs, which are g
lobally unique identifiers that can be sorted by time


which you will find to
be very useful throughout your Cassandra code


use the
UuidHelp
er

class.



To work with numbers stored as binary values, use the
NumberHelper

class.



To work with strings stored as binary values, use the
StringHelper

class.



Methods in the Pelops library that cause interaction with Cassandra throw the standard Cassandra exceptions defined
here
.


The Pelops design secret

One of the key design decisions that at the time of writing distinguishes Pelops, is that the data processing code written by

developers does not involve connection pooling or
management. Instead, classes like
Mutator
and
Selector
borrow
connections to Cassandra from a Pelops pool for just the periods that they need to read and write to the underlying Thrift
API. This has two advantages.

Firstly, obviously, code becomes cleaner
and developers are freed from connection management concerns. But also more
subtly this enables the Pelops library to completely manage connection pooling itself, and for example keep track of how
many outstanding operations are currently running against e
ach cluster node.

This for example, enables Pelops to perform more effective client load balancing by ensuring that new operations are
performed against the node to which it currently has the least outstanding operations running. Because of this architectu
ral
choice, it will even be possible to offer strategies in the future where for example nodes are actually queried to determine
their load.

To see how the library abstracts connection pooling away from the semantics of data processing, take a look at the
execute
method of
Mutator

and the
tryOperation

method of
Operand
. This is the foundation upon which Pelops greatly improves
over existing libraries that have modelled connection management on pre
-
existing SQL database client libraries.


Source:
http://ria101.wordpress.com/2010/06/11/pelops
-
the
-
beautiful
-
cassandra
-
database
-
client
-
for
-
java/

Thrift:

Thrift is a software framework for scalable cross
-
language services development. It c
ombines a software stack with a code
generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang,
Perl, Haskell, C#, Cocoa, Smalltalk, and Ocaml.

Source:
http://incubator.apache.org/thrift/