Download File - Christopher A. Jeffers

batterycopperInternet και Εφαρμογές Web

12 Νοε 2013 (πριν από 3 χρόνια και 11 μήνες)

86 εμφανίσεις

Spring Batch

Christopher Jeffers


August 2012


Agenda


Intro to Spring Batch and Use
-
Cases


Spring Batch Technical Explanation


Architecture


The Batch Job


Skipping and Retrying Steps


Scaling Features


Spring Batch Evaluation


Solving Use
-
Cases


Benefits


Issues


Integration Options


Future Steps





2


Spring Batch Overview


Lightweight framework designed to enable the development
of robust batch applications used in enterprise systems


As a part of Spring, it builds on the ease of use of the POJO
-
based development approach, while making it easy for
developers to use more advanced enterprise services when
necessary


Provides reusable functions that are essential in processing
large volumes of data


Provides scaling features, including multi
-
threading and
massive parallelism for Spring
B
atch Jobs

3


Batch Use
-
Cases


DataRoomBatch


Physically delete all rows marked for deletion from a given
bucket (
DeepSix
)


Rerun user documents through publishing workflow


Proactive auditing of the environment


Public Records Batch Processing


User inputs file with search criteria for many individuals
and program searches database for changes in
information, returning a report of hits to user


Read, Process, and Write sequence


Satisfies Government and Corporate requirements


4


Reason for Spring Batch POC


Current batch
s
ystem for public
r
ecords is not
powerful enough to handle very large requests


Have had to turn away customers because of this


A more powerful and flexible batch solution could
solve this problem

5


Agenda


Intro to Spring Batch and Use
-
Cases


Spring Batch Technical Explanation


Architecture


The Batch Job


Skipping and Retrying Steps


Scaling Features


Spring Batch Evaluation


Solving Use
-
Cases


Benefits


Issues


Integration Options


Future Steps






6


Architecture


Layered architecture


The application layer contains all batch jobs and custom code


Batch Core contains runtime classes necessary to launch and
control a batch job


Batch Infrastructure contains common readers and writers,
and services used by both the application and the core
framework

7

http://static.springsource.org/spring
-
batch/reference/html/spring
-
batch
-
intro.html


The Batch Job


A Job entity encapsulates an entire batch process


A Job is comprised of Steps, which encapsulate a
phase of a batch job


Step can be as complex or simple as developer wants

8

http://static.springsource.org/spring
-
batch/reference/html/domain.html


Chunk Processing


Typical Spring Batch Step


Read, Process, Write
sequence


Multiple items are read and processed before being
written as a “chunk”


Size of chunk declared in configuration (commit
-
interval)

9

http://static.springsource.org/spring
-
batch/reference/html/configureStep.html


Step Flow


Steps can be configured to flow sequentially or
conditionally


Allows for some complex jobs

10

http://static.springsource.org/spring
-
batch/reference/html/configureStep.html


Job Repository


The
JobRepository

is used to do CRUD operations
with Meta
-
Data relating to Job and Step execution


Example: Job Parameters, Job/Step status, etc.

11

http://static.springsource.org/spring
-
batch/reference/html/domain.html



Step Skipping


Step is skipped if an exception listed in the
configuration is thrown, rather than stopping the
batch execution


Used for exceptions that will be thrown on every
attempt of the Step


FileNotFoundException
, Parse Exceptions, etc.


SkipListener

can be used to log skipped items


12


Retrying Steps


If
an
exception listed in the configuration
is thrown,
the
operation is attempted again


Used for exceptions that may not be thrown on
every attempt of the Step


ConcurrencyFailureException
,
DeadlockLoserDataAccessException
, etc.


Can set a limit on number of retries


RetryListener

can be used to log retried items


RetryTemplate

can be used to further customize
retry logic

13


Scaling Features (Single Process)


Multi
-
Threaded Jobs or Steps


Using Spring’s
TaskExecutor

object


Parallel Steps


Using split flows and a
TaskExecutor

in Job configuration.

14

http://static.springsource.org/spring
-
batch/reference/html/scalability.html


Scaling Features (Multi
-
Process)


Remote
Chunking


Splits
Step processing across multiple processes, using
some middleware to
communicate

15

http://static.springsource.org/spring
-
batch/reference/html/scalability.html


Scaling Features (Multi
-
Process)


Step Partitioning


Splits input and executes remote steps in parallel


PartitionHandler

sends
StepExecution

requests to remote
steps


Partitioner generates the input for new step executions

16

http://static.springsource.org/spring
-
batch/reference/html/scalability.html


Job Flow with Client/Server and Partitioning

17


Agenda


Intro to Spring Batch and Use
-
Cases


Spring Batch Technical Explanation


Architecture


The Batch Job


Skipping and Retrying Steps


Scaling Features


Spring Batch Evaluation


Solving Use
-
Cases


Benefits


Issues


Integration Options


Future Steps





18


Solving the Use
-
Cases


DataRoomBatch

(
DeepSix

Example
)


Bucket is input to
JdbcCursorItemReader


Create an Item Processor to check if the row is marked for
deletion and delete it if so


Item Writer could be empty or used to output statistics


Partitioning easily done by dividing up number of rows per
partition


19


Solving the Use
-
Cases


Public Records Batch Processing


Input file is input to
FlatFileItemReader


Custom Item Processor to search the database for hits


Custom Item Writer to compile report of search results


Following step to send report to user


Easy to implement a Partitioner for the input file



20


Benefits of Spring Batch


Part of Spring Framework


Allows easy integration with other Spring features


General simplicity offered by Spring


Step flow customizable


Basic Item Readers and Writers already available


Features available for monitoring Jobs and Steps


Many scaling options available


21


Issues with Spring Batch


No built
-
in scheduler


Not a big issue, scheduler libraries easily integrated


Potentially a lot of XML configuration


Business logic across Java and XML files can complicate
debugging and maintenance


Annotations can help


Anything but very basic components will need to be
created as new classes


22


Helpful Integration Options


Spring Batch Admin


Web
-
Based administration console


Contains Spring Batch Integration, allowing use of Spring
Integration messages to launch and monitor jobs


Scheduler (
cron
, Spring Scheduling, Quartz)


Clustering Framework (
Hadoop
,
GridGain
,
Terracotta)


Ideal for improving horizontal scaling


Spring Data
Hadoop

is a fairly new Spring feature that
helps integrate Spring with
Hadoop

23


Future Steps


Get Spring Batch set up with a clustered environment


Evaluate performance


Figure out dynamic load balancing


Play around with more features and integration options


Spring Batch Admin, manual job restarting, etc.


Implement Spring Batch Admin into Cobalt GUI?


Look more into the information stored in Meta
-
data database
and figure out how to use for monitoring/managing jobs


Look into Partitioning and how much must be done to
implement sending partitions off to remote machines


Look into job/step timeout


24

Questions?