Predictive audit: transactions status prediction

munchsistersAI and Robotics

Oct 17, 2013 (4 years and 27 days ago)

96 views

Page
1

of
20


P
redictive audit: transactions status prediction


Abstract

In a n
ew
business
era
,

economical storage and
inexpensive
processing

cost expand

the
need of data assurance
and the
application of analytic
s

in the a
udit field. Continuous auditing
facilitates auditors to perform more frequent audit with full population, not just sampling. Thus,
it could fulfill business needs. Furthermore, auditors are required by
SAS 99

to assess the risk of
material misstatement due

to fraud.
Continuous
auditing and continuous monitoring could alert

auditors to irregularities found in a live stream of economic data. This allows
management and
auditors

to investigate any potential problems before they escalate.
The objective of this
paper is
to identify anomal
ous

business transactions and provide an alert beforehand.
Predictive audit
could identify the area that has

a

high possibility of anomal
ies
or fraud.
One of the major issues
in sales and revenue cycle is channel stuffing. Emplo
yees may try to boost sales

to raise

their
performance and compensation.
At the time of sales,
a number of variables, such as sales
employees’ performance and product’s attributes, can be used to predict the status of

that sales
transaction

in the future
.

If the sales transaction has particular characteristics,
it

has a potential to
be either
a
normal transaction or
a
problematic one.

Several machine learning techniques,
including decision

trees, logistic regress
ion and support vector machines, are applied

to the sales
transactions

data.

A set of

indicators that predict
whether

that sales transactions will be cancelled
in the future

is identified and

the predictive ability of
models are compared.



1.

Introduction

In
a

real time economy
,

businesses operate 24/7, and companies extensively adopt
integrated software and modern technologies (Vasarhelyi et al. 2010). Internal

auditors and
external

auditor
s

have to adapt themselves accordingly. Audit timing, nature and extent are more
rigorous.

In traditional audit, auditors have to sampl
e

data or document testing and generalize
results to the population. This is due to limited access,
limited
resource
s
, and

lack of

technology.
Additionally, t
he role of
the
auditor ha
s

change
d overtime. In addit
ion to

financial statements

audit, auditors

have to monitor internal controls and company’s core activities and operation

processes
, for example, to serve SOX requirements
. As businesses become more complex and
continue expanding, auditors face a vast amou
nt of complicated data to be processed and
analyzed.
The emerging of c
ontinuous auditing and continuous control monitoring
could greatly

Page
2

of
20


facilitate audit works

in real time economy
. Auditors could do audit test
ing

more frequently
with
the
entire population

and in
a
more timely basis. Continuous auditing is able to
process
full
populations of
l

data without


resorting

to

sampling
. T
ransaction verification components

in the
continuous audit

are able to filter the exceptional transactions and bring in auditor and
managem
ent attention (Kogan et al, 2010
).

Apart from
the
conventional

audit with
a pile of
paper documents,
a number of
manual
testing and
a
periodic review, internal auditors now
c
a
n

utilize more technology enable
d

tools to
support their wors
. Automated audit testing began in the 1960s with embedded audit module

methodology
. In 1980s, computer assisted audit techniques
(CAATs) w
ere

deployed for
substantive test
s

on a large electronic

data set
(Coderre, 2006). Vasarhelyi and Halper

(
1991
)

developed a continuous auditing and continuous control monitoring system to monitor a large
paperless real
-
time billing system at AT&T Bell laboratories.
In 2000, Glover et al.

conducted a
survey of s
oftware usage trends of internal auditors worldwide with 2,700 Institute of Internal
Auditors (IIA) members. They found that the usage of software to extract and import data is
rapidly increased from
those of
1998 survey. Almost half of the respondents use

continuous
monitoring software to find trends, to create exceptional report, to detect fraud and to locate
duplicate transactions. This revolution shows that technology has an important role to assist

in
many types of audits
. Continuous auditing and conti
nuous control monitoring changes the audit
paradigm t
o more frequent review and
automated audit

where possible
. It also facilitates auditor
s

to cover

full

population testing, not just a

sample and can produce more timely report to
support
management
decision and concern
. According to

an

audit maturity model developed by
Vasarhelyi et al. (2010, working paper),
a

mature continuous audit
stage has

critical meta
-
control
struc
ture and benchmark, and

auditors will execute the audit only by exception. The s
ystem

will
Page
3

of
20


operate on an assured mode and
have a warning or

an
alarm to call attention of auditor
s and
management in case that

irregular

activities or transactions are found
.

Applying technolog
y

to
the
audit work would facilitate internal auditor especiall
y in the
data rich
modern age. Correctly understand
ing

and proper

analy
sis

of date

would uncover
interesting pattern, trend,

weakness or irregularity of data and processes
.

The objective of this
paper is to identify anomal
ous

business transactions and provide an alert beforehand.

This


predictive audit (Vasarhelyi et al,

2011)”

could help auditor
s and management

to
block a
problem before it spreads.
It is better to look forward to a potential problem
, and maybe block it,

than

to just look back at
erroneous

histor
ical

data
.
At the time of sale,

indicator
s c
an

be used to
predict the status of

sales transaction
s
.
The
sales transaction particular characteristics,
determine
it
s

potential to be either

normal or

problematic.


2.

Background

More timely auditing and monitoring would help the company detect and resolve errors,
problems or anomalies before they go beyond control. It would be of great help if auditors and
management g
ot timely

warning of suspicious activities. With con
tinuous auditing and
continuous monitoring, the systems will create alarm
s

that alert internal auditors or relevant
persons to the deviation of control from the baseline.
The alarm and warning system could be
implemented in

any business process

to
assist

auditors and management

in

monitoring the
system
s
. This

paper

illustrates
the application of
this
feature

to
a front office operation such as
sales activities
in a revenue cycle
.

In recent years, fraud is a growing concern for
companies

and
regulators
.
S
AS 99
emphasizes this importance by requiring auditor
s

to assess the risk of material misstatement due
Page
4

of
20


to fraud.
The Enron case
, for example,

showed that untimely fraud discovery could
allow

the
company to fail. Vasarhelyi et al. (2002) suggested that continuous auditing would be able to
detect the abnormal nature of Enron’s special purpose entities and could alarm auditors and
management in a
more
timel
y
.

Business processes th
at

encompass

mo
ney are more prone to
fraud than other areas.
S
ale
s

i
n a revenue cycle

is another important area in business processes
that

is closely
involve
d

with money.

Many

companies have incentive programs
with cash

compensation based on sales.
Thus, there is a
possibility that employees may try to

fraudulently

increase their sales to
enhance

compensation.

Sales transactions could
properly
be cancelled
and

reimbursed afterward
s

if customers
for many reasons
. These cancellation and reimbursement
may be
the
illegit
imate ones

if sales employees try to unethical
ly boost

sales
for reaching targets
or cash bonuses. One form of sales exaggeration for a period is

of channel stuffing
. S
ales

people

of the company in this
study
get

compensation based on
their
total number of

sale transactions.
T
hus, they may try to

increase
a
number of sale transactions by several ways. For exa
mple, sales
employees may

sell a product which has low monetary value for many transactions and let
customers cancel later

after a certain period
. Cust
omers can cancel the
transactions

and

get
money back

or t
hey can complain about the products and request for reimbursement.

These
cancelled sales are not debited from the bonuses or group targets.
T
hus, t
he
compan
y incurs in
extra costs. These could be avo
ided
if there is
some form of predictive screening
.

This study aims to predict the sales transaction status at the time of customer purchasing.
When a customer purchases a

product, as soon as information
entered

in
to the system, it could
predict the futur
e status of that sale transaction
, in particular,

whether it will be cancelled or not.
Each sale transaction has different characteristics
including

the

propert
ies

of the

transaction
itself, seller characteristics
,

and buyer qualifications.
A
ccordingly
, by examining all relevant
Page
5

of
20


factors together, if there is any suspicious transaction,

determined by a pre
-
determined numeric

suspicion threshold


the system
could block its processing and pass it over to audit/ operational
personnel for approval and incl
ude it on

a
warning or alarm report.

Several classification
techniques

in machine learning

are used to create
prediction
models to predict
the outcome of
the
sale transaction.


Research questions development

A
udit by exception

allows auditors to plan and execute work based on

internal control
evaluation, and to focus more on specific areas that need attention. Generally,
the system is
considered materially correct until an alarm arise (Vasarhelyi et al. 2004).
The sooner
manage
ment or auditor
s

acknowledge
a
e problem, the earlier the problem

can be resolved at
reduced cost to the organization
. AT&T Bell laboratories implemented continuous assurance
mod
el

called Continuous Process Auditing System (CPAS)
to monitor a real time

bill
ing system.
It triggers

alarm
s

which will be
selectively
escalated to auditor
s

and management when
transaction
s

exceed
s

a
predefine threshold
or certain events occur

(Vasarhelyi and Halper, 1991).

An ability to predict or estimate
conditions facilitate
s

management and auditors


works
.
Management could
operate business or implement internal control on a preventive
m
anner
, while
auditors could execute audits as a predictive audit rather than
a
detective basis.
Several
researchers
examine

how decision suppo
rt system helps
estimating

potential risk, especially in
auditing area
,

such as audit risk

and client risk.
Bell and Smith

(2002) use

a procedure to
evaluate client risk for external audit work on several factors. This would help auditors to
determine whet
her they w
ill accept the

potential client. If
risk likelihood

can be identified in
Page
6

of
20


advance or
b
e

predict
ed
, it c
ould aid management, auditor
s or relevant

personnel

for

decision

making

or deal
ing

with p
otential problems
.


Most of p
rior
work

studied the indicators of potential risks and problems in aggregate
view
,
very few deal
at
the
transaction level

(Kogan et al, 2011)
. The
se studies, for example,

tried
to predict client risk to aid decision

making

on client acceptance and to identify audi
t risk for the
engagement. At the transaction level,
this paper

propose
s

the prediction models to forecast
business

transaction outcomes
. Considering a number of transaction’s attributes,
the models
will

determine whether

the transaction
will
succeed or fa
il.
The result of the
prediction models
could
warn

management

to pay attention to those transactions

that have negative
results
. This le
ads to
the research question
:


What
prediction
model(s) will
more
accurately forecast business t
ransaction outcomes
?

Prediction models are proposed using several machine learning techniques. Each
prediction model has advantages and disadvantages that have to be traded off. The accuracy of
the prediction result is one of the most important factors to consider the
quality

of the model. The
precise forecast would benefit auditors and management decision.


3.

Literature review

Sales forecast studies

The r
evenue cycle is one of the most important

business areas
. Every for
-
profit company
is very
concern
ed with

its
sales numbers.
T
he major source of revenue

of business is from sales

activities
. Thus, sales forecasting is
crit
i
cal

for business planning, strategy, supply chain and
more. A number of
prior
studies examine

various perspective
s

about sales forecasting using
Page
7

of
20


different

tec
hniques and variables. Winters
(1960)

most classic work,
introdu
ces

the moving
average exponential model with the seasonal and tren
d smoothing technique and states

that the
desired c
haracteristics of the forecast
are

quick, cheap and easy. Thus his model includes

only
past sales history but

does not include any external factors.

Nowadays,

more factors are included in the sales forecast

models
.
Current research
concentrate on developing new models and finding more eff
icient algorithm
s

for sales forecast.
Fisher and Raman (1996) use

the historical data of previous products mixed with expert opinion
to do the

sales prediction. They propose

a new model to estimate demand densities of the fashion
skiwear of
the
manufacturi
ng company. The success and f
ailure of new launch products i
s
studied by Garber et al (2004). It is difficult to obtain
enough sales data for a

new product to
enable reliable sales
prediction. The authors include

spatial data to render
a
better prediction.

This spatial information is

available since the product was launched and sold
. The cross entropy
i
s calculated as a measure and logistic
regression is run to classify

the cases. When plotting the
graphs of entropy, a succe
s
s
ful

product and a fail
ed

produc
t apparently
have
different pattern
s
.
As a result,
the model successfully predicts

14 out of 16 products. Fashion products are
challenging to forecast the demand because they have short product lifetimes, long lead time and
fluctuate demand. Moreover, hist
orical demand information is not available.

Chang and Wang (2006) employ

fuzzy back
-
propagation network (FBPN) to forecast
monthly sales in Printed Circuit Board (PCB) industry. FBPN is the integration of fuzzy logic
and artificial neural network algorith
ms. The stepwise regression analysis and fuzzy Delphi
method
s

were applied to select variables related to the sales forecast from three domains
:

market
demand
,
macroeconomics
,
and industrial production. The authors found that both stepwise
regress
ion and f
uzzy Delphi methods have

better performance when include
a
tendency factor,
Page
8

of
20


and
the
fuzzy Delphi

technique

outperform
s

the stepwise

method
. Finally,
FBPN is

compare
d

with other three methods
, which are

Grey forecasting, multiple regression analysis and Back
-
propagation network. They conclude that among four models, FBPN is the best model with
97.61%
prediction
accuracy.

Cadez et al. (2001)
propose probabilistic modeling to make inferences about individu
al
behavior (profile) given transaction data from a large data set of individual over a period of time.
The behavior here focuses on the likelihood that individual will purchase a particular item. In
this paper, a model
-
based approach is applied to the pro
filing problem. A flexible probabilistic
mixture model for the transactions is proposed and compared with baseline modes based on raw
or adjusted histogram techniques. The data are separated into two time periods for training and
testing. The log
-
probabili
ty (logp score) of the transactions is used to evaluate predictive power
of the models. Customers with relatively high logp score per item are the most predictable

ones
.
This score can be used to identify interest and unusual purchasing behavior of custome
rs as well.

Data from n
ewly registered automobile
s

in German
y

(
1992
-
2007
) is

used to test the sales
forecast model by Bruhl et al. (2009). Yearly, monthly, and quarterly data are compared using
multiple linear
regressions

(MLR) for the linear trend estimat
ion, and support vector machine
(SVM) for the non
-
linear trend estimation. The results show that
the
non
-
linear (SVM) model
outperformed
the
linear model and quarterly data has the lowest prediction error. The problems
in the yearly model are
the
very smal
l data set and information content. The
major
problem for
the monthly model
is

that most of the exogenous variables used in the model are not collected
monthly, so substitute or average values are used.


Page
9

of
20


Sales forecast

with Machine learning techniques

Machine learning is
a
computerized technique to learn from the sample of data or
historical information and uses the discovered patterns to predict
a

new data. Morwitz and
Schmittlein (1992) use the segmentation methods to improve the accuracy of sales for
ecasting
based on
the
purchase
intention theory. They segment

heterogeneous individual groups into
homogeneous subgroups with
a

hypothesis that consumers are heterogeneous in purchase
intention and the realization of the intentions. The
ir

segmentation meth
od
s are

a priori, CART,
discriminant analysis and k
-
mean clu
ster analysis. The results show

that after segmenting
consumers into similar groups the average forecast errors
are

reduced, and more accurate sales
forecasts
are

obtained. In this study, when usi
ng discriminant analysis as a segmentation method,
the average percentage in forecast error was reduced most.

New and unique product
s

like songs, movies and books usually do not have past
information about sales and availability of the relevant data is not

known in advance. In this case,
the data of diverse prior products could be used for preliminary sales forecasting and then update
the forecast later when the data

of this product becomes

available.

Lee et al (2003) use

a
hierarchical Bayesian model of th
e logistic diffusion model to forecast prelaunch weekly sales of
in
dividual song albums and update

post launch when sales data become
s

available using
sampling/importance resampling algorithm in Bayesian. In the hierarchy, the first level of sales
predicti
on model is at
the
album level and the second level is the underlying characteristics of
artist
s

and albums. The study finds that the prelaunch of the album forecast with album
characteristics has a better result than the one without album characteristics,

and the forecast
result significantly improve
s

after the first week of sales with the updating data to the model.

Page
10

of
20


Thomassey and Fiordaliso (2006) develop the hybrid model based on clustering and
decision trees to forecast mid
-
term sales for
a

textile industry. The prediction
i
s processed in

two
stages. First, clustering is applied to produce

sales profiles. Next,
each new product is

assigned to
a

sales profile by decisi
on tree. This methodology uses

sales behavior of past products to identify
a

possible pattern of the new product, which has no histor
ical

data. Using k
-
means clust
ering,
a
number of clusters
are

set between 2 and 20. Then, a decision tree, C4.5,

i
s applied to each
different set of cluster r
esults and the absolute error i
s computed

to select the number of cluster
that produces the most accurate classification.

Even though
there are several studies

in the area of sales forecasting,
there seems to be
none that

predict
s

the status of the sale especially at
the
transaction level

applied

in auditing
field
. Moreover
, the

main objective of this study is not to predict the future income of the
company, but
the sales transaction status, and

use
the result of
sales prediction as an alarm for
internal auditors and relevant business staff to fur
ther investigate the irregularity of the sales
transactions and employee performance. There is no prior work that uses sales prediction as a
warning indicator in continuous auditing environment.


4.

Data

The data sets are from one of the largest banks in the
world. The data consist
s

of sa
les
related information which
are

sales

and cancellation transactions of
a
special saving account

of
the company
and sales employee records.
A special saving account

offers customers to deposit
an
equal amount of money to the bank every period, usually on
a
monthly basis. Customers can
select how much money (installment) they want to depo
sit and how many periods
in

a

contract.
An

installment could be very small or big amount upon customer
s’

preferen
ce, but this money
Page
11

of
20


could not be withdrawn before the contract end
s
. If
a
customer cancels the contract before the
contract end, the
bank

will return money to customer at a discount
ed

rate. However, after buying
a

special saving account, a customer can
canc
el, suspend or reimburse a

purchase

if he or she
does not satisfy

with

the product. If the
bank

gets
a
complaint

from
a
customer,
it

may decide to
refund those
payments
.

T
he data set
s

are transactions during November 2009 to April 2010. These transactions
are from all branches country
-
wide.

The main sales transaction table

(Base_table) has

607,189

records. The data set also includes
additional

information related to sales transactions

in
separated tables
. Those tables are;

1.

Registration of employee
s
: Personal information about each employee

2.

Complain
s
: Customers can make complain to the company if they are not
satisfied with the product after purchase.

3.

Reimbursement
s
: When customers
are n
ot

satisf
ied

with the product, they can ask
to cancel those
purchases
. In some case, the company gives them the
reimbursement for their purchase. This table contains seven months of
reimbursement data from October 2009 to April 2010.

Even though the total special saving account cancellation transactions during 6 months
period are 6.67% of total sales, the preliminary analysis of cancellation summary by date shows
that they are increasing over the period from approximately 110 transacti
ons per day at the
beginning of the period to 470 transactions per day at the end of period (Graph 1).



Page
12

of
20


5.

Model Development

Channel stuffing is a malpractice to inflate the sales figures of a company by distributing
products to the distributors far more
than they can actually sell to customers. This makes a
company look healthier than it really is because of the increasing sales figures and accounts
receivable numbers. Normally, the company will offer long term credit or return condition to the
distributo
rs to make them accept the large amount of goods. In this sales prediction study,
employees may push sales to increase a number of sales transactions and let customers cancel or
reimburse those later.
Therefore, the selected variables
for sales status pre
diction
are related to
total number of sales, cancellation and reimbursement transactions.

The
important criteri
on

to select

attributes is that all the selected attributes have to be
known at the time of prediction. For example,
a

status of the current
sale transaction is not
known at the time

a

customer
make a purchase

and this is the outcome that will be predicted by
the model
. Thus, the
current

sales
transaction
and its status cannot
be
include
d

in the prediction
variables, unlike value and number
s

of

installment of the current transaction that are known at
the time of sale.
Other attributes those available at the time of purchase are past performance of
sales employees, for example,
total sales transactions that an employee sold in the past and total

sales transactions with complaint that an employee ever had.
The attributes selected as variables
for prediction are as follow.


1.

Ratio o
f sale cancellation by sale employee
(D_CANC_RA)

2.

Ratio of sale cancellati
on and reimbursement by sale employee

(D_RESS
_RA)

3.

R
atio of matched sale by sale employee

(D_CASA_RA)

4.

Ratio of sale to inactive account
by sale employee
(D_INAT_RA)

Page
13

of
20


5.

Ratio of sale with complaint
by sale employee
(D_RECL_RA)

6.

Ratio of sale to another employee
by sale employee
(D_FUNC_RA)

7.

Number

of sales

transaction by

employee (C_A_SALE)


These attributes are known variables
at the time of sales
, so they are good candidates for
the analysis. Most of the variables are histor
ical

sales information of each employee.
These
variables are normalized by

calculat
ed
as a

ratio
. This is
to avoid bias and to make data
comparable.


Several algorithms
are

applied to the data set to

predict the status of the sale

transactions.
O
nly algorithms suitable for nominal or categorical
outcome
value

were selected
. They are
classification tree, logistic

regression
, and support vector machine. The validation method used
is 10
-
fold cross
-
validation.


6.

Results and Analysis


To compare the results among algorithms,
several measurements

are considered. They
are percentage that th
e model could correctly c
lassified instance, error rate, specificity, recall,
precision and false alarm rate
.

For the accuracy of the models, t
he first run of the analysis gave very high percentage of
correctly classified instances
-

more than 90 percents,

for all algorithms. This is a signal of some
abnormality

because the results are too good for all algorithms
. Thus, the data was re
-
evaluate
d

and found that it was suffered from the unbalanced data problem. From total 607,189 records,
Page
14

of
20


566,753 (93.34%) rec
ords are non
-
cancel sales transactions, while another 40,436 (6.66%)
records are cancel sales transactions.


There are two
approaches to deal with the unbalanced data problem. One approach is to
weight the data
by making

both cancel

and non
-
cancel transac
tions

more reasonably equal
weight

and to add a penalty for incorrectly classify the result
. Another approach is to select a
balance sub
-
sample by selecting the same amount of cancel
ed

and non
-
cancel
ed

transactions. In
this case, both approaches
were tried

and different results

obtained
.


In the data weighted approach, the weigh
t was assigned to a

smaller portion of data
:

cancelled sales transactions. There is no specific method to calculate the right weight for the
data, but several numbers could be tried
to find the best result. The ratio between cancel and non
-
cancel data is 1:14, thus this is the first
tried
weight value. With the data weighted approach, the
result of
the
classification tree, J48, is considerably dropped to 64.23% correctly classified
in
stances, and got a large number of false negative. With the logistic algorithm, after adjus
ted the
weight, the result is

70.16% correctly classified instances, which is better than J48 algorithm
result. However, it still generated a large number of false n
egative results. The support vector
machine algorithm presented the best result among all three algorithms
.
The model can correctly
classif
y

79.36%
instances and has a much smaller number of false negative than other models.
The summary results of data weighted approach with all algorithms are shown
in table
1
.


Another approach to deal with the unbalanced data problem is by selecting a balance
d

sub
-
sample from the data set.
The cancel and non
-
cancel sale
s

transactions are randomly selected
from the population in the same volume; 30,000 records were selected from cancelled sales
transactions and another 30,000 records were selected from non
-
cancel sal
es transactions. The
classification tree, J48, correctly classified instances at 67.54% Logistic
regression algorithm
Page
15

of
20


correctly classifies

instances at 65.29% and also has a large number of false positive instances.
Support vector machine correctly classif
ied instances at
64.64
% and has a large number of false
positive.
The summary results of sub
-
sample data approach with all algorithms are shown in
table

2
.


The preliminary results show that
each approach has both pros and cons. The data
weighted approach
has higher accuracy rate, lower error rate
,

higher precision rate

and higher F
-
measure
, while the sub
-
sample approach has higher specificity rate, higher recall

rate,
lower false
alarm rate

and lower F
-
measure
. T
hese

results
could imply

that
overall result

of data weighted
approach is better. However, it is a decision to trade
-
off the cost and benefit to investigate
suspicious transactions. If auditors and management prefer not to investigate
too many
transactions to avoid the interruption of the process or

any other reasons, they can select the
prediction method that have lower type I error.




7.

Conclusion

An alarm or
a
warning system in continuous auditing is a helpful characteristic that calls

the

attention of
auditor
s

and management to the problem.

The ideal situation is that a

problem is
identified and automatically
solved

as soon as possible before it
propagates into other processes
.
The sales activities

and the compensation of
sales
em
ployees
,

which based on the sale

transaction
s
,

are

front offic
e processes
with

inherent risk.
The volumes of sales are enormous and
continuous by nature.

Prediction models using machine learning techniques are created to predict
the status of each sales transaction. The results could alert auditors and management for

possible
fraud or
irregularities of transactions
.
Predictive audit will let
them

monitor the controls and
detail transactions

in a preventive basis.

Page
16

of
20




Page
17

of
20


Graph

Graph 1: Special saving account cancellation transaction summary by date




0
100
200
300
400
500
600
700
11/4/20…
11/9/20…
11/14/2…
11/19/2…
11/24/2…
11/29/2…
12/4/20…
12/9/20…
12/14/2…
12/19/2…
12/24/2…
12/29/2…
1/3/2010
1/8/2010
1/13/20…
1/18/20…
1/23/20…
1/28/20…
2/2/2010
2/7/2010
2/12/20…
2/17/20…
2/22/20…
2/27/20…
3/4/2010
3/9/2010
3/14/20…
3/19/20…
3/24/20…
3/29/20…
4/3/2010
4/8/2010
4/13/20…
4/18/20…
4/23/20…
4/28/20…
Page
18

of
20


Tables

Table

1:
Models comparison of data weighted approach with 1:14 ratio

Model/
Measurements
(%)

Accuracy

Error
rate

Specificity

Recall

Precision

False
alarm
rate

J48

64.23

35.77

51.72

65.12

94.98

48.28

Logistic

70.16

29.84

50.30

71.58

95.28

49.70

Support
vector
machine

79.36

20.64

37.20

82.37

94.84

62.80




Page
19

of
20


Table

2:
Models comparison of
balanced sub
-
sample approach

Model/
Measurements
(%)

Accuracy

Error
rate

Specificity

Recall

Precision

False
alarm
rate

J48

67.54

32.46

63.69


71.39

66.29

36.31

Logistic

65.29

34.71

54.02

76.56

62.47

45.98

Support vector
machine

64.64

35.36

47.70

81.58

64.64

52.30




Page
20

of
20


Bibliography


Bell, T. B., and E. F. Smith. 2002. KRISK: A computerized decision aid for client acceptance
and continuance risk assessments.
Auditing: A Journal of Practice & Theory

21:97
-
113.

Brühl, B., M. Hülsmann, D. Borscheid, C. Friedrich, and D. Reith. 2009. A Sales Forecast Model
for the German Automobile Market Based on Time Series Analysis and Data Mining
Methods.
Advances in Data
Mining. Applications and Theoretical Aspects
:146
-
160.

Cadez, I. V., P. Smyth, and H. Mannila. 2001. Probabilistic modeling of transaction data with
applications to profiling, visualization, and prediction.

Chang, P. C., and Y. W. Wang. 2006. Fuzzy Delphi a
nd back
-
propagation model for sales
forecasting in PCB industry.
Expert Systems with Applications

30 (4):715
-
726.

Coderre, D. 2006. A continuous view of accounts.
Internal Auditor
:25
-
31.

Fisher, M., and A. Raman. 1996. Reducing the cost of demand uncertain
ty through accurate
response to early sales.
Operations research

44 (1):87
-
99.

Garber, T., J. Goldenberg, B. Libai, and E. Muller. 2004. From density to destiny: Using spatial
dimension of sales data for early prediction of new product success.
Marketing S
cience

23 (3):419
-
428.

Kogan, A.
, Michael G., M. A. Vasarhelyi, and J. Wu. 2010. Analytical Procedures for
Continuous Data Level Auditing: Continuity Equations Alles.
Unpublished working
paper
.

Lee, J., P. Boatwright, and W. A. Kamakura. 2003. A Bayesian m
odel for prelaunch sales
forecasting of recorded music.
Management Science

49 (2):179
-
196.

Morwitz, V. G., and D. Schmittlein. 1992. Using Segmentation to Improve Sales Forecasts
Based on Purchase Intent: Which" Intenders" Actually Buy?
Journal of Marketin
g
Research

29 (4):391
-
405.

Thomassey, S., and A. Fiordaliso. 2006. A hybrid sales forecasting system based on clustering
and decision trees.
Decision Support Systems

42 (1):408
-
421.

Vasarhelyi, M. A., M. G. Alles, and A. Kogan. 2004. Principles of analytic

monitoring for
continuous assurance.
Journal of Emerging Technologies in Accounting 1.
:1
-
21.

Vasarhelyi, M. A., M. G. Alles, S. Kuenkaikaew, and J. Littley. 2010. The Acceptance and
Adoption of Continuous Auditing by Internal Auditors: A Micro Analysis Wo
rking
paper.

Vasarhelyi, M. A., M. G. Alles, and K. T. Williams. 2010. Continuous Assurance for the Now
Economy.
Institute of Chartered Accountants in Australia
.

Vasarhelyi, M. A., and F. B. Halper. 1991. The continuous online systems.
Auditing

10:110
-
125.

Vasarhelyi, M. A., A. Kogan, and M. G. Alles. 2002a. Would Continuous Auditing Have
Prevented The Enron Mess?
The CPA Journal

72 (7):80.

Vasarhelyi, M. A., Teeter, R., Warren, D.J., and Titera, B. 2011. The lego audit. Unpublished
working paper, Rutgers
Business School.

Winters, P. R. 1960. Forecasting sales by exponentially weighted moving averages.
Management
Science

6 (3):324
-
342.