The First Annual Special Topics in Strategic and Emerging Technologies in Accounting Conference:

jumentousmanlyInternet και Εφαρμογές Web

21 Οκτ 2013 (πριν από 3 χρόνια και 9 μήνες)

116 εμφανίσεις

1



T
he First Annual Special Topics in Strategic and Emerging
Technologies in Accounting

Conference
:

Text Analytics in Accounting

March 1, 2013


Conference
Chair and Editor

Ingrid E. Fisher,
University at Albany
-
SUNY.



Sponsored by Deloitte, the Strategic
and Emerging Technologies Section of the American
Accounting Association, and the University at Albany School of Business Department of
Accounting and Law

with the support of CaseWareIdea

Held at the SUNY Global Center, 116 East 55
th

Street, New York, NY





Authors retain all copyrights.

Papers are working papers.

Please do not quote or reference without permission of the paper author.

2


Table of Contents

and Schedule

The SUNY Global Center, NY, NY

Registration: 8:00
-
8:30

Opening Remarks: 8:30
-
8:35

Ingrid E
, Fisher, Conference Chair,
University at Albany
-
SUNY


Keynote Address: 8:35
-
9:30

Text Categorization for Document Review

Ali Hadjarian, Ph.D.,
Senior Manager,
Data
Analytics P
r
actice of Deloitte Financial Advisory Services LL
P

Nate Huber
-
Fliflet
,
Manager , Discovery P
ractice of Deloitte Financial Advisory Services LLP

Coffee Break: 9:30
-
10:00

Panel Session: 10:00
-
11:00 Can t
ext analysis and unstructured data help in the audit
process?

Miklos A. Vasarhelyi (chair)
, KPMG
Professor of Accounting Information Systems, Director


RARC
-
CAR Lab,

Rutgers University,

Jame
s
R
.

Littley
,
Principal,
KPMG LLP

Tre
vor Stewart, Partner (Retired) Del
oitte


Senior F
ellow C
AR L
ab,
Rutgers University

Wen Ling Hsu, Lead
Member Technical
Staff


Applied

Data Mining,
AT&T Laboratories


Eric Cohen, PwC,
XBRL Global Technical Leader

Anthony DeSantis,
Principal, Data Analytics Practice of Deloitte Financial Advisory Services LLP

Paper Session 1: 11:00
-
12:00

The SEC’s Impact on Earnings Disclos
ure Readability……………………………

………5

Michele D. Meckfessel
, Case Western Reserve Un
i
versity


Textual Information in the SEC XBRL Filings and Companies' Business Nature
………….6

Roman Chychyla
, and

Alexander Kogan
,

Rutgers University

Lunch and Speaker: 12:00
-
1:00

String Matching and Fraud Detection

Jianping Zhang
, Ph.D.,

Senior Manager, Data Analytics Practice of Deloitte Financial Advisory Services LLP

Anthony DeSantis, CFE, Principal, Data Analytics Practice of Deloitte Financial Advisory Services LLP

Sara
Vandermark, Senior Manager, Data Analytics Practice of Deloitte Financial Advisory Services LLP



Paper Session 2: 1:00
-
2:00

Using Text Analytics, Semantic Web Technology, and XBRL to

Retrieve and

In
tegrate Business Information in
Chinese
……………………………………………………7

C. Janie Chang,

University of San Diego,

Chi
-
Chun Chou,

University of Taipei

and Jacob Peng
,
The University of Michigan
-
Flint

3



Does text in MD&A improve

Return
-
on
-
Equity forecasts?
..................
.....................................8

Khrystyna Bochkay
, and

Carolyn B. Levine
, Rutgers University

Paper Session 3: 2:00
-
3:00

Are Quarterly Reports More Informative Than Annual Reports in Fraud

Detection?..................................................
................................................
.....................................9

Sunita Goel
, Siena College


Do Auditors Consider Non
-
Financial Factors in Pre Engagement Stage?
.............................10

Kyunghee Yoon
, Rutgers University

Short
Break: 3:00
-
3:15

Paper Session 4: 3:15
-
4:45

Automatic Classification of Accounting Literature
…………………………………………..11

Vasundhara Chakraborty,
Ramapo College
,
Victoria Chiu
, Rutgers University

and Miklos
Vasarhelyi
, Rutgers University


Effect of Industry Type and Firm Size on Pension
Footnote Reporting
……………………12

Vasundhara Chakrabort
y,

Ramapo College
and Miklos Vasarhelyi
, Rutgers University


Content Analysis and Sentiment Detection in Online Reviews of Tax Professionals:

A Comparison o
f Sentistrength, LIWC2007,
and Diction 6.0
………………………………13

Candace L. Witherspoon,
Valdosta State
and
Dan N. Stone
, University of Kentucky

Folksonomy Tags: Survey and Extensions (Possible Presentation via Skype)
…………….14

Daniel E. O'Leary
, University of
Southern California


Closing Remarks: 4:45
-
4:55

Ingrid E
. Fisher, Conference Chair,
University at Albany
-
SUNY


Hors d'oeuvres Reception: 5:00
-
6:30




4


Speaker Profiles

Ali Hadjarian, Ph.D.

Ali Hadjarian

is a Senior Manager in the Analytics practice of Deloitte Financial Advisory Services LLP. He has
over 17 years of industry experience in advanced data and text analytics and a doctoral degree in the same area.
Prior to joining Deloitte, he served as a

te
chnology advisor to the U.S.
Government and led a number of high profile
analytics projects with the SEC, the IRS and the PCAOB.

Nate Huber
-
Fliflet

Nate is a Manager in the Discovery practice of Deloitte Financial Advisory Services LLP.


With more than 8 y
ears
of experience consulting with Am Law 100 law firms and corporations in the Banking, Automotive, and Oil & Gas
industries, Nate specializes in developing and implementing di
scovery management strategies,

administering non
-
linear attorney reviews incorp
orating the use of analytics, and helping companies with discovery technology design
and implementation.

Dr. Jianping Zhang

Dr. Jianping Zhang has more than 20 years R&D academic and industry experience in machine learning and text and
data mining. He publ
ished more than 100 technical papers and held two patents in these areas. He also developed a
number of innovative data and text mining products for government and industry applications and served as program
committee members for many conferences. Prior to

joining Deloitte, he was a Principal Scientist and Data Mining
group lead in the MITRE Corporation where he led several predictive analytics efforts in developing innovative
approaches for financial fraud detection. Prior to that, he was Chief Architect i
n AOL from 2004 to 2006, where he
led efforts in developing web content categorization service, web filtering services, and web user profiling platform.
From 2000 to 2004, he was a Lead AI Scientist in MITRE, where he worked on various data mining projects

for
different US Government agencies. From 1990 to 2000, he was Assistant/Association Professor in Utah State
University. He obtained his PhD from University of Illinois at Urbana
-
Champaign in Computer Science.

Anthony DeSantis, CFE

Mr. DeSantis is a Prin
cipal in the Data Analytics practice within Deloitte Financial Advisory Services

LLP with
more than 13 years of experience specializing in the forensic analysis of electronic data; primarily for the purpose of
identifying fraud, waste, abuse and corruption
.


His technical experience includes the analysis of complex structured
and unstructured data, management and implementation of information management systems, the design and
operation of relational databases in investigations, litigations, claims processi
ng and settlement administration
environments and the use of databases and advanced analytic methodologies for identifying indicators or creating
risk profiles of fraud and corruption.


Mr. DeSantis is a Certified Fraud Examiner and has presented on Intern
ational
eDiscovery Issues, Investigative Data Considerations, Technology to Detect Potential Violations of the U.S. Foreign
Corrupt Practices Act, Healthcare Data Mining and Utilizing Data Visualization to Detect Fraud, Waste and Abuse.


Sara Vandermark

Ms. Vandermark is a Senior Manager in the Data Analytics practice of Deloitte Financial Advisory Services LLP
with over 7 years of experience with relational database technology, related programming languages and reporting
utilities, and industry
-
standard
ERP and enterprise
-
wide accounting systems in investigative, litigation, and
compliance environments.


Much of her experience to
-
date has focused on Anti
-
Fraud and Anti
-
Money Laundering
Compliance, Bankruptcy Claims Administration, Product Diversion / Para
llel Trade, the U.S. Foreign Corrupt
Practices Act and the development of related analytical tools.


She has worked in industries ranging from financial
services and healthcare to electronics and engineering.



5


The SEC’s Impact on Earnings Disclosure Reada
bility


Author:
Michele D. Meckfessel

University:

Case Western Reserve University

E
-
mail:

michele.meckfessel@case.edu



Abstract
:

This research documents the SEC’s impact on the accessibility of an influential piece of firm
financial information, quarterly earnings disclosures, through changes in the regulatory
environment (Plain English Guidelines, Reg. FD and SOX) during the 1997
-
2
007 timeframe. In
spite of guidance and regulations designed to increase accessibility of financial information,
earnings disclosure readability decreased. Interestingly, in quarters that firms experience
increases in operating earnings or beat analyst ea
rnings forecasts, quarterly earnings disclosure
readability increases. This research shows that firm managers are more motivated to improve the
readability of quarterly earnings disclosures by market factors than by market regulators.



Keywords: readab
ility, earnings disclosure, SEC regulation, beating analyst forecasts











6


Textual Information in the SEC XBRL Filings and Companies' Business
Nature


Author:
Roman Chychyla

University:

Rutgers, The State University of New Jersey

E
-
mail:

rchychyla@gmail.com

Author:
Alexander Kogan

University:

Rutgers, The State University of New Jersey

E
-
mail
:
kogan@andromeda.rutgers.edu

Abstract:

The recent SEC’s requirement for the U.S. GAAP filers is to prepare their filings using the
XBRL reporting for
mat. For each reporting concept in an XBRL filing, a succinct textual
description should be provided by the filer. In this study, we try to answer the question whether
such textual information in an XBRL 10
-
K filing is indicative of the business nature o
f the
company. We proxy business nature of a company by its industry and size, and develop text
similarity measures for company filings. The results show that 1) company filings are more
similar within the industries as compared to the whole population of

filings, and 2) filings’
similarity of the smaller
-
size companies differs from the filings’ similarity of the larger
-
size
companies within the same industry. This supports the premise of our research question. In
addition, we also find that the textual
description of the same reporting concept is fairly similar
across different filings, and that the level of this similarity is not positively related to the
frequency of the concept in the filings.










7


Using Text Analytics, Semantic Web Technology,
and XBRL to Retrieve and
Integrate Business Information in Chinese


Author:
C. Janie Chang

University:

San Diego State University

E
-
mail:

jchang@mail.sdsu.edu


Author:
Chi
-
Chun Chou


University:

National Taipei College of Business


Author:
Jacob Peng


University:

The University of Michigan


Flint

E
-
mail:

jcpeng@umflint.edu



Abstract:

Given the demand on complete and relevant business information, the complementary textual
data in the annual financial reports plays a critical role in decision making, in addition to the
numerical data in the financial statements. However, due to the form
atting differences, it has
been difficult to process the textual data and to integrate the two types of data. Therefore, using
appropriate information technology to redesign and/or remodel the information disclosure
mechanisms has drawn tremendous attentio
n from the capital market participants. Since the
Great China Area has gained increased importance in the global capital market, this study adopts
technologies in Chinese text analytics fore classifying the semi
-
structured or unstructured
business strategi
es
-
related textual data to enhance the effectiveness of textual data retrieval.
Furthermore, this study uses semantic webs technology to allow users to integrate XBRL
financial data with nonfinancial information in the financial reports or management discu
ssion
and analysis through the design science methodology.



Keywords: text analytics; business information retrieval; integration of financial and non
-
financial information; XBRL




8



Does text in MD&A improve Return
-
on
-
Equity forecasts?


Author:
Khrystyna Bochkay

University:

Rutgers, The State University of New Jersey

E
-
mail
:
bochkaykhrystyna@yahoo.com

Author:
Carolyn B. Levine

University:

Rutgers, The State University of New Jersey



Abstract:

The Management Discussion and Analysis (MD&
A) requirements are intended to provide
prospective textual disclosure... with particular emphasis on the registrant’s prospects for the
future" (Securities and Exchange Commission, 1989). This paper tests the extent to which
MD&A disclosures improve qua
ntitative forecasts of future prospects. We estimate and compare
a set of earnings forecasting models based solely on quantitative factors with expanded models
that include qualitative factors extracted from the MD&A section. To represent text numerically
for use in statistical forecasting models, we employ the bag
-
of
-
words (BOW) approach which
identifies and counts the words appearing in the MD&A section. Because of the large number of
words used, standard linear regression models cannot be applied, and w
e use kernel ridge
regressions to overcome the dimensionality problem. We find that MD&A was uninformative
(i.e., forecasts enhanced by text were not significantly better than forecasts based on quantitative
inputs alone) in the pre
-
reform period of 2001
-
2002. In contrast, in the post
-
reform period the
narrative disclosures in MD&A significantly help to improve forecasts. This is one of the first
papers to provide some empirical evidence on the success (and outcome) of recent MD&A
regulatory reforms.




Keywords: Text Mining; Earnings Forecasting; Kernel Ridge Regression; MD&A




9




Are Quarterly Reports More Informative Than Annual Reports in Fraud
Detection?


Author:
Sunita Goel

University:

Siena College

E
-
mail:

sgoel@siena.edu



Abstract :

In this stu
dy we conduct a comparative qualitative analysis of the textual content of quarterly
reports and the annual reports to examine if the quarterly reports contain more information about
fraud, and to find out if text in the quarterly reports provides new insi
ghts into how companies
portray themselves when committing fraud. When benchmarked against the results reported with
annual reports in Goel et al. (2010), our results show a modest increase in association between
textual content of quarterly reports and fr
aud. In this study we also look at some of the more
advanced techniques of content analysis to decipher evidence of fraud or deception embedded in
quarterly reports. The basic premise of our research is that organizations tend to camouflage
negative findin
gs to sound less damaging. The real intent of the writer is hidden in content but
can be revealed through structured content analysis. Simple word frequency count is not
sufficient to provide us with the level of precision that is necessary to detect decep
tion or fraud
in company filings. We examine this problem using a new corpus of quarterly reports of firms
where fraud has occurred and juxtapose it with firms where fraud has not been detected. Our
results show a statistical difference in the content of t
he two sets of data.

Keywords: quarterly reports; textual content; content analysis; fraud detection








10



Do Auditors Consider Non
-
Financial Factors in Pre Engagement Stage?

Textual information of 10K and Press release and Auditor Resignation


Author:
Kyunghee Yoon

University:

Rutgers, The State University of New Jersey

E
-
mail:
yoonk05@gmail.com


Abstract :

In an audit engagement's planning stage, audit firms consider various factors in order to avoid
potential engagement risks. As such, there is a great deal of research that can be conducted to
examine which kinds of factors affect an auditors’ client accept
ance decision in the pre
-
engagement stage. However, to evaluate factors auditors consider in the stage is not possible in
reality. Also, it is important to note that auditors tend to estimate a potential clients' risk profile
using not only financial facto
rs but also non financial factors; there is little to no research that
examines which non financial factors are considered in the pre
-
engagement stage by audit firms.
Therefore, in this paper, I compare non financial factors of resigned companies as of com
panies
which dismissed auditor to find factors auditors might consider to avoid potential engagement
risks. Especially, I compare text information on 10K and press releases of resigned companies
and matching companies. I expect there are significant differ
ences in language usages on text
information between resigned and matched companies which dismissed auditors, so I can
implicate that audit firms tend to evaluate non financial factors to avoid risks. By machine
learning methodologies I classify textual i
nformation of two groups with negative, positive,
litigious, and uncertain words, and overall those list of words could classify two groups.
Additionally, I develop a logistic regression model to predict resignation companies with
frequency of list of word
s in news articles and other control variables from financial factors, the
results indicate that non financial factors plays a role in the model. Therefore, this research
implies that auditors tend to consider non financial factors also, and in further, th
is paper
concludes by that non financial factors are considered strategically in the market.






11



Automatic Classification of Accounting Literature


Author:
Vasundhara Chakraborty



University:

Ramapo College of New Jersey

E
-
mail:
vchakrab@ramapo.edu

Author:
Victoria Chiu

University:

Rutgers, The State University of New Jersey

E
-
mail:

vchiu626@gmail.com


Author:
Miklos Vasarhelyi

University:

KPMG Professor of AIS, Rutgers, The State University of New Jersey


Abstract:

This paper explores the possibility of using semantic parsing, information retrieval and data
mining techniques to classify automatically accounting research. Literature
taxonomization plays a critical role in understanding a discipline’s k
nowledge attributes and
structure. The traditional research classification is a manual process which is considerably time
consuming and may introduce inconsistent classifications by different experts. Aiming at aiding
this classification issue, this study
conducted three experiments to seek the most effective and
accurate method to classify accounting publications’ attributes. We found results in the third
experiment most rewarding in which the classification accuracy reached 87.27% with decision
trees and
rule
-
based algorithms applied. Findings in the first and second experiments also
provided valuable implications on automatic literature classifications, e.g., abstracts are better
measures to use than keywords and balancing under
-
represented subclasses doe
s not contribute
to a more accurate classifications. All three experiments’ results also suggest that expanding
article sample size is a key to strengthen automatic classification accuracy. Overall, the potential
path of this line of research seems to be v
ery promising and would have several collateral
benefits and applications.






12




Effect of Industry Type and Firm Size on Pension Footnote Reporting


Author:
Vasundhara Chakraborty


University:
Ramapo College of New Jersey


E
-
mail:
vchakrab@ramapo.edu



Author:
Miklos Vasarhelyi

University:

Rutgers, The State University of New Jersey

E
-
mail:

miklosv@andromeda.rutgers.edu


Abstract:


The objective of this chapter is to examine whether (i
) the type of industry or (ii) firm size has an
impact on the volume of pension disclosure of firms and the selection of line items to be used by
firms for pension reporting. The parsing tool developed as an integral part of the hybrid method
proposed and
demonstrated in Chakraborty & Vasarhelyi (2010) is applied in this study as well.
Two experiments are conducted within the fold of this study. Natural language processing
techniques are applied to 10K statements belonging to firms from three different gr
oups of
industries namely, pharmaceutical/healthcare, technology to perform content analysis and
capture specific trends emerging in pension disclosure reporting testing for industry type as a
significant factor in the first experiment and firm size as the

significant factor in the second
experiment. From the preliminary results based on the first experiment it can be concluded that
type of industry does not play any significant role in causing differences in the pension
disclosure reporting of firms. Howe
ver, preliminary results from experiment II indicate that firm
size is a significant factor towards determining pension reporting trends of companies. Finally,
uncommon terms found in the second experiment, which were used rarely by firms, indicate the
use

of new line items by companies.







13



Content Analysis and Sentiment Detection in Online Reviews of Tax
Professionals: A Comparison of Sentistrength, LIWC2007, and Diction 6.0


Author:
Candace L. Witherspoon


University:

Valdosta State University

E
-
mail:

clwitherspoon@valdosta.edu



Author:
Dan N. Stone


University:

University of Kentucky

E
-
mail:

dstone@uky.edu



Abstract:

This study develops methods to assist in identifying factors that influence how tax clients
perceive service quality by invest
igating the extent to which Sentistrength, LIWC2007, and
Diction 6.0 accurately classify tax domain
-
specific terminology and detect sentiment in the tax
setting. Results indicate that while off
-
the
-
shelf programs assess sentiment poorly, a customized
versi
on of SentiStrength assesses sentiment well. Furthermore, software may inaccurately
classify tax vocabulary and client sentiment because of tax
-
specific language idiosyncrasies. This
study contributes to the literature by assessing the accuracy of automate
d content analysis in the
tax domain, providing a methodological foundation for social analytic and taxation research.



KEYWORDS: sentiment analysis, content analysis, social analytics, tax professionals







14



Folksonomy Tags: Survey and Extensions


Author:
Daniel E. O'Leary

University:

University of Southern California

E
-
mail:

oleary@usc.edu


Abstract:

This paper investigates tags and folksonomies with particular focus on issues of concern with
their use in business settings. Tags employ short amoun
ts of text to capture the essence of some
document, picture, etc. The paper reviews previous research on tagging and folksonomies and
how tags are used. Empirical analysis of delicious.com tags is used at a number of points in the
paper to provide insigh
ts into a range of concepts. For example, we examine multiple word tags
and symbol
-
based tags. A Bayesian model of tags and tag reliability is generated. The use of
entropy as a tool to facilitate use of tags for generation and evolution of taxonomies a
nd
ontologies is discussed. Finally, we examine the use of tags in business settings for knowledge
management.














15