American Accounting Association DOI: 10.2308/ajpt-50009
Financial Statement Fraud Detection: An Analysis of Statistical and Machine Learning Algorithms Johan Perols
SUMMARY: This study compares the performance of six popular statistical and machine learning models in detecting financial statement fraud under different assumptions of misclassification costs and ratios of fraud firms to nonfraud firms. The results show, somewhat surprisingly, that logistic regression and support vector machines perform well relative to an artificial neural network, bagging, C4.5, and stacking. The results also reveal some diversity in predictors used across the classification algorithms. Out of 42 predictors examined, only six are consistently selected and used by different classification algorithms: auditor turnover, total discretionary accruals, Big 4 auditor, accounts receivable, meeting or beating analyst forecasts, and unexpected employee productivity. These findings extend financial statement fraud research and can be used by practitioners and regulators to improve fraud risk models. Keywords: analytical auditing; ﬁnancial statement fraud; fraud detection; fraud predictors; classiﬁcation algorithms. Data Availability: A list of fraud companies used in this study is available from the author upon request. All other data sources are described in the text.
he cost of ﬁnancial statement fraud is estimated at $572 billion1 per year in the U.S. (Association of Certiﬁed Fraud Examiners [ACFE] 2008). In addition to direct costs, ﬁnancial statement fraud negatively affects employees and investors and undermines the
Johan Perols is an Assistant Professor at the University of San Diego. This study is based on one of my three dissertation papers completed at the University of South Florida. I thank my dissertation co-chairs, Jacqueline Reck and Kaushal Chari, and committee members, Uday Murthy and Manish Agrawal. I am also grateful to Robert Knechel (associate editor), two anonymous reviewers, and Ann Dzuranin for their helpful suggestions. An earlier version of this paper was awarded the 2009 AAA Information System Section Outstanding Dissertation Award. Editor’s note: Accepted by Robert Knechel.
Submitted: November 2008 Accepted: March 2010 Published Online: May 2011
The ACFE (2008) report provides estimates of occupational fraud cost, median cost per fraud category, and number of cases. To derive the estimate for total cost of ﬁnancial statement fraud, it was assumed that the relative differences among the fraud categories in mean costs are similar to differences in median costs.
reliability of corporate ﬁnancial statements, which results in higher transaction costs and less efﬁcient markets. Auditors, both through self-regulation and legislation, are responsible for providing reasonable assurance that ﬁnancial statements are free of material misstatement caused by fraud. Earlier auditing standards, i.e., Statement on Auditing Standards (SAS) No. 53, only indirectly addressed this responsibility through references to ‘‘irregularities’’ (American Institute of Certiﬁed Public Accountants [AICPA] 1988). However, more recent auditing standards, SAS No. 82 and later, make this responsibility explicit. Auditors must provide ‘‘reasonable assurance about whether the ﬁnancial statements are free of material misstatements, whether caused by error or fraud’’ (AICPA 1997, AU 110.02). To improve fraud detection, recent accounting research (e.g., Lin et al. 2003; Kirkos et al. 2007) has focused on testing the utility of various statistical and machine learning algorithms, such as logistic regression and artiﬁcial neural networks (ANN), in detecting ﬁnancial statement fraud. This research is important since the ﬁnancial statement fraud domain is unique. Distinguishing characteristics that make this domain unique...