A Critical Analysis of Databases Used in Financial Misconduct Research

Link to article on SSRN

The electronic availability of data on financial restatements, class action lawsuits, and regulatory actions has facilitated significant advances in our understanding of the causes and effects of financial misconduct. Nearly 100 published studies examining aspects of financial misconduct rely on data from one of four publicly available databases: the Government Accountability Office (GAO) and Audit Analytics (AA) databases of restatement announcements, the Securities Class Action Clearinghouse (SCAC) database of securities class action lawsuits, and the Securities and Exchange Commission’s series of Accounting and Auditing Enforcement Releases (AAERs). In this paper we describe and document five types of potential problems in these databases that, if not recognized and addressed, can affect the validity and interpretation of empirical findings in this area.

The first potential problem is that each database typically misidentifies the initial event upon which news of the misconduct is revealed to the public. The average date staleness ranges from 150 calendar days (for the SCAC database) to 1,017 days (for the AAER data series). Second, the events in each database capture an average of only 6% to 36% of the value-relevant information that pertains to a given case of misconduct. Third, these databases suffer from high rates of error of omission. For example, the GAO database omits 31% of the cases of material misrepresentation during the time period it covers and for which the firm issued one or more restatement announcements; even for the cases of misconduct that the database correctly identifies, it omits 53% of the relevant restatement announcements. Fourth, each database contains duplicate events for the same underlying instance of misconduct, which can induce problems from lack of independence or exacerbate the incomplete information problem. Finally, each database contains many events that are unrelated to financial misconduct. Whether any particular event should be included in a sample depends on the researcher’s objective. However, we show that for some simple criteria (e.g., does this identify a case of financial fraud?), the rate of false positives ranges from 31% (for the AAER data series) to 98% (for the AA database).

Whether these database features pose material challenges for a research study depends on the particular research question. We show, however, that the potential for error is economically significant. In event studies, the stale date problem alone causes these databases to understate the average loss in share value when misconduct is revealed by 56% to 73%. We conclude with suggestions for how to minimize the potential for data-related error in future research.


Additional information