By: Gary Bauer


CAEs in the Middle East are increasingly expressing interest in deploying Data Analytics to detect fraud and irregularity. Popular audit software contains basic Data Analytics tests, and many CAEs are keen to explore how these can be developed and tailored.


Proactive Data Analytics is one of the principal tools in fraud detection and prevention. The Association of Certified Fraud Examiners (ACFE) 2014 Report to the Nations found it to be one of the most effective anti-fraud controls.

Of the 18 Anti-Fraud Controls selected by the ACFE, Proactive Data Monitoring/Analysis was found to be the most effective at limiting the duration and cost of fraud schemes. Victim organizations with this control experienced losses of 60% lesser value and schemes 50% shorter in duration than organizations that did not.

I set out in this article some of the considerations for CAEs wishing to embark on a Data Analytics program, including what it involves, the data that can be used, the types of tests to be run, how to interpret the test results, who should do the work, and some of the problems and pitfalls in running a Data Analytics program.

Data Analytics’ benefits in fraud detection are known to many CAEs. Data Analytics does not rely on sampling (and fraud in general does not lend itself to easy extrapolation across data populations), and can be easily redeployed, or even deployed on a continuous basis.

Data Analytics is also useful in fraud prevention. Simply knowing that Internal Audit is running a series of tests designed to identify fraud and irregularity will surely dampen the enthusiasm of potential fraudsters.


What is ‘Data Analytics and Fraud Detection’?

We talk here of a set of tests that can deployed across company data, designed to detect irregularity which can be indicative of fraud. It is important to realize that the tests themselves do not show fraud, only indicators. Exceptions generated by tests must be investigated to determine whether the underlying transaction is fraudulent, or whether there is some other explanation.

It is important, when dealing with stakeholders, that this difference is clearly and consistently articulated. A basic set of Data Analytics tests will generate a lot of exceptions. Most of these exceptions will not be fraud.

We talk here of Data Analytics on ‘structured’ data. Structured data is simply data that is stored in fixed fields. Data from an ERP will almost always be structured data. Unstructured data can include photos, graphics, webpages, email, PDFs, PPTs and word data. Semi-structured data is a hybrid of the two and includes tags that are attached to unstructured data, for example keywords that are tagged onto photos and metadata attached to word documents.

For the most part, CAEs will be interested in starting a Data Analytics program based on deploying tests on structured data. Forensic investigations focus a lot on unstructured information; however, Data Analytics involves analysis, not investigation. The advantage of starting with Data Analytics on structured data is that the outcomes are easily understood and accepted by stakeholders and the tests almost always yield interesting results, irrespective of whether they turn out to be fraud or not.


Types of tests

CAEs have the option of a suite of over 100 tests in areas such as Procure-to-Pay, Order-to-Cash, Finance, Human Resources (including Payroll) and Bidding & Contracting. Many of these tests include basic tests that internal audit may already run and which are embedded in popular auditing software. Others are more complicated. The two or three word test description (for example, ‘Vendors Paid Early’) will often indicate the utility of the test. CAEs may also be aware of tests involving Benford’s Law, which – assuming an adequate population – can yield more fraud-focused results.

Tests can be broken down into a number of broad types:

  • Tests that are run on a single set of data – such as transactional invoice data from the ERP.
  • Tests that are run on a combination of data sets from the same platform – such as vendor master and transactional data from the ERP.
  • Tests that are run on a combination of data sets from different platforms – such as vendor master data and third party vendor data.

Knowing what type of test you are running will be useful in preparing you for the volume and extent of exceptions detected. Tests that are run on data from different data sources face a higher risk of yielding exceptions that are due to data quality rather than being genuine exceptions to be examined. Extra care needs to be taken when reviewing results that rely on matching text (such as names of vendors).


Entity data to be used

Most entities will run tests on their ERP data. CAEs should give thought to other ‘stand-alone’ data that might be captured within the organization but not fed into the ERP. This can include databases that Functions use for their work, but which have little to do with the financial records. Examples include HR data, such as candidate applications, test results, offers and rankings. Other examples might exist in procurement, particularly around vendor and bidder acceptance and onboarding, while Logistics and Supply Chain may keep data about vehicle movements and staff rosters. Internal audit should have an appreciation of the databases that Functions maintain in their day-to-day work.


Third party data

Data Analytics can be very interesting when third party data is used in conjunction with ERP data. Third party data in this sense includes corporate registry and business directory information and the like. Third party data can show legal and beneficial owners of vendors and customers and contain useful information such as date of establishment, turnover and profit, addresses and other important identifiers.

Of course, in the Middle East, there is a paucity of this information, and what is available can be difficult to compile in a readily-usable format. One solution to this – and which could be a recommendation coming out of a Data Analytics exercise – is for the company itself to start compiling this information at the vendor take-on process or remediate its vendor database over time.


Data Analytics is also useful in fraud prevention. Simply knowing that Internal Audit is running a series of tests designed to identify fraud and irregularity will surely dampen the enthusiasm of potential fraudsters


Interpreting the results

Data Analytics tests generate a lot of exceptions, particularly if a few years of data is analysed.

The first level of review should focus on whether the test results make sense, or whether they are the result of problems while executing tests, or poor data quality.

Once satisfied that the results are genuine exceptions, the review should focus on what to look into, particularly for tests with a large number of exceptions. Different tests will generate different volumes of results. A test matching employee phone numbers with vendor phone numbers should hopefully yield few results; whereas a test designed to highlight missing information in the vendor master file will inevitably generate a long list of exceptions.

Where should this analysis start? Unless a stakeholder has imposed a constraint on how the data is to be analysed, you are free to set your own practical criteria for determining the exceptions to be investigated.

Bear in mind that potentially all of your exceptions could involve fraud – it is highly unlikely they will; however, you won’t know until you’ve looked into them.

It can be useful to prepare a summary sheet of vendors that appear in a number of transactions, containing data on overall spending per vendor, their location and the
type of expenditure. From this, it can be seen which vendors appear in which tests. Tests can also be weighted, so for example an exception with a ‘split invoice test’ is worth more than an exception with a ‘large quarterly change test’.


First line, second line or third line?

Who should look at the exceptions? CAEs may, on one hand, wish to keep control of the process, particularly in the early stages; while on the other, not wish to reassign existing resources to manage the program.

One approach could be that internal audit starts and refines the program (including investigating select results) and ensures that it is meeting expectations, and then moves to a stage where it conducts the tests and sends select exceptions back to the first line for follow up.


Problems and pitfalls

There are a handful of problems that CAEs should be aware of in embarking on a Data Analytics exercise.

In order to gain management buy-in, CAEs may wish to first focus on straightforward tests against structured data. This is likely to yield less confusing, more easily understood results in a shorter time frame. From there, efforts can potentially be made on semi- and unstructured data.

Finding the data within the organization can also present problems. The data in the ERP is probably relatively straightforward, but data kept and maintained by functions (outside of the ERP) can also be useful. Often finding the owners of data, and gaining their acceptance for its use, can present problems.

Data feeds must also be organised. There are a couple of methods for this and care should be taken that underlying data is not altered in the process.

Data quality could well be the chief problem faced. Broadly, the data used will not have been prepared for the purpose of running specific queries to identify anomalies. However, it may still be useful, or at least adequate. Data may be missing fields, may not have been fully completed, may be outdated, or it may have been overwritten with no historical data or audit trail of changes. This will be important to know when running tests and interpreting results. For example, a test to detect ‘payments made to bank accounts not assigned to vendors’ may yield a great number of exceptions if historical information on bank accounts is not kept.

Finally, it is vital to understand the data that is captured in each field. For example, whether the ‘invoice date’ is the date of the invoice as per the vendor, the date it was approved or the date it was input to the ERP. Without knowing this, your understanding of what a test reveals and what it actually reveals will be two separate things.



Data Analytics is a proven tool in fraud detection and prevention programs and has captured the attention of CAEs in the region. A Data Analytics program can be deployed in several stages, focusing on early quick wins from structured data, mindful of data availability and quality. The effort required to interpret results should not be underestimated.



GARY BAUER is Managing Director and Regional Head of Forensic Services for Protiviti in the Middle East.