Framing AI Audits: As more organizations implement artificial intelligence, internal auditors need a framework for reviewing these systems.

Author:Applegate, Dennis

Artificial intelligence (AI) is transforming business operations in myriad ways, from helping companies set product prices to extending credit based on customer behavior. Although still in its nascent stage, organizations are using AI to rank money-laundering schemes by degree of risk based on the nature of the transaction, according to a July EY analytics article. Others are leveraging AI to predict employee expense abuse based on the expense type and vendors involved. Small wonder that McKinsey & Company estimates that the technology could add $ 13 trillion per year in economic output worldwide by 2030.

If AI is not on internal audit's risk assessment radar now, it will be soon. As AI transitions from experimental to operational, organizations will incteasingly use it to predict outcomes supporting management decision-making. Internal audit departments will need to provide management assurance that the predicted outcomes are reasonable by assessing AI risks and testing system controls.


AI uses two types of technologies for predictive analytics--static systems and machine learning. Static systems are relatively straightforward to audit, because with each system iteration, the predicted outcome will be consistent based on the datasets processed and the algorithm involved. If an algorithm is designed to add a column of numbers, it remains the same regardless of the number of rows in the column. Internal auditors normally test static systems by comparing the expected result to the actual result.

By contrast, there is no such thing as an expected result in machine learning systems. Results are based on probability rather than absolute correctness. For example, the results of a Google search that float to the top of the list are those that are most often selected in prior searches, reflecting the most-clicked links but not necessarily the preferred choice. Because the prediction is based on millions of previous searches, the probability is high--though not necessarily certain--that one of those top links is an acceptable choice.

Unlike static systems, the Google algorithm, itself, may evolve, resulting in potentially different outcomes for the same question when asked at different intervals. In machine learning, the system "learns" what the best prediction should be, and that prediction will be used in the next system iteration to establish a new set of outcome probabilities. The very unpredictability of the system output increases audit risk absent effective controls over the validity of the prediction. For that reason, internal auditors should consider a range of issues, risks, controls, and tests when providing assurance for an AI business system that uses machine learning for its predictions.


The proficiency and due professional care standards of the International Professional Practices Framework require internal auditors to understand AI concepts and terms, as well as the phases of development, when planning an AI audit (see "Three Phases of Development" on this page). Because data fuels these systems, auditors must understand AI approaches to data analysis, including their effect on the system algorithm and its precision in generating outcome probabilities.

Features define the kinds of data for a system that would generate the best outcome. If the system objective is to flag employee expense reports for review, the features selected would be those that help predict the highest payment risk. These could include the nature of the business expense, vendors and dollar amounts involved, day and time reported, employee position, prior transactions, management authorization, and budget impact. A data scientist with expertise in this business problem would set the confidence level and predictive values and then let the system learn which features best determine the expense reports to flag.

Labels represent data points that a system would use to name a past outcome. For instance, based on historical data, one of the labels for entertainment expenses might be "New York dinner theater on Saturday night." The system then would know such expenses were incurred for this purpose on that night in the past and would use this data point to predict likely expense reports that might require close review before payment.

Feature engineering delimits the features selected to a critical few. Rather than provide a correct solution to a given problem, such as which business expense reports contain errors or fraud, machine learning calculates the probability that a given...

To continue reading