The vast amount of data generated by business and the increase in data warehouses and legacy systems have created a treasure trove of information to be mined to draw meaningful insights regarding fraud indicators, emerging risks, and business performance. Companies such as Amazon, Facebook, Coogle, and Netflix are built on foundations of data exploration and mining.
Data mining, which includes text mining, is the discovery of information without a previously formulated hypothesis where relationships, patterns, and trends hidden in large data sets are uncovered. It involves using methods at the convergence of artificial intelligence, machine learning, statistics, and database systems. With the advent of big data, this niche-driven research discipline, developed in the 1980s, is now a powerful tool.
There are no roadmaps or directions in data mining. Instead, it requires thinking outside the box to come up with a range of scenarios. Questions like, "What are the risks?" "What opportunities exist for business improvements?" "How can this data be leveraged?" and "What fraudulent activities can occur?" can lead to developing algorithms.
Data Mining Techniques
The most common techniques used in data mining are predictive modeling, data segmentation, neural networks, link analysis, and deviation detection.
Predictive modeling uses "if then" rules to build algorithms. For example, during a loan audit, auditors can create rules to show which customers in a specific age range (18-25, for instance) with balances exceeding US$5,000 are likely to default.
Data segmentation involves partitioning data into segments or clusters of similar records. Also called clustering, this technique lets auditors see common factors underlying each segment. For example, a marketing audit can look at residents of urban neighborhoods and affluent areas where wealthier, older people live.
Neural networks are a type of artificial intelligence that uses case-based reasoning and pattern recognition to simulate the way the brain processes, stores, or learns information. In fraud detection, neural networks can learn the characteristics of fraud schemes by comparing new data to stored data and detecting hidden patterns.
Link analysis establishes links between records or sets of records. Such links are called associations. Examples include customers buying one product at a specific time and then a different product a few hours later or a vendor supplying a raw material and...