Mining for information gold: data mining offers the RIM professional an opportunity to contribute to knowledge discovery in databases in a substantial way.

AuthorFirestone, Joseph M.

During the late 1980s, several trends in computing, including the emergence of client-server technology, the growing popularity of structured query language (SQL), the gravity of "the islands of information" problem, and the inaccessibility of much of the structured information "hidden away" in both legacy and SQL transactional databases led to the development of large, physically centralized, structured databases called data warehouses. These were intended for decision support. SQL-querying technology, however, was not sufficient to deliver the hoped-for information value, and the 1990s led to the rapid growth of data warehousing and to the development and spread of new technologies for getting useful information out of those surprisingly unwieldy first-generation data warehouses.

One of these new technologies is data mining, a term based on the idea that very large databases are "mountains" of information that can be "mined" for "nuggets" of great value if the right technology is applied. During the 1980s and increasingly during the 1990s, data mining technology was becoming available in the form of statistical and artificial intelligence-based models and computing algorithms. Additionally, new software technology was being developed for integrating distributed systems based on object and web technology. The result of this confluence between need and technology has been a continually growing data mining industry containing scores of new companies selling to large, mid-sized--and even small--organizations. There are several sectors interested in data mining: banking, medicine, insurance, retailing, and government. Data mining supports many goals, such as reducing costs, enhancing or reusing research, increasing sales, and detecting fraud.

The image suggested by the term "data mining" is an attractive one, but, unfortunately, it may not be very informative to those records and information management (RIM) professionals who need to know what data mining means for them. RIM managers need answers to these questions.

* What is the process context of data mining?

* What is its value for RIM managers?

* What is the relationship between data mining and knowledge discovery in databases (KDD)?

* How does one get started in a data mining process?

* In what direction is this fast-moving field going?

The most compelling reason for RIM managers to take an interest in data mining is simply this: the "data" in "data mining" are, for the most part, records created in the normal course of business of any organization. Records, then, become the structured data foundation to the data mining process.

What Is Data Mining?

Definitions of data mining abound, and they vary among practitioners. (See Sidebar, "Definitions of Data Mining" on page 50.)

Selecting just one of the definitions is not as important as realizing that people will use the term data mining in at least the four ways described in the sidebar. It will be up to information managers to decide which meaning their organization assigns to it. Definition 3 is used in this article because it has the advantage of distinguishing "data mining" from traditional analyses by emphasizing its automated character in generating patterns and relationships. It also clearly distinguishes data mining from knowledge discovery by emphasizing the much broader character of KDD as an overarching process, including steps distinct from data mining and relying more heavily on human interaction.

What Is the Process Context of Data Mining?

The process context leads to the more comprehensive process of KDD within which data mining occurs. KDD starts with problems--seeking them in routine situations, recognizing them, and clearly articulating them. It continues with gathering information about a problem and its potential solutions. At that point, hypotheses or models are developed that are central to the solution. There are many alternative ways of developing models, including intuition, a literature review, mathematical modeling, and...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT