The new face of enterprise search: bridging structured and unstructured information: new technology has expanded the scope of the search engine, opening the doors for searches beyond the traditional.

AuthorDelgado, Joaquin

[ILLUSTRATION OMITTED]

Enterprise search allows employees to search and retrieve information needed to accomplish professional activities in a manner compliant with their enterprise's information sharing and access control policy. Such information includes publicly available information, proprietary enterprise information, and private employee information available on the employee's desktop.

As of August 12, 2005, Yahoo estimated that there were more than 20 billion pages on the Internet. In addition, billions of e-mails are traded each day between corporations, and the volume of information stored in corporate intranets, file systems, document management systems, and other enterprise applications is growing faster than ever. Consequently, the task of finding a reference in a six-month-old e-mail or retrieving the latest memo published on a given subject becomes more complex and rime-consuming each day.

Concurrent with increasing volume, the information available to corporate workers is originating from a larger and more diverse set of sources. Information appears on public and semi-public information networks such as blogs and Really Simple Syndication (RSS) feeds, for example, while within corporations, scanning and optical character recognition processes render paper archives into digital formats.

The most unsettling diversification and complexity of data sources, however, has appeared as a function of the extended role, scope, and reach assigned to enterprise search software. First-generation enterprise search engines were limited to searching a single data source or, in the most complex cases, to combining several internal indexes or federating queries to several underlying search engines. With the emergence of new, more powerful search engines, scope has expanded to provide "universal" search, including the ability to seamlessly access data stored in enterprise applications such as databases and enterprise resource planning systems.

Such an expanded scope and functionality implies that structured data (e.g., in a database) and unstructured data, although stored in different sources, can and should now be managed jointly. The increasing diversity of applications that rely on search as a cornerstone points to another implication. In addition to solving traditional "finding" problems, search is now also being used for information integration, discovery, collaboration and knowledge management, compliance, and records management.

Enterprise search must thus go beyond traditional document-based information retrieval. Enterprise information is managed by myriad secure applications and systems that impose an extra layer of abstraction from the original data source. This layer determines protocols, access methods, data control and display, as well as business logic that ultimately shapes output to end-users. To understand the concept of enterprise search requires a fresh look at documents and data as they exist in today's organizations, the mechanisms for discovering content, and services that enable users to perform better, more relevant searches.

Information Restructuring: Revisiting the "Document" Model

To fully understand current enterprise search mechanisms, it is critical to realize one thing: there is constant information restructuring, that is, transitioning back-and-forth between structured and unstructured data, in modern information systems. Normal electronic documents, or what are referred to as static documents, are the digital equivalent of analog paper documents; they are more or less permanent--or at least stable-snapshots of information saved in document repositories and handled, for example, through word processing and electronic mail. In addition to those static documents, there are now an increasing number of transient or virtual documents that are generated on the fly that constitute temporary renderings of live data.

Most enterprise applications are now web-enabled, allowing users to interface with them via a regular web browser. By accessing these applications on the Internet, users are able to perform transactions and retrieve information in the form of transient documents, which are dynamically generated HTML "screens" or PDF "reports." Transient documents are often composites of text, images, and structured fields assembled from various back-end systems such as relational databases and file systems. These virtual documents are usually mapped to rows within database tables, views, or query results. The declarative and semi-structured nature of extensible markup language (XML) makes it the preferred vehicle for transmitting, transforming, and rendering data into...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT