itmWEB: Putting the Data Warehouse on the Intranet


..information technology management..

white paper


Putting the Data Warehouse on the Intranet

by Richard Tanler

Corporations recognize that information placed in the hands of decision makers is a powerful tool. To meet the decision maker's nearly insatiable appetite for information, data is being extracted from operational systems and placed in data warehouses. The data warehouse contains historical data organized by key business dimensions. For example, the data warehouse for a retailer contains daily product sales for each store. The data warehouse for a bank contains customer information for each bank service. Each warehouse summarizes individual transactions into time-series data for monitoring and analyzing performance.


Delivery of data warehouse information to decision makers throughout the enterprise and around the world has been an expensive challenge. The World Wide Web offers a solution. Private corporate Intranet sites are the fastest growing segment of the Web. The opportunity is to introduce a new level of collaborative information sharing among decision makers. Intranets currently manage unstructured content - text, image and audio data types - as "static" HTML documents. With the right tools and the right architecture, the data warehouse can be made accessible over the enterprise Intranet, forming the basis for a comprehensive enterprise information infrastructure. There are three important advantages of such an infrastructure:

  • Intranet Economics
  • Information Integration
  • User Collaboration

Intranet Economics
The cost of client/server computing is high when communications, support and other hidden costs are considered. In fact, some studies suggest that client/server computing is more costly than mainframe computing. Certainly the personal computer has become bloated with processing power, memory, software and user-managed files of considerable proportions. The result is a "fat" client architecture. The Intranet will change the economics of supporting a large population of knowledge workers. The Intranet is a "thin" client architecture. Not only does the Intranet reduce communications costs, but the personal computer may very well be replaced with a low-cost Intranet device. It has been projected that a $500 thin client machine may be available in 1996.


The thin client model requires server distribution of applications software. This is where Sun's Java plays a role. Java allows software to be served to the Intranet browser in code fragments or applets. Application software no longer needs to be resident on every client. Rather, application software is acquired only when needed, for as long as it is needed, for a specific application. Application software licensing will shift from per seat pricing (licensed for each personal computer) to a server-centric licensing model. The result is likely to be lower application software licensing costs. Thus, the economics of the Intranet are lower communications costs, less expensive thin client hardware and reduced application software licensing costs.


Information Integration
One of the most valuable assets of the enterprise is the operational data used in managing day-to-day business activities. This numeric data provides frequent measures of performance. By developing a data warehouse, corporations are organizing the data in a way that makes the data useful to decision makers. When the data warehouse is put on the Intranet, users can toggle between structured data analysis (producing reports in columns and rows) and unstructured browsing. The marketing manager at a retailer can pull up an advertising image and a report on sales of products featured in the ad on the screen.


User Collaboration
The Intranet is as much about communicating at an interactive level as it is about making structured and unstructured content easily accessible. There are few that would disagree with the view that decision making improves with timely, accurate and complete information. The Intranet influences how the ideas and experience of a workgroup are exchanged as part of knowledge sharing.

 

Types of Information on the Intranet

 Types of Information on the Intranet

 

Today, most users can communicate via a corporate e-mail system. While an e-mail system allows text files to be exchanged, it does not facilitate true collaboration. LotusNotes is one step closer to a collaborative method of exchanging valuable information, but the focus is still textual file sharing. True collaboration requires interactive sharing of information in such a way that the recipient can continue an analysis or branch off in an entirely new direction without assistance. For example, if I receive a report from another Intranet user, I should immediately be able to drill down or drill up on any report dimension, pivot and rotate the results, add additional calculations as part of my analysis and then pass my work to others in the organization - this requires dynamic report creation based on data stored in the warehouse.


True collaboration for business decision making requires a higher level of interactive analysis and knowledge sharing than exists today in most text-oriented groupware products.


Requirements For Putting The Data Warehouse On The Intranet
Data warehouses employ relational database management systems that use SQL to retrieve rows and columns of numeric data, while unstructured content is managed as HTML documents. The challenge in putting the data warehouse on the Intranet is in properly enabling SQL data warehouse access from HTML browsers. Four application software services must be provided:

  • Analytic Layer
  • File Management
  • Security
  • Agents

Analytic Layer
Putting structured data content on the Intranet requires a server-resident analytic layer to generate SQL "on the fly," perform computations and format reports based on user requests. In essence, a specialized structured content Web server is required to support data warehouse access from an HTML browser client initiated request.


The analytic layer shares some capabilities with spreadsheet software. The power of a spreadsheet is derived from a user's ability to author custom calculations based on facts stored in cells in the spreadsheet. For example, facts (numeric data) are stored in two dimensions - letters A, B, C, D... and numbers 1, 2, 3, 4... Combining the two dimensions, A1, provides a unique address. This unique address can be used to create a formula for a required calculation, A1-B1. Often the number of calculated rows and columns in a spreadsheet application far outnumber the number of stored facts.


The spreadsheet provides two additional powerful capabilities. First, users can easily replicate formulas for calculations down a column or across a row. Second, the calculation logic is automatically updated if new rows or columns of data are inserted into the spreadsheet.


The data warehouse has multiple dimensions - product, market, customer, outlet, vendor, period, etc. - as opposed to the spreadsheet. The combination of multiple data warehouse dimensions still provides a unique address. The analytic layer on the structured content Web server allows users to apply calculations, based on database dimensions, to create more useful reports. Furthermore, calculations once authored can be shared with other users much like calculation formulas are replicated within a spreadsheet. And, the calculation logic is maintained as the data warehouse is updated each day, week or month.


Like the spreadsheet, reports that users request from a data warehouse often contain more calculated rows and columns than raw data. Without a robust analytic layer in front of the data warehouse, the user is limited to a simple listing of stored data elements. The analysis layer is key to addressing business questions that users must answer.


File Management
Collaboration requires interactive analysis and knowledge sharing. A report requested by one user is valuable when it is shared with other users to gain their insight and ideas. The recipients should be able to continue the analysis initiated by the original author. In this way, the analysis process becomes an interactive exchange. Many users can pursue different analysis paths from a common starting point. To meet the challenge of providing interactive analysis of a data warehouse, users must have access and be able to change their copy of the logic used to create the report. A sophisticated server-based file management system is required to support user collaboration at this level.


Security
The liberal sharing of information and collaboration among users on the Intranet immediately raises data security issues. The data warehouse should be viewed as one of the enterprise's most valuable assets. Data must be secured, but if too tightly controlled, the value of the warehouse will never be fully realized.


The security issue is indeed complex. To illustrate, a sales vice president has authorization to view financial data at a U.S., region and territory level. At the territory level, the financial data would include all salary data for sales representatives. If the sales vice president creates a report and decides to share it, the regional managers should have access to only territory information for their region, and be blocked from accessing territory information for other regions. In other words, if a user is not authorized to receive and access a report, then that user must not be able to view the report or drill into areas where the user does not have authorization.


Again, there is a subtle, but important, capability that is being described. Reports that are created from data stored in the warehouse should not simply be shared as text files. All reports should include the underlying logic, giving the recipient the ability to immediately analyze and modify the report as well as the logic and assumptions supporting the analyses. For effective collaboration, reports must be shared throughout the workgroup and enterprise. If the recipients are not authorized, the access to report logic should be denied.


Encryption of data can provide a higher level of security than is generally available for business applications. For example, utilizing the Netscape Secure Socket Layer (SSL) and a Commerce Server, data passing between the client and server can be encrypted. This enables business users to run important applications over unsecured communications lines without worrying about an intruder tapping into the network and viewing the transmitted information.


Agents
One of the common complaints about e-mail and even phone mail is that the mail box fills up faster than a user has time to isolate and address the really important issues. Agents are intended to work on behalf of users to isolate important information sought by a user. An agent can be triggered by some pre-defined event or at a specified time interval. The agent sends an alert to notify specific users on a "need to know basis." Agents must have the ability to run continually as background processes on an Intranet logic server because each user is almost always disconnected from the Web server. This provides a means of automating the routine analysis process. And, when the user signs on to the network, the agents must be smart enough to notify users of conditions that occurred during the disconnect. Because data warehouses tend to grow exponentially, it is critical that agents proactively monitor and manage activities, alerting decision makers only when specific conditions exist. It is unrealistic to believe decision makers would be productive by aimlessly data surfing through potentially hundreds of gigabytes of data looking for valuable insight. The decision maker should be free to concentrate on the immediate and critical issues while the system ensures that developing conditions will not go undetected.


An example of the use of triggers, agents and alerts is a forecast accuracy monitoring application. When each week's actual sales are updated to the data warehouse, the system automatically calculates the mean absolute percent error between forecasted and actual results over the previous six weeks for every product. The mean absolute percent error is a simple way of representing forecast error over time. An alert is sent to each marketing manager responsible for the product forecast if a threshold for the mean absolute percent error calculation is exceeded. The marketing manager can then take appropriate actions to avert out-of-stocks or excessive inventory build-up.

Structured/Unstructured Content Server

Structured/Unstructured Content Server

Application Architecture
The Common Gateway Interface (CGI) facility of Web server software provides a method to execute server-resident software. Building secure applications for the Intranet requires a well thought out security strategy as well as the appropriate application architecture. Most Web applications provide all users with the same access permissions. The information available is either of low or no level of confidentiality. Examples include human resource forms for insurance, vacation requests, and company phone lists as well as market information.


Business users require a system that maps them to their server account by verifying user names and passwords. When server applications are run, they will have access to their files secured by user, group and world permission levels. The same issue exists with database security. Users must be mapped to the appropriate entity in the relational database in order to control the data that a user can access. And, because the number of users may be large, the administration of the security system should be centralized at the server and minimized to the extent possible.


A second issue with the CGI interface is that it does not offer a continuous connection to the database. As a result, it is impossible to support an application requiring multiple interactive queries - a data warehousing requirement. One approach to solving this problem is to employ a message-based protocol between the client browser and the server-resident analytic layer using the CGI. By mapping a user to a server account, a continuous connection is maintained between the logic layer and database during iterative queries. This facilitates the execution of efficient SQL query strategies and computational routines to meet the user requirements for structured content analysis.


Conclusion
The personal computer era transformed a generation of users into electronic "knowledge seekers". As knowledge seekers, users sought to load their personal computers with as much data and software as possible. Users wanted to be self-sufficient analysts, in fact they had to be self-sufficient.


The network computer era allows users to evolve into "knowledge sharers" and emphasizes the powerful advantage of collaborative problem resolution. Knowledge sharing requires the free flow of all types of information among users, not just text file transfer, but an interactive data analysis capability that encourages the exchange of experience and ideas.

The Intranet should be the basis for rethinking the enterprise information infrastructure.

Copyright © 1997 Richard Tanler. All Rights Reserved.

Used by Permission.

Link to Info Advantage


Return to Spotlight Archive





The itmWEB Site™, Copyright © 2006, itmWEB Media Corporation,
All Rights Reserved -
webmaster@itmweb.com