New Year 1998   

Avoiding the Mistakes

The keys to building a successful data warehouse

There is little doubt that data warehouses are making their way into the consciousness of both IS and non-IS management. After all, the potential benefits of a decision support system can be appealing, especially if the IS department is unable to supply requested product or sales information in a reasonable timeframe. In my previous article, I outlined the concepts and components necessary to build a data warehouse. In this issue, we discuss some of the factors and elements that need to be present in order to increase the chances for a successful implementation.

Make no mistake, the development of a data warehouse can be a long and complicated one. You will probably start by creating a small, specialized data mart. Typically, this data mart will focus on a single topic (usually sales information) and be used to give executives a more up-to-date view of the organization than the monthly reports they already get. As you go through the development of the pilot data mart, the first truth you will encounter is that the data warehouse process is never complete. Every piece of data you give the user will cause them to ask for two more. Also, you will discover discrepancies, inconsistencies and incompleteness in the corporate database that you were afraid to admit to in the past. For the rest of the article, we will consider some of the other issues, both technical and political, that can make or break your warehousing projects.

Championing the Cause

There is a reason for placing this particular factor first. Studies have shown that having an appropriate person driving the data warehouse project greatly increases the chances for success. Just to make sure we’re all clear on the concept, a champion is a well-positioned, influential executive who has the desire to see a decision support system succeed and the political skill and power to make it happen. Without such support, most data warehouse projects are destined to flounder on the shoals of irrelevance.

If you need more convincing, consider for a moment one of the more difficult aspects of a data warehouse project - performing a cost/benefit analysis. The money that will go into this project can be easily quantified. It is the benefits that are a little less tangible. After all, you are attempting to give managers tools to help them retrieve the information necessary to help them perform their jobs. This doesn't have a positive bottom line impact unless the resulting decisions improve. And that is a tough standard to be measured against. After all, you're really suggesting that their decisions are currently not the best, a premise that can cause the calmest manager to get defensive. With the presence of a strong supporter, you have a better chance of avoiding these problems by having a figure of authority set the terms and define the results of the project. This will not only help get the project approved in the first place, but can ensure that the affected managers will provide at least a minimum level of assistance.

Manage Expectations

This particular piece of wisdom applies to every significant project I have worked on. There is nothing like the feeling of sweating the details of a long, complicated application, delivering it to the users with pride and having them be disappointed by what it doesn't do. The solution is to not let the users expect the world when all you're giving them is a continent.

Data warehousing does not cure the common cold. No surprise there. Decision support systems do not generate good business decisions. Probably none of us are surprised by this. Implementing an ad hoc query tool for your users will not allow them to retrieve any information they want in any shape they want. This statement runs counter to some of the marketing literature your users will digest. And it will be the first of the battles you will face in managing their expectations of the completed system. Before you get too far into the project, you need to sit down with your users and outline, in detail, the kinds and depth of information that will be available.

Taking the management of expectations a step further, it is a good idea to get the users involved in the selection of the querying tool. After the IS staff has narrowed down the choices to two or three, set up a prototype of each using some of the existing data. Then let the users test it out. After all, they are the ones who will be using the tool on a daily basis. It only makes sense to give them a say in their destiny.

Don't Include the Kitchen Sink

There will be an urge to include most of the data from your operational database in your data warehouse. This feeling will be exacerbated by demands from your users once they get their hands on the data marts you have developed as a pilot project. After all, the taste they received from the pilot could be like a small sip of water to a thirsty man. Resist the temptation to give in to these demands without a proper plan in place. Otherwise, you will end up with a data warehouse system that resembles your existing legacy application.

Additionally, it is not a good idea to get the users to provide the information they are looking for. I have seen data warehousing project leaders give their users a list of the files and fields in the operational database and ask them to indicate which fields they would like to see included in the warehouse. The result is inevitably an overwhelming list of fields most of which have dubious value.

Instead of presenting the users with a list, might I suggest a sit-down discussion of the data they require to assist with their decision-making process. The result of this interview will be a definition of the necessary data values put into words that the users understand. It will then be up to the IS staff to determine the mapping between the required fields and the current data structure.

Don't Need It - Already Have It

"We don't need a separate data warehouse. Our users can use English/Retrieve/Inform to get the information they need." Does this sound like someone in your organization? There are a couple of fallacies with this attitude. First, it requires the users to have an in-depth knowledge of the structures in which the data is stored. As well, they need to understand the query language and how it is used to retrieve the desired data. While it may be possible to teach users how to use LIST, SELECT and SORT, it will not be the effortless journey to information nirvana that the users were hoping for. And none of the LIST/SELECT/SORT training will prepare them to navigate through the multivalue minefield. There are some tools available designed to help users create ad hoc queries against a multivalue database. But none of them have the graphical interface that non-technical users need to fully utilize the information stored in the database.

The second misconception is that operational and warehouse data can successfully be placed in the same database. I know that it seems redundant to store what is essentially the same data in multiple locations, but a single data structure cannot efficiently support both types of access. For example, consider an ORDER file containing a multivalued association of order line information. This structure is quite efficient for processing transactions. But, if a user needs to know the sales dollar volume by part number, the multivalued structure's weakness shows through. The query used to retrieve this information includes an exploded sort - one of the slowest types of processing in our environment. And if you want the sales dollars report to place the high volume items first, then you have another problem.

In a database optimized for analytical, rather than transactional, processing, a technique of summarization, aggregation and denormalization is used to improve performance. Or to put it another way, the queries used on a transactional database are designed to be run hundreds of times. The analytical database expects a query to be run one time only. And we need to organize the data accordingly.

Factor Percent
Upper management support
User involvement
Having a business need
User support
Use of a methodology, modelling
Defined, understandable goals
Good, clean data
Managing expectations
38
23
20
14
13
11
10
10

Figure 1 - Critical Success Factors in Data Warehousing Projects

A Field by Any Other Name...

As someone who has been involved with the nuts and bolts of a data warehousing project, I can say that one of the most trying elements is coming up with the business definition for each of the attributes stored in the database. Consider for a moment what your definition of 'sales' would be. If you have an accounting background, it is billed revenue less any returns. To the shipping department, it is the items that need to be delivered to the customer. To those of you with a customer service bent, it is the amount of purchase commitments that have been received.

You might guess that resolving these different points of view could be challenging. You would be correct. Getting executives from different departments to agree on a single definition for common business terms is like pulling teeth. Yet it is crucial factor in determining the usefulness of the data warehouse. If a user generates the sales for a specific part number and the result uses a different definition than they expect, the blame will be placed squarely on the data warehouse. And as the confidence in the accuracy of the warehouse decreases, so does the need to have one.

How Does Your Warehouse Grow

Once you have designed your analytical database, populated it with the necessary data from the operational database and implemented the latest and greatest in data access tools, you sit back and heave a sigh of relief. After all, your work is complete. Wrong. Studies have shown that the growth in the number of users accessing a data warehouse is exponential over the first 12 months after implementation. As part of your planning, you need to account for a similar level of growth.

There are generally three ways that this growth manifests itself. First, the system on which the data warehouse resides needs to be of sufficient power to quickly handle the queries that will be generated. Second, the network over which access to the data warehouse is gained has the potential to become a bottleneck. Finally, there is the data itself. My experience is that the number of fields in the data warehouse is always increasing. As users see the kind of data they can retrieve and the types of manipulations they can perform, they will constantly come up with new data requirements. You could (and probably will) make a career out of satisfying the needs of the users in this area.

Of course, this is not a complete list. There are many more factors which contribute to the success (or failure) of your data warehouse project. Figure 1 above (taken from Data Warehousing: A Framework and Survey of Current Practices by Hugh J. Watson and Barbara J. Haley, published in the January 1997 issue of the Journal of Data Warehousing) shows the critical success factors for a data warehouse project. By being aware of these factors, you can help ensure a long and profitable life for your decision support system.

Sally Hober
TAG Consulting Inc

Sally Hober is a senior consultant at TAG Consulting, a consulting company that specializes in helping businesses utilize computer systems to gain a competitive advantage. Sally has 7 years of experience in the MultiValue database (i.e. UniVerse, UniData and Pick) market. She specializes in the design and development of Data Warehouses, as well as technical training in many areas, including Data Warehousing, Managing Software Projects, UniVerse Objects and ODBC.

Sally was formerly a Technical resource Manager and Consulting Project Manager at BLACKSMITH Corporation. While at BLACKSMITH, Sally was involved in managing large-scale software development projects and co-ordinating a staff of 11 analysts

 


Last Updated: 09 November 1998

Home | | What's New | | News | | Articles | | Workspace | | Pick of the Web | | InfoCentre | | Parting Shots | | About Us | | Subscribe |