Summer 1998   

An Introduction to Data Mining

Pat Gilbrough, Management Information Tools, Inc

With all the elaborate terms being thrown at you such as Business Intelligence Tools, EIS, DSS, data cubes, star schema and a host of others, I would like to explain what Data Mining is without the complex jargon that is usually associated with processes of this type. Data mining is a process that can be very complex, but in most cases, can bring a great benefit to business in a simple to use form. This article is directed at business users that incorporate MultiValue based systems.

 

What is Data Mining?

The system that performs the day-to-day functions of a business such as order entry, accounting and inventory control is called the "operational system" and the information it creates is called "operational data". Data mining is used to bring awareness of the substance between different data elements present in a business' operational data. Data mining is a "discovery" process, in that you uncover information you would typically not find without data mining. Unlike using pre-established computer reports to get information, data mining has no fixed presentation of data and allows the user to create inquiries based on what information he or she requires at the time.

 

"Discovery" Through Data Mining

The following is an example of data mining in a sales application. The user may start by bringing up a total of the overall sales of the company comparing this year's sales with last year's sales over the same time span (figure 1 below).

 

Subject Level Subject Description Jan-Apr 1998 Sales Jan-Apr 1997 Sales Trend Difference
Company ABC Company 8,640,000 8,287,000 4.3% 353,000

Figure 1 - Initial data view

Once this information is presented, the user decides that a breakdown of the company's individual branch totals is desired. This initiates a "drill-down" function. The ability to perform drill-down functions is one of the most important aspects of a data mining system. Figure 2 (below) shows the information after drilling down to the branch level.

Subject Level Subject Description Jan-Apr 1998 Sales Jan-Apr 1997 Sales Trend Difference
Company ABC Company 8,640,000 8,287,000 4.3% 353,000
Branch
 
 
Seattle
San Francisco
Portland
5,000,000
2,000,000
1,640,000
4,800,000
1,500,000
1,987,000
4.2%
33.3%
-17.5%
200,000
500,000
-347,000

 

Figure 2 - Drill-down to Branch level

This is how the "discovery" process works. When the user first looked at the overall company totals, it showed that the business's sales have increased.

With the uncovering of the decline in sales in the Portland branch, showing the breakdown of the company totals by branch paints a very different picture of the sales situation.

Now the user can continue with data research to get a better focus on where the sales declines have been occurring. In Figure 3 (below), the user "explores" Portland's total by department.

Subject Level Subject Description Jan-Apr 1998 Sales Jan-Apr 1997 Sales Trend Difference
Company ABC Company 8,640,000 8,287,000 4.3% 353,000
Branch Portland 1,640,000 1,987,000 -17.5% -347,000
Department
 
 
Home Electronics
Women's Shoes
Athletic Equipment
800,000
720,000
120,000
957,000
820,000
210,000
-16.4%
-12.2%
-42.9%
-157,000
-100,000
-90,000

 

Figure 3 - Drill-down to Department level

Seeing that home electronics has the greatest dollar decline, the user then explores, (performs a drill-down function), on that department and fine tunes the information down to the product group level, shown in Figure 4 (below).

Subject Level Subject Description Jan-Apr 1998 Sales Jan-Apr 1997 Sales Trend Difference
Company ABC Company 8,640,000 8,287,000 4.3% 353,000
Branch Portland 1,640,000 1,987,000 -17.5% -347,000
Department Home Electronics 800,000 957,000 -16.4% -157,000
Product Group
 
 
 
 
Computers
Televisions
Stereo Systems
Camcorders
Portable Phones
260,000
210,000
150,000
100,000
80,000
350,000
250,000
165,000
110,000
82,000
-25.7%
-16.0%
-9.1%
-9.1%
-2.4%
-90,000
-40,000
-15,000
-10,000
-2,000

 

Figure 4 - Drill-down to Product Group level

 

This shows the power of a basic data mining system. Rather than believe that the company was doing well because of a 4.3% increase in sales, the user was able to quickly "discover" that a serious sales decline exists and where the core of that problem is.

There are four "columns" of numeric data in our example. With the flexibility allowed by a data mining system, the number of available columns could be several hundred. The types of columns that would be available for a sales data mining application could be sales year-to-date, sales month-to-date, sales each month for the last few years, standard deviation of sales, profit margins, average sales amounts by order and/or by sales order line, order count, etc.

Data mining systems also need a spreadsheet-like ability to enable the creation of columns that are a result of formulas between data elements. In the above examples, the Trend and Difference columns are not stored values and are calculated as they are being displayed.

Database Requirements for Data Mining

With drill-down requests being driven in an ad-hoc fashion, the database supplying the information to data mining must be very flexible. In our example, the user happened to ask for information in the order: branch, then department, and then product line, but another inquiry could just as well be product line, then branch, and then product.

Because of this "slicing and dicing" style of inquiry, the database used by the data mining system must be extensively cross-referenced to link all the associated summary items together.

Another critical element in the database is that information to be accessed has been pre-summarized and would require little to none "on the fly" summarization. Each drill-down function in our example should take no more than a few seconds and preferably have a sub-second response time. Taking longer than a few seconds per request takes away from the purpose and usefulness of data mining.

The time element, summarization and extensive cross-referencing aspect of the database that data mining requires is what define many of the requirements of a data warehouse.

Key Features of Data Mining

Easy to use - Data mining must be very easy to use and not require extensive training. It is important for the user to be able to focus his or her thought process on researching data quickly rather than trying to figure out how to run a program. The more difficult the running of the data mining system, the more likely it will not be used by the people that may need it the most.

Accessibility - The real value of data mining is giving this tool to the people that have a direct need for it and that can take direct action from what is learned by the data research. Give it to each salesman in your company to research the activity of his or her assigned customers. Give it to the inventory control person to quickly find dead stock in the inventory. Give it to accounting to help in establishing budgets. Empower employees with this powerful tool to improve their business efforts.

Fast Response Times - It is critical that the "slicing and dicing" of information can occur in a matter of seconds to encourage the powerful research capabilities of data mining. Managers and executives are much less apt to use a system to its potential when they have to wait longer than a few seconds between responses, especially with the interactive environment required with exploring data.

Up-to-Date Information - Using data that is several months or even weeks old reduces the power of data mining. The more up-to-date the information is, the more likely a plan can be implemented to improve a situation that has been presented by the data mining system. In our example, the success of possibly slowing the decline in sales in the discovered area would probably depend on discovering this information while the problem is occurring.

Other Benefits of a Data Mining System

Lower Cost of Data Processing. With information supplied to users with their own data research, the need for custom report and hardcoded inquiry programming is less. Programmers can focus their attention on operational issues such as transaction entry and processing versus information output requests.

Reduction in Computer Generated Hardcopy Reports. Many computer reports are now generated to show a manager the totals in a given area of business, (i.e. the "bottom line" on a sales report). With data mining, that same manager would start his inquiry by showing the "bottom line" totals on a screen and then research more of the detail behind that total if he or she so desires. Printing of a report would become more of an on-demand action and would probably be printed from the data mining system.

Reduced Load on Operational System. Data mining uses a database that has been summarized either off-line or off-hours. With the use of data mining to get information that was previously achieved with traditionally burdensome sort, select and reporting programs, the overall load on the operational business system would be reduced during the business day.

Data Mining Methods for MultiValue Systems

The cost and complexity of implementation and software for a data mining/data warehousing environment for MultiValue based computers are quite varied. The MITS system has been implemented for less than $10,000 on several MultiValue systems. It is designed to store the metadata of the data warehouse and run data mining on the same computer as the operational system. MITS was designed as a simple to use system for the average MultiValue computer user. A version of MITS is now in place that includes the option of storing and processing the data warehouse on a separate Windows NT system attached to the operational business computer.

The most common way of setting up a data mining/data warehousing system is to transmit transactional data, such as sales orders, from an operational system to a separate Windows-based computer. The Windows system then performs the transformation of the operational data into the metadata of a data warehouse. This is the method used by Vmark with their DataStage transformation product. With DataStage handling the building of the data warehouse, Vmark recommends the third party product PowerPlay from Cognos for the data mining functions. PowerPlay is one of the most popular Windows-based data mining programs available today.

Future of Data Mining Systems

The data mining example used in this article is a simple, but very powerful depiction of what can be accomplished when business users are given a different way of looking at all the data they have been accumulating over the years.

As businesses realize that a data mining/data warehousing system is not overwhelmingly complex and is affordable, more and more of these businesses will be including this beneficial technology in their future data processing plans.

While implementing this new technology at many businesses, it has been quite a positive experience to witness business users being "unleashed" on a refreshingly new way to present computer information.

Pat Gilbrough
Management InformationTools, Inc

Pat Gilbrough is President of Management Information Tools, Inc. MITI specialises in Data Warehousing and Data Mining for MultiValue-based systems and is the distributor of the MITS Data Mining System. Pat has over 18 years experience in the MultiValue industry as a designer/programmer, consultant and President and founder of Progressive Systems, Inc. - a MultiValue developer and distributor of systems for wholesale distributors. Pat can be contacted by e-mail at patg@mgmt-info-tools.com or by telephone at +(206) 789 8313.

 

Management Information Tools, Inc
2400 NW 80th Street
Suite 140
Seattle
WA 98117 USA
Tel: +(206) 789 8313
Fax: +(206) 782 8045
e-mail: patg@mgmt-info-tools.com

http://www.mgmt-info-tools.com
In the UK, contact the MITS authorised distributor: EdgTech Systems Limited
37 North Bar Street
Banbury
Oxfordshire
OX16 0TH
Tel: (01295) 279279
Fax: (01295) 279179
e-mail: sales@edgtech.co.uk

 


Last Updated: 5 November 1998

Home | | What's New | | News | | Articles | | Workspace | | Pick of the Web | | InfoCentre | | Parting Shots | | About Us | | Subscribe |