Summer 1998
|
With all the elaborate terms being thrown at you such as Business Intelligence Tools, EIS, DSS, data cubes, star schema and a host of others, I would like to explain what Data Mining is without the complex jargon that is usually associated with processes of this type. Data mining is a process that can be very complex, but in most cases, can bring a great benefit to business in a simple to use form. This article is directed at business users that incorporate MultiValue based systems.
The system that performs the day-to-day functions of a business such as order entry, accounting and inventory control is called the "operational system" and the information it creates is called "operational data". Data mining is used to bring awareness of the substance between different data elements present in a business' operational data. Data mining is a "discovery" process, in that you uncover information you would typically not find without data mining. Unlike using pre-established computer reports to get information, data mining has no fixed presentation of data and allows the user to create inquiries based on what information he or she requires at the time.
The following is an example of data mining in a sales application. The user may start by bringing up a total of the overall sales of the company comparing this year's sales with last year's sales over the same time span (figure 1 below).
Subject Level | Subject Description | Jan-Apr 1998 Sales | Jan-Apr 1997 Sales | Trend | Difference |
Company | ABC Company | 8,640,000 | 8,287,000 | 4.3% | 353,000 |
Once this information is presented, the user decides that a breakdown of the company's individual branch totals is desired. This initiates a "drill-down" function. The ability to perform drill-down functions is one of the most important aspects of a data mining system. Figure 2 (below) shows the information after drilling down to the branch level.
Subject Level | Subject Description | Jan-Apr 1998 Sales | Jan-Apr 1997 Sales | Trend | Difference |
Company | ABC Company | 8,640,000 | 8,287,000 | 4.3% | 353,000 |
Branch |
Seattle San Francisco Portland |
5,000,000 2,000,000 1,640,000 |
4,800,000 1,500,000 1,987,000 |
4.2% 33.3% -17.5% |
200,000 500,000 -347,000 |
This is how the "discovery" process works. When the user first looked at the overall company totals, it showed that the business's sales have increased.
With the uncovering of the decline in sales in the Portland branch, showing the breakdown of the company totals by branch paints a very different picture of the sales situation.
Now the user can continue with data research to get a better focus on where the sales declines have been occurring. In Figure 3 (below), the user "explores" Portland's total by department.
Subject Level | Subject Description | Jan-Apr 1998 Sales | Jan-Apr 1997 Sales | Trend | Difference |
Company | ABC Company | 8,640,000 | 8,287,000 | 4.3% | 353,000 |
Branch | Portland | 1,640,000 | 1,987,000 | -17.5% | -347,000 |
Department |
Home Electronics Women's Shoes Athletic Equipment |
800,000 720,000 120,000 |
957,000 820,000 210,000 |
-16.4% -12.2% -42.9% |
-157,000 -100,000 -90,000 |
Seeing that home electronics has the greatest dollar decline, the user then explores, (performs a drill-down function), on that department and fine tunes the information down to the product group level, shown in Figure 4 (below).
Subject Level | Subject Description | Jan-Apr 1998 Sales | Jan-Apr 1997 Sales | Trend | Difference |
Company | ABC Company | 8,640,000 | 8,287,000 | 4.3% | 353,000 |
Branch | Portland | 1,640,000 | 1,987,000 | -17.5% | -347,000 |
Department | Home Electronics | 800,000 | 957,000 | -16.4% | -157,000 |
Product Group |
Computers Televisions Stereo Systems Camcorders Portable Phones |
260,000 210,000 150,000 100,000 80,000 |
350,000 250,000 165,000 110,000 82,000 |
-25.7% -16.0% -9.1% -9.1% -2.4% |
-90,000 -40,000 -15,000 -10,000 -2,000 |
This shows the power of a basic data mining system. Rather than believe that the company was doing well because of a 4.3% increase in sales, the user was able to quickly "discover" that a serious sales decline exists and where the core of that problem is.
There are four "columns" of numeric data in our example. With the flexibility allowed by a data mining system, the number of available columns could be several hundred. The types of columns that would be available for a sales data mining application could be sales year-to-date, sales month-to-date, sales each month for the last few years, standard deviation of sales, profit margins, average sales amounts by order and/or by sales order line, order count, etc.
Data mining systems also need a spreadsheet-like ability to enable the creation
of columns that are a result of formulas between data elements. In the above
examples, the Trend and Difference columns are not stored values and are calculated
as they are being displayed.
With drill-down requests being driven in an ad-hoc fashion, the database supplying the information to data mining must be very flexible. In our example, the user happened to ask for information in the order: branch, then department, and then product line, but another inquiry could just as well be product line, then branch, and then product.
Because of this "slicing and dicing" style of inquiry, the database used by the data mining system must be extensively cross-referenced to link all the associated summary items together.
Another critical element in the database is that information to be accessed has been pre-summarized and would require little to none "on the fly" summarization. Each drill-down function in our example should take no more than a few seconds and preferably have a sub-second response time. Taking longer than a few seconds per request takes away from the purpose and usefulness of data mining.
The time element, summarization and extensive cross-referencing aspect of the database that data mining requires is what define many of the requirements of a data warehouse.
Easy to use - Data mining must be very easy to use and not require extensive training. It is important for the user to be able to focus his or her thought process on researching data quickly rather than trying to figure out how to run a program. The more difficult the running of the data mining system, the more likely it will not be used by the people that may need it the most.
Accessibility - The real value of data mining is giving this tool to the people that have a direct need for it and that can take direct action from what is learned by the data research. Give it to each salesman in your company to research the activity of his or her assigned customers. Give it to the inventory control person to quickly find dead stock in the inventory. Give it to accounting to help in establishing budgets. Empower employees with this powerful tool to improve their business efforts.
Fast Response Times - It is critical that the "slicing and dicing" of information can occur in a matter of seconds to encourage the powerful research capabilities of data mining. Managers and executives are much less apt to use a system to its potential when they have to wait longer than a few seconds between responses, especially with the interactive environment required with exploring data.
Up-to-Date Information - Using data that is several months or even weeks old reduces the power of data mining. The more up-to-date the information is, the more likely a plan can be implemented to improve a situation that has been presented by the data mining system. In our example, the success of possibly slowing the decline in sales in the discovered area would probably depend on discovering this information while the problem is occurring.
Lower Cost of Data Processing. With information supplied to users with their own data research, the need for custom report and hardcoded inquiry programming is less. Programmers can focus their attention on operational issues such as transaction entry and processing versus information output requests.
Reduction in Computer Generated Hardcopy Reports. Many computer reports are now generated to show a manager the totals in a given area of business, (i.e. the "bottom line" on a sales report). With data mining, that same manager would start his inquiry by showing the "bottom line" totals on a screen and then research more of the detail behind that total if he or she so desires. Printing of a report would become more of an on-demand action and would probably be printed from the data mining system.
Reduced Load on Operational System. Data mining uses a database that has been summarized either off-line or off-hours. With the use of data mining to get information that was previously achieved with traditionally burdensome sort, select and reporting programs, the overall load on the operational business system would be reduced during the business day.
The cost and complexity of implementation and software for a data mining/data warehousing environment for MultiValue based computers are quite varied. The MITS system has been implemented for less than $10,000 on several MultiValue systems. It is designed to store the metadata of the data warehouse and run data mining on the same computer as the operational system. MITS was designed as a simple to use system for the average MultiValue computer user. A version of MITS is now in place that includes the option of storing and processing the data warehouse on a separate Windows NT system attached to the operational business computer.
The most common way of setting up a data mining/data warehousing system is to transmit transactional data, such as sales orders, from an operational system to a separate Windows-based computer. The Windows system then performs the transformation of the operational data into the metadata of a data warehouse. This is the method used by Vmark with their DataStage transformation product. With DataStage handling the building of the data warehouse, Vmark recommends the third party product PowerPlay from Cognos for the data mining functions. PowerPlay is one of the most popular Windows-based data mining programs available today.
The data mining example used in this article is a simple, but very powerful depiction of what can be accomplished when business users are given a different way of looking at all the data they have been accumulating over the years.
As businesses realize that a data mining/data warehousing system is not overwhelmingly complex and is affordable, more and more of these businesses will be including this beneficial technology in their future data processing plans.
While implementing this new technology at many businesses, it has been quite
a positive experience to witness business users being "unleashed"
on a refreshingly new way to present computer information.
Pat Gilbrough
Management InformationTools, Inc
Pat Gilbrough is President of Management Information Tools, Inc. MITI specialises
in Data Warehousing and Data Mining for MultiValue-based systems and is the
distributor of the MITS Data Mining System. Pat has over 18 years experience
in the MultiValue industry as a designer/programmer, consultant and President
and founder of Progressive Systems, Inc. - a MultiValue developer and distributor
of systems for wholesale distributors. Pat can be contacted by e-mail at patg@mgmt-info-tools.com
or by telephone at +(206) 789 8313.
Management Information Tools, Inc
Last Updated: 5 November 1998