WHAT IS DATA MINING?
The cost of storing and processing data has decreased dramatically in the recent past and, as a result, the amount of data stored in electronic form has grown at an explosive rate. A case in point: WalMart. The retail giant recently installed an NCR WorldMark 5100M “massively parallel processing server” and upgraded a second NCR 5100M (Chester, 1999). Together, they take Wal-Mart's data warehouse from 7.5 terabytes to more than 24 terabytes. The system collects and analyzes item information from approximately 2,900 stores to track buying trends department-by-department, shelf-by-shelf, item-by-item. It handles more than 30 applications and some 50,000 queries per week.
With the creation of large databases came the possibility of analyzing the data stored in them. The term data miningwas originally used to describe the process through which previously undiscovered patterns in data were identified. This definition has since been stretched beyond these limits to...
- Bradley, P., Fayyad, U., and Mangasarian, O. (1999). “Mathematical Programming for Data Mining: Formulations and Challenges,” INFORMS Jl. Computing, 11. Google Scholar
- Chester, T. (1999). “Sizes of some large databases,” Fallbrook, California (also available via http://sd.znet.com/~schester/facts/databases_sizes.html/). Google Scholar
- Data Mining News (1998). Vol. 1, No. 17. Google Scholar
- Data Mining News (1998). Vol. 1, No. 22. Google Scholar
- Data Mining News (1999). Vol. 2, No. 10. Google Scholar
- Hayes, H. (1999). “The New and Improved Tax Collector,” CNN Interactive, April 7. Google Scholar
- Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). “From Data Mining to Knowledge Discovery in Databases,” AI Magazine, Fall. Google Scholar
- Fayyad, U. and Bradley, P. (1998). “Tutorial on Data Mining and Knowledge Discovery in Databases, II,” at INFORMS Fall National Meeting, Seattle, Washington. Google Scholar
- Garfinkel, S. (1998). “Threats to privacy: new science called datamining could benefit consumers–or harass them,” The Boston Globe, July 16. Google Scholar
- Grossman, R., Kasif, S., Moore, R., Rocke, D., and Ullman, J. (1999). “Data Mining Research: Opportunities and Challenges,” a report of three Workshops on Mining Large, Massive and Distributed Data, National Science Foundation, Washington, DC.Google Scholar
- Hoffman, T. (1998). “Banks Turn to IT to Reclaim Most Profitable Customers,” Computer World, December 7. Google Scholar
- Hoffman, T. (1999). “Insurers Mine Age-Appropriate Offering,” Computer World, April 19. Google Scholar
- IBM (1999). “Data Mining — An IBM Overview,” Boulder, Colorado (also available via http://direct.boulder.ibm.com/bi/info/overview/htm). Google Scholar
- Mangasarian, O. (1965). “Linear and Nonlinear Separation of Patterns by Linear Programming,” Operations Research, 13, 444–452.Google Scholar
- Thyfault, M. (1999). “Data Mining for Audio–Dragon Systems Uses Speech Recognition to Cull Data from Call-Center Tapes,” Information Week, February 8. Google Scholar
- Two Crows Corp. (1999). “Introduction to Data Mining and Knowledge Discovery” (2nd ed.), Potomac, Maryland (also available via http://direct.boulder.ibm.com/bi/info/overview.htm).Google Scholar