UBDM 2006
Second Workshop on
Utility-Based Data Mining
August 20, 2006 in Philadelphia, Pennsylvania

Held in conjunction with
The 12th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining (KDD 2006)

New: The workshop proceedings are now available (2.3MB).
You can also view individual papers (see the program below).

Consider submitting an article to DMKD's special issue on Utility-Based Data Mining

Description Topics Program Submissions Important Dates Organizers Program Committee

Workshop Description

Motivation
Early work in data mining did not address the complex circumstances in which models are built and applied. It was assumed that a fixed amount of data was available and only simple objectives such as predictive accuracy were considered. Over time, it became clear that these assumptions were unrealistic and that various utility factors related to acquiring data, building models, and applying models had to be considered. The machine learning and data mining communities responded with research on active learning, which focused on methods for cost-effective acquisition of information for the training data, and research on cost-sensitive learning, which considered the costs and benefits associated with using the learned knowledge and how these costs and benefits should be factored into the data mining process.

All the different stages of the data mining process impact the ultimate utility of the knowledge derived from data mining. The utility of acquiring data, extracting a model, and applying the acquired knowledge must be considered. For example, in the data acquisition phase the costs of obtaining informative and accurate data may be considered to help identify the most cost-effective information. Similarly, utility considerations also impact the assessment of the decisions made based on the learned knowledge. Simple assessment measures like predictive accuracy have given way to economic utility measures, such as profitability and return on investment.

Goals
As was the case for the first workshop, this workshop will bring together researchers who currently contribute to different utility aspects of the data mining process. Our goal is to promote an examination of all of the utility factors that affect data mining and their interaction, in order to continue to encourage the field to go beyond what has been accomplished individually in the areas of active learning and cost-sensitive learning. In addition, this workshop will continue to explore the types of utility factors and new methods for incorporating utility considerations in both predictive and descriptive data mining tasks. We welcome recent work on Value of Information analysis over graphical models.

This workshop will focus on real world experiences as well as existing and new research methods and results. Attendance is not limited to the paper authors and we strongly encourage interested researchers from related areas to attend the workshop. This will be a full-day workshop and will include invited talks, paper presentations, short position statements and two panel discussions.

Workshop Topics
  • Types of utility factors in data mining
    • What utility factors arise in the context of data mining?
    • What assessment metrics are used in response to these utility factors?
    • Can the use of utility factors help address previously studied problems in data mining, such as the problems of learning rare classes and learning from skewed distributions?
  • Algorithms
    • Approaches for information acquisition, data preprocessing, mining and knowledge application that incorporate relevant utility factors. This includes work in active learning/sampling and cost-sensitive learning.
    • Approaches for adapting predictive and descriptive data mining tasks such as predictive modeling, clustering and link analysis to incorporate utility factors.
  • Interaction of utility factors throughout the data mining process
    • Work towards a comprehensive framework for incorporating utility factors to benefit the entire data mining process. This includes techniques which take into account dependencies between different phases of the process to maximize the utility of more than a single phase. For example, methods for acquiring training data which take into account the costs of errors in addition to the cost of data; or methods for the extraction of predictive patterns which take into account the cost of test features necessary at prediction time.
  • Applications
    • What existing data mining applications have taken utility factors into account?
    • How are the relevant utility factors for a given application determined and measured?
    • What methods do these applications use to take utility factors into consideration?
    • How do utility factors and the methods for dealing with them vary according to the specific problem addressed (e.g., by industry)?

Program
8:30 - 8:40 Opening Remarks and Welcome
8:40 - 9:20    Invited Talk: Bugdeted Learning of Probabilistic Classifiers (Talk Slides)
Russell Greiner
9:20 - 9:40 Maximizing Classifier Utility when Training Data is Costly
Gary Weiss and Ye Tian
9:40 - 10:00 Prediction Games in Infinitely Rich Worlds
Omid Madani
10:00 - 10:30    Break
10:30 - 10:50 Efficient Mining of Temporal High Utility Itemsets from Data Streams
Vincent Tseng, Chun-Ron Chu and Tyne Liang
10:50 - 11:10 A Unified Framework for Utility-Based Measures for Mining Itemsets
Hong Yao, Howard Hamilton and Liqiang Geng
11:10 - 11:30 Assessing the Interestingness of Discovered Knowledge Using a Principled Objective Approach
Robert Hilderman
11:30 - 11:50 Utility-Based Anonymization for Privacy Preservation with Less Information Loss
Jian Xu, Wei Wang, Jian Pei, Xiaoyuan Wang, Baile Shi and Ada Wai-Chee Fu
11:50 - 12:00    Discussion
12:00 - 1:30 Lunch
1:30 - 2:10 Invited Talk: Reinforcement Learning and Utility-Based Decisions (Talk Slides)
Michael Littman
2:10 - 2:30 Beyond Classification and Ranking: Constrained Optimization of the ROI
Lian Yan and Patrick Baldasare
2:30 - 2:50 Pricing Based Framework for Benefit Scoring
Nitesh Chawla and Xiangning Li
2:50 - 3:10 Maximum Profit Mining and Its Application in Software Development
Charles Ling, Victor Sheng, Tilmann Bruckhaus and Nazim Madhavji
3:10 - 3:30    Discussion
3:30 - 4:00 Break
4:00 - 4:40 Panel Discussion: Research Directions and New Applications of Utility-Based Data Mining
Russell Greiner, Michael Littman, Dragos Margineantu, Lian Yan, Gerald Fahner
4:40 - 4:50 Concluding Remarks

Submission Guidelines
All submissions should be submitted electronically, by the submission deadline of June 26, 2006 (extended!), to the workshop contact, Bianca Zadrozny. Please send it to the following email address: bianca@ic.uff.br. All submissions should be made in PDF or PostScript format. Submissions should be a maximum of 10 pages and should use the ACM SIG Proceedings Templates. Note that in addition to technical papers, we encourage the submission of position papers (which generally should be at most 6 pages).

Submitted papers will be reviewed by members of the program committee and accepted papers will be presented at the workshop and published in the workshop proceedings and in the ACM digital library. Authors will be notified of the acceptance or rejection of their paper by July 5, 2006. Camera-ready version of the papers are due July 14, 2006.

We are also guest editing a special issue of the Data Mining and Knowledge Discovery Journal on Utility-Based Data Mining. Authors are encouraged to submit extended versions of the workshop papers to the special issue on UBDM.

Please do not hesitate to email the workshop contact if you have any questions.

Important Dates

   Note: These are the final dates.
June 26, 2006 (extended!) Deadline for electronic submission of full papers
July 5, 2006 Notification of accepted papers
July 9, 2006 Copyright form for ACM Digital Library
July 14, 2006 Camera Ready Copies
August 20, 2006 UBDM Workshop

Workshop Co-Chairs

   Note: for inquiries please send email to bianca@ic.uff.br
Bianca Zadrozny Federal Fluminense University, Brazil
Maytal Saar-Tsechansky   University of Texas at Austin
Gary M. Weiss Fordham University, Bronx, New York

Program Committee

Naoki Abe IBM Research
Alina Beygelzimer IBM Research
Nitesh Chawla University of Notre Dame
Ian Davidson State University of New York at Albany
Chris Drummond National Research Council (Ottawa)
Wei Fan IBM Research
Tom Fawcett Stanford Computational Learning Laboratory
Howard Hamilton University of Regina
Robert Hilderman University of Regina
Rob Holte University of Alberta
Aleksander Kolcz AOL Inc.
Charles Ling University of Western Ontario
Dragos Margineantu   Boeing Company
Prem Melville IBM Research
Ion Muslea Language Weaver, Inc.
Claudia Perlich IBM Research
Xingquan Zhu University of Vermont