The application of Data Mining (DM) technologies has shown an explosive growth in an increasing number of different areas of business, government and science. Two of the most important business areas are finance, in particular in banks and insurance companies, and e-business, such as web portals, e-commerce and ad management services. In spite of the close relationship between research and practice in Data Mining, it is not easy to find information on some of the most important issues involved in real world application of DM technology, from business and data understanding to evaluation and deployment. Papers often describe research that was developed without taking into account constraints imposed by the motivating application. When these issues are taken into account, they are frequently not discussed in detail because the paper must focus on the method. Therefore knowledge that could be useful for those who would like to apply the same approach on a related problem is not shared. The papers in this book address some of these issues. This book is of interest not only to Data Mining researchers and practitioners, but also to students who wish to have an idea of the practical issues involved in Data Mining.
PrefaceWe have been watching an explosive growth of application of Data Mining (DM) technologies in an increasing number of different areas of business, government and science. Two of the most important business areas are finance, in particular in banks and insurance companies, and e-business, such as web portals, e-commerce and ad management services.
In spite of the close relationship between research and practice in Data Mining, it is not easy to find information on some of the most important issues involved in real world application of DM technology, from business and data understanding to evaluation and deployment. Papers often describe research that was developed without taking into account constraints imposed by the motivating application. When these issues are taken into account, they are frequently not discussed in detail because the paper must focus on the method. Therefore, knowledge that could be useful for those who would like to apply the same approach on a related problem is not shared.
In 2007, we organized a workshop with the goal of attracting contributions that address some of these issues. The Data Mining for Business workshop was held together with the 11th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), in Nanjing, China.
This book contains extended versions of a selection of papers from that workshop. Due to the importance of the two application areas, we have selected papers that are mostly related to finance and e-business. The chapters of this book cover the whole range of issues involved in the development of DM projects, including the ones mentioned earlier, which often are not described. Some of these papers describe applications, including interesting knowledge on how domain-specific knowledge was incorporated in the development of the DM solution and issues involved in the integration of this solution in the business process. Other papers illustrate how the fast development of IT, such as blogs or RSS feeds, opens many interesting opportunities for Data Mining and propose solutions to address them.
These papers are complemented with others that describe applications in other important and related areas, such as intrusion detection, economic analysis and business process mining. The successful development of DM applications depends on methodologies that facilitate the integration of domain-specific knowledge and business goals into the more technical tasks. This issue is also addressed in this book.
This book clearly shows that Data Mining projects must not be regarded as independent efforts but they should rather be integrated into broader projects that are aligned with the company's goals. In most cases, the output of DM projects is a solution that must be integrated into the organization's information system and, therefore, in its (decisionmaking) processes.
Additionally, the book stresses the need for DM researchers to keep up with the pace of development in IT technologies, identify potential applications and develop suitable solutions. We believe that the flow of new and interesting applications will continue for many years.
Another interesting observation that can be made from this book is the growing maturity of the field of Data Mining in China. In the last few years we have observed spectacular growth in the activity of Chinese researchers both abroad and in China. Some of the contributions in this volume show that this technology is increasingly used by people who do not have a DM background.
To conclude, this book presents a collection of papers that illustrates the importance of maintaining close contact between Data Mining researchers and practitioners. For researchers, it is useful to understand how the application context creates interesting challenges but, simultaneously, enforces constraints which must be taken into account in order for their work to have higher practical impact. For practitioners, it is not only important to be aware of the latest developments in DM technology, but it may also be worthwhile to keep a permanent dialogue with the research community in order to identify new opportunities for the application of existing technologies and also for the development of new technologies.
We believe that this book may be interesting not only for Data Mining researchers and practitioners, but also to students who wish to have an idea of the practical issues involved in Data Mining. We hope that our readers will find it useful.
Porto, Bradford, Hangzhou, Osaka and Nanjing – May 2008
Carlos Soares, Yonghong Peng, Jun Meng, Takashi Washio, Zhi-Hua Zhou