Presentations at Conferences and Other Locations

"Mopping up the flood of data with web services" by Gary Wiggins

ACS Central Regional Meeting, May 18, 2006. (Knowledge Management/Data Mining Symposium)

Abstract: The core of informatics is the relationships between the meaning of information and its representation as data. Our burgeoning ability to both generate and collect scientific information has created a number of challenges. The principal challenges are how to extract meaningful, sometimes latent, novel information and how to manage the information. The new field of Knowledge Discovery in Databases (KDD) seeks to provide answers to both of these problems. Data mining involves a strong machine-learning component to semi-automatically detect information that is useful. Unfortunately for the scientist, there is no set of standard tools at their disposal to conduct data mining. Furthermore, the sophisticated creation of algorithms, heuristics and their implementation is generally outside the expertise of scientists. Another distinguishing characteristic of scientific information is the much greater role played by "metadata" (data about data). The metadata problem has long vexed scientific information. With a fixed table approach, programs must be rewritten frequently to keep pace with changes in data representations. The huge increase in data volumes that stream from modern laboratory instruments has magnified the problems. Informatics schools train students to deal with scientific data handling problems, utilizing not only locally generated data, but the full spectrum of databases and resources available on the Web. Indiana University's approach to a Chemical Informatics and Cyberinfrastructure Collaboratory will be presented as an example of one solution to the data deluge problem.