Chemical Informatics at Indiana University
Testimonial from a former student
The Chemical Informatics program at Indiana University provided with me with the skills and experience necessary to enter the workforce. With a background in chemical research before starting at Indiana, I was only prepared for a small segment of the industry. At Indiana, I had the opportunity to learn new tools to apply to chemical information, such as database design and language, programming, data curation, usability testing, and human-computer interaction. I was also able to collaborate with other students on projects in which we built tools to handle chemical data, and to work on my own project for my thesis "Combinatorial Study of a Purine-based Computational Library and the Effects of Cisplatin Binding". The experience at Indiana University was well worthwhile, and the faculty was highly motivated and keen to work with the students, providing an exciting and nurturing environment. I would do it all again if I did not have a California mortgage payment! --Leah Sandvoss, MS in Chemical Informatics, 2004
About the Program
Chemical informatics is the application of computer technology to chemistry in all of its manifestations. Much of the current use of cheminformatics techniques is in the drug industry, but chemical informatics is now being applied to problems across the full range of chemistry. Chemical informaticians often work with massive amounts of data. They construct information systems that help chemists make sense of the data, attempting to predict the properties of chemical substances from a sample of data, much as Mendeleev did many years ago when he accurately predicted the existence and properties of unknown elements in the periodic table. Thus, through the application of information technology, chemical informatics helps chemists organize and analyze known scientific data and extract new information from that data to assist in the development of novel compounds, materials, and processes.
The field of chemical informatics has become increasingly important in the last few years as drug companies and other life science research organizations turn their attention from genomics to proteomics and integrative informatics techniques. Combinatorial chemistry and high-throughput screening are being applied at an ever-increasing rate to chemical research, so the volume of data that chemists and life science researchers must deal with is overwhelming. The chemical informatics graduate program teaches you to construct information systems that utilize the chemical structure as an organizing principle for generating and evaluating chemically related data and information. Those trained in chemical informatics provide the tools to acquire, organize, and evaluate data, yielding new insights for further chemical research. Chemical informatics companies combine molecular simulation and data analysis techniques with high-quality graphical visualization to obtain stunning results. Chemical informatics thus helps chemists investigate new problems and organize and analyze scientific data to develop novel compounds, materials, and processes through the application of information technology.
People who work in chemical informatics may concentrate on computational chemistry and molecular modeling, chemical structure coding and searching, chemical data visualization, or a number of other areas of specialization. Indeed, the various computer graphics codes for chemical structures that let us both view and search chemical structures via the computer were developed by chemical informaticians. Chemical and pharmaceutical companies are in great need of people with such skills. Methods and tools used in cheminformatics include:
- Structure/Activity or Structure/Property Relationships (QSAR, QSPR)
- Genetic Algorithms
- Statistical Tools (e.g., recursive pairing)
- Data Analysis Tools
- Visualization Techniques
- Chemically-Aware Web Language (CML)
- Web Services
- Computational Chemistry / Molecular Modeling
Faculty:
- Mu-Hyun (Mookie) Baik
- Rajarshi Guha (Visiting Assistant Professor, 2007-2009)
- David J. Wild
Adjunct Faculty:
- Dimitris K. Agrafiotis (J&J Pharmaceutical Research and Development)
- John M. Barnard (Digital Chemistry)
- Robert D. Clark (Tripos)
- David E. Clemmer (Indiana University Department of Chemistry)
- Thompson N. Doman (Eli Lilly and Company)
- Kelsey Forsythe (IUPUI Department of Chemistry)
- Gary M. Hieftje (Indiana University Department of Chemistry)
- John C. Huffman (Indiana University Department of Chemistry)
- Peter J. Ortoleva (Indiana University Department of Chemistry)
- Gary Wiggins (Indiana University School of Informatics)
- Faming Zhang (Indiana University Department of Chemistry)
Graduate Courses for the MS in Chemical Informatics or PhD in Informatics (Chemical Informatics track)
- INFO I571 Chemical Information Technology
- INFO I572 Computational Chemistry and Molecular Modeling
- INFO I573 Programming Techniques for Science Informatics
- INFO I533 Seminar in Chemical Informatics
- INFO I553 Independent Study in Chemical Informatics
- INFO I647-I657 Seminar in Chemical Informatics I and II
- INFO I693 M.S. Thesis/Project in Chemical Informatics
Program Requirements
PhD in Informatics (Chemical Informatics track)
MS in Chemical Informatics (36 cr.)
Core Courses (12 cr.)
- INFO I501 Introduction to Informatics (3 cr.)
- INFO I502 Information Management (3 cr.)
- INFO I571 Chemical Information Technolgy (3 cr.)
- INFO I572 Computational Chemistry and Molecular Modeling (3 cr.)
Electives (18 cr.)
No more than 6 credits of electives may be in biochemistry, biology, or chemistry.
Capstone Project (6 cr.)
- INFO I693 Thesis/Project in Chemical Informatics (6 cr.)
Prerequisites
A sound knowledge of chemistry and excellent facility in computer science are required to be an effective practitioner in the cheminformatics field. Prospective students for graduate study in chemical informatics will be expected to have some training in both informatics or computer science and chemistry. If sufficient background in either area is lacking, some additional coursework may be necessary to ensure reasonable progress through the programs.
Students with a Bachelor's Degree in Computer Science, Informatics, or Other Information Fields
Upon entering the program, students with undergraduate degrees in any information-based field should either have or quickly acquire in the course of their graduate study the relevant chemistry knowledge covered in an undergraduate minor in chemistry. A typical chemistry minor includes:
- General Chemistry with laboratory (two semesters)
- Organic Chemistry (one semester)
Plus two other courses chosen from:
- Analytical Chemistry (one semester) OR
- Biological Chemistry or Biochemistry (one semester) OR
- Physical Chemistry (one semester)
Computer-trained students may enter on a chemistry fast track if necessary and receive graduate credit for one or more of the chemistry courses that cover key topics. Organic chemistry is especially important. In addition, students must be sure that they have at least the computer science skills listed in the following section, especially programming, discrete sructures, and data structures.
Students with a Bachelor's Degree in Chemistry (B.A. or B.S.)
Upon entering the program, students with undergraduate degrees in chemistry or biochemistry should either have or quickly acquire in the course of their graduate study the relevant computer science knowledge covered in an undergraduate minor in computer science. A typical computer science minor includes:
- Introduction to programming and algorithm design and analysis
- Introduction to software systems (object-oriented programming language, operating system interface, building and maintaining large projects)
- Discrete structures (including trees and lists, graph algorithms, the relational data model, propositional and predicate logic)
- Data structures (structure and use of storage media, methods of representing structured data, and techniques for operating on data structures)
Chemistry-trained students may enter on a computer science fast track if necessary and receive graduate credit for one or more of the computer science courses that cover key topics. Programming, discrete structures, and data structures are especially important.
Recent Classes at Indiana University
I533 Molecular Informatics, the Data Grid, and an Introduction to e-Science presented essential topics for effectively integrating cheminformatics and bioinformatics techniques and databases into the research practices of academic and other research groups. Modules on interfacing with the Grid, database modeling and design, and targeted programming were among the key topics included. Material in the seminar was related to the projects underway in the NIH-funded Chemical Informatics and Cyberinfrastructure Collaboratory (ChemBioGrid), and students had an opportunity to participate in those projects.
I571 Chemical Information Technology Chemical structure and data representation and search system; chemical information and database system: commercial chemical information databases, 2D and 3D representation and analysis, laboratory information management systems, electronic notebooks; chemical informatics: software development, artificial intelligence, high-throughput screening analysis, molecular modeling, industry-specific topics, research, web service technologies; relationship to bioinformatics, genomic and proteomics.
I572 Computational Chemistry and Molecular Modeling Experimental aspects and computer models of molecules and their behavior in gas and condensed phases; quantum and molecular mechanics; implicit and explicit solvation models; conformational analysis; geometry optimization methods; molecular dynamics and Monte Carlo simulations; de novo design techniques; quantitative structure-activity relationships (QSAR); pharmacophore modeling; comparative molecular field analysis (CoMFA); structure-based design; docking and scoring; molecular diversity and combinatorial libraries; molecular similarity; chemogenomics and systems sciences; practical aspects of molecular modeling - computable quantities, cost and efficiency, hardware, software, human aspects.
I573 Programming for Science Informatics is a highly practical course that covers the techniques used in the development of Chemical Informatics and related life sciences software, including programming with chemistry toolkits & libraries, domain-specific application of client-server systems, web services, high performance computing, and design of software for scientists. (Note that from 2007, this class will be known as I573 Programming Techniques for Chemical & Life Science Informatics.)
I590 Information Retrieval from Chemistry and Life Sciences Databases Students will be well acquainted with the broad array of chemistry and life sciences databases and be able to choose the most appropriate database(s) for a given question. They will be conversant with the STN Messenger Command language and will have a good grasp of the search options available in the databases covered in the course. They will have a firm understanding of the structure of the most important databases.
Specific goals are: to learn the main vendors or providers of database services in chemistry and the life sciences, to become expert in using the top databases in the disciplines, to understand the advantages that database searching by command language has over searching by front-end software packages, to learn the sources of online or offline help that can be used for a quick overview of the full range of available databases in the respective disciplines, to learn the aspects of a chemical substance that lend themselves to coding and retrieval in a chemistry database, and to distinguish the added value that commercial versions of databases provide in comparison to free implementations of those databases.
I590 Scientific Applications of XML XML is becoming the de facto language for data communication and interchange. XML acceptance is increased by the availability of a variety of tools for editing, validation and transformation of XML data. In the context of scientific data, XML is increasingly recognized as the backbone solution towards solving the problem of integration and sharing of data produced by scientific instruments and/or software tools. Solutions include domain specific markup languages such as GENEXML as well as standard markup languages including SBML, GAML, and AnIML. This course will take a practical approach, centered on XML, to study the issues related to scientific data integration and sharing. In the first part of the course, we will study XML as well as different standards used to define the structure of XML documents (e.g. DTDs, XML schemas, RELAX NG) and to transform XML data (e.g. XSLT). Approaches for transforming database information into XML and vice versa will also be studied. In the second part we will cover the structure and applications of a selected list of scientific markup languages. In addition, we will examine case studies in which XML-based solutions are proposed to solve specific data integration problems in bioinformatics, laboratory informatics, and other domains. This course will be presented in a computer lab to facilitate the application of data integration techniques and manipulation of software tools.
I647 Seminar in Chemical Informatics I: Bridging Bioinformatics and Chemical Informatics explored areas where chemistry and biology touch with respect to the databases and computational tools used in chemoinformatics and bioinformatics. Material in the seminar was related to the projects underway in the NIH-funded Chemical Informatics and Cyberinfrastructure Collaboratory (ChemBioGrid), and students had an opportunity to participate in those projects.
C649 Service Architectures and Science: Tools and Technology for Computational Science Computational science has emerged as the third leg of science, along with theory and experiment. Biology, chemistry, physics, economics, and medicine all have strong computational science research arms. This science often requires substantial collaboration between computer scientists and domain specialists. The domain scientist often has deep domain expertise, while the computer scientist brings expertise in issues like security, threading and concurrency, and distributed computing. The seminar looked at the key elements of contemporary, grid-based technology for computational science.


