David Wild Update June 2006
From Chemical Informatics and Cyberinfrastructure Collaboratory
Contents |
[edit]
Web Service Infrastructure
- New services
- DTP Database Screen Search
- Protein Database
- ToxTree
- New VOTABLES services (TAB->VOTABLE, VOTABLE->TAB)
- Distributed Drug Discovery in progress
- Decision to use tab files (for the moment)
- Mk.1 Tab File, Mk.2 VOTABLES, Mk.3 CML?
[edit]
Workflows
- Implemented key HTS Data Analysis workflow (Workflow 2):
SCREEN SEARCH -> FILTER -> (TOXTREE) -> DIVKMEANS -> VOTABLES -> VOPLOT
- Taverna 1.4 now available including visual workflow builder
- Pilot project with Plale/Gannon to illustrate how our Taverna framework can be easily integrated with other environments
- See Workflows for more information
[edit]
Data Mining of DTP Database
- Major review of current work, and identification of areas of opportunity
- Characterization of database in terms of diversity, compound profiles, similarity to HTS datasets
- Collaboration with Faming Zhang
[edit]
Visualization and Interaction Tools
- First portlet application in Gridsphere - BCI clustering portlet
- Developments to PubChemSR - similarity searching
- Usability experiment on Pubchem/Chmoogle/ChemDB completed. Developing wider package of usability experiments (outside grant...)
[edit]
Methods
- Demonstrated ability to cluster PubChem in 5-6 hours on AVIDD (20 procs)
[edit]
Outreach
- Begun Collaborating with Michigan MACE
- Wrapping around PDBBind Dataset
- Mookie - Heather Carlson
- Late Summer Collaboration Meeting
- See Indiana Michigan Collaboration for more information
- Potential connection with OpenEye
- Potential connection with Jake Chen
- ACS presentation accepted for September
- Microsoft presentation September/October
- Publication (Wild, Wiggins) in Drug Discovery Today, May 2006, on chemoinformatics education (in addition to JCIM article)
- Publicaton (Wild, Wu, Zaharevitz) on DTP Data Mining, submit July
- Publication on Web Service Infrastructure, submit July
[edit]
Six Month Outlook
- Summary so far
- Implemented 12 chemoinformatics web services (each service can provide multiple functions)
- Implemented 2 drug discovery related workflows using these services
- Several prototype visualization / interaction tools
- Started mining the DTP Tumor Cell Line Set
- Demonstrated feasibility of clustering entire PubChem dataset
- Begun collaborations with PMR group, Michigan MACE, Faming Zhang, DTP
- Posters at spring ACS, presentations at fall ACS and Microsoft, 2 papers in preparation
- Must have
- Scientific successes using workflows
- Faming Zhang Kinase collaboration
- DTP data mining with PDB
- Analysis of MLSCN data in PubChem (Workflow 2)
- Expanded Workflow 2
- Add in 2D structure viewer to VOPlot
- ToxTree
- Any QSAR models available
- PDBBind
- 2 more workflows
- A non-trivial portlet interface talking multiple workflow streams
- OSCAR success with PMR collaboration (Matt Stahl concurs)
- Publications on DTP Data Mining and Web Service Infrastructure
- Some success with Michigan collaboration (PDBBind, Workshop, Chemoinformatics Course)
- Scientific successes using workflows
- Nice to have
- Execution environment for Taverna, including ability to wrap workflows as web services and possibly BEPL support
- R statistical and QSAR services
- Something with a .NET interface
- Scientific Workflow using reaction database from Distributed Drug Discovery
- Publication on DivKmeans
[edit]
Points for discussion
- Use of OEChem and potential for NIH buyout
- Joint workshop with MACE
