smi23d - 3D Coordinate Generation


smi23d consists of two programs that can be used to convert one or more SMILES strings to 3D. The first step uses a program called smisdf which generates a set of rough 3D coordinates using an iterative refinement procedure. These coordinates are then optimized with a MMFF94 force field by mengine to generate a reasonable 3D structure. In addition to generating an optimized structure, the code can optinally calculate some molecular properties such as XlogP, dipole moment and vibrational properties. By default these are not calculated. The resultant coordinates are written in SD format.

Important Features

We have used these programs to convert 17M structures from Pubchem to 3D. These structures are available in the Pub3D database. In addition we have provided a web form where you can paste a set of SMILES and get back a set of 3D coordinates in SD format.

Updates

Downloads

The source code for the programs can be obtained from the CICC-Grid Sourceforge SVN repository by doing
svn co https://cicc-grid.svn.sourceforge.net/svnroot/cicc-grid/cicc-grid/smi23d/trunk smi23d
If you'd rather not download and compile the code, we've made a few binary builds available. Note that you'll also need to download the parameters files: mmxconst.prm and mmff94.prm.
Platform
Intel OS Xsmi2sdfmengine
Linux, Fedora Core 5
Intel 64 bit
smi2sdfmengine

Tobias Kind has kindly provided a version for Windows (compiled using Cygwin)

Compiling

The code is written in C and should not require any library beyond the standard C library. The build system uses
Scons, a Python alternative to Makefile's. The sources are available from the CICC-Grid Sourceforge SVN repository. You can check out the smi23d trunk by doing the following
svn co https://cicc-grid.svn.sourceforge.net/svnroot/cicc-grid/cicc-grid/smi23d/trunk smi23d
This will create a directory called smi23d. Change into the directory and then type
scons
This will compile the code and place the binary executables along with the parameter files into the build directory. Note that the binaries are built with no optimization. It has been observed that using -O2 on 32 bit Linux leads to binaries that give the wrong results.

However using the -ffast-math option to gcc, does lead to significant speedups. Tobias Kind also reports a set of gcc parameters that leads to a 5x speedup for some test cases, though as above, it employs the -O3 optimization level, so in some cases, might give non-reproducible errors.

Usage

In general one would start from a set of SMILES strings which are to be converted to an SD file containing 3D coordinates. The first step is to generate a set of rough 3D coordinates
smi2sdf -o rough.sdf -p mmxconst.prm mymols.smi
You can provide the name of any output file. Also you have to give it the full path to the parameter file. If not specified it will look in the current working directory. The program will generate a number of extra files including an error log (error.log) and a SMILES containing the entries from the input file that could not be processed due being inorganic.

Once you have generated a set of rough coordinates, we use this file (in our example rough.sdf) as input to mengine to get the optimized 3D coordinates.

mengine -p mmff94.prm -c mmxconst.prm -o opt.sdf rough.sdf
This will dump output to the STDOUT and place the optimized coordinates into opt.sdf. mengine can evaluate a number of molecular properties include dipole moment, XlogP and various vibrational properties. By default these are not calculated. Run mengine with no arguments to get the help page which describes how to evaluate each of these properties.

In case you don't want to download and compile the program we provide a SOAP web service that can be used to convert SMILES strings to SDF (returned as a string). You can get the WSDL and documentation describing the arguments that the service requires. We have also made available a web page client (limited to 200 molecules at a time).

Benchmarks

Since the programs do not aim to generate a low energy conformer, comparison with other programs can be difficult. Since we do not have access to other coordinate generators, we welcome anybody who can perform such comparisons. We have done some timing benchmarks using molecules from PubChem ranging in size from 5 to 100 heavy atoms. You can view plots of time versus heavy atom count for smi2sdf and mengine

License

The software is available under the Apache 2.0 license. You can get a copy of the license here

Acknowledgements

Thanks to Kevin Gilbert who wrote the code. Rajarshi Guha updated the code for command line arguments and SDF format details