smi23d consists of two programs that can be used to convert one or more SMILES
strings to 3D. The first step uses a program called
smisdf which generates a
set of rough 3D coordinates using an iterative refinement procedure. These coordinates
are then optimized with a MMFF94 force field by
mengine to generate a reasonable 3D structure.
In addition to generating an optimized structure, the code can optinally calculate some
molecular properties such as XlogP, dipole moment and vibrational properties. By default
these are not calculated. The resultant coordinates are written in SD format.
Important Features
- It should be noted that the programs do not aim to identify the lowest energy (or a low energy)
conformer. The output of this procedure is simply a reasonable 3D structure for a given molecule,
which can be used as a starting point for other investigations.
- If you're molecules are polycyclic aromatics, it is a good idea to provide kekulized SMILES,
rather than aromatic SMILES.
We have used these programs to convert 17M structures from Pubchem to 3D. These
structures are available in the Pub3D
database. In addition we have provided a
web form where you can paste a set of SMILES and get
back a set of 3D coordinates in SD format.
Downloads
The source code for the programs can be obtained from the CICC-Grid Sourceforge
SVN repository by doing
svn co https://cicc-grid.svn.sourceforge.net/svnroot/cicc-grid/cicc-grid/smi23d/trunk smi23d
If you'd rather not download and compile the code, we've made a few binary builds available.
Note that you'll also need to download the parameters files:
mmxconst.prm and
mmff94.prm.
Tobias Kind has kindly provided
a version
for Windows (compiled using Cygwin)
Compiling
The code is written in C and should not require any library beyond the standard C library. The build
system uses
Scons, a Python alternative to Makefile's.
The sources are available from the CICC-Grid Sourceforge SVN repository. You can check out
the smi23d trunk by doing the following
svn co https://cicc-grid.svn.sourceforge.net/svnroot/cicc-grid/cicc-grid/smi23d/trunk smi23d
This will create a directory called
smi23d. Change into the directory and then type
scons
This will compile the code and place the binary executables along with the parameter files
into the
build directory. Note that the binaries are built with no optimization. It has
been observed that using -O2 on 32 bit Linux leads to binaries that give the wrong results.
However using the -ffast-math option to gcc, does lead to significant speedups. Tobias
Kind also
reports a set of gcc parameters that leads to a 5x speedup for some test cases, though
as above, it employs the -O3 optimization level, so in some cases, might give non-reproducible
errors.
Usage
In general one would start from a set of SMILES strings which are to be converted to
an SD file containing 3D coordinates. The first step is to generate a set of rough
3D coordinates
smi2sdf -o rough.sdf -p mmxconst.prm mymols.smi
You can provide the name of any output file. Also you have to give it the full path
to the parameter file. If not specified it will look in the current working directory. The
program will generate a number of extra files including an error log (
error.log) and a SMILES
containing the entries from the input file that could not be processed due being inorganic.
Once you have generated a set of rough coordinates, we use this file (in our example rough.sdf)
as input to mengine to get the optimized 3D coordinates.
mengine -p mmff94.prm -c mmxconst.prm -o opt.sdf rough.sdf
This will dump output to the STDOUT and place the optimized coordinates into
opt.sdf. mengine
can evaluate a number of molecular properties include dipole moment, XlogP and various
vibrational properties. By default these are not calculated. Run mengine with no arguments to get the
help page which describes how to evaluate each of these properties.
In case you don't want to download and compile the program we provide a SOAP web service that can be
used to convert SMILES strings to SDF (returned as a string). You can get the
WSDL and
documentation describing the arguments that the service requires. We have also made available a web page
client (limited to 200 molecules at a time).
Benchmarks
Since the programs do not aim to generate a low energy conformer, comparison with other programs
can be difficult. Since we do not have access to other coordinate generators, we welcome anybody
who can perform such comparisons. We have done some timing benchmarks using molecules
from
PubChem ranging in size from 5 to 100
heavy atoms. You can view plots of time versus heavy atom count for
smi2sdf
and
mengine
License
The software is available under the Apache 2.0 license. You can get a copy of the license
here