Extract all molecules from the smile file

Question

Extract all molecules from the smile file

I am working on .smiles files. .Smiles file structure: http://en.wikipedia.org/wiki/Chemical_file_format#SMILES

I want to get all atoms from a smile file. This means that if there is one "C" atom, it means that 4 "H" atoms will be connected to them.

I found while searching that there are some modules in python that can parse the format of smiles, but they do not give supported hydrogen atoms. (ex: they only give "C" and not other 4'H 'atoms associated with that C atom)

How can I find all atoms including bound H atoms as well using python.
An example of a smile file that needs to be converted to all atoms, including bound H atoms:

[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]

Thanks in advance.

+3

python bioinformatics biopython cheminformatics

sam 12 Feb At 5:59 am

source to share

5 answers

pradyunsg · Answer 1 · 2013-02-12T09:38:42+0000

See Open Babel .

Useful links on the Open Babel site

See also,
This blog (Kasper Steinmann) on Chemistry with Python (using Open Babel, not all)

Update See this code (untested):

mymol = pybel.readstring("smi",  
"[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)" + \
"N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H")
print mymol.addh()

chupvl · Answer 2 · 2013-02-12T07:07:14+0000

I want to get all atoms from a smile file. This means that if there is one "C" atom, it means that 4 "H" atoms will be connected to them. This assumption is incorrect, it could be 1,2,3 hydrogen atoms.

Try openbabel, CDK or similar library for cheminformatics.

But why do you need all the atoms from the file?

Klaus-Dieter Warzecha · Answer 3 · 2013-06-12T07:24:01+0000

For the molecular weight of the compound, specified as SMILES, Python Openbabel bindings must be met:

import pybel
mol = pybel.readfile("smi", "stuff.smi").next()
print mol.molwt

BioGeek · Answer 4 · 2013-02-12T09:05:39+0000

Try frowns , a chemoinformatics toolkit designed to quickly develop chemistry-related algorithms. It is written in almost 100% Python with a small portion written in C ++.

Jayaram · Answer 5 · 2014-02-11T15:19:31+0000

RDKIT is a well-defined cheminformatics library in python.

To read a molecule from smiles

from rdkit import Chem

m = Chem.MolFromSmiles('[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]')

After reading the emoji in the RDKIT molecule, you can pretty much do everything. See the documentation for more details.

Extract all molecules from the smile file

More articles: