Extract all molecules from the smile file

I am working on .smiles files. .Smiles file structure: http://en.wikipedia.org/wiki/Chemical_file_format#SMILES

I want to get all atoms from a smile file. This means that if there is one "C" atom, it means that 4 "H" atoms will be connected to them.

I found while searching that there are some modules in python that can parse the format of smiles, but they do not give supported hydrogen atoms. (ex: they only give "C" and not other 4'H 'atoms associated with that C atom)

How can I find all atoms including bound H atoms as well using python.
An example of a smile file that needs to be converted to all atoms, including bound H atoms:

[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]

      

Thanks in advance.

+3


source to share


5 answers


See Open Babel .

Useful links on the Open Babel site



See also,
This blog (Kasper Steinmann) on Chemistry with Python (using Open Babel, not all)

Update See this code (untested):

mymol = pybel.readstring("smi",  
"[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)" + \
"N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H")
print mymol.addh()

      

+6


source


I want to get all atoms from a smile file. This means that if there is one "C" atom, it means that 4 "H" atoms will be connected to them. This assumption is incorrect, it could be 1,2,3 hydrogen atoms.

Try openbabel, CDK or similar library for cheminformatics.



But why do you need all the atoms from the file?

+3


source


For the molecular weight of the compound, specified as SMILES, Python Openbabel bindings must be met:

import pybel
mol = pybel.readfile("smi", "stuff.smi").next()
print mol.molwt

      

+3


source


Try frowns , a chemoinformatics toolkit designed to quickly develop chemistry-related algorithms. It is written in almost 100% Python with a small portion written in C ++.

+2


source


RDKIT is a well-defined cheminformatics library in python.

To read a molecule from smiles

from rdkit import Chem

m = Chem.MolFromSmiles('[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]')

      

After reading the emoji in the RDKIT molecule, you can pretty much do everything. See the documentation for more details.

0


source







All Articles