Extract all molecules from the smile file
I am working on .smiles files. .Smiles file structure: http://en.wikipedia.org/wiki/Chemical_file_format#SMILES
I want to get all atoms from a smile file. This means that if there is one "C" atom, it means that 4 "H" atoms will be connected to them.
I found while searching that there are some modules in python that can parse the format of smiles, but they do not give supported hydrogen atoms. (ex: they only give "C" and not other 4'H 'atoms associated with that C atom)
How can I find all atoms including bound H atoms as well using python.
An example of a smile file that needs to be converted to all atoms, including bound H atoms:
[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]
Thanks in advance.
source to share
See Open Babel .
Useful links on the Open Babel site
- " Guides "
- Capabilities
- Using Python with Open Babel
- Other tools include tool ( obprop )
See also,
This blog (Kasper Steinmann) on Chemistry with Python (using Open Babel, not all)
Update See this code (untested):
mymol = pybel.readstring("smi",
"[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)" + \
"N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H")
print mymol.addh()
source to share
I want to get all atoms from a smile file. This means that if there is one "C" atom, it means that 4 "H" atoms will be connected to them. This assumption is incorrect, it could be 1,2,3 hydrogen atoms.
Try openbabel, CDK or similar library for cheminformatics.
But why do you need all the atoms from the file?
source to share
RDKIT is a well-defined cheminformatics library in python.
To read a molecule from smiles
from rdkit import Chem
m = Chem.MolFromSmiles('[H]OC([H])([H])[C@@]1([H])C([H])=C([H])[C@@]([H])(n2c([H])nc3c(nc(nc23)N([H])[H])N([H])C2([H])C([H])([H])C2([H])[H])C1([H])[H]')
After reading the emoji in the RDKIT molecule, you can pretty much do everything. See the documentation for more details.
source to share