How to make python organic chemistry retrosynthesis generator?

I'm trying to learn python by creating a simple program that generates the typical type of practice problem organic chemistry students usually face exams: the retrosynthesis question.

For those unfamiliar with this type of question: the student is given an initial and final view of a series of chemical reactions, then asked to determine which reagents / reactions were performed with the initial reagent to obtain the final product.

Sometimes you are only given the final product and asked to list the reactions required for the synthesis, taking into account some parameters (start only with a compound containing 5 carbon atoms or less, use only alcohol, etc.)

I've done some research so far, and I think RDkit w / Python is a good place to start. My plan is to use the SMILE format to read molecules (since I can manipulate it like a string) and then define functions for each reaction, eventually I need a database of chemical species that the program can randomly select species from (for finite and finite species in the problem). The program then selects a random view from the database, applies a bunch of reactions to it (3-5 specified by the user), and then displays the final product. Then the user decides the question himself, and the program then shows the path (using images of intermediates and printing the reagents used to obtain them). Just. Basically.

But once I started actually coding the functions I ran into, some problems, first of all, it is very tedious to write a function for each individual reaction, and secondly, while SMILE can handle almost all molecular complications thrown at it ( stereochemistry, geometry, etc.), it has several forms for certain molecules, and I am having problems with specific reactions. Third, I use the "replace" method to manipulate SMILE strings, and this causes me problems when I have region-specific reactions that I want to make universal

For example: Sn2 reactions react well with primary alkyl halides, but not all with tertiary (steric hindrance), how would I create a function for this reaction?

Another problem: I want the reactions to be labeled with the appropriate reagents, so I decided to call the functions the reagents used. But this becomes problematic when there are reagents that can take different forms (for example, Gringard reagents).

I feel like there is a better, less repetitive and tedious way to accomplish this task. Looking for a nudge in the right direction

+3


source to share


2 answers


This is a rather ambitious task, and you are not the first to undertake it. Notable examples were / are

  • LHASA , originally developed by the EJ Corey group at Harvard University

  • WODCA developed by J. Gasteiger's group at the University of Erlangen

  • CHIRON , developed by the S. Hanessian group at the University of Montreal



There are several decades of human development in these projects, but I have no reliable information about their current state.

+5


source


It would be helpful if you were looking for free or if possible commercial software (written in python) that solves the same or a problem close to it, examines its functionality, approach to problem solving and, if possible, gets it source. I find this helpful in many ways.



+1


source







All Articles