How to make python organic chemistry retrosynthesis generator?
I'm trying to learn python by creating a simple program that generates the typical type of practice problem organic chemistry students usually face exams: the retrosynthesis question.
For those unfamiliar with this type of question: the student is given an initial and final view of a series of chemical reactions, then asked to determine which reagents / reactions were performed with the initial reagent to obtain the final product.
Sometimes you are only given the final product and asked to list the reactions required for the synthesis, taking into account some parameters (start only with a compound containing 5 carbon atoms or less, use only alcohol, etc.)
I've done some research so far, and I think RDkit w / Python is a good place to start. My plan is to use the SMILE format to read molecules (since I can manipulate it like a string) and then define functions for each reaction, eventually I need a database of chemical species that the program can randomly select species from (for finite and finite species in the problem). The program then selects a random view from the database, applies a bunch of reactions to it (3-5 specified by the user), and then displays the final product. Then the user decides the question himself, and the program then shows the path (using images of intermediates and printing the reagents used to obtain them). Just. Basically.
But once I started actually coding the functions I ran into, some problems, first of all, it is very tedious to write a function for each individual reaction, and secondly, while SMILE can handle almost all molecular complications thrown at it ( stereochemistry, geometry, etc.), it has several forms for certain molecules, and I am having problems with specific reactions. Third, I use the "replace" method to manipulate SMILE strings, and this causes me problems when I have region-specific reactions that I want to make universal
For example: Sn2 reactions react well with primary alkyl halides, but not all with tertiary (steric hindrance), how would I create a function for this reaction?
Another problem: I want the reactions to be labeled with the appropriate reagents, so I decided to call the functions the reagents used. But this becomes problematic when there are reagents that can take different forms (for example, Gringard reagents).
I feel like there is a better, less repetitive and tedious way to accomplish this task. Looking for a nudge in the right direction
source to share
This is a rather ambitious task, and you are not the first to undertake it. Notable examples were / are
-
LHASA , originally developed by the EJ Corey group at Harvard University
-
WODCA developed by J. Gasteiger's group at the University of Erlangen
-
CHIRON , developed by the S. Hanessian group at the University of Montreal
There are several decades of human development in these projects, but I have no reliable information about their current state.
source to share