Blog: Chematica (Synthia) – a new software tool for synthesis planning, that combines artificial and expert intelligence to outperform both

  1. Home
  2. > Blog
  3. > Chematica (Synthia) – a new software tool for synthesis planning, that [...]

by: Iryna Boiko Track « In Silico Drug Design » Strasbourg-Milan-Paris, 2023

Last year, I had an amazing opportunity to attend a lecture of Bartosz A. Grzybowski, a Polish scientist, who introduced his group’s ground-breaking work on an advanced synthesis planning tool – Chematica, now commercialized by Merck KGaA as Synthia.

Chematica’s synthetic pathway designs have reached a level where they are indistinguishable from those created by humans, and sometimes even surpass them in terms of efficiency and elegance. Several complex natural product syntheses proposed by the algorithm have been successfully realized in the lab.[1]

The success of Chematica can be attributed to the combination of machine learning techniques with an expert-based approach. Over nearly a decade, the authors manually identified approximately 100,000 reaction types. The implementation of the software took almost 20 years from its initial conception. For popular reactions, where large amounts of data are available, machine learning algorithms were utilized. Different cross-reactivities and conflicting groups were also encoded into each reaction rule. Quantum Chemistry and Molecular Mechanics calculations were occasionally incorporated. This hybrid model demonstrated superior performance compared to purely expert-based or purely ML-based softwares.[2,3]

A distinguishing feature of Chematica are the scoring functions, which help navigate through vast networks of synthetic possibilities. At each step, the software must select the most feasible retrosynthetic pathway to prevent combinatorial explosion (Figure 1).

The scoring functions evaluate both the reactions and the sets of generated substrates. The chemicals’ scoring function (CSF) accounts for variables such as the number of stereocenters, rings and the length of SMILES of each substrate to avoid big, more complex synthons. The reaction scoring function (RSF) approximates the difficulty of a particular operation based on conflicting or fragile functional groups, possibilities for non-selectivity and the need for protective groups. Thus, each variable increases its value for less favorable pathways, with user-defined coefficients. The RSF and CSF are summed up and the pathway with the lowest score is selected.[2]

But, as focusing on one step at a time may lead to a dead end later on, Chematica simultaneously explores ‘wide’ and ‘deep’. It also considers tandem reactions and ‘tactical combinations’—two-step sequences that initially increase structural complexity but enable simplification later. Notably, Chematica is not biased towards reactions commonly reported in literature, allowing it to assign high ranks to newly developed or specific reactions, leading to more elegant solutions compared to purely ML-driven softwares.[2,3]

Grzybowski claims that they have effectively taught the computer the rules of Chemistry. Does it mean that organic chemists will lose their jobs soon? We shall see, but one thing is certain—chemoinformaticians will be in high demand.

Figure 1. Synthetic options during iterative retron-to-synthon expansion around scabrolide A target. Only few initial expansions are shown.[3]

[1] B. Mikulak-Klucznik, P. Gołębiowska, A. A. Bayly, O. Popik, T. Klucznik, S. Szymkuć, E. P. Gajewska, P. Dittwald, O. Staszewska-Krajewska, W. Beker, T. Badowski, K. A. Scheidt, K. Molga, J. Mlynarski, M. Mrksich, B. A. Grzybowski, Nature 2020, 588, 83–88 (https://doi.org/10.1038/s41586-020-2855-y).
[2] B. A. Grzybowski, T. Badowski, K. Molga, S. Szymkuć, WIREs Comput. Mol. Sci. 2023, 13:e1630 (https://doi.org/10.1002/wcms.1630).
[3] K. Molga, S. Szymkuć, B. A. Grzybowski, Acc. Chem. Res. 2021, 54, 1094–1106 (https://doi.org/10.1021/acs.accounts.0c00714).