Together with
Microsoft MOLeR: A New Age of Artificial Molecule Generation
Nov 9, 2022
Microsoft MOLeR: A New Age of Artificial Molecule Generation
Ashesh Anand
Ashesh Anand

The advancement of technology not only improves an individual's global position but also makes life easier. Medical science has improved world health more than ever before. The world is rapidly moving toward technology and innovation. Every day we hear about a new idea or discovery. The fascinating aspect of this relationship is that each new technology masters its predecessor.

One of the many applications that have become a reality, thanks to technology, is the discovery of therapeutic medications.

MoLeR is a graph-based model that naturally supports scaffolds as an initial seed of the generative technique of molecules. MoLeR beats state-of-the-art methods on unconstrained molecular optimization problems and outperforms them on scaffold-based tasks, according to the findings of MoLeR developers, while being an order of magnitude faster to train and sample from than existing approaches. It's also visible how a variety of seemingly minor design choices affect overall performance.

The MoLeR Model:

The molecules are represented as graphs in the MoLeR model, with atoms acting as vertices connected by edges corresponding to bonds. The model is trained using the auto-encoder paradigm, which consists of an encoder?a graph neural network (GNN) to compress an input molecule into a so-called latent code?and a decoder to reconstruct the original molecule from this code. The reconstruction method is designed to be sequential because the decoder must decompress a short encoding into a graph of any size.

A partially produced graph is extended by adding new atoms or bonds in each phase. The decoder in the model makes predictions at each step simply on the basis of a partial graph and a latent code rather than relying on previous predictions. MoLeR is taught to build the same molecule in a variety of different orders because the order of construction is random.

How MOLeR Optimize The Process of Molecule Generation?

Even after training the model, as mentioned earlier, MoLeR has no concept of "molecular optimization." It may, however, use an off-the-shelf black-box optimization method to do optimization in the space of latent codes, similar to other ways. This is not achievable with CGVAE because it employed a considerably more sophisticated graph encoding. It chooses Molecular Swarm Optimization (MSO) for working because it produces state-of-the-art results for latent space optimization in other models, and it works quite well for MoLeR. On novel benchmark tasks that are akin to true drug discovery projects using big scaffolds, the developers discovered that optimization with MSO and MoLeR outperformed existing methods.

The Evolution That Led Up to the MoLeR Model:

CGVAE, a generative model of molecules developed by the researchers, performed well on simple synthetic tasks. However, the CGVAE model's applicability in practical drug development was limited by two issues:

It can't be naturally limited to merely looking at molecules with a certain substructure (called the scaffold). Due to its low-level, atom-by-atom generation approach, it struggles to recreate essential structures, such as complicated ring systems.

Because the decoder must decompress a short encoding into any size graph, the reconstruction technique is meant to be sequential. In each phase, they add new atoms or bonds to the partially constructed graph.

Rather than depending on prior predictions, the model's decoder simply produces predictions based on a partial graph and a latent code at each step. The researchers explain that pharmacological compounds are typically composed of larger structural motifs rather than random atom combinations, comparable to how sentences in spoken languages are composed of words rather than random sequences of letters. MoLeR, unlike CGVAE, learns to extend a partial molecule using entire motifs after first recognizing these common building blocks in data (rather than single atoms). Because the sequence in which the molecule is formed is arbitrary, MoLeR has also been trained to build the same molecule in several orders.

As a result, MoLeR not only takes fewer steps to produce drug-like compounds, but it also does so in a manner that is more similar to how biologists think about molecule construction.

A scaffold is a critical component of a molecule that has previously shown intriguing features. Drug development approaches usually concentrate on a small subset of the chemical space, identifying a scaffold first and then only looking at compounds that contain the scaffold as a subgraph. The design of MoLeR's decoder permits the smooth integration of an arbitrary scaffold by employing an random scaffold as an initial state in the decoding loop. MoLeR is well suited for targeted scaffold-based exploration since it learns to complete arbitrary subgraphs via randomized generation order during training.

Although MoLeR has no concept of "molecular optimization," the researchers note that optimization in the space of latent codes can be done using an off-the-shelf black-box optimization method. In this study, they used Molecular Swarm Optimization (MSO), which offers cutting-edge findings for latent space optimization in other models. Their findings demonstrate that it is also effective for MoLeR. The researchers used MSO and MoLeR to optimize new benchmark tasks that are similar to genuine drug development projects requiring large scaffolds and found that this combination outperformed current models.

Results: Efficiency of MoLeR

The number of molecules processed per second is used to quantify the speed of different models in training and inference. For this comparison, we do not subsample generation stages, so every model processes all of them, even though MoLeR can only learn from a subset of them. MoLeR is substantially quicker than all baselines for both training and inference, thanks to a simplified formulation and concurrent training on all generation steps.

Write for us

Our writers are independent, remote and growing in numbers. Join our team of enthusiastic authors and begin creating and earning today.

Get Started