Digitalizing materials synthesis

My research seeks to accelerate the discovery of materials by integrating high-performance computing (HPC), machine learning (ML), and atomistic simulations to model materials synthesis. Recently, my work has focused on the following areas:

  • designing catalysts and molecular sieves using automated software pipelines, ML, and human-computer interaction
  • embedding physical principles in ML models to improve their robustness and interpretability
  • discovering synthesis recipes for materials using ML and literature extraction
  • developing computational representations and algorithms applied to chemical systems


Below, I highlight a few projects where computation meets materials synthesis and design.

A priori catalyst design


The diversity of materials compositions, structures, surfaces etc. provides an untapped potential for discovering heterogeneous catalysts. However, the breadth of this chemical space poses a challenge when designing catalysts for given chemical reactions, especially when synthesis routes are taken into account. My research provides theoretical insights to explore this structural and synthesis space using databases, simulations, and machine learning.

As an example, I explored the synthesis space of zeolites using high-throughput simulations and data-driven approaches. The theoretical insights explain phase selectivity, quantify active site distributions, and simplify the design of synthesis routes for these materials, attempting to bypass trial-and-error in zeolite synthesis. These results were unified under a web platform (available for public use), validated with experimental collaborations, and led to the discovery of new catalysts balancing lower synthesis costs, improved stability, and structural control.

Scientific machine learning


Machine learning and deep learning are revolutionizing all areas of research. Particularly in the applied sciences, substantial efforts are devoted towards producing interpretable and robust models. On the other hand, tools such as neural networks act mostly as black boxes with limited extrapolation power. My work embeds physical principles into ML to improve their interpretability, robustness, and data efficiency.

In one example, I combined ideas on uncertainty quantification and adversarial attacks to simulate the dynamics of materials using ML. The proposed method guides the exploration of the chemical space in neural network potentials, improving the robustness of these models through an iterative, data-efficient approach (an interactive explanation of the method can be accessed here). I also collaborated to develop ML approaches for active learning or configurational space sampling.

Literature-enabled materials synthesis


The scientific literature contains an immense amount of latent knowledge, often accessible only through years of dedicated study. Automating the extraction of insights from this corpus of data can accelerate discovery in materials sciences. However, correlating materials properties reported in the literature with a theoretical insight requires developing representations to interpret that data.

In my work, literature is often used to propose and validate new features and theories. For instance, I proposed a graph-theoretical “order parameter” for explaining diffusionless transformations in zeolites that explains most of the cases of such polymorphic transformations in the literature. The theory also shows how other phase transitions in these materials are hardly predicted by local features such as structural units. In another example, over six decades of literature are used to validate a theory on zeolite phase competition. As the information contained in papers is not used as training data for the model, it represents a ground truth dataset painstakingly curated by the field, and can be used for testing different hypotheses. In addition to connecting theoretical insights with literature data, I also collaborated to extract, clean, and analyze these datasets using unsupervised learning.

Software and algorithms for the chemical sciences


Automating data pipelines in the chemical sciences can expedite tasks such as simulations, inference, and data sharing. However, connecting different data provenances and deploying calculations to HPC centers requires substantial engineering work.

My computational projects rely extensively on software and algorithm development for simulating and analyzing materials (see software page). I have developed docking algorithms, benchmarks, neural network force fields, and human-computer interaction tools for visualizing and improving the performance of atomistic simulations. The integration of these software packages is performed by database infrastructures, some of which are built using Django and PostgreSQL, and calculated using HPC centers.