We seek to accelerate materials discovery beyond screening by designing synthesis routes along with new materials. To do that, we integrate high-performance computing (HPC), machine learning (ML), and atomistic simulations to model complex design principles and chemical phenomena.


Some of our research projects include:

  • designing catalysts and molecular sieves using automated software pipelines, ML, and human-computer interaction
  • embedding physical principles in ML models to improve their robustness and interpretability
  • discovering synthesis recipes for materials using ML and literature extraction
  • developing computational representations and algorithms applied to chemical systems

Nanoporous materials design


The diversity of nanoporous materials compositions, structures, surfaces etc. provides wide potential for a variety of applications. In particular, the breadth of this chemical space poses a challenge when designing catalysts for given chemical reactions, especially when synthesis routes are taken into account. Our research provides theoretical insights to explore this structural and synthesis space using databases, simulations, and machine learning.

Examples of our work include exploring the synthesis of zeolites using high-throughput simulations, molecular descriptors, and machine learning which led to the discovery of new catalysts balancing lower synthesis costs, improved stability, and structural control. Recently, we had success in modeling inorganic synthesis conditions of zeolites, achieving great agreement with the literature.

Atomistic machine learning


Artificial intelligence and machine learning (ML) are revolutionizing all areas of knowledge, including materials research. Particularly in the applied sciences, substantial efforts are devoted towards producing interpretable and robust models. Our work embeds physical principles into ML to improve their interpretability, robustness, and data efficiency in the field of materials science.

For example, we proposed new ways to improve the robustness of ML models for atomistic simulations by combining ideas on uncertainty quantification, chemical sampling, and adversarial attacks. This meeting between deep learning and physical sciences also explains why some ML models are more robust than others in production simulations.

Literature-enabled materials synthesis


The scientific literature contains an immense amount of latent information, often accessible only through years of dedicated study. Automating the extraction of knowledge from this corpus of data can accelerate materials discovery, but correlating materials properties reported in the literature with a theoretical insight requires developing representations to interpret that data.

In our work, we combine representation learning to literature extraction to explain existing phenomena. For instance, we developed a graph-theoretical “order parameter” for explaining diffusionless transformations in zeolites that explains most of the cases of such polymorphic transformations in the literature. In another example, over six decades of literature are used to validate a theory on phase competition of nanoporous materials. As the information contained in papers is not used as training data for the model, it represents a ground truth dataset painstakingly curated by the field, and can be used for testing different hypotheses.

Automating and scaling up materials simulations


Automating data pipelines in the chemical sciences can expedite tasks such as simulations, inference, and data sharing. However, connecting different data provenances and deploying calculations to HPC centers requires substantial engineering work.

Our computational projects rely extensively on software and algorithm development for simulating and analyzing materials (see software page). We have developed computational platforms to facilitate the simulation of materials in distributed computing environments. In combination with its modular approach, our software provides great flexibility for simulating diverse materials systems.