Research Scientist at Google DeepMind

I am a research scientist at Google DeepMind working on AI alignment. My goal is to develop interpretable and trustworthy AI systems that learn from human feedback.

I received my PhD from ETH Zurich, where I was part of the Learning & Adaptive Systems Group at ETH Zurich supervised by Prof. Andreas Krause and Dr. Katja Hofmann. My dissertation focused on developing “Algorithmic Foundations for Safe and Efficient Reinforcement Learning from Human Feedback”. Before that, I received a master’s degree in Data Science from ETH Zurich and a bachelor’s degree in physics from University of Cologne in Germany.


My research aims to build safe, robust, and interpretable artificial intelligence (AI). Currently, I am primarily working on Reinforcement Learning from Human Feedback (RLHF) which I think is a key ingredient for building safe AI. My work in this context has two goals: first, making RLHF more sample efficient via active learning, and second, using constraint models in addition to reward models to specify tasks, particularly in contexts where safety is a critical concern. Recently, my interests have expanded to other areas including interpretability, specifically mechanistically understanding neural network models, and red-teaming models before and during deployment. Through my work, I strive to ensure AI models we deploy today and in the coming years are safe, robust, and transparent.