I am a doctoral student in the Learning & Adaptive Systems Group at ETH Zurich supervised by Prof. Andreas Krause and Dr. Katja Hofmann. I am part of the Microsoft Swiss Joint Research Center, and an associated PhD student of the ETH AI Center. During my PhD, I was an intern at DeepMind and at the Center for Human-Compatible AI, UC Berkeley. Previously, I received a master’s degree in Data Science from ETH Zurich and a bachelor’s degree in physics from University of Cologne.
My research aims to build safe, robust, and interpretable artificial intelligence (AI). Currently, I am primarily working on Reinforcement Learning from Human Feedback (RLHF) which I think is a key ingredient for building safe AI. My work in this context has two goals: first, making RLHF more sample efficient via active learning, and second, using constraint models in addition to reward models to specify tasks, particularly in contexts where safety is a critical concern. Recently, my interests have expanded to other areas including interpretability, specifically mechanistically understanding neural network models, and red-teaming models before and during deployment. Through my work, I strive to ensure AI models we deploy today and in the coming years are safe, robust, and transparent.