Communication Compression for Tensor Parallel LLM Inference
Inter-GPU communication compression for Large Language Models leads to up to 2x reduction of time-to-first-token with negligible model performance degradation.
I am a PhD candidate at Prof. Schoellig’s lab at Technical University of Munich. My main interests include control for complex systems, reinforcement learning, and robotics. Previously, I completed a M.Sc. degree in Robotics at TUM, and wrote my master’s thesis with Prof. Ijspeert. I was at Robert Bosch GmbH and in the Formula Student. Member of AI Grid.
Inter-GPU communication compression for Large Language Models leads to up to 2x reduction of time-to-first-token with negligible model performance degradation.
A single gait cycle of expert demonstration significantly improves achieved reward and visual appearance of the learned gait in locomotion tasks - even for transfer tasks.
3D-printed chip for mechanoactivation at single-cell level to study intracellular calcium signaling and translocation.
A Franka Emika FR3 robotic manipulator builds lego structures.
Teaching and studying tool for TU Dresden with 3500 users. Awarded with the german Digital Changemaker Award.
Building a robot for autonomous outdoor exploration and artifact detection
KIRO2024 was the first german AI-in-robotics conference
Check out the Huberman Lab podcast for a healthy lifestyle and science-based tools for everyday life. The Lex Fridman podcast for broadening your worldview and socially concerning topics. I like to look at DailyArt.