- 제목
- [세미나] GenRobot: Building VLA Foundation Models to Advance General-Purpose Service Robotics / Dr. Jianlong Fu (MSRA)
- 작성자
- 첨단컴퓨팅학부
- 작성일
- 2025.09.29
- 최종수정일
- 2025.09.29
- 분류
- 세미나
- 게시글 내용
-

일시: 2025. 10. 1. (수요일), 오전 11시~
장소: 제4공학관, D915
Speaker: Dr. Jianlong Fu (Microsoft Research Asia)
Title: GenRobot: Building VLA Foundation Models to Advance General-Purpose Service Robotics
Abstract:
This work explores the design and implementation of a Robot GPT model capable of performing diverse tasks to assist in people’s daily lives. To this end, we propose HALO, an advanced long-horizon robotic foundation model for object manipulation. The key innovation lies in enabling robots to generalize across a wide variety of tasks. Specifically, the HALO model introduces a joint vision–action embedding space, where robotic actions are encoded into rich, discrete symbolic representations. Based on this unified cross-modal symbolic space, we integrate large language models (LLMs) with multimodal foundation models, enabling the generation of both visual predictions and action sequences to accomplish robotic tasks. To further enhance scalability, the model is designed to be trained jointly on robotic demonstrations and human operation videos, thus leveraging the vast amount of human video data available online. After pretraining, the model can adapt to new objects, environments, and robotic platforms through few-shot learning techniques. We develop models of different scales (1B, 3B, and 7B parameters) and train them on progressively larger datasets, with the ultimate goal of building a highly scalable foundation Robot GPT model.Bio:
Jianlong Fu is a Principal Research Manager in the Multimedia Computing Group at Microsoft Research Asia. His research interests include computer vision and robot learning, with a focus on multimodal generative AI, especially multimedia content generation and perception for images, videos, and embodied agents. He was named one of the MIT Technology Review 2023 China Intelligent Computing Innovators, received the title of IEEE Distinguished Lecturer 2024, and won five Best Paper Awards at flagship international multimedia conferences such as ACM Multimedia. His innovations have been deployed in core Microsoft products, including Windows, Office, Azure, Bing, and Edge.

