세미나 Classical signal processing based tricks yield SOTA results for low bi…
페이지 정보
작성자 최고관리자 작성일 24-11-20 15:00본문
날짜 : 2024. 11. 25.(월) 11:00-
장소 : 제4공학관 D503
장소 : 제4공학관 D503
Title: Classical signal processing based tricks yield SOTA results for low bit-width arithmetic in Large Language Models
Speaker: Vikas Singh, Vilas Distinguished Achievement Professor, University of Wisconsin Madison
Abstract: Transformer-based vision and language models are ubiquitous in modern AI not only driving advances in science but also live at the heart of products that touch millions of users each day. Despite these impressive capabilities, the memory and compute footprint of training and serving such models is quite large. The compute infrastructure needed is expensive and the size of the models is often proportional to latency and energy consumption. One effective strategy is to utilize low bit-width instead of 16 or 32 bit floating point operations, which, assuming appropriate hardware/compiler support, drastically improves latency and reduces the energy footprint. But there is no free lunch. Arbitrary quantization of model parameters to low bit-width leads to severe drop in performance, and identifying the correct quantization can often involve solving a challenging optimization problem. In this talk, I will cover some new results showing how revisiting classical ideas from coding theory/signal processing, specifically frames, and their variants suggests natural ways in which the quantization problem can be reformulated via a change of basis argument. I will show promising results on most publicly available models on HuggingFace where in some cases, calculations based on <3 bit (and up to 2 bits) approach the performance profile of 8 bit or 16 bit models. If time permits, I will cover another related result on recasting a broader range of matrix multiplication operations in Transformer models via a simple unpacking/basis expansion reformulation.
Bio: Vikas is a Vilas Distinguished Achievement Professor at the University of Wisconsin Madison and also a (part-time) Researcher at Google DeepMind. His research group is focused on theoretical and applied research in machine learning, computer vision and statistical image analysis (with a focus on neuroimaging). He was a recipient of the NSF CAREER award, NSF’s most prestigious award to junior faculty members many years ago and more recently, a result from his group received the best paper award at ECCV 2022, one of the flagship venues in computer vision. He is singularly proud of graduate researchers who have trained with him, and gone on to establish prominent independent research careers of their own, in academia and industry.
댓글목록
등록된 댓글이 없습니다.