报告题目 Title:What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)
报告人 Speaker:滕佳烨
报告人所在单位 Affiliation:上海财经大学
时间 Time:2025-11-13 14:00-15:00
地点 Venue:光华楼东主楼1513
报告摘要 Abstract:While looped transformers (Looped-Attn) often outperform standard transformers (Single-Attn) on complex reasoning tasks, their theoretical advantage remains unclear. Guided by empirical observations of distinct sample- and Hessian-level dynamics, we interpret this phenomenon via loss landscape geometry, extending the River-Valley model to distinguish U-shaped (flat) and V-shaped (steep) valleys. Based on these observations, we conjecture that Looped-Attn’s recursive architecture induces a River-V-Valley inductive bias; under this bias, our theoretical derivations guarantee improved loss convergence along the river via valley hopping and encourage learning of complex patterns, compared to the River-U-Valley bias of Single-Attn. Building on this insight, we propose SHIFT (Staged HIerarchical Framework for Progressive Training), a staged training framework that accelerates Looped-Attn training while maintaining comparable performance.
个人简介 Bio:滕佳烨,上海财经大学统计与数据科学学院助理教授,主要研究方向为理论机器学习,包括泛化理论与共形预测。他博士毕业于清华大学交叉信息研究院,曾赴普林斯顿大学访问。曾获清华大学优秀毕业生、优秀博士论文等奖项,并入选2025年CCF理论计算机科学博士学位论文激励计划提名。他是人工智能研讨班 FAI-Seminar 的发起人。主页:www.tengjiaye.com.
海报 Poster:
滕佳烨 学术报告.jpg
电话 Tel:021-65648958
邮箱 Email:am_admin@fudan.edu.cn
地址 Address:上海市杨浦区湾谷科技园二期D1栋
Building D1, Bay Valley II, Yangpu District, Shanghai, China
邮编 Postcode:200438