What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)

中心动态

首页中心动态学术报告

中心动态

报告题目 Title：What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)

报告人 Speaker：滕佳烨

报告人所在单位 Affiliation：上海财经大学

时间 Time：2025-11-13 14:00-15:00

地点 Venue：光华楼东主楼1513

报告摘要 Abstract：While looped transformers (Looped-Attn) often outperform standard transformers (Single-Attn) on complex reasoning tasks, their theoretical advantage remains unclear. Guided by empirical observations of distinct sample- and Hessian-level dynamics, we interpret this phenomenon via loss landscape geometry, extending the River-Valley model to distinguish U-shaped (flat) and V-shaped (steep) valleys. Based on these observations, we conjecture that Looped-Attn’s recursive architecture induces a River-V-Valley inductive bias; under this bias, our theoretical derivations guarantee improved loss convergence along the river via valley hopping and encourage learning of complex patterns, compared to the River-U-Valley bias of Single-Attn. Building on this insight, we propose SHIFT (Staged HIerarchical Framework for Progressive Training), a staged training framework that accelerates Looped-Attn training while maintaining comparable performance.

个人简介 Bio：滕佳烨，上海财经大学统计与数据科学学院助理教授，主要研究方向为理论机器学习，包括泛化理论与共形预测。他博士毕业于清华大学交叉信息研究院，曾赴普林斯顿大学访问。曾获清华大学优秀毕业生、优秀博士论文等奖项，并入选2025年CCF理论计算机科学博士学位论文激励计划提名。他是人工智能研讨班 FAI-Seminar 的发起人。主页：www.tengjiaye.com.

海报 Poster：滕佳烨学术报告.jpg

沪ICP备16018209号-1

沪公网安备31009102000052

联系我们

CONTACT

电话 Tel：021-65648958

邮箱 Email：am_admin@fudan.edu.cn

地址 Address：上海市杨浦区湾谷科技园二期D1栋

Building D1, Bay Valley II, Yangpu District, Shanghai, China

邮编 Postcode：200438

友情链接

links

搜索

联系我们

友情链接