Project Page

HeadRouter

Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models

Peize He, Yaodi Luo, Xiaoqian Liu, Xuyang Liu, Jiahang Deng, Yaosong Du, Li Bangyu, Xiyan Gui, Yuxuan Chen, Linfeng Zhang

GitHub Repository Method Results Citation

Overview

HeadRouter is a training-free audio token pruning method for large audio language models. It learns from task-dependent attention-head behavior and routes token importance scoring toward semantic, acoustic, or mixed head-weight profiles.

Overview heatmap of task-dependent audio head behavior
Task-level behavior overview motivating adaptive routing across semantic and acoustic audio workloads.
Task AdaptiveRoutes head weights according to input-dependent audio behavior.
Training FreeRequires no extra model training or parameter updates.
Audio FocusedTargets long-context audio understanding and token redundancy.
Compression FriendlyMaintains strong performance under aggressive pruning ratios.

Method

HeadRouter combines position-bias-reduced text-to-audio probing with dynamic head-weight routing. The router softly mixes task profiles to score and retain the most informative audio tokens for each input.

HeadRouter method pipeline
HeadRouter pipeline: estimate task-aware head behavior, mix routing profiles, and prune audio tokens.

Head Behavior Analysis

Representative visualizations show why one fixed head profile is insufficient: semantic and acoustic tasks exhibit different selectivity patterns and separable head-behavior clusters.

Selectivity heatmap

Semantic tasks are more diffuse, while acoustic tasks concentrate on smaller groups of highly selective heads.

Attention head selectivity heatmap

t-SNE of head behavior

Per-sample head-behavior vectors form structured task clusters, supporting input-adaptive routing.

t-SNE visualization of head behavior

Results

HeadRouter improves the trade-off between compression and task performance by preserving task-relevant audio tokens more consistently than fixed or task-agnostic pruning strategies.

Motivation and oracle comparison

Oracle comparison across pruning methods

Local comparison

Comparison of local pruning methods

Efficiency

Efficiency comparison

Ablation

Ablation study bar chart

Citation

Citation information will be added after release.

@article{headrouter2026,
  title  = {HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models},
  author = {He, Peize and others},
  year   = {2026}
}