HeadRouter

News

2026.07.17The curated HeadRouter code and calibrated profiles are available.

2026.07.10HeadRouter was accepted by ACM MM 2026.

2026.04.30The paper was released on arXiv.

Overview

HeadRouter is a training-free audio token pruning method for large audio language models. It learns from task-dependent attention-head behavior and routes token importance scoring toward semantic, acoustic, or mixed head-weight profiles.

Task AdaptiveRoutes head weights according to input-dependent audio behavior.

Training FreeRequires no extra model training or parameter updates.

Audio FocusedTargets long-context audio understanding and token redundancy.

Compression FriendlyMaintains strong performance under aggressive pruning ratios.

Code

The release contains the routing equations, calibrated profiles, a Qwen2.5-Omni integration, and a configurable single-audio example.

git clone https://github.com/DabDans/HeadRouter.git
cd HeadRouter
conda create -n headrouter python=3.10 -y
conda activate headrouter
pip install -e .

python examples/qwen2_5_omni_inference.py \
  --model Qwen/Qwen2.5-Omni-3B \
  --audio /path/to/example.wav \
  --pruning-ratio 0.6

Browse core code

Method

HeadRouter combines position-bias-reduced text-to-audio probing with dynamic head-weight routing. The router softly mixes task profiles to score and retain the most informative audio tokens for each input.

Head Behavior Analysis

Representative visualizations show why one fixed head profile is insufficient: semantic and acoustic tasks exhibit different selectivity patterns and separable head-behavior clusters.

Selectivity heatmap

Semantic tasks are more diffuse, while acoustic tasks concentrate on smaller groups of highly selective heads.

t-SNE of head behavior

Per-sample head-behavior vectors form structured task clusters, supporting input-adaptive routing.

Results

HeadRouter improves the trade-off between compression and task performance by preserving task-relevant audio tokens more consistently than fixed or task-agnostic pruning strategies.

Motivation and oracle comparison

Oracle comparison across pruning methods

Local comparison

Efficiency

Ablation

Citation

Please cite either the ACM MM 2026 paper or its arXiv version.

ACM MM 2026

@inproceedings{he2026headrouter,
  title     = {HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models},
  author    = {He, Peize and Luo, Yaodi and Liu, Xiaoqian and Liu, Xuyang and Deng, Jiahang and Du, Yaosong and Li, Bangyu and Gui, Xiyan and Chen, Yuxuan and Zhang, Linfeng},
  booktitle = {Proceedings of the 34th ACM International Conference on Multimedia},
  year      = {2026}
}

arXiv

@misc{he2026headrouterarxiv,
  title         = {HeadRouter: Dynamic Head-Weight Routing for Task-Adaptive Audio Token Pruning in Large Audio Language Models},
  author        = {He, Peize and Luo, Yaodi and Liu, Xiaoqian and Liu, Xuyang and Deng, Jiahang and Du, Yaosong and Li, Bangyu and Gui, Xiyan and Chen, Yuxuan and Zhang, Linfeng},
  year          = {2026},
  eprint        = {2604.23717},
  archivePrefix = {arXiv},
  primaryClass  = {cs.SD},
  url           = {https://arxiv.org/abs/2604.23717}
}