In contrast to other sequence tasks modeling three-dimensional hidden layer features, Dual-Path time and time-frequency domain speech enhancement models are effective and with low parameters but computationally demanding due to their four-dimensional hidden layer features. We propose ZipEnhancer, which is Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement, incorporating time and frequency domain Down-Up sampling to reduce computational costs. We introduce the ZipformerBlock as the core block and propose the design of the Dual-Path DownSampleStacks that symmetrically scales down and scales up. Also we introduce the ScaleAdam optimizer and Eden learning rate scheduler to further improve the performance, Our model achieves new state-of-the-art results on the DNS 2020 Challenge and Voicebank+DEMAND datasets, with a perceptual evaluation of speech quality (PESQ) of 3.69 and 3.63, using 2.04M parameters and 62.41G FLOPS, outperforming other methods with similar complexity levels.
More Audio Samples can be found at https://github.com/ZipEnhancer/ZipEnhancer.
Noisy | Clean | FRCRN | MFNet | MP-SENet | ZipEnhancerS (Ours) | ZipEnhancerM (Ours) | |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 1 | |||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 2 | |||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 3 |
Scene | Noisy | Clean | DB-AIAT | CMGAN | MP-SENet | ZipEnhancerS(λ=0.2, Ours) | ZipEnhancerS(λ=0, Ours) |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 1 | |||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 2 | |||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 3 |
Noisy | Clean | ZipEnhancerS | S2 | S3 | S4 | |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 1 | ||||||
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 2 |
S5 | S6 | S7 | S8 | S(AdamW) | |
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 1 | |||||
![]() |
![]() |
![]() |
![]() |
![]() |
|
Sample 2 |
Acknowledge: We update the github page template refer to https://yxlu-0102.github.io/MP-SENet/.
@misc{wang2025zipenhancerdualpathdownupsamplingbased,
title={ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement},
author={Haoxu Wang and Biao Tian},
year={2025},
eprint={2501.05183},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2501.05183},
}