ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement

Speech Lab, Alibaba Group, China

Abstract

In contrast to other sequence tasks modeling three-dimensional hidden layer features, Dual-Path time and time-frequency domain speech enhancement models are effective and with low parameters but computationally demanding due to their four-dimensional hidden layer features. We propose ZipEnhancer, which is Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement, incorporating time and frequency domain Down-Up sampling to reduce computational costs. We introduce the ZipformerBlock as the core block and propose the design of the Dual-Path DownSampleStacks that symmetrically scales down and scales up. Also we introduce the ScaleAdam optimizer and Eden learning rate scheduler to further improve the performance, Our model achieves new state-of-the-art results on the DNS 2020 Challenge and Voicebank+DEMAND datasets, with a perceptual evaluation of speech quality (PESQ) of 3.69 and 3.63, using 2.04M parameters and 62.41G FLOPS, outperforming other methods with similar complexity levels.


I. Audio Samples of Speech Enhancement


More Audio Samples can be found at https://github.com/ZipEnhancer/ZipEnhancer.


DNS Challenge Dataset


Noisy Clean FRCRN MFNet MP-SENet ZipEnhancerS (Ours) ZipEnhancerM (Ours)
Sample 1
Sample 2
Sample 3

VoiceBank+DEMAND Dataset


Scene Noisy Clean DB-AIAT CMGAN MP-SENet ZipEnhancerS(λ=0.2, Ours) ZipEnhancerS(λ=0, Ours)
Sample 1
Sample 2
Sample 3


II. Different model configurations and Ablation study on the DNS2020 dataset.


Noisy Clean ZipEnhancerS S2 S3 S4
Sample 1
Sample 2


S5 S6 S7 S8 S(AdamW)
Sample 1
Sample 2


Acknowledge: We update the github page template refer to https://yxlu-0102.github.io/MP-SENet/.


BibTeX

@misc{wang2025zipenhancerdualpathdownupsamplingbased,
      title={ZipEnhancer: Dual-Path Down-Up Sampling-based Zipformer for Monaural Speech Enhancement}, 
      author={Haoxu Wang and Biao Tian},
      year={2025},
      eprint={2501.05183},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2501.05183}, 
    }