A Semantically Consistent Dataset for Data-Efficient Query-Based Universal Sound Separation

━ Submitted to ICML 2026 ━

[Paper 📝]

[Code ⚙️]

Overview of the proposed automated pipeline. The framework consists of three coupled stages: (1) Ontology Reconstruction & Data Preprocessing. (2) Single-source Semantic-acoustic Alignment. (3) Super-resolution-based Standardization.

Hive Dataset

Dataset composition across sources.

Mixture type distribution (2-5 mix).

Label frequency statistics.

Performance & Efficiency

Separation performance results.

Efficiency comparison across models.

Demo Samples

Below we show inference results for different models on four types of mixture (2mix, 3mix, 4mix, 5mix).

For each mixture type, we present five test samples. AudioSep and FlowSep provide Hive-trained versions, selectable via the Model Weights dropdown next to each model.

Select model, mix type, and sample to view full comparison

Mix Type Sample Model A Model B

Acknowledgements

Website template was borrowed from Colorful Image Colorization and Nerfies.