2024-June-14 Baseline system is available.
Date(AoE Time) | Event |
---|---|
June 15, 2025 | Challenge launch Baseline system & Development metadata release |
July 22, 2025 | Evaluation metadata release |
August 1, 2025 | Final result & Code submission |
August 8, 2025 | Results announcement |
August 15, 2025 | Special session submission deadline of GC paper |
August 22, 2025 | Acceptance notification of GC paper |
For the APSIPA ASC 2025 grand challenge "City and Time-Aware Semi-supervised Acoustic Scene Classification", we provide a development dataset comprising approximately 24 hours of audio recordings from the Chinese Acoustic Scene (CAS) 2023 dataset. This challenge introduces previously unutilized contextual metadata that accompanies each recording:
City information: Identification of the recording location among 22 diverse Chinese cities (e.g., Xi'an, Beijing, Shanghai)
Timestamp information: Precise recording time accurate to year, month, day, hour, minute, and second
The CAS 2023 dataset is a large-scale dataset that serves as a foundation for research related to environmental acoustic scenes. The dataset includes 10 common acoustic scenes, with a total duration of over 130 hours. Each audio clip is 10 seconds long with metadata about the recording location and timestamp. The data collection spanned from April 2023 to September 2023, covering 22 different cities across China.
Acoustic scenes (10): Bus, Airport, Metro, Restaurant, Shopping mall, Public square, Urban park, Traffic street, Construction site, Bar
More details can be found at https://arxiv.org/abs/2402.02694.
The audio recordings of development dataset can be found at https://zenodo.org/records/10616533.
The audio recordings of evaluation dataset can be found at https://zenodo.org/records/10820626.
Metadata of development and evaluation datasets will be release at https://github.com/JishengBai/APSIPA2025GC-ASC/tree/main/metadata.
The baseline system for the APSIPA ASC 2025 GC "City and Time-Aware Semi-supervised Acoustic Scene Classification" challenge is based on a multimodal semi-supervised framework with a pre-trained SE-Trans model. Baseline codes are released in github. Systems will be ranked by macro-average accuracy (average of the class-wise accuracies). If two teams got the same score on the evaluation dataset, the team with the smaller model size will be ranked higher.