Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and ACE-KiSing

In singing voice synthesis (SVS), generating singing voices from musicalscores faces challenges due to limited data availability. This study proposes aunique strategy to address the data scarcity in SVS. We employ an existingsinging voice synthesizer for data augmentation, complemented by detailedmanual tuning, an approach not previously explored in data curation, to reduceinstances of unnatural voice synthesis. This innovative method has led to thecreation of two expansive singing voice datasets, ACE-Opencpop and ACE-KiSing,which are instrumental for large-scale, multi-singer voice synthesis. Throughthorough experimentation, we establish that these datasets not only serve asnew benchmarks for SVS but also enhance SVS performance on other singing voicedatasets when used as supplementary resources. The corpora, pre-trained models,and their related training recipes are publicly available at ESPnet-Muskits(https://github.com/espnet/espnet)

Further reading