Training Dataset
The official training release includes approximately 16K hours of bilingual speech and around 3M segments for context-aware CoT-TTS research and system development.
Data & models
This page provides the official training dataset and two baseline models for Track 1 and Track 2.
The official training release includes approximately 16K hours of bilingual speech and around 3M segments for context-aware CoT-TTS research and system development.
A reproducible 0.6B Qwen3-style baseline with a three-stage training strategy for context-aware reasoning and speech generation.
Uses speaker-labeled textual dialogue context together with target text and reference speech.
A reproducible 0.6B Qwen3-style baseline with a three-stage training strategy for reasoning from acoustic dialogue history.
Uses continuous multi-speaker audio context together with target text and reference speech.
All resources will be released through the official challenge website. Models and data are provided for academic research and challenge participation only. Submitted systems must include all required files because official evaluation runs without internet access.