Context Input
Provide dialogue context, target text, and a reference speech sample as the conditioning signal for style inference.
Challenge tasks
Given dialogue context, target text, and a reference speech sample, the system should infer the reasoning analysis and generate a speech waveform that stays consistent with both the reasoning analysis and the reference speaker's timbre.
Provide dialogue context, target text, and a reference speech sample as the conditioning signal for style inference.
Before synthesis, the causes and consequences are analyzed and the speaking style of the target audio is summarized.
Generate speech that preserves the reference speaker timbre while matching the inferred speaking manner and scene context.
The official evaluation is designed to measure both the quality of the generated speech and the reliability of the model's reasoning process. Each submission will be assessed through a unified pipeline that combines automatic speech metrics, multimodal LLM-based judgment, and human subjective evaluation, ensuring that the final ranking reflects naturalness, contextual appropriateness, reasoning quality, and speech-reasoning consistency.
Speech quality, intelligibility, speaker similarity, prosody, expression, and efficiency.
Contextual understanding, internal logical coherence, and informativeness of the reasoning.
Contextual coherence, reasoning accuracy, informativeness, naturalness, and speech-reasoning consistency.