Chun-Yi Kuan

"Silence, I discover, is something you can actually hear." — Kafka on the Shore

Hello! I am a Ph.D. student at the National Taiwan University (NTU), where I am a member of the Speech Processing and Machine Learning (SPML) Lab advised by Prof. Hung-yi Lee. My research focuses on building trustworthy audio-aware large language models, with an emphasis on hallucination, abstention, and robust audio-language alignment. I am also interested in controllable audio generation, including instruction-guided text-to-audio and text-to-speech systems.

Chun-Yi Kuan

News

  • 2026.06 I made a little book game — Guess My Bookshelf.
  • 2026.06 Excited to share that our paper, Improving Text-to-Audio Instruction Following via Fine-Grained Feedback from Audio-Aware Large Language Models, has been accepted as a long paper at INTERSPEECH 2026 🇦🇺 🐨 🦘.

Selected Publications

  1. Improving Text-to-Audio Instruction Following via Fine-Grained Feedback from Audio-Aware Large Language Models

    Chun-Yi Kuan, Siwon Kim, Byeonggeun Kim, Suyoun Kim, Bo-Ru Lu, Qingming Tang, Ankur Gandhe, Hung-yi Lee, Chieh-Chi Kao, Chao Wang

    Interspeech 2026

    TL;DRUses fine-grained feedback from audio-aware LLMs to make text-to-audio models follow instructions more faithfully.

  2. AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering

    Chun-Yi Kuan, Hung-yi Lee

    ICASSP 2026

    TL;DRA benchmark that tests whether audio QA models know when a question has no answer, not just when they can find one.

  3. Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

    Kai-Wei Chang†, En-Pei Hu†, Chun-Yi Kuan, Wenze Ren, Wei-Chih Chen, Guan-Ting Lin, Yu Tsao, Shao-Hua Sun, Hung-yi Lee, James Glass

    ICASSP 2026

    TL;DRBenchmarks whether spoken language models can handle timing, tempo, and synchronized speech in real-time conversation.

  4. AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering

    Chun-Yi Kuan, Kai-Wei Chang, Hung-yi Lee

    arXiv preprint · 2026

    TL;DRScores text-to-audio alignment from an audio-aware LLM's confidence in answering 'Yes' to targeted questions, catching fine-grained mismatches that similarity metrics like CLAPScore miss.

  5. Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models

    Chun-Yi Kuan, Wei-Ping Huang, Hung-yi Lee

    arXiv preprint · 2026

    TL;DRThe systematic study of uncertainty estimation for audio-aware LLMs, finding that semantic and verification-based methods win on general reasoning but their advantage breaks down on hallucination and unanswerable-question benchmarks.

  6. From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data

    Chun-Yi Kuan, Hung-yi Lee

    IEEE TASLP · 2025

    TL;DRBootstraps audio–language alignment with synthetic data to push audio-aware LLMs from basic alignment toward stronger reasoning.

  7. Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples

    Chun-Yi Kuan, Hung-yi Lee

    Interspeech 2025

    TL;DRCurbs hallucinations in audio-aware LLMs by teaching them what is NOT in the audio using synthesized negative samples.

  8. Gender Bias in Instruction-Guided Speech Synthesis Models

    Chun-Yi Kuan, Hung-yi Lee

    NAACL 2025

    TL;DRAudits and quantifies gender bias in instruction-guided speech synthesis models.

  9. Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning

    Chun-Yi Kuan, Hung-yi Lee

    ICASSP 2025

    TL;DRProbes whether audio-LLMs truly 'hear' via multi-task assessment and stepwise audio reasoning to reduce hallucinations.

  10. Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation

    Chun-Yi Kuan†, Chih-Kai Yang†, Wei-Ping Huang, Ke-Han Lu, Hung-yi Lee

    SLT 2024

    TL;DRSpeech-Copilot lets an LLM solve speech tasks by decomposing them into modular, callable programs.

  11. Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

    Chun-Yi Kuan, Wei-Ping Huang, Hung-yi Lee

    Interspeech 2024

    TL;DRShows large audio-language models often hallucinate objects/sounds, and frames the object-hallucination problem.

All publications →