Chun-Yi Kuan
"Silence, I discover, is something you can actually hear." — Kafka on the Shore
Hello! I am a Ph.D. student at the National Taiwan University (NTU), where I am a member of the Speech Processing and Machine Learning (SPML) Lab advised by Prof. Hung-yi Lee. My research focuses on building trustworthy audio-aware large language models, with an emphasis on hallucination, abstention, and robust audio-language alignment. I am also interested in controllable audio generation, including instruction-guided text-to-audio and text-to-speech systems.
News
- 2026.06 I made a little book game — Guess My Bookshelf.
- 2026.06 Excited to share that our paper, Improving Text-to-Audio Instruction Following via Fine-Grained Feedback from Audio-Aware Large Language Models, has been accepted as a long paper at INTERSPEECH 2026 🇦🇺 🐨 🦘.
Selected Publications
-
Improving Text-to-Audio Instruction Following via Fine-Grained Feedback from Audio-Aware Large Language Models
TL;DRUses fine-grained feedback from audio-aware LLMs to make text-to-audio models follow instructions more faithfully.
-
AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering
TL;DRA benchmark that tests whether audio QA models know when a question has no answer, not just when they can find one.
-
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
TL;DRBenchmarks whether spoken language models can handle timing, tempo, and synchronized speech in real-time conversation.
-
AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering
TL;DRScores text-to-audio alignment from an audio-aware LLM's confidence in answering 'Yes' to targeted questions, catching fine-grained mismatches that similarity metrics like CLAPScore miss.
-
Walking Through Uncertainty: An Empirical Study of Uncertainty Estimation for Audio-Aware Large Language Models
TL;DRThe systematic study of uncertainty estimation for audio-aware LLMs, finding that semantic and verification-based methods win on general reasoning but their advantage breaks down on hallucination and unanswerable-question benchmarks.
-
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
TL;DRBootstraps audio–language alignment with synthetic data to push audio-aware LLMs from basic alignment toward stronger reasoning.
-
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
TL;DRCurbs hallucinations in audio-aware LLMs by teaching them what is NOT in the audio using synthesized negative samples.
-
Gender Bias in Instruction-Guided Speech Synthesis Models
TL;DRAudits and quantifies gender bias in instruction-guided speech synthesis models.
-
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
TL;DRProbes whether audio-LLMs truly 'hear' via multi-task assessment and stepwise audio reasoning to reduce hallucinations.
-
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
TL;DRSpeech-Copilot lets an LLM solve speech tasks by decomposing them into modular, callable programs.
-
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
TL;DRShows large audio-language models often hallucinate objects/sounds, and frames the object-hallucination problem.