Chun-Yi Kuan

Hello! I’m Chun-Yi, a second-year Ph.D student at NTU Speech Processing and Machine Learning (SPML) Lab, supervised by Prof. Hung-yi Lee.

My research focuses on multi-modality large language models, exploring how to establish robust audio-language alignment to address recent trustworthiness issues, such as hallucination phenomena related to sound events in audio.

My previous research centered on text-guided speech generation tasks, investigating how to use textual information to guide the generation of high-quality speech with desired styles and prosody.

news

Jan 18, 2026	Our paper, “AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering,” has been accepted to ICASSP 2026. See you in Barcelona!
May 28, 2025	Our paper, “Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples”, has been accepted to Interspeech 2025 🇳🇱.
Apr 15, 2024	Excited to share 🔱 Speech Trident- Awesome Speech LM

selected publications

In Progress

AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering

Chun-Yi Kuan , Kai-Wei Chang , and Hung-yi Lee

arXiv preprint arXiv:2601.14728, 2026

@article{kuan2026aqascore,
  title = {AQAScore: Evaluating Semantic Alignment in Text-to-Audio Generation via Audio Question Answering},
  author = {Kuan, Chun-Yi and Chang, Kai-Wei and Lee, Hung-yi},
  journal = {arXiv preprint arXiv:2601.14728},
  year = {2026},
  bibtex_show = true,
}

AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering

Chun-Yi Kuan , and Hung-yi Lee

In ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2026

@inproceedings{kuan2024can,
  title = {AQUA-Bench: Beyond Finding Answers to Knowing When There Are None in Audio Question Answering},
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  booktitle = {ICASSP 2026-2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year = {2026},
  organization = {IEEE},
  bibtex_show = true,
}

TASLP

From Alignment to Advancement: Bootstrapping Audio-Language Alignment With Synthetic Data

Chun-Yi Kuan , and Hung-yi Lee

In IEEE/ACM Transactions on Audio, Speech, and Language Processing , 2025

@inproceedings{Kuan2025FromAlignment,
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  title = {From Alignment to Advancement: Bootstrapping Audio-Language Alignment With Synthetic Data},
  booktitle = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
  year = {2025},
  volume = {33},
  pages = {4604--4619},
  doi = {10.1109/TASLPRO.2025.3626233},
  organization = {IEEE},
  bibtex_show = true,
}

Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples

Chun-Yi Kuan , and Hung-yi Lee

In 2025 Conference of the International Speech Communication Association (INTERSPEECH) , 2025

@inproceedings{kuan2025teaching,
  title = {Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples},
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  booktitle = {2025 Conference of the International Speech Communication Association (INTERSPEECH)},
  year = {2025},
  organization = {ISCA},
  bibtex_show = true,
}

Gender Bias in Instruction-Guided Speech Synthesis Models

Chun-Yi Kuan , and Hung-yi Lee

In 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics , 2025

@inproceedings{kuan2025genderbiasinstructionguidedspeech,
  title = {Gender Bias in Instruction-Guided Speech Synthesis Models},
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  year = {2025},
  primaryclass = {cs.CL},
  bibtex_show = true,
  booktitle = {2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics}
}

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning

Chun-Yi Kuan , and Hung-yi Lee

In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2025

@inproceedings{kuan2024cao,
  title = {Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning},
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  booktitle = {ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year = {2025},
  organization = {IEEE},
  bibtex_show = true,
}

Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation

Chun-Yi Kuan , Chih-Kai Yang , Wei-Ping Huang , and 2 more authors

In IEEE Spoken Language Technology Workshop 2024 (SLT) , 2024

@inproceedings{kuan2024speechcopilotleveraginglargelanguage,
  title = {Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation},
  author = {Kuan, Chun-Yi and Yang, Chih-Kai and Huang, Wei-Ping and Lu, Ke-Han and Lee, Hung-yi},
  year = {2024},
  booktitle = {IEEE Spoken Language Technology Workshop 2024 (SLT)},
  url = {https://arxiv.org/abs/2407.09886},
  bibtex_show = true,
  organization = {IEEE},
}

Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Cheng-Han Chiang , Wei-Chih Chen , Chun-Yi Kuan , and 2 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Nov 2024

@inproceedings{chiang-etal-2024-large,
  title = {Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course},
  author = {Chiang, Cheng-Han and Chen, Wei-Chih and Kuan, Chun-Yi and Yang, Chienchou and Lee, Hung-yi},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  month = nov,
  year = {2024},
  address = {Miami, Florida, USA},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.emnlp-main.146},
  doi = {10.18653/v1/2024.emnlp-main.146},
  pages = {2489--2513},
  bibtex_show = true,
}

Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

Chun-Yi Kuan , Wei-Ping Huang , and Hung-yi Lee

In 2024 Conference of the International Speech Communication Association (INTERSPEECH) , Nov 2024

@inproceedings{kuan2024understanding,
  title = {Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models},
  author = {Kuan, Chun-Yi and Huang, Wei-Ping and Lee, Hung-yi},
  booktitle = {2024 Conference of the International Speech Communication Association (INTERSPEECH)},
  pages = {1--6},
  year = {2024},
  organization = {ISCA},
  bibtex_show = true,
}

Towards General-Purpose Text-Instruction-Guided Voice Conversion

Chun-Yi Kuan , Chen-An Li , Tsu-Yuan Hsu , and 5 more authors

In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , Nov 2023

@inproceedings{kuan2023towards,
  title = {Towards General-Purpose Text-Instruction-Guided Voice Conversion},
  author = {Kuan, Chun-Yi and Li, Chen-An and Hsu, Tsu-Yuan and Lin, Tse-Yang and Chung, Ho-Lam and Chang, Kai-Wei and Chang, Shuo-Yiin and Lee, Hung-yi},
  booktitle = {2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages = {1--8},
  year = {2023},
  organization = {IEEE},
  bibtex_show = true,
}