Chun-Yi Kuan

Hello! I’m Chun-Yi, a first-year Ph.D student at NTU Speech Processing and Machine Learning (SPML) Lab, supervised by Prof. Hung-yi Lee.

My research focuses on multi-modality large language models, exploring how to establish robust audio-language alignment to address recent trustworthiness issues, such as hallucination phenomena related to sound events in audio. I’m also involved in the Dynamic-SUPERB project phase 1 and 2, which benchmarks the performance of large audio-language models across universal speech, audio, and music tasks.

My previous research centered on text-guided speech generation tasks, investigating how to use textual information to guide the generation of high-quality speech with desired styles and prosody.

news

May 28, 2025	Our paper, “Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples”, has been accepted to Interspeech 2025 🇳🇱.
Jul 16, 2024	🚀 Excited to share our real-world application of using LLMs as automatic assignment evaluators in our Intro to Generative AI course at NTU with over 1000 students! Led by Prof. Hung-yi Lee and with tremendous contributions from Cheng-Han Chiang as the head TA. His dedication was crucial to the success of this work. Check out our findings and insights here: https://arxiv.org/abs/2407.05216
Apr 15, 2024	Excited to share 🔱 Speech Trident- Awesome Speech LM

selected publications

INTERSPEECH

Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples

Chun-Yi Kuan , and Hung-yi Lee

2025

arXiv Bib PDF

@article{kuan2025teaching,
  title = {Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples},
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  pages = {1--6},
  year = {2025},
  organization = {ISCA},
  bibtex_show = true,
}

NAACL 2025

Gender Bias in Instruction-Guided Speech Synthesis Models

Chun-Yi Kuan , and Hung-yi Lee

2025

Bib PDF

@misc{kuan2025genderbiasinstructionguidedspeech,
  title = {Gender Bias in Instruction-Guided Speech Synthesis Models},
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  year = {2025},
  primaryclass = {cs.CL},
  bibtex_show = true,
  booktitle = {2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics}
}

ICASSP 2025

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning

Chun-Yi Kuan , and Hung-yi Lee

In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2025

arXiv Bib PDF

@inproceedings{kuan2024can,
  title = {Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning},
  author = {Kuan, Chun-Yi and Lee, Hung-yi},
  booktitle = {ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year = {2025},
  organization = {IEEE},
  bibtex_show = true,
}

SLT 2024

Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation

Chun-Yi Kuan , Chih-Kai Yang , Wei-Ping Huang , and 2 more authors

2024

arXiv Bib PDF

@misc{kuan2024speechcopilotleveraginglargelanguage,
  title = {Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation},
  author = {Kuan, Chun-Yi and Yang, Chih-Kai and Huang, Wei-Ping and Lu, Ke-Han and Lee, Hung-yi},
  year = {2024},
  booktitle = {IEEE Spoken Language Technology Workshop 2024 (SLT)},
  url = {https://arxiv.org/abs/2407.09886},
  bibtex_show = true,
  organization = {IEEE},
}

EMNLP 2024

Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course

Cheng-Han Chiang , Wei-Chih Chen , Chun-Yi Kuan , and 2 more authors

In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , Nov 2024

arXiv Bib PDF

@inproceedings{chiang-etal-2024-large,
  title = {Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course},
  author = {Chiang, Cheng-Han and Chen, Wei-Chih and Kuan, Chun-Yi and Yang, Chienchou and Lee, Hung-yi},
  booktitle = {Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
  month = nov,
  year = {2024},
  address = {Miami, Florida, USA},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.emnlp-main.146},
  doi = {10.18653/v1/2024.emnlp-main.146},
  pages = {2489--2513},
  bibtex_show = true,
}

INTERSPEECH

Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models

Chun-Yi Kuan , Wei-Ping Huang , and Hung-yi Lee

Nov 2024

arXiv Bib PDF

@article{kuan2024understanding,
  title = {Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models},
  author = {Kuan, Chun-Yi and Huang, Wei-Ping and Lee, Hung-yi},
  booktitle = {2024 Conference of the International Speech Communication Association (INTERSPEECH)},
  pages = {1--6},
  year = {2024},
  organization = {ISCA},
  bibtex_show = true,
}

ASRU 2023

Towards General-Purpose Text-Instruction-Guided Voice Conversion

Chun-Yi Kuan , Chen-An Li , Tsu-Yuan Hsu , and 5 more authors

In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , Nov 2023

arXiv Bib PDF

@inproceedings{kuan2023towards,
  title = {Towards General-Purpose Text-Instruction-Guided Voice Conversion},
  author = {Kuan, Chun-Yi and Li, Chen-An and Hsu, Tsu-Yuan and Lin, Tse-Yang and Chung, Ho-Lam and Chang, Kai-Wei and Chang, Shuo-Yiin and Lee, Hung-yi},
  booktitle = {2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  pages = {1--8},
  year = {2023},
  organization = {IEEE},
  bibtex_show = true,
}