About Me

I'm Zhishuo Zhao, a Ph.D. candidate in the School of Computer Science at Sichuan University. My research focuses on multimodal learning, affective computing, and robust speech understanding under real-world conditions. Specifically, I work on multimodal sentiment analysis, cross-modal contrastive optimization, and noise-resilient speech recognition. I aim to build interpretable and generalizable models that bridge language, vision, and audio for emotionally-aware AI systems. In practice, I integrate algorithm design with system implementation, and have experience in full-stack research — from model prototyping to academic writing and real-world deployment.

🚀 What I’m Doing

Web Development

Building responsive front-end applications with React and Vue.

Mobile Apps

Crafting cross-platform apps using Flutter and React Native.

Data Science

Analyzing data and training ML models in Python.

Music

Playing the cello and composing ambient pieces.

🔥 News

[2025] 🎉 Our paper "AV-RISE: Hierarchical Cross-Modal Denoising for Learning Robust Audio-Visual Speech Representation" is accepted at ACM MM 2025.
[2024] 📖 Published "WinNet: Make Only One Convolutional Layer Effective for Time Series Forecasting" in ICIC 2025.
[2024] 🎉 Our paper "AMG-AVSR: Adaptive Modality Guidance for Audio-Visual Speech Recognition via Progressive Feature Enhancement" is accepted at ACML 2024.