About Me
I'm Zhishuo Zhao, a Ph.D. candidate in the School of Computer Science at Sichuan University. My research focuses on multimodal learning, affective computing, and robust speech understanding under real-world conditions. Specifically, I work on multimodal sentiment analysis, cross-modal contrastive optimization, and noise-resilient speech recognition. I aim to build interpretable and generalizable models that bridge language, vision, and audio for emotionally-aware AI systems. In practice, I integrate algorithm design with system implementation, and have experience in full-stack research — from model prototyping to academic writing and real-world deployment.
🚀 What I’m Doing
Web Development
Building responsive front-end applications with React and Vue.
Mobile Apps
Crafting cross-platform apps using Flutter and React Native.
Data Science
Analyzing data and training ML models in Python.
Music
Playing the cello and composing ambient pieces.
🔥 News
- [2025] 🎉 Our paper "AV-RISE: Hierarchical Cross-Modal Denoising for Learning Robust Audio-Visual Speech Representation" is accepted at ACM MM 2025.
- [2024] 📖 Published "WinNet: Make Only One Convolutional Layer Effective for Time Series Forecasting" in ICIC 2025.
- [2024] 🎉 Our paper "AMG-AVSR: Adaptive Modality Guidance for Audio-Visual Speech Recognition via Progressive Feature Enhancement" is accepted at ACML 2024.