中文 / English

Zhihan Zhu 朱旨函 (Zhihan Zhu)

I am a third-year undergraduate at the Artificial Intelligence Lab of SUSTech, advised by Prof. Zhihai He (IEEE Fellow). My research focuses on Generative AI, particularly Diffusion Models and Flow Matching. I am also broadly interested in Multimodal Large Language Models, LLM Agents, and Long-Video Understanding and Visual Memory.

My long-term goal is to develop unified visual models that integrate visual understanding and generation within a single framework.

我是南方科技大学人工智能实验室的大三本科生,导师为 何志海教授(IEEE Fellow)。我的研究方向聚焦于 Generative AI,特别是 Diffusion ModelsFlow Matching。我同时对 多模态大语言模型 (MLLMs)LLM Agents,以及 长视频理解与视觉记忆 等方向充满兴趣。

我的长期研究目标是构建兼具视觉理解与生成能力的统一视觉模型。

Zhihan Zhu

Education 教育经历

Publications 论文

HyGRAIL Thumbnail

HyGRAIL: Cost-Aware and Evidence-Grounded Scientific Hypothesis Discovery over Knowledge Graphs

Yihang Sun, Zhihan Zhu, Zhiyuan Jiang, Jingyi Ge, Zixuan Li, Jiaxuan You
Under Review · EMNLP 2026
A cost-aware GNN–LLM framework for scientific hypothesis discovery over knowledge graphs, integrating heterogeneous graph triage, knowledge-graph evidence retrieval, and LLM-based hypothesis review.
提出了一种面向科学假设发现的成本感知 GNN–LLM 框架,融合异质图分流、知识图谱证据检索与基于 LLM 的假设审查,实现对稀疏与模糊候选关系的可靠验证。
Rectified Flow Inversion Thumbnail

Runge-Kutta Approximation and Decoupled Attention for Rectified Flow Inversion and Semantic Editing

Weiming Chen, Zhihan Zhu, Yijia Wang, Zhihai He
Under Review · IEEE Transactions on Image Processing, 2025 · arXiv:2509.12888
We propose a high-order inversion method for rectified flow models using a Runge–Kutta solver, enabling state-of-the-art fidelity and precise semantic control via Decoupled Diffusion Transformer Attention (DDTA).
提出了一种基于 Runge–Kutta 求解器的 Rectified Flow 模型高阶反演方法。通过引入解耦扩散 Transformer 注意力机制(DDTA),实现了极高的重建保真度与精确的语义控制。
Generative Semantic Coding Thumbnail

Generative Semantic Coding for Ultra-Low Bitrate Visual Communication and Analysis

Weiming Chen, Yijia Wang, Zhihan Zhu, Zhihai He
Under Review · IEEE Transactions on Image Processing, 2025 · arXiv:2510.27324
A generative semantic coding framework that combines deep compression with rectified-flow generation, supporting both ultra-low-bitrate visual communication and downstream visual analysis.
提出了一种生成式语义编码框架,将深度压缩与 Rectified Flow 生成相结合,支持超低比特率的视觉通信以及下游视觉分析任务。
Latent Bias Alignment Thumbnail

Latent Bias Alignment for High-Fidelity Diffusion Inversion in Real-World Image Reconstruction and Manipulation

Weiming Chen, Qifan Liu, Siyi Liu, Yijia Wang, Zhihan Zhu, Zhihai He
Under Review · IEEE Transactions on Circuits and Systems for Video Technology, 2026 · arXiv:2603.23903

Patents 发明专利

Honors & Awards 荣誉与奖项

Technical Skills 技术能力

Programming: Python, C/C++, Java, MATLAB
Deep Learning: PyTorch, Hugging Face Diffusers, Stable Diffusion, FLUX, SDXL, DiT, ControlNet, VAE, GNNs, vLLM, verl, LLM/MLLM pipelines
Systems: Linux, Git, CUDA, HPC/LSF, Multi-GPU Training, LaTeX
Languages: Chinese (native), English (IELTS 6.5)