Weihao XUAN

Weihao XUAN (宣 偉豪)

I'm a Ph.D. student at Machine Learning and Statistical Data Analysis Lab (杉山・横矢・石田研究室), The University of Tokyo (東京大学), where I'm very fortunate to be advised by Prof. Naoto Yokoya. I'm also under the Junior Research Associate (JRA) program at RIKEN Center for Advanced Intelligence Project.

I obtained a master's degree in Computer Science from Waseda University. Prior to that, I received my bachelor's degree with First-Class Honours in Mechanical Engineering from the University of Leeds, United Kingdom.

My research focuses on natural language understanding, particularly in post-training for LLMs and VLMs. I'm also passionate about AI4Science (Earth Observation & Medical) . I collaborate very closely with my friends Heli Qi and Junjue Wang (UTokyo), as well as several friends in LLM from the United States, Singapore and Japan.

Publications

Preprints

  1. Xuan, W.*, Zeng, Q.*, Qi, H., Wang, J., & Yokoya, N. (2025). Seeing is Believing, but How Much? A Comprehensive Analysis of Verbalized Calibration in Vision-Language Models. arXiv preprint arXiv:2505.20236. * indicates co-first authors
  2. Xuan, W., Yang, R., Qi, H., Zeng, Q., Xiao, Y., Feng, A., Liu, D., Xing, Y., Wang, J., Gao, F., Lu, J., Jiang, Y., Li, H., Li, X., Yu, K., Dong, R., Gu, S., Li, Y., Xie, X., Juefei-Xu, F., Khomh, F., Yoshie, O., Chen, Q., Teodoro, D., Liu, N., Goebel, R., Ma, L., Marrese-Taylor, E., Lu, S., Iwasawa, Y., Matsuo, Y., & Li, I. (2025). MMLU-ProX: A multilingual benchmark for advanced large language model evaluation. arXiv preprint arXiv:2503.10497.
  3. Xuan, W.*, Wang, J.*, Qi, H., Chen, Z., Zheng, Z., Zhong, Y., Xia, J., & Yokoya, N. (2025). DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding. arXiv preprint arXiv:2505.21076. * indicates co-first authors
  4. Wang, J.*, Xuan, W.*, Qi, H., Liu, Z., Liu, K., Wu, Y., Chen, H., Song, J., Xia, J., Zheng, Z., & Yokoya, N. (2025). DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response. arXiv preprint arXiv:2505.21089. * indicates co-first authors
  5. Chen, H., Song, J., Dietrich, O., Broni-Bediako, C., Xuan, W., Wang, J., Shao, X., Wei, Y., Xia, J., Lan, C., Schindler, K., & Yokoya, N. (2025). BRIGHT: A globally distributed multimodal building damage assessment dataset with very-high-resolution for all-weather disaster response. arXiv preprint arXiv:2501.06019.
  6. Ning, C., Gan, W., Xuan, W., & Yokoya, N. (2025). Is pre-training applicable to the decoder for dense prediction? arXiv preprint arXiv:2503.07637.
  7. Xiao, A.*, Xuan, W.*, Qi, H., Xing, Y., Yokoya, N., & Lu, S. (2024). Segment anything with multiple modalities. arXiv preprint arXiv:2408.09085. * indicates co-first authors

Conference Papers

  1. Ning, C., Xuan, W., Gan, W., & Yokoya, N. (2025). LR2Depth: Large-region aggregation at low resolution for efficient monocular depth estimation. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2025). Accepted.
  2. Song, J., Chen, H., Xuan, W., Xia, J., & Yokoya, N. (2024). SynRS3D: A synthetic dataset for global 3D semantic understanding from monocular remote sensing imagery. In The Thirty-eight Conference on Neural Information Processing Systems (NeurIPS 2024). Spotlight Paper [Top 3.1%]
  3. Xiao, A.*, Xuan, W.*, Qi, H., Xing, Y., Ren, R., Zhang, X., Shao, L. & Lu, S. (2024). Cat-sam: Conditional tuning for few-shot adaptation of segment anything model. In European Conference on Computer Vision (ECCV 2024) (pp. 189-206). * indicates co-first authors Oral Paper [Top 2.3%, 200/8585]
  4. Xiao, A., Huang, J., Xuan, W., Ren, R., Liu, K., Guan, D., El Saddik, A., Lu, S., & Xing, E. P. (2023). 3D semantic segmentation in the wild: Learning generalized models for adverse-condition point clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023) (pp. 9382-9392).
  5. Xuan, W., Ren, R., Wu, S., & Chen, C. (2022). Maskvo: Self-supervised visual odometry with a learnable dynamic mask. In 2022 IEEE/SICE International Symposium on System Integration (SII) (pp. 225-231). IEEE.
  6. Lin, Y., Xuan, W., Ren, R., & Liu, J. (2021). On a discrete-time network SIS model with opinion dynamics. In 2021 60th IEEE Conference on Decision and Control (CDC) (pp. 2098-2103). IEEE.
  7. Xuan, W., Ren, R., Paré, P. E., Ye, M., Ruf, S., & Liu, J. (2020). On a network SIS model with opinion dynamics. IFAC-PapersOnLine, 53(2), 2582-2587.

Journal Papers

  1. Xiao, A., Xuan, W., Wang, J., Huang, J., Tao, D., Lu, S., & Yokoya, N. (2025). Foundation models for remote sensing and Earth Observation: A survey. IEEE Geoscience and Remote Sensing Magazine. Accepted.

Professional Activities

Reviews

Conference: NeurIPS, CVPR, ICCV, ICCVW, ACMMM, IROS, ICDL, SII, CPHS
Journal: Pattern Recognition, ISPRS Journal of Photogrammetry and Remote Sensing, IEEE Transactions on Geoscience and Remote Sensing