
I am an Assistant Professor at the University of Tokyo, Japan. I am interested in computer vision, multimedia processing, and data-centric AI. I have conducted OCR tasks such as multilingual text recognition and synthetic visual text generation. Currently, I am learning/focusing on large language models (LLMs) and large multimodal models (LMMs).
CV | email | Google Scholar | LinkedIn | Github
Work experience
Education
Publications

MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding
- Jeonghun Baek*, Kazuki Egashira*, Shota Onohara*, Atsuyuki Miyai*, Yuki Imajuku, Hikaru Ikuta, Kiyoharu Aizawa. (*Equal contribution)
- arXiv preprint 2025
[Paper] [Code]

Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI
- Eunchung Noh*, Jeonghun Baek*. (*Equal contribution)
- arXiv preprint 2025
[Paper]

Harnessing PDF Data for Improving Japanese Large Multimodal Models
- Jeonghun Baek, Akiko Aizawa, Kiyoharu Aizawa.
- Association for Computational Linguistics (ACL), Findings, 2025
[Paper] [Code (placeholder)]

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
- Shota Onohara*, Atsuyuki Miyai*, Yuki Imajuku*, Kazuki Egashira*, Jeonghun Baek*, Xiang Yue, Graham Neubig, Kiyoharu Aizawa.
(*Equal contribution)
- Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025,
and Neural Information Processing Systems (NeurIPS) EvalEval Workshop (oral), 2024
[Project page]

Leveraging LLM for Detecting and Explaining LLM-generated Code in Python Programming Courses
- Jeonghun Baek, Tetsuro Yamazaki, Akimasa Morihata, Junichiro Mori, Yoko Yamakata, Kenjiro Taura, Shigeru Chiba.
- ACM Special Interest Group on Computer Science Education (SIGCSE) Technical Symposium, poster, 2025
[Paper]
- BibTeX

Cross-Lingual Learning in Multilingual Scene Text Recognition
- Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa.
- International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
[Paper] [Code]
- BibTeX

Character Image Combination for Multilingual Scene Text Recognition: Can We Make High-Performance Synthetic Data Without Fonts?
- Jeonghun Baek, Eunchung Noh, Yusuke Matsui, Kiyoharu Aizawa.
- International Conference on Computer Vision
(ICCV) Workshop Towards the Next Generation of Computer Vision Datasets (TNGCV) and Doctoral Consortium (ICCVDC), 2023

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts
- Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa.
- European Conference on Computer Vision (ECCV) (acceptance rate 28.4%, 1650/5803), 2022 [Paper] [Code]
- BibTeX

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels
- Jeonghun Baek, Yusuke Matsui, Kiyoharu Aizawa.
- Computer Vision and Pattern Recognition (CVPR) (acceptance rate 23.7%, 1663/7015), 2021 [Paper] [Code]
- BibTeX