썬문_도쿄타워_구글프로필용.jpg

I am an Assistant Professor at the University of Tokyo, Japan. I am interested in computer vision, multimedia processing, and data-centric AI. I have conducted OCR tasks such as multilingual text recognition and synthetic visual text generation. Currently, I am focusing on large language models (LLMs) and large multimodal models (LMMs).

CV | email | Google Scholar | LinkedIn | Github

Work experience

Education

Publications

(*: Equal contribution)

overview.jpg

MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding

quali.jpg

Exploring LMM-as-a-Judge for Image Harmonization Evaluation

teaser.jpg

Enhancing Safety Judgment on LLM Responses via Text-to-Image Generation

FedLLM-RAI.png

Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI

pdf.jpg

Harnessing PDF Data for Improving Japanese Large Multimodal Models

jmmmu.png

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

example_codes.png

Leveraging LLM for Detecting and Explaining LLM-generated Code in Python Programming Courses

CLL.jpg

Cross-Lingual Learning in Multilingual Scene Text Recognition

data.jpg

Character Image Combination for Multilingual Scene Text Recognition: Can We Make High-Performance Synthetic Data Without Fonts?

teaser.jpg

Powered by Fruition