썬문_도쿄타워_구글프로필용.jpg

I am an Assistant Professor at the University of Tokyo, Japan. I am interested in computer vision, multimedia processing, and data-centric AI. I have conducted OCR tasks such as multilingual text recognition and synthetic visual text generation. Currently, I am learning/focusing on large language models (LLMs) and large multimodal models (LMMs).

CV | email | Google Scholar | LinkedIn | Github

Work experience

Education

Publications

overview.jpg

MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding

FedLLM-RAI.png

Toward Responsible Federated Large Language Models: Leveraging a Safety Filter and Constitutional AI

pdf.jpg

Harnessing PDF Data for Improving Japanese Large Multimodal Models

jmmmu.png

JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation

example_codes.png

Leveraging LLM for Detecting and Explaining LLM-generated Code in Python Programming Courses

CLL.jpg

Cross-Lingual Learning in Multilingual Scene Text Recognition

data.jpg

Character Image Combination for Multilingual Scene Text Recognition: Can We Make High-Performance Synthetic Data Without Fonts?

teaser.jpg

COO: Comic Onomatopoeia Dataset for Recognizing Arbitrary or Truncated Texts

What If We Only Use Real Datasets for Scene Text Recognition? Toward Scene Text Recognition With Fewer Labels

Powered by Fruition