I am an Assistant Professor at the University of Tokyo, Japan. I am interested in computer vision, multimedia processing, and data-centric AI. I have conducted OCR tasks such as multilingual text recognition and synthetic visual text generation. Currently, I am learning/focusing on large language models (LLMs) and large multimodal models (LMMs).
CV | email | Google Scholar | LinkedIn | Github
The University of Tokyo, Japan, Apr. 2024 - Present
I am working at the Mathematics and Informatics Center, Graduate School of Information Science and Technology.
Mantra Inc., Japan, Jun. 2023 - Mar. 2024
I worked on recognizing onomatopoeia texts in Japanese comics for comic translation using LMM, with Ryota Hinami (about 8 hours per week).
The University of Tokyo, Japan, Apr. 2023 - Mar. 2024
I worked on several projects related to text recognition. I obtained funds from the Japanese government.
Google Research, Oct. 2022 - Jan. 2023
As a student researcher, I worked in the Google OCR team (16 hours per week, from the Google Japan office or home). I surveyed the TextVQA task (Visual Question Answering with text recognition) and implemented part of the baselines with Yasuhisa Fujii.
*Clova AI Research, NAVER Corp., South Korea,* Jan. 2018 - Mar. 2020
Developed scene text recognition (STR) model, which recognizes text in the natural scene.
Language Analytics, NCSOFT Corp., South Korea, Apr. 2016 - Dec. 2017
Developed sentence embedding model for question/document clustering and text style transfer model for colloquial text generation.