I am working at the Aizawa-Yamakata-Matsui Lab as a Postdoc. I am interested in computer vision, multimedia processing, and data-centric AI. I have conducted OCR tasks such as multilingual text recognition and synthetic visual text generation. Currently, I am learning/focusing on large language models (LLMs) and large multimodal models (LMMs).
CV | email | Google Scholar | LinkedIn | Github
Mantra Inc., Japan, Jun. 2023 - Present
I am working on recognizing onomatopoeia texts in Japanese comics for comic translation using LMM, with Ryota Hinami (about 8 hours per week).
The University of Tokyo, Japan, Apr. 2023 - Present
I am working on several projects related to text recognition. I obtained funds from the Japanese government.
Google Research, Oct. 2022 - Jan. 2023
As a student researcher, I worked in the Google OCR team (16 hours per week, from the Google Japan office or home). I surveyed the TextVQA task (Visual Question Answering with text recognition) and implemented part of the baselines with Yasuhisa Fujii.
Clova AI Research, NAVER Corp., South Korea, Jan. 2018 - Mar. 2020
Developed scene text recognition (STR) model, which recognizes text in the natural scene.
Language Analytics, NCSOFT Corp., South Korea, Apr. 2016 - Dec. 2017
Developed sentence embedding model for question/document clustering and text style transfer model for colloquial text generation.