Result: MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Title:

MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Authors:

Li, Kaixin, Tian, Yuchen, Hu, Qisheng, Luo, Ziyang, Huang, Zhiyong, Ma, Jing

Publisher Information:

2024-04-15 2024-09-26

Document Type:

Electronic Resource Electronic Resource

Index Terms:

Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Software Engineering, text

URL:

http://arxiv.org/abs/2404.09486

Availability:

Open access content. Open access content

Other Numbers:

COO oai:arXiv.org:2404.09486
1438546338

Contributing Source:

CORNELL UNIV
From OAIster®, provided by the OCLC Cooperative.

Accession Number:

edsoai.on1438546338

Database:

OAIster

Further Information

Programming often involves converting detailed and complex specifications into code, a process during which developers typically utilize visual aids to more effectively convey concepts. While recent developments in Large Multimodal Models have demonstrated remarkable abilities in visual reasoning and mathematical tasks, there is little work on investigating whether these models can effectively interpret visual elements for code generation. To this end, we present MMCode, the first multi-modal coding dataset for evaluating algorithmic problem-solving skills in visually rich contexts. MMCode contains 3,548 questions and 6,620 images collected from real-world programming challenges harvested from 10 code competition websites, presenting significant challenges due to the extreme demand for reasoning abilities. Our experiment results show that current state-of-the-art models struggle to solve these problems. The results highlight the lack of powerful vision-code models, and we hope MMCode can serve as an inspiration for future works in this domain. The data and code are publicly available at https://github.com/likaixin2000/MMCode.
Comment: EMNLP 2024

Result: MMCode: Benchmarking Multimodal Large Language Models for Code Generation with Visually Rich Programming Problems

Further Information

Links

Additional functions