Using Large Language Models for Generating Multi-Level Commit Messages: Insights from Professor Partha Pratim Das’s Recent Research Publication
In this article, Professor Partha Pratim Das, from Department of Computer Science at Ashoka University, talks about his recently published research paper “Using Large Language Models for multi-level commit message generation for large diffs.” The study highlights the frustration developers experience while looking at a large set of code changes with commit messages, which are poor quality, inconsistent, too short, or lacking key information. To address this issue, it aims to explore how recent advances in Large Language Models (LLMs), such as GPT-4 and LLaMA, could solve this problem.
In software development processes today, hundreds or even thousands of changes are made to the code every day. Each change is stored as a commit in a version control system like Git, and ideally, every commit is accompanied by a short description called a commit message. These messages tell other developers what has changed and why. Unfortunately, many commit messages are vague (fixed bug or minor update) or are completely missing. Research shows that nearly half of the commit messages lack important details, and some 14% are empty. This makes it difficult for developers to understand past work, track down bugs, or maintain large projects.
This collaborative study by Mr. Abhishek Kumar and Ms. Sandhya Sankar under the joint supervision of Professor Partha Pratim Chakrabarti of Department of Computer Science and Engineering, IIT Kharagpur and Professor Partha Pratim Das, Department of Computer Science, Ashoka University, addresses the abovementioned challenge by asking Can Large Language Models (LLMs) generate better commit messages automatically, especially for large and complex code changes?
The researchers focused on three key questions:
- How effective are LLMs at generating commit messages for large code changes?
- Is a single commit message enough when many files are changed?
- Would a new multilevel approach, combining general and file-specific messages, be more useful to developers?
The seek answers to these questions, the team carried out their research in the following three phases:
- Dataset Preparation: The team carefully curated a data set of nearly 1,000 real-world software commits, focusing on large code changes (some with up to 8,000 tokens of content). As a next step, they carefully removed trivial or auto-generated messages (like version updates) to ensure quality.
- Commit Message Generation: In the second phase, they used state-of-the-art LLMs (GPT-4o, LLaMA 3.1, and Mistral) to generate two types of messages: general diff-level messages that summarise the whole commit and file-level messages for each changed file. For both, the team uses specially designed prompts to guide the models.
- Evaluation: In the final phase, the team combined automatic metrics (like BLEU, ROUGE, METEOR, CIDEr) with human evaluations by expert developers. A survey with 50 developers to gather real-world opinions on single vs. multi-level commit messages was also conducted to get further insights.
Completion of these three phases revealed the following key findings:
- LLMs outperform older methods: Traditional rule-based or shallow machine learning models performed poorly, while modern LLMs generated more accurate and relevant commit messages.
- GPT-4o and LLaMA 3.1 stand out: LLaMA 70B achieved the best results in automated tests, but human evaluators often preferred the output of GPT-4o. Interestingly, LLMs sometimes produced better messages than the original human-written ones.
- Developers prefer multilevel messages: 64% of surveyed developers preferred the proposed multilevel approach. They said it improved clarity and saved time during code reviews, especially for large commits.
The research shows that AI can play a practical role in software engineering by reducing the burden of writing detailed commit messages. It benefits the developers in faster onboarding of multiple ways, including faster onboarding of new team members, better debugging through effectively tracing the bugs back, and improved collaboration as clear documentation minimises misunderstandings amongst developer groups.
Speaking about the core of the study, Professor Partha Pratim Das notes, “Our study shows that multi-level commit message generation with AI offers a real solution to an everyday pain point in software development.”
Professor Das also underlines that future research in this field could extend this approach by capturing relationships between changes across multiple files, exploring lightweight open-source LLMs to reduce costs and developing new evaluation methods that better reflect human preferences.
To conclude, poor quality, inconsistent, and vague commit messages lead to frustrating challenges developers face to understand past work, track down bugs or work on large collaborative work, ultimately reducing their productivity and wasting the potential.
The study demonstrates that LLMs can enhance developer productivity and software maintainability. By combining general summaries with detailed file-level insights, developers gain a clearer, more structured view of code changes. It is believed that this approach can become a standard feature in future development tools, making collaboration smoother and software projects easier to manage.
Edited by Priyanka, Academic Communications, Research and Development Office, Ashoka University
Readers can access the original publication here: Using Large Language Models for multi-level commit message generation for large diffs.
Authors: Abhishek Kumar, Sandhya Sankar, Partha Pratim Das, Partha Pratim Chakrabarti
– Edited by Priyanka (Research and Development Office)
This blog has been adapted from the original research article, available here: doi.org/10.1016/j.infsof.2025.107831
Study at Ashoka