How can we make AI-generated Pull Request (PR) reviews better? This post explores what a good review should cover and how AI, specifically Large Language Models (LLMs), can help improve the process.
An AI-generated Pull Request for Issue To PR
A good PR review should:
I’ll use a specific PR as an example and explain how multiple AI agents could work together to handle these tasks.
Underlying issue for PR #239.
To know what a PR is trying to do, we can look at the linked GitHub Issue. In this PR, the body points to an issue about reducing the output from the search_code
tool that LLMs use. An AI agent could follow that link or use a tool to access the issue and figure out the PR’s goal.
If there are no linked issues, then the LLM would just have to make an educated guess, and state it clearly to the human, so the human can verify.
Code changes for PR #239
Next, we need to see if the code changes fix the issue. In this PR, edits were made to the searchCode
function in lib/github/search.ts
.
Looking at the codebase, searchCode
is a low-level function used by tools in the tools
folder, which LLMs rely on. The problem is that editing searchCode
directly affects everything using it, not just the tools
. This could cause issues elsewhere. It might be better to change the tools
instead, so other parts of the code can still use searchCode
as needed and filter the output themselves if they want.
For an AI to catch this, it needs to understand how the codebase fits together. Something like a graph or Code Syntax Tree might help LLMs see these connections more clearly. That’s worth looking into.
The best way to check if code works is to run it. Tools like automated codespaces can make this easier.