The other day we had a AI hackathon day in the office, basically a do-whatever day as long as it was related to AI. Fun! Part of my work as system architect at MAJORITY is to do code reviews, but it's impossible to review everything. Wouldn't it be great if an AI could help out? Let's find out! (PS. If you don't care about how to set it up, feel free to skip to the PR Review Feedback)
We use Azure Devops to both host our code, and to run CI/CD pipelines. And we already have an automated PR review pipeline (which runs tests, checks code coverage and does automated code cleanup with Jetbrains). The stuff I'm mainly involved in is our backend, which consists of C# microservices.
At the time I did not have a API key for OpenAI GPT4 ready. But luckily there's a GPT4 model available in Azure as well, under Azure OpenAI. As of writing you have to fill out a form to get access, but we had already done that earlier, so the process was just a couple of clicks in the UI:
With the OpenAI resource in place, it's super easy to create a "deployment" of a base model which is what we need to connect to. The model management is handled in Azure OpenAI Studio and not the portal, but it's still just a couple of clicks. I had several different models to chose from, and to start with I used the gpt-4 model:
To send the relevant code in the PR for code review by our new GPT 4 deployment we need to add a extension in DevOps. In the Visual Studio Marketplace there are many to chose from, but all of them are very fresh and mostly seems to be forks/clones of each other...
My simple requirements were to be able to connect it to the Azure OpenAI deployment created above, and look reasonably trustworthy. After looking through the Git history of a bunch of them I settled on AI Pull Request Reviewer by Blogic.
The easy part done!
Final step of the setup is to make a PR pipeline and trigger it on PRs. In my
case I had to be a bit careful since I didn't want to roll it out on all our
code. To prevent this I forked one of our repos with a real microservice and
added a .azuredevops/pr-pipeline.yml
file to its master branch
with this content:
jobs:
- job: aitest
displayName: AI Testing
pool:
name: dev-buildpipeline-agents
timeoutInMinutes: 5
steps:
- checkout: self
persistCredentials: true
- task: AIPullRequestReview@1
inputs:
api_key: "$(aoi-api-key)"
model: "$(aoi-model)"
aoi_endpoint: "$(aoi-endpoint)"
To the pipeline I also had to setup the 3 variables (btw, the UI to add a variable can not have been tested by real users, I've seen other and myself forget to save so many times):
Done - Lets try some code!
To easily test this I made a new PR on my forked repo, which added the following code to a service class in one of our C# microservices.
public async Task TestAiPRReview(){
Console.WriteLine("hellow?");
Console.WriteLine($"Pi with 5 decimals: {CalcualtePi(5)}");
}
public double CalculatePi(int decimals){
if (decimals == 0)
return 3;
else
return 3.14;
}
In my mind this would be easy for it to review, it's a more or less useless method to calculate π with no need for any context, no tricky concurrency issues, no unreliable 3rd parties etc. On top of that my implementation leaves a lot of room for improvement.
Less than a minute later I had my first PR comment
Project Collection Build Service (MAJORITY)
- Code Consistency: 5 🚩
- Misspelling in
CalcualtePi
(should beCalculatePi
).- Inconsistent capitalization in
Console.WriteLine("hellow?");
('hellow' should be 'Hello').Improvement:
Console.WriteLine("Hello?"); Console.WriteLine($"Pi with 5 decimals: {CalculatePi(5)}");
Improvement:
Security: 1
- - Logging 'Pi with 5 decimals' may expose internals, consider if necessary.
Readability: 5 🚩
- Typo in method name affects readability and function invocation.
Best Practices: 2
- Method `CalculatePi` is too simplistic and does not accurately calculate pi.
- Missing named constants for pi values.
private const double PI_APPROXIMATION = 3.14; // ... public double CalculatePi(int decimals){ if (decimals == 0) return 3; else return PI_APPROXIMATION; }
Additional:
- The patch adds placeholder code (
Console.WriteLine("Hello?");
) that seems intended for debugging rather than production.- The
CalculatePi
method does not follow rule 6, as it is not an accurate calculation of pi. A more robust algorithm or library should be used.
Ok, let's break it down in a review of the code review:
Project Collection Build Service (MAJORITY)
- Code Consistency: 5 🚩
- Misspelling in
CalcualtePi
(should beCalculatePi
).- Inconsistent capitalization in
Console.WriteLine("hellow?");
('hellow' should be 'Hello').Improvement:
Console.WriteLine("Hello?"); Console.WriteLine($"Pi with 5 decimals: {CalculatePi(5)}");
Very reasonable complains, I didn't even add the spelling mistakes on purpose! I will apply these fixes.
- Security: 1
- Logging 'Pi with 5 decimals' may expose internals, consider if necessary.
This is nonsense! I will just ignore this part.
Readability: 5 🚩
- Typo in method name affects readability and function invocation.
Ok, I already accepted the mistake and will fix.
Best Practices: 2
Improvement:
- Method
CalculatePi
is too simplistic and does not accurately calculate pi.- Missing named constants for pi values.
private const double PI_APPROXIMATION = 3.14; // ... public double CalculatePi(int decimals){ if (decimals == 0) return 3; else return PI_APPROXIMATION; }
Very true. But I'm not sure of this improvement. Let's skip for now.
Additional:
- The patch adds placeholder code (
Console.WriteLine("Hello?");
) that seems intended for debugging rather than production.
Agreed. I only added this to make debugging easier (theoretically, I never even compiled my changes)
- The
CalculatePi
method does not follow rule 6, as it is not an accurate calculation of pi. A more robust algorithm or library should be used.
Right, repeat of number 5. But I still ignore it for now.
Updated code:
public async Task TestAiPRReview(){
Console.WriteLine("Hello?");
Console.WriteLine($"Pi with 5 decimals: {CalculatePi(5)}");
}
public double CalculatePi(int decimals){
if (decimals == 0)
return 3;
else
return 3.14;
}
And not surprising it still had some comments. This time I will break it down directly:
- Code Consistency: 1 - The new method CalculatePi should use braces consistently as per the surrounding code.
Yeah maybe... but I will ignore this.
- Readability: 1 - Descriptive names should be used instead of
TestAiPRReview
. This name doesn't indicate the method's purpose.
I agree, naming is important. But also difficult, and I think the name is ok, so I will ignore this as well.
- Best Practices: 2 - Avoid magic numbers (3 and 3.14); use named constants instead.
private const double PiToZeroDecimals = 3.0; private const double PiToOneDecimal = 3.1; // Assume you meant to round to one decimal place public double CalculatePi(int decimals){ if (decimals == 0) return PiToZeroDecimals; else return PiToOneDecimal; }
I have no idea why it thinks it's better to round of to 3.1 instead of my previous approximation of 3.14. But ok, I guess I should follow its advice, it is after all a AI trained on more or less all human knowledge!
By now I think you can guess how it goes. But surprise, surprise, this time it has a great insight. But let's not get ahead of ourselves. This is the updated code:
public async Task TestAiPRReview(){
Console.WriteLine("Hello?");
Console.WriteLine($"Pi with 5 decimals: {CalculatePi(5)}");
}
private const double PiToZeroDecimals = 3.0;
private const double PiToOneDecimal = 3.1; // Assume you meant to round to one decimal place
public double CalculatePi(int decimals){
if (decimals == 0)
return PiToZeroDecimals;
else
return PiToOneDecimal;
}
And just as before, after a short wait, it produced the following review (with my inline comments):
Best Practices: 2
- Non-descriptive method name 'TestAiPRReview'.
- 'CalculatePi' doesn't scale beyond 1 decimal, use a more general solution.
I agree!
Readability: 3
- Magic numbers
3.0
and3.1
; provide a method for pi calculation or fetch from a math library.
Also agree. Magic numbers should be avoided, especially when they are used for no good reason.
Best Practices Example:
private const double BasePiValue = 3.0; // or use a math library to get Pi public double CalculatePi(int decimals) { return Math.Round(Math.PI, decimals); }
Nice! Math.Round
and Math.PI
is much better than the
previous hard coded values! Im not really sure what the
BasePiValue
constant is useful for, but since Math.PI seems so
smart I will trust it on the constant as well.
The same continued for the next rounds of feedback. But to save everyone's time I won't go into details by skip directly to feedback round #9. Have you ever encountered such a persistent (human) code reviewer before? Even after 8 rounds of fixes it still came back with suggestions!
This is what the methods looked like after my last commit.
private const int PiPrecision = 5;
#if DEBUG
///
/// Debugging method that writes pi to the log to help test PR Code Review flow.
///
public void TestCalculatePi(){
_logger.LogInformation("Hello?");
_logger.LogInformation($"Pi with {PiPrecision} decimals: {CalculatePi(PiPrecision)}");
}
#endif
private const double MathPi = Math.PI; // Or a remark why a specific base value is needed
///
/// Return Pi calculated to the specified number of decimals given as argument.
///
/// <param name="decimals"></param>
///
public double CalculatePi(int decimals) {
try {
return Math.Round(Math.PI, decimals);
}
catch (Exception e) {
_logger.LogError("An error occurred while calculating Pi: {Exception}", e);
return MathPi;
}
}
Now it actually started to run out of things to comment on, with no Security or Readability issues, yay!:
- Code Consistency: 2 - The comment
// Or a remark why a specific base value is needed
is against minimal comments rule unless necessary for understanding.
I agree... Its just that I was advised by a smart AI to add it there in the first place...
- Security: NF
- Readability: NF
- Error Handling: 3 - Using a fallback value directly returning
MathPi
may conceal the issue rather than properly handling it. Alternative: throw a custom exception.
Oh yes, let the caller deal with the mess.
- Best Practices: 4 - Use of preprocessor directive
#if DEBUG
is questionable. This kind of code should generally not reach production.
This was my own idea I have to admit. Some time in the middle, maybe review
round 5 or 6 it remarked that the TestCalculatePi
looked like it
was only for debugging and shouldn't be put in production, so I added the
#if DEBUG
there... Sorry!
When I showed this for my co-workers I had this conversation in slack:
Colleague:
\> We are not gonna use it, are we? 😂
Me:
\> Only on JDs code 😉
(JD is our very AI-passionate CTO. Did I mention that we already
have another AI Hackathon planned? are on the way to do AI
Hackathon #3)
I would say it's quite a bit left until it can match the more traditional analyzers that's available. At least they give sensible suggestions most of the time. To be fair, I suspect a "professional" AI code review tool would perform better than something "only" using a custom prompt with ChatGPT. But that is for another time.
In my experience the most valuable feedback in a code review session is not the easy details, but harder issues. A couple of typical issues I thought about in the last couple of code reviews I've done:
Maybe a future AI can help with these issues as well, but it's not there yet at least. And then I skipped the softer benefits of code reviews, like shared ownership, knowledge sharing etc. which arguably is even more important.
Victor 2024-02-24