Prompting LLMs to Modify Existing Code using ASTs

By Jacob Sheehy on 2024-05-16

We're going to see how we can make useful code modifications with LLMs. I like fast visuals so here is a short video where we continuously edit an html file by prompting GPT-4o using the techniques described in this post:

If this is already interesting to you, consider signing up for the early-alpha-testing waitlist. The site I demo above is not ready for use yet, but if you sign up you'll be the first to know when we're ready. I won't share your email ever and I'll only email you with really big and important updates or when the site is ready for you.

Ok let's go!

It's awkward to modify existing code with LLMs

Ever write some code with the help of ChatGPT and notice that it starts off well but the experience quickly degrades? There are a few reasons. One is obviously context size (but enough has been written about that, we're not talking about context window in this post!).

Another reason is the Chat model used by most LLM coding assistants like GPT-4o: they will talk to you in your human language (and write raw code in your programming language). But they are trained to help helpful assistants, not coders. This is evident in the work OpenAI has done with RLHF to make ChatGPT so good at chatting with you and following instructions, responding to queries, etc.

The best way to get raw code out of GPT-4 is to use the API (not ChatGPT) and have it only write code. (You can have a chain where the first prompt is natural-language planning if you want to get it thinking about the code first). But still, the LLM has a hard time making changes to your existing code. For example, if you ask ChatGPT to modify some of your code, it will give you new and some existing code, but you'll get a whole bunch of "// existing code here" intermingled throughout.

It should be obvious why this is frustrating. The developer then has to compare their code with the new code and see line-by-line what changed; you cannot wholesale copy+paste it (nor should you!) because there is an intermingling of new code with missing pieces of old code.

I've seen some people offer solutions to this online, like prompting the LLM to provide git patch files that can then be automatically applied. This is pretty cool, for sure - but I find it too far distanced from the actual structure of the code. I think something more robust could be built. This is my attempt.

But could LLMs modify Abstract Syntax Trees?

Programming languages are for humans - ASTs are for machines. We write code in plain text, but the syntax must be perfect (or close to it! I'm looking at you javascript 👀) because it gets parsed into an AST data structure for the compiler. Prompting the LLM to write code that modifies an AST might be superior than having it modify the source code directly. Early results are positive - at least in simple cases.

Let's try it!

...with an extremely simple example

Let's say you want to modify a simple html file with nothing but a <p> tag.

    <p>Test paragraph</p>

Let's file a ticket to make the text in our paragraph bold. Of course we could just add <b> tags. But we're here to illustrate the concept, so for today we'll get the LLM to do the work.

We'll prompt the LLM with something like this (my actual prompt is much longer and more detailed, and includes other instructions we'll talk about later):

Instead of writing raw html code you will write python beautifulsoup code to generate the changes. Task: make the text in the paragraph bold. The file content is: <p>Test paragraph</p>

Here is the code we get back:

        
from bs4 import BeautifulSoup

# Parse the input HTML string with BeautifulSoup
soup = BeautifulSoup('<p>Test Paragraph</p>', 'html.parser')

# Find the paragraph tag
paragraph_tag = soup.find('p')

# Make the text within the paragraph bold by wrapping it in a <b> tag
if paragraph_tag:
    paragraph_tag.string.wrap(soup.new_tag('b'))

# Print the modified HTML
print(soup.prettify())
When we run that code we get this output:
<p>
  <b>
    Test Paragraph
  </b>
</p>

Looks good! But that was real simple - what about something more complicated? (Yes I know, HTML and beautifulsoup don't even use ASTs specifically!) Let's get something a bit more interesting going.

Modifying JavaScript with ASTs

Let's modify a simple ROT-13 demo, 'cause we like secret codes but want to keep it simple. Let's change it to ROT-4, which we just made up. While more interesting than the HTML above, this is still a simple modification - but let's see how it goes.

This time we'll give a more complete and detailed prompt. Some of the instructions exist here to fit into a larger system, like allowing us to easily execute the code automatically (we'll run it in nodejs). Some pieces are still removed for brevity, but it should get the point across:

Given that the file already exists and is a JavaScript file, you're tasked with writing modification code using Esprima. This modification code should be self-contained and designed to be executed dynamically. Your task is to modify the existing JavaScript code in a way that adds specific functionality or makes changes as described.

The variable input_string will already exist in your scope and it will contain the existing JavaScript content and will be available for your use. Do not declare or modify input_string yourself. Instead, focus on generating the necessary Esprima code to parse input_string as an AST (Abstract Syntax Tree), apply your modifications to this AST, and then use Escodegen to generate the modified JavaScript code. Your task is: change this rot-13 demo site to use rot-4. The input_string is: [input_string]

Our input_string is the existing js code that was generated by GPT-4o to make a ROT-13 demo. It is:

function applyROT13() {
    var input = document.getElementById('inputText').value;
    var output = input.replace(/[a-zA-Z]/g, function(c) {
        return String.fromCharCode(
            (c <= 'Z' ? 90 : 122) >= (c = c.charCodeAt(0) + 13) ? c : c - 26
        );
    });
    document.getElementById('outputText').textContent = output;
}
When we run our prompt with this input, we get the following output:
const esprima = require('esprima');
const escodegen = require('escodegen');

let AST = esprima.parseScript(input_string);

function traverse(node, func) {
    func(node);
    for (let key in node) {
        if (node.hasOwnProperty(key)) {
            let child = node[key];
            if (typeof child === 'object' && child !== null) {
                traverse(child, func);
            }
        }
    }
}

traverse(AST, function(node) {
    if (node.type === 'Literal' && node.value === 13) {
        node.value = 4;
    }
});

console.log(escodegen.generate(AST));

Interesting! Reading that...well, I don't like it much. It does work, of course; here is the output once we use nodejs to run it:

function applyROT13() {
    var input = document.getElementById('inputText').value;
    var output = input.replace(/[a-zA-Z]/g, function (c) {
        return String.fromCharCode((c <= 'Z' ? 90 : 122) >= (c = c.charCodeAt(0) + 4) ? c : c - 26);
    });
    document.getElementById('outputText').textContent = output;
}

...which is...kind-of correct. It doesn't change the function name to be correct (but also I didn't tell it to specifically, so...). However, it is unpleasant to see that the AST code from GPT-4o seems like it could go way wrong: what if there were another literal value of 13 in the code? It would have to write something much more complicated to narrow down which literal needs to change, and that would be a point of failure and a lot of resources. It doesn't feel very robust. But of course, the LLM knew the full context of our problem and would have been 'aware' that this solution would produce the correct output, without additional checks.

Here is a demo video of CodePlusEqualsAI building this ROT-13 to ROT-4 demo from start to finish. Notice that the task is completed in less than 1 minute and most of that is me (a human) typing.

So, is it worth it?

Well, it's definitely interesting. But my implementation needs some work. I showed an example that mostly worked, but there are others that don't work as well. It does continually improve however, as I improve the prompts and infrastructure backing this.

There are security concerns too - you know, running the code your AI writes without reading it (as shown in demo video) is not advised. And on top of that, there are environmental concerns: surely we can't be wasting energy to run an LLM to produce long AST code that will simply add <b> tags to a two-word paragraph. What if this idea scales up and we all do this? Hrmph.

However, I can tell already that there is a lot to explore here. With better prompts and better infrastructure / scaffolding, techniques like this can be chained together to produce more automated systems that iterate on code, making code modifications that could be done much faster than humans.

And by using ASTs, we know right away if the output of the AST code isn't going to run - when we run our AST code, we can capture errors and send that error back to the LLM. Often, after just a single retry, the LLM produces working AST modification code that we can run and get our code change done.

Thanks for reading! Want more of this?

This is my first blog post on this topic. This is the first real announcement of my side-project. Consider signing up for the waitlist (at the top of the page under the video) to hear more in the future and get a first dibs at trying out CodePlusEqualsAI. Direct contact: This project is so new I haven't even registered an email at this domain yet. For now, if you want to contact me, email jacob@codeplusequalsai.com

Back to Blog