Lately, I've been playing around with LLMs to write code. I find that they're great at generating small self-contained snippets. Unfortunately, anything more than that requires a human to evaluate LLM output and come up with suitable follow-up prompts. Most examples of "GPT wrote X" are this - a human serves as a REPL for the LLM, carefully coaxing it to a functional result. This is not to undersell this process - it's remarkable that it works. But can we go further? Can we use an LLM to generate ALL the code for a complex program ALL at once without any human intervention?
Writing a VSCode Extension
To test GPT-4's ability to generate a complex program, I prompted it to create a VSCode extension that lets the user adjust the heading level of selected Markdown text. This task requires:
- Domain-specific knowledge about how to scaffold and expose a program to VSCode
- Mixing multiple languages and platforms: VS Code extensions are written in TypeScript, which requires writing configuration for Typescript, Node.js, and VSCode
- Generating multiple files
- Generating scaffolding to debug, build, and run code
Setup
For this experiment, I used GPT-4 for all generation purposes. I find it to be the most effective among current-day models.
In addition, I make use of the smol-ai framework to generate code.
smol-ai
description from the README:
This is a prototype of a "junior developer" agent (aka smol dev) that scaffolds an entire codebase out for you once you give it a product spec, but does not end the world or overpromise AGI. instead of making and maintaining specific, rigid, one-shot starters, like create-react-app, or create-nextjs-app, this is basically create-anything-app where you develop your scaffolding prompt in a tight loop with your smol dev.
I like smol-ai
because of its simplicity. The entire code generation logic is in a single Python file consisting of three primary functions:
generate a list of files that are needed to carry out the prompt (eg. package.json, index.js, ...)
generate a list of shared dependencies that are needed to carry out the prompt (eg. axios, react, ...)
for each file in the generated file list, generate code that would go into the file, making use of shared dependencies if applicable
Note that smol-ai
also provides other capabilities like the ability to debug errors and generate a prompt based on the current code base. These are not explored in this post.
Prompt
I started with a basic prompt describing my problem statement
python main_no_modal.py "a vscode extension that lets the user adjust the heading level of the selected text. it should have three commands: increase heading level, decrease heading level, and set heading level"
Note that the prompt is short and general. I wanted to see how well GPT could do without feeding in any additional details or domain-specific knowledge beyond what was in the training set.
Output
After a few minutes spent generating code, smol-ai
created the following files:
.
├── extension
│ ├── package.json
│ ├── src
│ │ ├── commands
│ │ │ ├── decreaseHeadingLevel.ts
│ │ │ ├── increaseHeadingLevel.ts
│ │ │ └── setHeadingLevel.ts
│ │ └── extension.ts
└── shared_dependencies.md
Note that shared_dependencies.md
is created as a best guess of what common dependencies will be. In our case, it created a file with the following contents:
the app is: a vscode extension that lets the user adjust the heading level of the selected text. it should have three commands: increase heading level, decrease heading level, and set heading level
the files we have decided to generate are: tsconfig.json, package.json, utils.ts, extension.ts
Shared dependencies:
1. Exported variables:
- adjustHeadingLevel (from utils.ts)
2. Data schemas:
- None
3. ID names of DOM elements:
- None
4. Message names:
- None
5. Function names:
- increaseHeadingLevel
- decreaseHeadingLevel
- setHeadingLevel
6. Dependencies:
- None
Code
extension.ts
is the entry point of a vscode extension. In it, we register the three commands that we specified in the prompt.
import * as vscode from 'vscode';
import { increaseHeadingLevel } from './commands/increaseHeadingLevel';
import { decreaseHeadingLevel } from './commands/decreaseHeadingLevel';
import { setHeadingLevel } from './commands/setHeadingLevel';
export function activate(context: vscode.ExtensionContext) {
context.subscriptions.push(
vscode.commands.registerCommand('extension.increaseHeadingLevel', increaseHeadingLevel),
vscode.commands.registerCommand('extension.decreaseHeadingLevel', decreaseHeadingLevel),
vscode.commands.registerCommand('extension.setHeadingLevel', setHeadingLevel)
);
}
export function deactivate() {}
Below is the logic inside of decreaseHeadingLevel.ts
import * as vscode from 'vscode';
export async function decreaseHeadingLevel() {
const editor = vscode.window.activeTextEditor;
if (!editor) {
return;
}
const document = editor.document;
const selection = editor.selection;
const selectedText = document.getText(selection);
const headingRegex = /^(#+)\s(.*)$/;
const match = selectedText.match(headingRegex);
if (!match) {
vscode.window.showErrorMessage('Invalid selection. Please select a heading to decrease its level.');
return;
}
const currentLevel = match[1].length;
if (currentLevel === 1) {
vscode.window.showErrorMessage('Cannot decrease heading level. Already at the lowest level.');
return;
}
const newLevel = currentLevel - 1;
const newHeading = '#'.repeat(newLevel) + ' ' + match[2];
await editor.edit((editBuilder) => {
editBuilder.replace(selection, newHeading);
});
}
The code checks for an active editor and if one exists, gets the selected text and looks for a markdown heading via regex. If a header is found and the current header level isn't already at the lowest level, it decreases the heading level.
At first glance, there is nothing amiss with the logic. It executes the command and checks for edge cases. It even provides helpful error messages which already puts it ahead of most human-generated programs...
Testing the extension
To test this extension, we need to successfully execute the following steps:
Install Dependencies
Compile Code
Run Extension
Step 1 - Install
We run into our first issue when trying to install dependencies.
$ yarn
Couldn't find any versions for "vscode-test" that matches "^1.6.2"
? Please choose a version of "vscode-test" from this list: (Use arrow keys)
❯ 1.6.1
Issue 1 - Couldn't find vscode-test
An inspection of package.json
returns the following
{
"name": "adjust-heading-level",
...
"engines": {
"vscode": "^1.62.0"
},
"devDependencies": {
"@types/node": "^14.17.0",
"@types/vscode": "^1.62.0",
"typescript": "^4.4.2",
"vscode": "^1.1.37",
"vscode-test": "^1.6.2"
},
}
The vscode engine determines the minimum version of vscode. The present-day (as of 2023-05-23) engine version is 1.78
. The 1.62.0
version was released on October 21st, 2021
This corresponds to GPT4's knowledge cutoff date:
GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cut off (September 2021)
The vscode-test
version of 1.6.2
seems suspiciously similar to 1.62
which indicates that GPT likely hallucinated the numbers.
In any case, this is easy enough to fix by specifying the correct version number and re-installing
- "vscode-test": "^1.6.2"
+ "vscode-test": "^1.6.1"
Re-running the install process is successful the second time around.
$ yarn
...
[3/5] 🚚 Fetching packages...
[4/5] 🔗 Linking dependencies...
[5/5] 🔨 Building fresh packages...
✨ Done in 4.31s.
Step 2 - Build
Because typescript is a compiled language, we need to execute a build step to compile the code to javascript. The package.json
comes with the following scripts:
"scripts": {
"vscode:prepublish": "npm run compile",
"compile": "tsc -p ./",
"watch": "tsc -watch -p ./",
"postinstall": "node ./node_modules/vscode/bin/install",
"test": "npm run compile && node ./node_modules/vscode/bin/test"
},
We can build the code by running the compile
script. This is where we run into our next issue:
$ yarn compile
warning package.json: No license field
warning adjust-heading-level@0.1.0: The engine "vscode" appears to be invalid.
$ tsc -p ./
error TS5057: Cannot find a tsconfig.json file at the specified directory: './'.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Issue 2 - Cannot find a tsconfig.json
Typescript requires a tsconfig.json
file to compile into javascript. If you remember our initial file layout, there is no tsconfig.json
file.
.
├── extension
│ ├── package.json
│ ├── src
│ │ ├── commands
│ │ │ ├── decreaseHeadingLevel.ts
│ │ │ ├── increaseHeadingLevel.ts
│ │ │ └── setHeadingLevel.ts
│ │ └── extension.ts
└── shared_dependencies.md
We can remediate this by adding the config and re-building. But now we run into more issues:
$ tsc --init
$ yarn compile
src/commands/decreaseHeadingLevel.ts:1:25 - error TS2307: Cannot find module 'vscode' or its corresponding type declarations.
1 import * as vscode from 'vscode';
~~~~~~~~
src/commands/decreaseHeadingLevel.ts:30:24 - error TS7006: Parameter 'editBuilder' implicitly has an 'any' type.
30 await editor.edit((editBuilder) => {
~~~~~~~~~~~
src/commands/increaseHeadingLevel.ts:1:25 - error TS2307: Cannot find module 'vscode' or its corresponding type declarations.
...
Found 7 errors
Issue 3 - Cannot find modules
The reason typescript can't find the module vscode
is because of the syntax we use for import statements:
// this is failing
import * as vscode from 'vscode';
// this would work
import vscode from 'vscode';
The reason for the different syntax comes from the differences between CommonJs
and ES Modules
and how they export dependencies as well as how typescript transpiles those exports. The maddening hellscape quirks in module compatibility can be a blog post unto itself - for now, we can fix the issue by disabling esModuleInterop
inside of tsconfig.json
@@ -71,7 +71,7 @@
- "esModuleInterop": true, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
+ "esModuleInterop": false, /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
Note that esModuleInterop
was changed to true
by default since typescript 4.4. This was released on March 15, 2022 - after GPT4's knowledge cutoff date.
Let's try building again. Just one more error this time:
$ yarn compile
src/commands/setHeadingLevel.ts:2:10 - error TS2305: Module '"../extension"' has no exported member 'adjustHeadingLevel'.
2 import { adjustHeadingLevel } from '../extension';
Found 1 error.
Issue 4 - No exported member
This last error comes from trying to import a function that does not exist.
Specifically, the following logic in setHeadingLevel.ts
:
import * as vscode from 'vscode';
import { adjustHeadingLevel } from '../extension';
export async function setHeadingLevel() {
...
}
GPT is prone to lie be optimistic about declaring its dependencies. It sometimes calls or imports functions that do not exist. This is one of those cases.
We can fix this by removing the dependency and manually adding the logic inside of setHeadingLevel
@@ -1,5 +1,4 @@
import * as vscode from 'vscode';
-import { adjustHeadingLevel } from '../extension';
export async function setHeadingLevel() {
const editor = vscode.window.activeTextEditor;
@@ -14,6 +13,12 @@ export async function setHeadingLevel() {
vscode.window.showErrorMessage('invalidSelection');
return;
}
+ const headingRegex = /^(#+)\s(.*)$/;
+ const match = selectedText.match(headingRegex);
+ if (!match) {
+ vscode.window.showErrorMessage('Invalid selection.');
+ return;
+ }
const inputOptions: vscode.InputBoxOptions = {
prompt: 'setHeadingLevelPrompt',
@@ -31,6 +36,16 @@ export async function setHeadingLevel() {
if (headingLevel) {
const newHeadingLevel = parseInt(headingLevel);
- adjustHeadingLevel(editor, selection, selectedText, newHeadingLevel);
+
+ const newHeading = '#'.repeat(newHeadingLevel) + ' ' + match[2];
+
+ await editor.edit((editBuilder) => {
+ editBuilder.replace(selection, newHeading);
+ });
}
}
Note that most of the code was lifted from decreaseHeadingLevel.ts
.
Lets build again. This time, it's successful 🎉
$ tsc -p ./
✨ Done in 0.80s.
But does it run?
Step 3 - Run
Note that GPT provided no instructions on how to run the extension. Or on how to install or build the extension for that matter. It is simple enough to do if you've built vscode extensions before but this can be a barrier to entry for newcomers.
Running a vscode extension requires that you go to the "Run and Debug" panel and launch the vscode extension
task when the extension.ts
file is active in the editor.
This launches a new vscode window with our extension installed. This also errors out as soon as I tried invoking a command.
Command 'Increase Heading Level' resulted in an error command 'adjust-heading-level. 'increaseHeadingLevel' was not found
Issue 5 - Commands not found
VSCode knows about commands when they are declared inside of package.json
.
Our package.json
declares the following commands:
"activationEvents": [
"onCommand:adjust-heading-level.increaseHeadingLevel",
"onCommand:adjust-heading-level.decreaseHeadingLevel",
"onCommand:adjust-heading-level.setHeadingLevel"
],
...
"contributes": {
"commands": [
{
"command": "adjust-heading-level.increaseHeadingLevel",
"title": "Increase Heading Level"
},
{
"command": "adjust-heading-level.decreaseHeadingLevel",
"title": "Decrease Heading Level"
},
{
"command": "adjust-heading-level.setHeadingLevel",
"title": "Set Heading Level"
}
]
}
After declaring inside of package.json
, these commands also need to be registered inside the extension.
Our extension.ts
export function activate(context: vscode.ExtensionContext) {
context.subscriptions.push(
vscode.commands.registerCommand('extension.increaseHeadingLevel', increaseHeadingLevel),
vscode.commands.registerCommand('extension.decreaseHeadingLevel', decreaseHeadingLevel),
vscode.commands.registerCommand('extension.setHeadingLevel', setHeadingLevel)
);
}
Do you see the issue?
The typescript file declares commands as extension.{COMMAND}
but the package.json
declares them as adjust-heading-level.{COMMAND}
We can fix this by adjusting package.json
to match the code. While the fix itself is simple, being able to diagnose the issue correctly takes some domain knowledge in knowing where to look.
@@ -1,5 +1,5 @@
{
"displayName": "Adjust Heading Level",
"description": "A VSCode extension that lets the user adjust the heading level of the selected text.",
"version": "0.1.0",
@@ -10,23 +10,20 @@
"Other"
],
"activationEvents": [
- "onCommand:adjust-heading-level.increaseHeadingLevel",
- "onCommand:adjust-heading-level.decreaseHeadingLevel",
- "onCommand:adjust-heading-level.setHeadingLevel"
],
"main": "./src/extension.js",
"contributes": {
"commands": [
{
- "command": "adjust-heading-level.increaseHeadingLevel",
+ "command": "extension.increaseHeadingLevel",
"title": "Increase Heading Level"
},
{
- "command": "adjust-heading-level.decreaseHeadingLevel",
+ "command": "extension.decreaseHeadingLevel",
"title": "Decrease Heading Level"
},
{
- "command": "adjust-heading-level.setHeadingLevel",
+ "command": "extension.setHeadingLevel",
"title": "Set Heading Level"
}
]
NOTE: I also used this to remove the
activationEvents
- these determine when a vscode extension trigger activates. For command-based activations, vscode is now able to detect them automatically and so no longer needs to be manually declared
Let's try running again and increasing the header level.
Well, that's not supposed to happen 😟
Issue 6 - The Decreasing Increase
Instead of increasing the header, we are getting the header level decreased.
Let's take a look at increaseHeadingLevel.ts
import * as vscode from 'vscode';
export async function increaseHeadingLevel() {
const editor = vscode.window.activeTextEditor;
if (!editor) {
return;
}
const document = editor.document;
const selection = editor.selection;
const selectedText = document.getText(selection);
const headingRegex = /^(#+)\s(.*)$/;
const match = selectedText.match(headingRegex);
if (!match) {
vscode.window.showErrorMessage('Invalid selection. Please select a valid heading.');
return;
}
const currentLevel = match[1].length;
const newLevel = Math.max(1, currentLevel - 1);
const newText = '#'.repeat(newLevel) + ' ' + match[2];
await editor.edit((editBuilder) => {
editBuilder.replace(selection, newText);
});
}
Do you see the issue?
There is a bug caused by a single character diff.
@@ -19,7 +19,7 @@ export async function increaseHeadingLevel() {
}
const currentLevel = match[1].length;
- const newLevel = Math.max(1, currentLevel - 1);
+ const newLevel = Math.max(1, currentLevel + 1);
const newText = '#'.repeat(newLevel) + ' ' + match[2];
Let's compile and run it again.
It works 🥳
Thoughts
So how did we do? We got a working extension. We got it to accomplish the goal set out in our prompt.
The journey to this point was not "automatic". We ran into many issues along the way. Lacking prior knowledge of typescript, node.js and vscode, these issues would have taken a while to debug.
And even though the code we were able to generate working code, there are still many improvements to be made:
there are no instructions on how to develop, use, or publish the extension
there is no
.gitignore
for typescript/javascript/vscode artifactsthere is no
launch.json
file that configures running the extension in developmentthere are no tests
there is no code reuse
Some Stats
GPT generated 9 files that cover ~100 lines of typescript, ~180 lines json, and 17 lines of markdown.
$ cloc --exclude-dir=node_modules,out --not-match-f=package-lock.json --not-match-f=prompt.md --include-ext=ts,json,md .
15 text files.
13 unique files.
7 files ignored.
github.com/AlDanial/cloc v 1.92 T=0.01 s (986.5 files/s, 36610.4 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
JSON 4 8 0 181
TypeScript 4 22 0 98
Markdown 1 8 0 17
-------------------------------------------------------------------------------
SUM: 9 38 0 296
-------------------------------------------------------------------------------
The final file tree
$ tree --gitfile extension/.gitignore
.
├── extension
│ ├── package.json
│ ├── src
│ │ ├── commands
│ │ │ ├── decreaseHeadingLevel.ts
│ │ │ ├── increaseHeadingLevel.ts
│ │ │ └── setHeadingLevel.ts
│ │ └── extension.ts
│ ├── tsconfig.json
│ └── yarn.lock
├── prompt.md
└── shared_dependencies.md
Out of the ~300 lines generated, we had to modify/add ~18 lines in order to make everything work.
Takeaways
GPT was able to generate most of the code using a naive prompt with no domain-specific context.
Some things to note:
GPT4 does great with code in its index but will likely generate bad logic if the underlying specs have changed since its knowledge cutoff date (September 2021)
GPT4 can hallucinate subtle bugs. In the
increaseHeadingLevel.ts
case, it was a one-character difference that caused the extension to do the exact opposite of what the command was supposed to doGPT4 is great at scaffolding boilerplate but domain expertise still matters (for now). This is especially true when building on the tech that has changed since GPT4's cutoff date
GPT4 introduces yet another abstraction layer for programming. We now have 7 translation layers for the case of writing typescript (which can easily be doubled when involving containers or VMs) 🤦♂
Future Directions
I did the initial experiment with a naive general prompt and no additional context. There is lots of room for improvement. Some next steps:
every issue that we encountered when trying to run the extension, include it as a detail in the prompt for GPT to watch out for
generalize this by indexing and summarizing the vscode extension docs to mitigate for new information that is not in GPT's current index
explore doing this by chaining an LLM that has access to today's context (eg. Bard) with GPT
generate tests to validate logic and have GPT autocorrect itself if tests fail
generate a checklist for what a high-quality vscode extension looks like and have GPT verify and autocorrect the artifacts it generates
NOTE: I've already run a subset of these steps and was able to get the error count to zero on the first generation. Will need to see if it generalizes to other examples. Look out for details in a future post
A Note to the reader
I'm a solo YC founder that is currently pivoting into building useful tools in the LLM space. If this is something that you're passionate about and want to be a part of, either as a cofounder (I'm looking for someone with a technical background in sales/product/go-to-market) or as a founding team member, please reach out.
hey! our startup has also pivoted into this space recently. shoot me an email: chris@lunasec.io or you can join our discord https://discord.gg/znyraHeTBt
That's fair though it also highlights another issue: since these foundation models are trained on human generated content, the training data itself is also littered with these "ambiguous" problem statements that can bias the results. Most of the time, its a "feature not a bug" as people say things that are technically wrong but semantically understandable to other humans. Foundation models are something of a mixed beast