Lessons from Creating a VSCode Extension with GPT-4

May 25, 2023

Lately, I've been playing around with LLMs to write code. I find that they're great at generating small self-contained snippets. Unfortunately, anything more than that requires a human to evaluate LLM output and come up with suitable follow-up prompts. Most examples of "GPT wrote X" are this - a human serves as a REPL for the LLM, carefully coaxing it to a functional result. This is not to undersell this process - it's remarkable that it works. But can we go further? Can we use an LLM to generate ALL the code for a complex program ALL at once without any human intervention?

Writing a VSCode Extension

To test GPT-4's ability to generate a complex program, I prompted it to create a VSCode extension that lets the user adjust the heading level of selected Markdown text. This task requires:

- Domain-specific knowledge about how to scaffold and expose a program to VSCode

- Mixing multiple languages and platforms: VS Code extensions are written in TypeScript, which requires writing configuration for Typescript, Node.js, and VSCode

- Generating multiple files

- Generating scaffolding to debug, build, and run code

Setup

For this experiment, I used GPT-4 for all generation purposes. I find it to be the most effective among current-day models.

In addition, I make use of the smol-ai framework to generate code.

smol-ai description from the README:

This is a prototype of a "junior developer" agent (aka smol dev) that scaffolds an entire codebase out for you once you give it a product spec, but does not end the world or overpromise AGI. instead of making and maintaining specific, rigid, one-shot starters, like create-react-app, or create-nextjs-app, this is basically create-anything-app where you develop your scaffolding prompt in a tight loop with your smol dev.

I like smol-ai because of its simplicity. The entire code generation logic is in a single Python file consisting of three primary functions:

generate a list of files that are needed to carry out the prompt (eg. package.json, index.js, ...)
generate a list of shared dependencies that are needed to carry out the prompt (eg. axios, react, ...)
for each file in the generated file list, generate code that would go into the file, making use of shared dependencies if applicable

Note that smol-ai also provides other capabilities like the ability to debug errors and generate a prompt based on the current code base. These are not explored in this post.

Prompt

I started with a basic prompt describing my problem statement

python main_no_modal.py "a vscode extension that lets the user adjust the heading level of the selected text. it should have three commands: increase heading level, decrease heading level, and set heading level"

Note that the prompt is short and general. I wanted to see how well GPT could do without feeding in any additional details or domain-specific knowledge beyond what was in the training set.

Output

After a few minutes spent generating code, smol-ai created the following files:

.
├── extension
│   ├── package.json
│   ├── src
│   │   ├── commands
│   │   │   ├── decreaseHeadingLevel.ts
│   │   │   ├── increaseHeadingLevel.ts
│   │   │   └── setHeadingLevel.ts
│   │   └── extension.ts
└── shared_dependencies.md

Note that shared_dependencies.md is created as a best guess of what common dependencies will be. In our case, it created a file with the following contents:

the app is: a vscode extension that lets the user adjust the heading level of the selected text. it should have three commands: increase heading level, decrease heading level, and set heading level

the files we have decided to generate are: tsconfig.json, package.json, utils.ts, extension.ts

Shared dependencies:

1. Exported variables:
   - adjustHeadingLevel (from utils.ts)

2. Data schemas:
   - None

3. ID names of DOM elements:
   - None

4. Message names:
   - None

5. Function names:
   - increaseHeadingLevel
   - decreaseHeadingLevel
   - setHeadingLevel

6. Dependencies:
    - None

Code

extension.ts is the entry point of a vscode extension. In it, we register the three commands that we specified in the prompt.

import * as vscode from 'vscode';
import { increaseHeadingLevel } from './commands/increaseHeadingLevel';
import { decreaseHeadingLevel } from './commands/decreaseHeadingLevel';
import { setHeadingLevel } from './commands/setHeadingLevel';

export function activate(context: vscode.ExtensionContext) {
    context.subscriptions.push(
        vscode.commands.registerCommand('extension.increaseHeadingLevel', increaseHeadingLevel),
        vscode.commands.registerCommand('extension.decreaseHeadingLevel', decreaseHeadingLevel),
        vscode.commands.registerCommand('extension.setHeadingLevel', setHeadingLevel)
    );
}

export function deactivate() {}

Below is the logic inside of decreaseHeadingLevel.ts

import * as vscode from 'vscode';

export async function decreaseHeadingLevel() {
    const editor = vscode.window.activeTextEditor;
    if (!editor) {
        return;
    }

    const document = editor.document;
    const selection = editor.selection;
    const selectedText = document.getText(selection);

    const headingRegex = /^(#+)\s(.*)$/;
    const match = selectedText.match(headingRegex);

    if (!match) {
        vscode.window.showErrorMessage('Invalid selection. Please select a heading to decrease its level.');
        return;
    }

    const currentLevel = match[1].length;
    if (currentLevel === 1) {
        vscode.window.showErrorMessage('Cannot decrease heading level. Already at the lowest level.');
        return;
    }

    const newLevel = currentLevel - 1;
    const newHeading = '#'.repeat(newLevel) + ' ' + match[2];

    await editor.edit((editBuilder) => {
        editBuilder.replace(selection, newHeading);
    });
}

The code checks for an active editor and if one exists, gets the selected text and looks for a markdown heading via regex. If a header is found and the current header level isn't already at the lowest level, it decreases the heading level.

At first glance, there is nothing amiss with the logic. It executes the command and checks for edge cases. It even provides helpful error messages which already puts it ahead of most human-generated programs...

Testing the extension

To test this extension, we need to successfully execute the following steps:

Install Dependencies
Compile Code
Run Extension

Step 1 - Install

We run into our first issue when trying to install dependencies.

$ yarn

Couldn't find any versions for "vscode-test" that matches "^1.6.2"
? Please choose a version of "vscode-test" from this list: (Use arrow keys)
❯ 1.6.1

Issue 1 - Couldn't find vscode-test

An inspection of package.json returns the following

{
  "name": "adjust-heading-level",
  ...
  "engines": {
    "vscode": "^1.62.0"
  },
  "devDependencies": {
    "@types/node": "^14.17.0",
    "@types/vscode": "^1.62.0",
    "typescript": "^4.4.2",
    "vscode": "^1.1.37",
    "vscode-test": "^1.6.2"
  },
}

The vscode engine determines the minimum version of vscode. The present-day (as of 2023-05-23) engine version is 1.78. The 1.62.0 version was released on October 21st, 2021

This corresponds to GPT4's knowledge cutoff date:

GPT-4 generally lacks knowledge of events that have occurred after the vast majority of its data cut off (September 2021)

The vscode-test version of 1.6.2 seems suspiciously similar to 1.62 which indicates that GPT likely hallucinated the numbers.

In any case, this is easy enough to fix by specifying the correct version number and re-installing

-   "vscode-test": "^1.6.2"
+   "vscode-test": "^1.6.1"

Re-running the install process is successful the second time around.

$ yarn

...
[3/5] 🚚  Fetching packages...
[4/5] 🔗  Linking dependencies...
[5/5] 🔨  Building fresh packages...
✨  Done in 4.31s.

Step 2 - Build

Because typescript is a compiled language, we need to execute a build step to compile the code to javascript. The package.json comes with the following scripts:

"scripts": {
    "vscode:prepublish": "npm run compile",
    "compile": "tsc -p ./",
    "watch": "tsc -watch -p ./",
    "postinstall": "node ./node_modules/vscode/bin/install",
    "test": "npm run compile && node ./node_modules/vscode/bin/test"
  },

We can build the code by running the compile script. This is where we run into our next issue:

$ yarn compile
warning package.json: No license field
warning adjust-heading-level@0.1.0: The engine "vscode" appears to be invalid.
$ tsc -p ./
error TS5057: Cannot find a tsconfig.json file at the specified directory: './'.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.

Issue 2 - Cannot find a tsconfig.json

Typescript requires a tsconfig.json file to compile into javascript. If you remember our initial file layout, there is no tsconfig.json file.

.
├── extension
│   ├── package.json
│   ├── src
│   │   ├── commands
│   │   │   ├── decreaseHeadingLevel.ts
│   │   │   ├── increaseHeadingLevel.ts
│   │   │   └── setHeadingLevel.ts
│   │   └── extension.ts
└── shared_dependencies.md

We can remediate this by adding the config and re-building. But now we run into more issues:

$ tsc --init
$ yarn compile

src/commands/decreaseHeadingLevel.ts:1:25 - error TS2307: Cannot find module 'vscode' or its corresponding type declarations.

1 import * as vscode from 'vscode';
                          ~~~~~~~~

src/commands/decreaseHeadingLevel.ts:30:24 - error TS7006: Parameter 'editBuilder' implicitly has an 'any' type.

30     await editor.edit((editBuilder) => {
                          ~~~~~~~~~~~

src/commands/increaseHeadingLevel.ts:1:25 - error TS2307: Cannot find module 'vscode' or its corresponding type declarations.

...

Found 7 errors

Issue 3 - Cannot find modules

The reason typescript can't find the module vscode is because of the syntax we use for import statements:

// this is failing
import * as vscode from 'vscode';

// this would work
import vscode from 'vscode';

The reason for the different syntax comes from the differences between CommonJs and ES Modules and how they export dependencies as well as how typescript transpiles those exports. The ~~maddening hellscape~~ quirks in module compatibility can be a blog post unto itself - for now, we can fix the issue by disabling esModuleInterop inside of tsconfig.json

@@ -71,7 +71,7 @@
-    "esModuleInterop": true,                             /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */
+    "esModuleInterop": false,                             /* Emit additional JavaScript to ease support for importing CommonJS modules. This enables 'allowSyntheticDefaultImports' for type compatibility. */

Note that esModuleInterop was changed to true by default since typescript 4.4. This was released on March 15, 2022 - after GPT4's knowledge cutoff date.

Let's try building again. Just one more error this time:

$ yarn compile

src/commands/setHeadingLevel.ts:2:10 - error TS2305: Module '"../extension"' has no exported member 'adjustHeadingLevel'.

2 import { adjustHeadingLevel } from '../extension';

Found 1 error.

Issue 4 - No exported member

This last error comes from trying to import a function that does not exist.

Specifically, the following logic in setHeadingLevel.ts:

import * as vscode from 'vscode';
import { adjustHeadingLevel } from '../extension';

export async function setHeadingLevel() {
  ...
}

GPT is prone to ~~lie~~ be optimistic about declaring its dependencies. It sometimes calls or imports functions that do not exist. This is one of those cases.

We can fix this by removing the dependency and manually adding the logic inside of setHeadingLevel

@@ -1,5 +1,4 @@
 import * as vscode from 'vscode';
-import { adjustHeadingLevel } from '../extension';
 
 export async function setHeadingLevel() {
     const editor = vscode.window.activeTextEditor;
@@ -14,6 +13,12 @@ export async function setHeadingLevel() {
         vscode.window.showErrorMessage('invalidSelection');
         return;
     }
+    const headingRegex = /^(#+)\s(.*)$/;
+    const match = selectedText.match(headingRegex);
+    if (!match) {
+        vscode.window.showErrorMessage('Invalid selection.');
+        return;
+    }
 
     const inputOptions: vscode.InputBoxOptions = {
         prompt: 'setHeadingLevelPrompt',
@@ -31,6 +36,16 @@ export async function setHeadingLevel() {
 
     if (headingLevel) {
         const newHeadingLevel = parseInt(headingLevel);
-        adjustHeadingLevel(editor, selection, selectedText, newHeadingLevel);
+    
+        const newHeading = '#'.repeat(newHeadingLevel) + ' ' + match[2];
+    
+        await editor.edit((editBuilder) => {
+            editBuilder.replace(selection, newHeading);
+        });
     }
 }

Note that most of the code was lifted from decreaseHeadingLevel.ts.

Lets build again. This time, it's successful 🎉

$ tsc -p ./
✨  Done in 0.80s.

But does it run?

Step 3 - Run

Note that GPT provided no instructions on how to run the extension. Or on how to install or build the extension for that matter. It is simple enough to do if you've built vscode extensions before but this can be a barrier to entry for newcomers.

Running a vscode extension requires that you go to the "Run and Debug" panel and launch the vscode extension task when the extension.ts file is active in the editor.

This launches a new vscode window with our extension installed. This also errors out as soon as I tried invoking a command.

Command 'Increase Heading Level' resulted in an error command 'adjust-heading-level. 'increaseHeadingLevel' was not found

Issue 5 - Commands not found

VSCode knows about commands when they are declared inside of package.json.

Our package.json declares the following commands:

  "activationEvents": [
    "onCommand:adjust-heading-level.increaseHeadingLevel",
    "onCommand:adjust-heading-level.decreaseHeadingLevel",
    "onCommand:adjust-heading-level.setHeadingLevel"
  ],
  ...
  "contributes": {
    "commands": [
      {
        "command": "adjust-heading-level.increaseHeadingLevel",
        "title": "Increase Heading Level"
      },
      {
        "command": "adjust-heading-level.decreaseHeadingLevel",
        "title": "Decrease Heading Level"
      },
      {
        "command": "adjust-heading-level.setHeadingLevel",
        "title": "Set Heading Level"
      }
    ]
  }

After declaring inside of package.json, these commands also need to be registered inside the extension.

Our extension.ts

export function activate(context: vscode.ExtensionContext) {
    context.subscriptions.push(
        vscode.commands.registerCommand('extension.increaseHeadingLevel', increaseHeadingLevel),
        vscode.commands.registerCommand('extension.decreaseHeadingLevel', decreaseHeadingLevel),
        vscode.commands.registerCommand('extension.setHeadingLevel', setHeadingLevel)
    );
}

Do you see the issue?

The typescript file declares commands as extension.{COMMAND} but the package.json declares them as adjust-heading-level.{COMMAND}

We can fix this by adjusting package.json to match the code. While the fix itself is simple, being able to diagnose the issue correctly takes some domain knowledge in knowing where to look.

@@ -1,5 +1,5 @@
 {
   "displayName": "Adjust Heading Level",
   "description": "A VSCode extension that lets the user adjust the heading level of the selected text.",
   "version": "0.1.0",
@@ -10,23 +10,20 @@
     "Other"
   ],
   "activationEvents": [
-    "onCommand:adjust-heading-level.increaseHeadingLevel",
-    "onCommand:adjust-heading-level.decreaseHeadingLevel",
-    "onCommand:adjust-heading-level.setHeadingLevel"
   ],
   "main": "./src/extension.js",
   "contributes": {
     "commands": [
       {
-        "command": "adjust-heading-level.increaseHeadingLevel",
+        "command": "extension.increaseHeadingLevel",
         "title": "Increase Heading Level"
       },
       {
-        "command": "adjust-heading-level.decreaseHeadingLevel",
+        "command": "extension.decreaseHeadingLevel",
         "title": "Decrease Heading Level"
       },
       {
-        "command": "adjust-heading-level.setHeadingLevel",
+        "command": "extension.setHeadingLevel",
         "title": "Set Heading Level"
       }
     ]

NOTE: I also used this to remove the activationEvents - these determine when a vscode extension trigger activates. For command-based activations, vscode is now able to detect them automatically and so no longer needs to be manually declared

Let's try running again and increasing the header level.

Well, that's not supposed to happen 😟

Issue 6 - The Decreasing Increase

Instead of increasing the header, we are getting the header level decreased.

Let's take a look at increaseHeadingLevel.ts

import * as vscode from 'vscode';

export async function increaseHeadingLevel() {
    const editor = vscode.window.activeTextEditor;
    if (!editor) {
        return;
    }

    const document = editor.document;
    const selection = editor.selection;
    const selectedText = document.getText(selection);

    const headingRegex = /^(#+)\s(.*)$/;
    const match = selectedText.match(headingRegex);

    if (!match) {
        vscode.window.showErrorMessage('Invalid selection. Please select a valid heading.');
        return;
    }

    const currentLevel = match[1].length;
    const newLevel = Math.max(1, currentLevel - 1);
    const newText = '#'.repeat(newLevel) + ' ' + match[2];

    await editor.edit((editBuilder) => {
        editBuilder.replace(selection, newText);
    });
}

Do you see the issue?

There is a bug caused by a single character diff.

@@ -19,7 +19,7 @@ export async function increaseHeadingLevel() {
     }
 
     const currentLevel = match[1].length;
-    const newLevel = Math.max(1, currentLevel - 1);
+    const newLevel = Math.max(1, currentLevel + 1);
     const newText = '#'.repeat(newLevel) + ' ' + match[2];

Let's compile and run it again.

It works 🥳

Thoughts

So how did we do? We got a working extension. We got it to accomplish the goal set out in our prompt.

The journey to this point was not "automatic". We ran into many issues along the way. Lacking prior knowledge of typescript, node.js and vscode, these issues would have taken a while to debug.

And even though the code we were able to generate working code, there are still many improvements to be made:

there are no instructions on how to develop, use, or publish the extension
there is no .gitignore for typescript/javascript/vscode artifacts
there is no launch.json file that configures running the extension in development
there are no tests
there is no code reuse

Some Stats

GPT generated 9 files that cover ~100 lines of typescript, ~180 lines json, and 17 lines of markdown.

$ cloc --exclude-dir=node_modules,out --not-match-f=package-lock.json --not-match-f=prompt.md --include-ext=ts,json,md .
      15 text files.
      13 unique files.
       7 files ignored.

github.com/AlDanial/cloc v 1.92  T=0.01 s (986.5 files/s, 36610.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
JSON                             4              8              0            181
TypeScript                       4             22              0             98
Markdown                         1              8              0             17
-------------------------------------------------------------------------------
SUM:                             9             38              0            296
-------------------------------------------------------------------------------

The final file tree

$ tree --gitfile extension/.gitignore
.
├── extension
│   ├── package.json
│   ├── src
│   │   ├── commands
│   │   │   ├── decreaseHeadingLevel.ts
│   │   │   ├── increaseHeadingLevel.ts
│   │   │   └── setHeadingLevel.ts
│   │   └── extension.ts 
│   ├── tsconfig.json
│   └── yarn.lock
├── prompt.md
└── shared_dependencies.md

Out of the ~300 lines generated, we had to modify/add ~18 lines in order to make everything work.

Takeaways

GPT was able to generate most of the code using a naive prompt with no domain-specific context.

Some things to note:

GPT4 does great with code in its index but will likely generate bad logic if the underlying specs have changed since its knowledge cutoff date (September 2021)
GPT4 can hallucinate subtle bugs. In the increaseHeadingLevel.ts case, it was a one-character difference that caused the extension to do the exact opposite of what the command was supposed to do
GPT4 is great at scaffolding boilerplate but domain expertise still matters (for now). This is especially true when building on the tech that has changed since GPT4's cutoff date
GPT4 introduces yet another abstraction layer for programming. We now have 7 translation layers for the case of writing typescript (which can easily be doubled when involving containers or VMs) 🤦‍♂
Turtles…

Future Directions

I did the initial experiment with a naive general prompt and no additional context. There is lots of room for improvement. Some next steps:

every issue that we encountered when trying to run the extension, include it as a detail in the prompt for GPT to watch out for
- generalize this by indexing and summarizing the vscode extension docs to mitigate for new information that is not in GPT's current index
  - explore doing this by chaining an LLM that has access to today's context (eg. Bard) with GPT
generate tests to validate logic and have GPT autocorrect itself if tests fail
generate a checklist for what a high-quality vscode extension looks like and have GPT verify and autocorrect the artifacts it generates

NOTE: I've already run a subset of these steps and was able to get the error count to zero on the first generation. Will need to see if it generalizes to other examples. Look out for details in a future post

A Note to the reader

I'm a solo YC founder that is currently pivoting into building useful tools in the LLM space. If this is something that you're passionate about and want to be a part of, either as a cofounder (I'm looking for someone with a technical background in sales/product/go-to-market) or as a founding team member, please reach out.

6 Comments

chris

May 26, 2023Liked by Kevin Lin

hey! our startup has also pivoted into this space recently. shoot me an email: chris@lunasec.io or you can join our discord https://discord.gg/znyraHeTBt

Expand full comment

2 replies by Kevin Lin and others

May 25, 2023Author

That's fair though it also highlights another issue: since these foundation models are trained on human generated content, the training data itself is also littered with these "ambiguous" problem statements that can bias the results. Most of the time, its a "feature not a bug" as people say things that are technically wrong but semantically understandable to other humans. Foundation models are something of a mixed beast

4 more comments...

Bit by Bit (by Nimbus)