Lessons from Creating a VSCode Extension with GPT-4 (Part II)
The more things change, the more they stay the same
NOTE: This is Part II of a blog series that explores writing code with LLMs. See here for Part I of the series.
This post continues where we left off in Part I. It builds on the previous work to generate a fully functional vscode extension without human intervention.
All the code used in this experiment is open source and published on GitHub. Specifically:
A fork of smol-ai customized for creating VS Code extensions: smol-vscode-developer
VS Code extension: repository, published extension
Recap
Previously, we used GPT4 to generate code to create a vscode extension from scratch. This involved lots of iteration and manual intervention. Most of the issues came from a lack of coherence (variables generated in one file were named something different in another file) and GPT's knowledge cutoff date (it's only been trained on data that was available in September 2021).
To make up for these deficiencies, we start with an updated prompt that accounts for all the issues we encountered in part I.
Prompt from part I:
the app is: a vscode extension that lets the user adjust the heading level of the selected text. it should have three commands: increase heading level, decrease heading level, and set heading level
Important details:
- make sure there is a tsconfig.json file present
- created a shared `utils.ts` file that exports the `adjustHeadingLevel` method to adjust headings
- in `package.json`, use the `1.6.1` dependency for `vscode-test`
- make sure to include `.vscode/launch.json` as one of the mandatory files
- it should have a "extensionHost" launch task
- it should have "--disable-extensions" as one of the "args"
Updated Prompt:
a vscode extension that lets the user adjust the heading level of the selected text. it should have three commands: increase heading level, decrease heading level, and set heading level
created a shared `utils.ts` file that exports the `adjustHeadingLevel` method to adjust headings
when registering commands, use an identifier that would match the following regex: "extension.(increaseHeading|decreaseHeading|setHeading)"
Important details:
- make sure the following files are present:
- tsconfig.json
- .gitignore
- README.md
- src/test/runTest.ts
- make sure the following constraints are observed in ".gitignore"
- ignore javascript files
- make sure the following constraints are observed in "runTest.ts"
- import "mocha" directly (eg. `import Mocha from 'mocha'`)
- import glob as named export (eg. `import {glob} from 'glob'`)
- do not import modules that are not needed
- use node builtin "assert" library for assertions
- DO NOT use parameters that have implicit 'any' type - add proper types or add an explicit 'any' type
- make sure the following shared dependencies are present
- mocha for testing
- make sure that the README has the following sections:
- Overview: what this project is about
- Quickstart: how to use this extension
- Development Guide: how to develop this extension
- make sure the following dependencies are present in `package.json`
- @vscode/test-electron: ^2.3.2
- glob: ^7.1.4
- dependencies required for testing (eg. mocha)
- dependencies required for types (eg. @types/mocha, @types/glob)
- make sure the following dependencies are not present in `package.json`
- "vscode-test"
- make sure the following constraints are observed in `package.json`
- when referring to an extension command, use the format "extension.{command-name}". DO NOT use the extension name as the command prefix
- when adding "activationEvents", do not include "onCommand" directives
- make sure to include `.vscode/launch.json` as one of the mandatory files
- it should have a "extensionHost" launch task
- it should have "--disable-extensions" as one of the "args"
- the "preLaunchTask" task should be "npm: compile"
- make sure that you have tests for the user-specified functionality
- for all typescript code, generate correct imports according to `"esModuleInterop": true` is set in `tsconfig.json`
- when exporting modules, ALWAYS use a named export, NEVER a default export. if the module is the only export in the file, the module name MUST BE IDENTICAL to the filename
- when importing modules that we have written, ALWAYS import the names of modules we are using
- when importing modules, ONLY import modules that will be used
- when calling a function from 'shared_dependencies', ALWAYS use the given type signature
There are a lot of changes - let's go over each section and explain what's happening.
Specifying Files
We start by specifying all the files that GPT might miss. Files like '.gitignore' are not core to creating a vscode extension but necessary for developers to sanely manage a git repo. We include them directly in the prompt since we don't want developers to have to add these "boilerplate" details every time.
- make sure the following files are present:
- tsconfig.json
- .gitignore
- README.md
- test/runTest.ts
Specifying File Contents
We also need to add additional constraints depending on the file.
For example, the .gitignore
won't always ignore javascript unless explicitly specified in the prompt.
- make sure the following constraints are observed in ".gitignore"
- ignore javascript files
We also encounter issues when generating a README - whether or not it covers setup instructions is not deterministic. We can make it so by being prescriptive over what sections we expect.
- make sure that the README has the following sections:
- Overview: what this project is about
- Quickstart: how to use this extension
- Development Guide: how to develop this extension
Specifying Dependencies
We explicitly list out dependencies that GPT might miss. These involve dependencies introduced after GPT's cutoff date (eg. @vscode/test-electron
) and development dependencies
- make sure the following shared dependencies are present
- mocha for testing
- make sure the following dependencies are present in `package.json`
- @vscode/test-electron: ^2.3.2
- glob: ^7.1.4
- dependencies required for testing (eg. mocha)
- dependencies required for types (eg. @types/mocha, @types/glob)
NOTE: Having examples matters. Without the
(eg. ...)
in parenthesis, GPT generates NO dev dependencies
- @vscode/test-electron: ^2.3.2 - - dependencies required for testing + - dependencies required for testing (eg. mocha) - - dependencies required for types + - dependencies required for types (eg. @types/mocha)
We also list dependencies that GPT should not include. These are dependencies that have been deprecated since GPT's cutoff date.
- make sure the following dependencies are not present in `package.json`
- "vscode-test"
Specifying Custom Constraints
We explicitly tell GPT about changes that are not in its index. In this case, the fact that VSCode no longer requires onCommand
to be part of the extension manifest. We also prescribe how extensions are identified in the app.
- make sure the following constraints are observed in `package.json`
- when referring to an extension command, use the format "extension.{command-name}". DO NOT use the extension name as the command prefix
- when adding "activationEvents", do not include "onCommand" directives
NOTE: Order matters. Using the mental model of "if this then that" helps with specifying constraints. For the diff below, the first directive did not work but the second does.
- - do not include "onCommand" directives when adding "activationEvents" + - when adding "activationEvents", do not include "onCommand" directives
We also need to explicitly tell GPT about the preLaunchTask
as it has the tendency to generate npm: watch
when left to its own devices. Because the watch command never exits, launching the app in development mode would hang otherwise
- make sure to include `.vscode/launch.json` as one of the mandatory files
- it should have a "extensionHost" launch task
- it should have "--disable-extensions" as one of the "args"
- the "preLaunchTask" task should be "npm: compile"
Specifying Code
We explicitly tell GPT about what coding style to use. This is because GPT has no context for the contents of the files that it generates. Javascript has multiple ways of exporting a module (eg. named export vs default export). GPT will more likely than not use one style for exporting and a different one for importing.
- for all typescript code, generate correct imports according to `"esModuleInterop": true` is set in `tsconfig.json`
- when exporting modules, ALWAYS use a named export, NEVER a default export. if the module is the only export in the file, the module name should be identical to the filename
- when importing modules that we have written, ALWAYS import the names of modules we are using
- when importing modules, ONLY import modules that will be used
Specifying Type Signatures
We provide additional details when specifying shared dependencies. Just generating the function names is not enough as GPT will hallucinate different ways of calling the shared function.
The following is a modification of the original prompt to generate explicit type annotations for any shared functions.
...
Now that we have a list of files, we need to understand what dependencies they share.
Name and briefly describe what is shared between the files we are generating, including exported variables, data schemas, and function signatures.
Exclusively focus on the names of the shared dependencies, and do not add any other explanation.
+ For function signatures, include the function name and the input and output parameters. Add type annotations to the function signatures
This adds the following annotation to shared functions.
Shared dependencies:
- - adjustHeadingLevel (function name in utils.ts)
+ - adjustHeadingLevel function signature: adjustHeadingLevel(text: string, adjustment: number): string
NOTE: If you are generating code that does not have type signatures, you can still tweak the prompt to generate "typescript types for shared dependencies"
Specifying Test Details
We need to specify many more implementation details for testing. There are more changes here than in other sections for the following reasons:
VSCode made a major change in how tests should be written (after GPT's knowledge cutoff)
the libraries used in testing (eg.
glob
andmocha
) have changed major versions and function signatures since GPT knowledge cut offGPT does not generate dependencies for tests unless explicitly prompted
The following prompt tries to address most of these issues. We hardcode example values and get very prescriptive about what to do and not do
- make sure the following constraints are observed in "runTest.ts"
- import "mocha" directly (eg. `import Mocha from 'mocha'`)
- import glob as named export (eg. `import {glob} from 'glob'`)
- do not import modules that are not needed
- use node builtin "assert" library for assertions
- DO NOT use parameters that have implicit 'any' type - add proper types or add an explicit 'any' type
- make sure that you have tests for the user-specified functionality
NOTE: Despite the hints, the tests unfortunately still do not compile most of the time. I timeboxed how long I could spend getting this to work which means getting GPT to correctly generate tests will be an exercise left to the reader
Results
Running the prompt at the top of this writeup should result in a working vscode extension (most of the time). You can pull down the code from here and follow the README
to try it out yourself.
NOTE: The generated tests might need manual tweaking - the extension itself should function.
You can find the output of our generation here. This is also published in the vscode marketplace and you can download the extension here to try it out yourself.
To check if this would work for a different task, I also generated an extension to format markdown. The prompt is as follows:
a vscode extension that lets the user bold, italicise and strikethrough the selected markdown text. it should have three commands: "bold selection", "italicise selection", and "strikethrough selection"
when registering commands, use an identifier that would match the following regex: "extension.(boldSelection|italiciseSelection|strikethroughSelection)"
...
The ...
contains the same prompt details that we covered in this write-up. This worked on the first try (with the caveat that the tests do not and require tweaking)
Code for this extension here
Lessons Learned
little changes in the prompt have big changes in the outcome
ORDER Matters. GPT is a next-word predictor. It reads context from left to right. When you want GPT to do something, tell it when it should act before what it should do (eg. "when X happens, do Y" works much better than "do Y when X happens")
Examples help. Instead of saying
include test dependencies
, bias with an exampleinclude test dependencies (eg. mocha)
watch out for non-deterministic outputs
Be prescriptive when there is ambiguity. Treat GPT as a smart intern. Tell it exactly what to do, especially when there are multiple ways of doing something (eg. how to import/export modules in javascript)
Use types. This builds on being prescriptive. When you want GPT to reuse a function, make sure the context includes the function signature
do not trust and always verify
Verify the output. Ideally using automated tests. Even when you're explicit about everything, there is still a chance that GPT makes things up. In our case, even when the type signature of
adjustHeadingLevel
was set, GPT still generated an invalid usage of the function when creating a unit test
Parting Thoughts
The approach we took to generate code is what I call the "do-then-check" strategy. We tell GPT to do a thing, GPT does the thing, we check if the thing is right, and then we add anything that wasn't right as additional instructions to the prompt so that GPT does the thing right the next time.
Getting a functional solution for even a basic app using this strategy requires a huge amount of context. This is to overcome the following problems:
dealing with information that is not in the LLM index (past its knowledge cutoff date)
dealing with "coherence" - making sure that code generated across different runs did not conflict
Both issues reduce to the same issue - how do you find the right information to add to a limited context window?
Coincidentally, this is a problem space that I've been working in for the past decade. My initial startup was built around the mission of helping humans organize and make sense of information. Given that the information we have available far exceeds our own context window, how do we find the right information when needed?
I strongly believe that these problems are related. It is likely that the same means by which humans can deal with information overload also apply to LLMs.
Much more on this topic in the near future 😇