For the sake of accountability, I am trying to regularly writing check-ins on my work in progress.See all check-ins here.
A pilot with cybernetic arms, flight captain hat, standing at a drafting table drawing a blueprint, Stable Diffusion 1.5
Time's flown by, with much to share. Today I'll dive into a project I've been working on for the past two months, Draftpilot.
Ever since playing with ChatGPT, I've been fascinated by the potential of LLMs to assist with work. In my case, a large part of the work is writing code, and LLMs seem pretty good at that, so I've been working on building an AI junior engineer that can help me with my work.
If you've ever asked ChatGPT to write code, you'll know that it's an amazing experience, and yet it could be so much better.
- the first thing is that it gets a fair amount wrong. Sometimes even very big
things - for example, I asked it to write a python file parser and it gave me
some logic for matching
- it also has no clue about your project - what language you use, existing libraries, patterns, and functions
- it has no memory - every new request is starting from scratch
- and obviously it can't actually modify files in your codebase, compile and run tests, and so on
As a result, as a pretty good engineer, I only really go to ChatGPT when I need help in an unfamiliar domain - and there, I find it difficult to trust the answers I get. (Recent examples: building a slide-out panel in React with flexbox, questions about Github's API, fixing an infinite loop in a function). For most smaller tasks, I'll just do it rather than ask ChatGPT.
Draftpilot is a web app where you can have a conversation that results in code changes and pull requests. Here's a request I made today:
Here's what it made:
Of course, this wasn't the finished product, but I actually need to iterate myself to figure out what I wanted. Draftpilot gave me a great first-draft (while I did other stuff) that helped move the project forward.
Chat with your code
Draftpilot also has a code-aware chat-mode for larger projects where you need advice more than code:
This is similar to Github's Copilot Chat feature, which is in private beta, but in my experience:
- This uses GPT-4 and consistently provides more thoughtful answers
- It can read your entire codebase, not just the few files you've worked on recently
- If you like a suggestion, you can say "do it" to switch to editing mode
- It remembers things - by giving the bot feedback, it can recall these learnings in future requests.
Of course, it's still the early days for all of these efforts, so I'm not going to say it's always going to be better than Github. Probably there's room for multiple interfaces - one in your IDE, one in your browser, one in your terminal, one in Slack, one on your phone.
How it works
There are two core ideas in tension for Draftpilot:
AI-assisted coding needs to be conversational. Software engineering needs to be precise, yet natural language is full of ambiguity, so an interactive dialogue is crucial to getting the best result. (I wrote an article on chatting with AI that explores this in more detail)
And yet, as much should be automated as possible. AI "thinking" is cheap compared to humans, so it's worth spending extra AI cycles to improve the quality of output that gets to the human.
When you create a request, Draftpilot checks out your repository and spins up a few agents. The first plans a course of action, with the ability to do research by searching the codebase and internet. Next, an editing agent applies the plan to the code - this is one of the hardest part of the entire endeavor, and still not perfect, as LLMs don't like following exact directions. Finally, a validating agent checks that the code compiles and does what it says.
All of this is done by a worker process which currently runs in the cloud but soon (as you can read above) will be able to be run on your file system, letting you partner with AI to modify your working directory directly.
Draftpilot will attempt any request you throw at it, but due to how it works, there are a few limitations to keep in mind:
for cost reasons, there is a limit on how many files it will edit, hence it's not good for whole-codebase refactors. I don't really want to teach it how to
it's not good at vague and open-ended requests - but then again, neither is a junior engineer. I've chosen to focus on supporting detailed prompts for now. Perhaps in the future a "product manager agent" can be the first-line of conversation helping users shape their requests.
it struggles at modifying complex HTML / JSX. For example "put a button under this other button". I think this is for two reasons - navigating a tree of nested divs is challenging (imagine doing it yourself without looking at the browser), and it's hard to match a user's verbal description to the code.
The ideal format of this type of edit might be a chrome extension where you can pick the exact location in the DOM.
Finally, I'm excited to announce that Draftpilot is open source! You can find the main repository here: draftpilot/draftpilot.
Once the web UI / backend is working well in hosted form, I plan to open-source that as well so companies can have an entirely self-hosted version of Draftpilot.
One of the major complaints of AI tools is the lack of transparency into where your data goes and who has access to it. Self-hosting means that only you and OpenAI will have access to your code.
(While I can't make any claims for OpenAI, I feel safe with the fact that (1) they have such high volumes of requests that it'll be difficult for them to isolate and steal a single user's code, and (2) as the largest AI provider, they'll face all the scrutiny of regulators and the public.)
Sign me up!
I'm onboarding users to Draftpilot now. If you're interested, reach out at firstname.lastname@example.org and I'll get you set up.
All hail our robot overlords.
Daft-punk style robot assistant, jet pilot helment, writing software, hacker house, glowing computer screen, Stable Diffusion 1.5