Sharpen your AI tools for dull work

5 September 2024

Give me six hours to chop down a tree, and I will spend the first four sharpening the axe.
Apocryphal Abraham Lincoln

This modern proverb is often discussed in programming context as learning to use the most effectively the most foundational tools that might provide a speedup that compounds. These can be mastering:

typing speed
shell tools
an efficient text editor (vim/emacs)
new tools that outperform older ones
a good work process, to maximize productivity and minimize interruptions (getting things done, etc.)

Any of these represent a significant part of a working day, so efficiency should result in a greater productivity during the day, every day, resulting in greater gains in the long run.

A tree asking to be chopped down

Where the analogy completely fails is that sharpening an axe has predictably good results (or, not sharpening an axe has predictably terrible results). It is hard to evaluate how much more emacs might really provide over VS Code (or other “modern” IDEs), especially given how much yak-shaving there is in the community. For new tools the situation is worse, as the issues with the new technology might not be documented at all, requiring to painfully learn to work around these. It is hard to estimate how much time this will take.

AI throws a wrench in that axe production line

For example, typing speed matters much less because AI autocomplete is pretty good, which also reduces the gain from text editing speed. Difficult shell commands can quickly be AI generated, requiring minimal knowledge to understand them. New technologies can quickly be explored by letting the AI write the code. Why invest time in a learning a skill that might be obsolete in a few ~~years~~ months?

AI might be the best sharpening tool there is

Except that the results might be unreliable.

In the case of a shell script, a simple rm -rf $VAR/ could wipe the whole system if the variable is not set. Regular expressions can be very tricky, so introducing AI-generated ones in the code might result in very tricky bugs down the line. Would you add ^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$ to your codebase?

For productivity; who would let the AI classify and respond to emails, unsupervised? It might miss implicit context or misunderstand the subtle nuances of the language.

The most evident problem is that it might generate answers that look very convincing, but ultimately don’t work. For example, in one case ChatGPT suggested a Python library to handle dynamic arguments. After checking the library, that did not seem to be the case. When asked about this, it suggested a workaround by overriding some internal classes. Yet again, an implementation limitation made the code crash. While looking for ways to work around this limitation in the debugger, ChatGPT stopped being helpful and repeated the same hacks with absolute confidence. I got back to the most foundational tool of all, the RTFM¹: it explained that dynamic arguments were not supported by design.

I wrote a pipeline to help proofread articles, but the results are underwhelming. My favorite proofreader rated the help as very minimal.

While working on a game feature using an LLM², I wrote a prompt that worked very well. But then a very small variation in that prompt failed 19 times over 20, without any “logical” reason for it to do so. After small tweaks, the variation also became accurate.

Drawing is another case where it looks like it can really help. But obtaining quality results requires a lot of time to fine-tune a prompt, explore generations, and clean up. Meanwhile, the process is still exploratory rather than directed, so drawing is still more efficient to get a determined (imaginative) result. Recently Flux has shown stunning result, so the situation might improve soon enough. Although that depends on other tools such as ControlNet to follow suit.

Keyword is “might”

New LLMs are released at a great pace, and new ground-breaking techniques are introduced every month. The frontier of what is possible keeps getting expanded, yet AI output is still in that awkward phase that might be oddly incredible in some regards yet completely unacceptable in others. In any case, any result should be rigorously scrutinized.

It is fun, can do cool things that challenge the way we think about different domains, but AI is not an “axe sharpening” tool yet. For me, without a doubt, AI has been the biggest time-sink of all.

Read The Flamboyant Manual ↩
Large Language Model, or ChatGPT-like models ↩