Faking human music with computers.
I got really interested in the pop2piano program when I came upon it. I love piano songs, and the idea that I found the most interesting was that it generates a midi file! That means you can edit the result as any score (using for example MuseScore), and fix the errors. As in the AI drawing case, it can then serve as a base for a human to work on. It opens a path to assistive AI rather than purely creative in the musical realm.
However, the program was offered as an online Google collab, rather than something to run locally. So I first spent an evening adapting the code in a fork to have it run as a standard CLI tool, that you can install with pipx, to use in scripts. Once I was done I applied the result on one of our band’s song, Heroic. The results were not too bad:
Elated, I thought about making a whole cover album with it, so I run it on other Cosmoose songs, and the results were absolutely horrible. The training set is ‘pop’, and clearly Cosmoose is ‘pop’, but there are different ways to interpret this term. In terms of structure, a fairly simple structure with lots of repetitions. In terms of production and arrangement, you expect some common instruments and layers. And, maybe most notably, you expect some common chords (the infamous I–V–vi–IV progression). I think these last two points are not standard enough that it entirely trips the algorithm’s expectation of what a song should be.
That brings us to the usual problem with AI: there are more than 50 songs that Cosmoose released (including remixes). To have an album, it would be necessary to find at least 10 ‘good’ covers. For each song, the result is nondeterministic, and there are 23 different variants to choose from, and 50x23 is already quite a lot. In simple words, you have to sift through a mass of garbage to find the good stuff.
And let me tell you, listening to the worst examples gets tiring very fast. The worst tendency it has is banging the same note over and over again, so that’s what you’ll hear mostly with the bad outputs.
Even if you manage this, there will still be a lot of mistakes requiring fixing in the outputs. I’ve not entirely killed the idea, but I don’t see it as reasonable yet.
So I wondered what other songs would work well. The given examples start from mainstream K-pop and Avril Lavigne, which would not interest me much. So I picked a few songs spanning a wide spectrum of genres to test pop2piano with only one chance per song. There are a few genres that obviously didn’t work (anything too close to breakbeat or glitch) but I could not get any decent result on metal either. Well not really if we consider that a song by Daisuke Ishiwatari and one by Rhapsody worked pretty well, but these are more conventional and melodic forms of metal. I’ve tried Rolo Tomassi, Gost, Wolfmother, Static-X and a bunch of others, but nothing was generally listenable. Because I like Mick Gordon a lot I tried many songs by him and the result was consistently terrible.
Still, some songs went surprisingly well. I thus wondered: is there a way to share the ‘good outputs’ on YouTube without too much effort? It would require to make an image cover, a video, and the upload all automatic. Turns out that most of these steps can be done by simple command lines, leveraging Stable Diffusion, ImageMagick, FFmpeg. So I just needed some bash and Python glue to make all things go nicely one into the other. The whole pipeline is in this gist.
So, how good the results were? One indication could be how many original songs are detected by the YouTube algorithm. These songs are marked as using copyrighted content. At this point, we have:
Over 20 songs, that’s 1 over 4 that is matched to the original. Interestingly, FLiP is matched to the anime OP that uses the song rather than the song itself (which has under 1K view on YouTube). To me WEG is most surprising both in how the algorithm managed to match the original given how non-standard it is, and being the only real indie result.
For the illustrations, note that these are all ‘first’ results except in two cases where the output was not really SFW – “scarred and battled” was too gory, “angel of doom” too damn lewd. I find a lot of these bad AI thumbnails to be frankly hilarious. Just look at these freakish Kirby abominations!
It was a fun project, but pop2piano still isn’t quite there yet. But no AI system is, for now at least. For instance, an image generated by Midjourney can look nice, but seeing a whole gallery of such images tend to have a very jarring effect because of the particular way the AI errors tend to be. This very inhuman effect feels like looking down an endless well of SEO spam. Past a certain time, you wonder what is even the point of art, but not in a thought-provoking, questioning and productive way, but in a very nihilistic one, where you feel helpless under the impending deluge of self generated meaningless content. Or maybe that’s just me.
At least, now there’s an automated pipeline for producing content. With an automated pipeline for consuming content, we can close the loop and all go back to doing our stuff.