Music Enhancement using AI Models

Hi everyone! Today I’d like to share some experiments I’ve been working on over the past two months. My goal was to explore how far AI can be pushed in restoring an audience tape recording of a live concert.

The source I chose was a Dire Straits show, recorded from the audience on June 22, 1983. You can check out the full concert recording here: YouTube link.

I picked this concert for two reasons: first, it’s one of the few complete audience recordings from that tour, and second, Dire Straits’ sound lends itself well to restoration. The clean guitar tones, keyboards, piano, drums, bass, and occasional saxophone give a wide but manageable palette. Distorted guitars, by contrast, are much harder to work with due to their dense harmonics.

One limitation is that I only had access to compressed audio from YouTube, so some detail was already lost to encoding. That made the challenge even more interesting.

The Process

When working with AI, there’s no single “correct” path. Here’s the workflow I followed:

Step 1: Train a custom model
I built models from scratch using standard deep learning architectures (U-Net, self-attention). Training datasets were relatively small, prepared in Cubase AI 7 by EQ’ing well-recorded material to resemble the audience tape. Augmentations like random EQ and speed changes helped increase variation. The goal was to teach the model to reconstruct a cleaner signal from noisy input.

Step 2: Apply the model
Run the trained model on the audience recording to enhance the base signal.

Step 3: Source separation with Demucs v4
I used Demucs v4 (a music source separation model by Meta) to split the improved audio into 4 stems: vocals, drums, bass, and all other instruments.

Step 4: Drum separation
I trained another model to break down the drum stem into individual elements (kick, snare, toms, overheads).

Step 5: Vocal improvement
A dereverberation model was applied to vocals to reduce reverb and add body. In places where the “cleaned” version introduced artifacts, I kept the original vocal track instead. This model was based on CycleGAN.

Step 6: Add high-frequency detail with AudioSR
AudioSR is an audio super-resolution model that can add high-end content. I used it cautiously on the instrumental tracks (everything except bass, drums, vocals). Occasionally, it confused piano notes with plucked strings, but the artifacts were minor.

Step 7: Mixing
Finally, I mixed everything in Cubase AI 7, using free plugins from Melda Productions, Voxengo, and Tokyo Dawn Records. This stage included compression, EQ, and plenty of manual volume automation.

Results

The end product was a more focused, polished sound that still retained the “live audience” character. A few highlights:

Vocals became much more present.
Drums, especially cymbals and hi-hats, had clearer detail.
Bass regained its low-end weight, thanks to harmonics that the model used to reconstruct missing fundamentals.
The parts that benefit the most are those with just a few drum hits (such as hi-hat only) and when the band plays at a lower volume. In these sections, the restored details are quite good: the bass guitar/synth benefits from the model, while the high end is enhanced by AudioSR.

Some new nuances are audible that weren’t clear in the raw audience tape. However, if the original recording masked or lost notes due to distortion, reverb, or mic limitations, those cannot truly be recovered. As one of my old teachers used to say: “What the microphone didn’t capture won’t be restored.”

Verdict

AI can’t work miracles, but it does push the boundaries of what’s possible in music restoration. With enough iteration and patience, it’s possible to achieve results that were out of reach even a decade ago.

Of course, there are tradeoffs. On instruments like bass or kick drum, missing fundamentals can often be inferred and reconstructed. But for harmonically rich instruments like piano (where 10+ notes may sound simultaneously), full restoration is still extremely challenging.

Looking ahead, I believe AI restoration will lean more on generative techniques. That means audience recordings may one day sound nearly professional — but with a caveat: some of the “restored” details may actually be fabricated, not historically accurate.

Listen for Yourself

Cheers,
Gabriel Fernandez

Notes: Used ChatGPT for correcting the syntax and writing.

Sorry for the high-end excess on the restored version. I didn’t perceive it as excessive on my studio monitors compared to how it sounded on consumer-grade audio players (TVs, portable speakers, etc.).

AI Audio and music software

The Process

Results

Verdict

Listen for Yourself

One thought on “Music Enhancement using AI Models”

Leave a Reply Cancel reply