What Is Audio to MIDI? How AI Turns Sound into Notes

You record a guitar riff on your phone, or you hum a melody that’s been stuck in your head all day. Now you want the actual notes: something you can edit, swap to a piano, slow down, or turn into sheet music. That jump from a sound recording to playable notes is exactly what “audio to MIDI” does. This guide explains what MIDI is, how it differs from an audio file, and how AI listens to a recording and writes out the notes.

First, what is MIDI?

MIDI stands for Musical Instrument Digital Interface. The key thing to understand is that a MIDI file does not contain any recorded sound. It contains instructions.

Think of MIDI like sheet music for computers. Instead of storing the audio of a note being played, it stores facts about each note:

A MIDI file is a list of these note events. When you press play, your software reads the instructions and chooses an instrument to perform them. That’s why the same MIDI file can sound like a grand piano, a synth bass, or a string section. The notes stay the same; only the voice playing them changes.

Audio file vs. MIDI file: why the difference matters

An audio file (MP3, WAV, M4A) is a recording. It’s a frozen snapshot of sound waves, baked together. If a recording has a piano and a voice mixed down, you can’t cleanly pull the piano back out, and you can’t change which piano it is. You can trim it or add effects, but the performance itself is locked.

A MIDI file is editable down to the individual note. Because it stores instructions rather than sound, you get a few things an audio recording can’t give you:

That editability is the whole point. Audio tells you what was played. MIDI lets you change it.

So what does “audio to MIDI” mean?

Audio to MIDI is the process of listening to a recording and writing out the notes it contains as a MIDI file. The technical name for this is automatic music transcription, often shortened to AMT. It’s the machine doing the job a trained musician does when they listen to a song and write down the notes by ear.

You feed in a sound, whether that’s a song to MIDI conversion, a single instrument line, or even humming to MIDI, and you get back a standard .mid file. You can try it right now and convert audio to MIDI in your browser, with nothing to install.

How does the AI actually do it?

At a high level, an audio-to-MIDI model has to answer three questions about the sound, over and over, moment by moment:

  1. What pitches are sounding? The model analyzes the frequencies in the audio. A musical note has a fundamental frequency plus a stack of overtones, and the model learns to recognize those patterns and map them back to specific notes.
  2. When does a note begin? This is called onset detection. The AI looks for the moment a new note is struck, which is what separates two repeated notes from one long held note.
  3. How long does the note last? Once a note starts, the model tracks how long that pitch keeps sounding before it stops, giving you the note’s duration.

Put those together for the whole recording and you get a grid of notes with pitch, start time, and length. That grid becomes the MIDI file.

The harder trick is hearing more than one note at a time. A single melody line is monophonic, one note at a time, and is the easiest case. A chord, where several notes ring together, is polyphonic, and pulling apart overlapping pitches that share overtones is genuinely difficult. Modern models are trained on large amounts of music specifically to handle this.

The converter on this site uses Spotify’s open-source Basic Pitch model, which runs in your browser through TensorFlow.js. Basic Pitch is polyphonic, so it can hear chords and multiple simultaneous notes, and it’s instrument-agnostic, meaning it isn’t locked to one type of sound. It was released by Spotify’s Audio Intelligence Lab and is open source. Because everything runs client-side, your audio is decoded locally and never uploaded to a server.

What audio to MIDI is good for

Once you have the notes as MIDI, a lot opens up:

The output is a standard .mid file, so it opens in the tools you already use: DAWs like Ableton, FL Studio, Logic, and GarageBand, and notation apps like MuseScore.

The honest limits

Audio to MIDI is impressive, but it is not magic, and it helps to know where it struggles.

The cleaner and more isolated the audio, the better the result. A solo piano recording, a single vocal line, or one instrument played clearly will transcribe far more accurately than a dense, fully produced track. When many instruments, vocals, reverb, and percussion are layered together, their frequencies overlap and the model has to guess, so a busy full mix is the hardest case. Drums and heavily distorted or noisy sounds also don’t map neatly onto clean pitched notes.

So expect to do some cleanup. Even a good transcription usually needs a few fixes in your DAW: a stray note here, a timing nudge there. The tool gets you most of the way, and editable MIDI is exactly what lets you fix the rest.

If you want to try it on your own recording, here’s a walkthrough: how to convert audio to MIDI step by step. The whole thing is free, needs no sign-up, and runs entirely in your browser.