Can you compare two similar songs based on their wav files?
I have a large library of older music (1920s, 30s, 40s, etc.) with a lot of duplicates, and I would like to identify duplicates and organize them using the same MP3 tag information. Since the music was recorded some time ago, while they may seem like a human ear, their recording may be slightly different (quieter, more static, etc.).
I am currently parsing some of the music with pydub
and can generate a wav file, remove the silence at the beginning and end of the songs and compress the dynamic range of the music, but I would be able to compare the wav files so that if they are similar enough I can assume that they are the same and give them the same tags.
Is it possible to run the wav file data through something like scipy
and numpy
to compare / correlate the data with good accuracy using something like Fourier transform / FFT? I know it can be done with the system, for example dejavu
, but it is quite intensive and uses a lot of storage in the database, and I have access to raw files and not the microphone, so I would rather do something simpler.
source to share
You need an audio hash or audio footprint , they are all "heavy" (resource intensive) as they have to decompress audio and extract data.
source to share