Can you compare two similar songs based on their wav files?

I have a large library of older music (1920s, 30s, 40s, etc.) with a lot of duplicates, and I would like to identify duplicates and organize them using the same MP3 tag information. Since the music was recorded some time ago, while they may seem like a human ear, their recording may be slightly different (quieter, more static, etc.).

I am currently parsing some of the music with pydub

and can generate a wav file, remove the silence at the beginning and end of the songs and compress the dynamic range of the music, but I would be able to compare the wav files so that if they are similar enough I can assume that they are the same and give them the same tags.

Is it possible to run the wav file data through something like scipy

and numpy

to compare / correlate the data with good accuracy using something like Fourier transform / FFT? I know it can be done with the system, for example dejavu

, but it is quite intensive and uses a lot of storage in the database, and I have access to raw files and not the microphone, so I would rather do something simpler.

+3


source to share


1 answer


You need an audio hash or audio footprint , they are all "heavy" (resource intensive) as they have to decompress audio and extract data.



+1


source







All Articles