How to determine if an audio track is Dolby Pro Logic II downmix

I'm trying to figure out if there is a way to determine if an audio encoded AAC track is encoded with Dolby Pro Logic II data. Is there a way to examine the file so you can see this information? For example, I encoded a media file in Handbrake with (truncated audio options) -E av_aac -B 320 --mixdown dpl2

and this is the output of the audio track which mediainfo

shows:

Audio #1
ID                                       : 2
Format                                   : AAC
Format/Info                              : Advanced Audio Codec
Format profile                           : LC
Codec ID                                 : 40
Duration                                 : 2h 5mn
Bit rate mode                            : Variable
Bit rate                                 : 321 Kbps
Channel(s)                               : 2 channels
Channel positions                        : Front: L R
Sampling rate                            : 48.0 KHz
Compression mode                         : Lossy
Stream size                              : 288 MiB (3%)
Title                                    : Stereo / Stereo
Language                                 : English
Encoded date                             : UTC 2017-04-11 22:21:41
Tagged date                              : UTC 2017-04-11 22:21:41

      

but I can't tell if there is anything in this release that would suggest it is encoded with DPL2 data.

+3


source to share


2 answers


TL: dr; it is possible it is possible; it might be easier if you are a programmer.

Since the encoded information is just a stereo pair, there is no guaranteed way to detect a Dolby Pro Logic II (DPL2) signal unless you specifically store your own metadata that says "this is a DPL2 file". But you can probably make a pretty good guess.

All older analog Dolby Surround formats, including DPL2, preserve surround information in two channels by inverting the phase of the surround or surround and then mixing them with the original left and right channels. Dolby Surround decoders, including DPL2, attempt to reconstruct this information by inverting the phase of one of the two channels, and then look for similarities in these pairs of signals. This is either done trivially, as in Dolby Surround, or these similarities are artificially biased to move much further left or right, or left or right, as in DPL2.

So the trick is to determine if important data is stored in the surround channel (s). I'll draw a method for you that might work, and I'll try to express it without writing any code, but it's up to you how to implement it and to your liking.

  • Trim the first N seconds or so of the program content into a stereo file, where N is between one and thirty. Call this file.
  • Mix the stereo input channels with a new mono file at -3dB per channel. Call this file Center.
  • Split left and right input channels into separate files. Name these left and right.
  • Invert the right channel. Call this file RightInvert.
  • Mix the Left and RightInvert channels with the new mono file at -3dB per channel. Call this file Surround.
  • Determine the RMS and peak dB of the Surround file.
  • If the RMS or Peak DB of the Surround file is below "tolerance", stop; the original file is either monaural or centralized and therefore does not contain surround information. You need to experiment with multiple DPL2 sources and without DPL2 to find out what the tolerances are, but after a dozen or so files, the numbers should be clear. I'm guessing around -30 dB or so.
  • Invert the Center file to a new file. Call this file CenterInvert.
  • Mix the CenterInvert file into a 0dB Surround file (both CenterInvert and Surround must be mono). Call this new file SurroundInvert.
  • Determine the RMS and dB peak of the SurroundInvert file.
  • If RMS and / or dB SurroundInvert peak is below "tolerance", stop; your original source contains illuminated left or right front information, not information about the surrounding space. You will need to experiment with multiple DPL2 and non-DPL2 sources to find out what the tolerances are, but after a dozen or so files the numbers should be clear - I'm guessing around -35dB or so.
  • If you've gotten to this, your original input likely contains information about the surroundings and is therefore likely a member of the Dolby Surround family of encodings.


I wrote this algorithm in such a way that you can perform each of these steps with a special command in sox . If you'd like to be more accommodating, instead of doing the RMS / Peak step in sox, you can run ebur128 and check your LUFS levels against tolerances. If you want to be even more attractive by creating Surround and Center files, you can filter out all frequencies above 7 kHz and emphasize them, just like a real DPL2 decoder.

To keep this algorithm simple, I sketched it completely in the amplitude area. The SurroundLevel file calculation is likely to be much more accurate in the frequency domain if you know how to calculate the magnitude and angle of the FFT bits and you are using windows of 30 to 100ms. But this cheapo version above should get you started.

One last caveat. AAC is a modern psychoacoustic codec, which means it enjoys playing games with stereo phasing and rendering to achieve compression. So I think it's likely that the simple act of encapsulating DPL2 into an AAC stream is likely to embed some of the images present in DPL2. To be frank, neither DPL2 nor AAC belongs to anywhere in this pipeline. If you must save an analog stream originally encoded with DPL2, do so in a lossless format such as WAV or FLAC, not AAC.

At the time of this writing, the operational concepts of Dolby Pro Logic (I) are here . These basic concepts still apply to DPL2; operating concepts for DPL2 here .

+3


source


If a file has more than one channel, you can assume with some confidence that they are used for surround sound purposes, although they may just be multiple tracks. In this case, it falls to the game system to do with the channels as it "thinks" best. (if the file header doesn't tell you what to do)

But your file is stereo. If you want to know if this is a virtual surround file, you can look in the header of the encoder field to see which encoder was used. It might help a little, though not much. Basically the encoder field is left blank, and the second is that the encoder does not have to be the same as the recorder that mixes the surround data. That is, the recorder will first create raw PCM data, and then feed it to some encoder to create a compressed file. (AAC or whatever) Also, there are many applications and versions that can change, so the encoder field can keep track of anything that would be frustrating.

However, you can deduce with more than 60% certainty whether something is virtual or not by examining the data. It will be an advanced DSP and even machine learning can be used for speed. You will need to find out if the stereo signals contain certain HRTFs (Head-Related Transfer Function). This can be achieved by examining the differences in intensity and delays between the same sound appearing in the time domain and harmonic characteristics (characteristic frequency changes) in the frequency domain. You will need to do both, because one without the other can just tell you that something is very good stereo, not a virtual environment. I don't know if there are any HRTFs that have already been mapped, or if you will need to do it yourself.

This is a very difficult decision that takes a long time to get right. Also performance will be problematic.



With this method, you can also mute stereo downmixing to near-original surround channels. But other methods are used to convert stereo to surround, and they sound good.

If you are determined to perform such a discovery, devote six months or more to the hard work, if there are no HRTF functions, a few weeks if they are, prepare for a lot of stress and I wish you the best of luck. I did something similar. This is a killer.

If you want to use an out-of-the-box solution, then there won't be an answer to your question, unless the header gives you an encoder field, and the encoder is different and is known to only be used to convert surround sound to stereo. I don't think anyone did it from the actual data as I described it, or if they did, it is part of a commercial product. Doing what you want is usually not required, but it can be done.

Ow, BTW, try googling HRTF inversion, that might provide some help.

+2


source







All Articles