Muxing compressed frames from VTCompressionSession with audio data into MPEG2-TS container for network streaming

Question

Muxing compressed frames from VTCompressionSession with audio data into MPEG2-TS container for network streaming

I'm working on a project that involves capturing H.264 encoded frames with a VTCompressionSession in iOS8, muxing them using AAC or PCM audio from a microphone into playable MPEG2-TS, and streaming over a socket in real time with minimal latency (i.e. i.e. (almost) no buffering).

After watching the presentation for the new VideoToolbox in iOS8 and doing some research, I think it's safe to assume that:

The encoded frames you get from the VTCompressionSession are not in Appendix B format, so I need to convert them somehow (all the explanations I've seen so far are too vague, so I'm not sure how you do it (i.e. .e. replace heading "3 or 4 bytes with heading length")).
The encoded frames you receive from the VTCompressionSession are actually an elementary stream. So I will first need to convert them to a packet elementary stream before it can be multiplexed.
I will also need an AAC or PCM elementary stream from the microphone data (I assume PCM will be easier since no encoding is required). Which I don't know how to do.
I also need a library like libmpegts for multiplexing packet elementary streams. Or perhaps ffmpeg (using libavcodec and libavformat libraries).

I am new to this. Can I get guidance on what would be the correct approach to achieve this?

Is there an easier way to implement this using Apple APIs (like AVFoundation)?

Is there any similar project I can take as a reference?

Thanks in advance!

+3

ios video-streaming audio h.264

user1327848 06 dec. 14 at 18:46

source to share

1 answer

nevyn · Answer 1 · 2015-05-13T00:23:00+0000

I also need a library like libmpegts for multiplexing packet elementary streams. Or perhaps ffmpeg (using libavcodec and libavformat libraries).

From what I can gather, there is no way to mux TS with AVFoundation or related frameworks. While it looks like it can be done manually, I am trying to use the Bento4 library to accomplish the same task as you. I assume libmpegts, ffmpeg, GPAC, libav or any other library will work too, but I don't like their APIs.

Basically, I am following Mp42Ts.cpp , ignoring the Mp4 parts and just looking at the Ts parts.

This question fooobar.com/questions/2182949 / ... has a whole outline of how to feed his video, and how to implement his audio. If you have any questions, please email me with a more specific question.

Hope this serves as a good starting point for you.

I will also need an AAC or PCM elementary stream from the microphone data (I assume PCM will be easier since no encoding is involved). Which I don't know how to do.

Getting microphone data as AAC is very easy. Something like that:

AVCaptureDevice *microphone = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
_audioInput = [AVCaptureDeviceInput deviceInputWithDevice:microphone error:&error];

if (_audioInput == nil) {
    NSLog(@"Couldn't open microphone %@: %@", microphone, error);
    return NO;
}

_audioProcessingQueue = dispatch_queue_create("audio processing queue", DISPATCH_QUEUE_SERIAL);

_audioOutput = [[AVCaptureAudioDataOutput alloc] init];
[_audioOutput setSampleBufferDelegate:self queue:_audioProcessingQueue];


NSDictionary *audioOutputSettings = @{
    AVFormatIDKey: @(kAudioFormatMPEG4AAC),
    AVNumberOfChannelsKey: @(1),
    AVSampleRateKey: @(44100.),
    AVEncoderBitRateKey: @(64000),
};

_audioWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:audioOutputSettings];
_audioWriterInput.expectsMediaDataInRealTime = YES;
if(![_writer canAddInput:_audioWriterInput]) {
    NSLog(@"Couldn't add audio input to writer");
    return NO;
}
[_writer addInput:_audioWriterInput];

[_captureSession addInput:_audioInput];
[_captureSession addOutput:_audioOutput];

- (void)audioCapture:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
    /// sampleBuffer contains encoded aac samples.
}

I am assuming that you are already using AVCaptureSession for your camera; you can use the same microphone capture session.

Muxing compressed frames from VTCompressionSession with audio data into MPEG2-TS container for network streaming

More articles: