Muxing compressed frames from VTCompressionSession with audio data into MPEG2-TS container for network streaming
I'm working on a project that involves capturing H.264 encoded frames with a VTCompressionSession in iOS8, muxing them using AAC or PCM audio from a microphone into playable MPEG2-TS, and streaming over a socket in real time with minimal latency (i.e. i.e. (almost) no buffering).
After watching the presentation for the new VideoToolbox in iOS8 and doing some research, I think it's safe to assume that:
-
The encoded frames you get from the VTCompressionSession are not in Appendix B format, so I need to convert them somehow (all the explanations I've seen so far are too vague, so I'm not sure how you do it (i.e. .e. replace heading "3 or 4 bytes with heading length")).
-
The encoded frames you receive from the VTCompressionSession are actually an elementary stream. So I will first need to convert them to a packet elementary stream before it can be multiplexed.
-
I will also need an AAC or PCM elementary stream from the microphone data (I assume PCM will be easier since no encoding is required). Which I don't know how to do.
-
I also need a library like libmpegts for multiplexing packet elementary streams. Or perhaps ffmpeg (using libavcodec and libavformat libraries).
I am new to this. Can I get guidance on what would be the correct approach to achieve this?
Is there an easier way to implement this using Apple APIs (like AVFoundation)?
Is there any similar project I can take as a reference?
Thanks in advance!
source to share
I also need a library like libmpegts for multiplexing packet elementary streams. Or perhaps ffmpeg (using libavcodec and libavformat libraries).
From what I can gather, there is no way to mux TS with AVFoundation or related frameworks. While it looks like it can be done manually, I am trying to use the Bento4 library to accomplish the same task as you. I assume libmpegts, ffmpeg, GPAC, libav or any other library will work too, but I don't like their APIs.
Basically, I am following Mp42Ts.cpp , ignoring the Mp4 parts and just looking at the Ts parts.
This question fooobar.com/questions/2182949 / ... has a whole outline of how to feed his video, and how to implement his audio. If you have any questions, please email me with a more specific question.
Hope this serves as a good starting point for you.
I will also need an AAC or PCM elementary stream from the microphone data (I assume PCM will be easier since no encoding is involved). Which I don't know how to do.
Getting microphone data as AAC is very easy. Something like that:
AVCaptureDevice *microphone = [AVCaptureDevice defaultDeviceWithMediaType:AVMediaTypeAudio];
_audioInput = [AVCaptureDeviceInput deviceInputWithDevice:microphone error:&error];
if (_audioInput == nil) {
NSLog(@"Couldn't open microphone %@: %@", microphone, error);
return NO;
}
_audioProcessingQueue = dispatch_queue_create("audio processing queue", DISPATCH_QUEUE_SERIAL);
_audioOutput = [[AVCaptureAudioDataOutput alloc] init];
[_audioOutput setSampleBufferDelegate:self queue:_audioProcessingQueue];
NSDictionary *audioOutputSettings = @{
AVFormatIDKey: @(kAudioFormatMPEG4AAC),
AVNumberOfChannelsKey: @(1),
AVSampleRateKey: @(44100.),
AVEncoderBitRateKey: @(64000),
};
_audioWriterInput = [AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio outputSettings:audioOutputSettings];
_audioWriterInput.expectsMediaDataInRealTime = YES;
if(![_writer canAddInput:_audioWriterInput]) {
NSLog(@"Couldn't add audio input to writer");
return NO;
}
[_writer addInput:_audioWriterInput];
[_captureSession addInput:_audioInput];
[_captureSession addOutput:_audioOutput];
- (void)audioCapture:(AVCaptureOutput *)captureOutput didOutputSampleBuffer:(CMSampleBufferRef)sampleBuffer fromConnection:(AVCaptureConnection *)connection
{
/// sampleBuffer contains encoded aac samples.
}
I am assuming that you are already using AVCaptureSession for your camera; you can use the same microphone capture session.
source to share