Detailed Explanation of Android Native Codec Interface MediaCodec

PS: Some ideas can be started first, and gradually improving them is a good choice.

MediaCodec is a codec component in Android used to access the underlying codec provided, typically used with MediaExtractor, MediaSync, MediaMuxer, MediaCrypto, MediaDrm, Image, Surface, and AudioTrack. MediaCodec is almost standard for hardware decoding in Android players, but whether a software or hardware codec is used is related to the configuration of MediaCodec. The following will introduce MediaCodec from several aspects, with the main content as follows:

Types handled by MediaCodec
The process of encoding and decoding with MediaCodec
The lifecycle of MediaCodec
Creating MediaCodec
Initializing MediaCodec
Data processing methods of MediaCodec
Adaptive playback support
Exception handling of MediaCodec

Types handled by MediaCodec#

MediaCodec supports processing three types of data: compressed data, raw audio data, and raw video data. These three types of data can be processed using ByteBuffer, which is referred to as the buffer mentioned later. For raw video data, Surface can be used to improve codec performance, but raw video data cannot be accessed directly. However, raw video frames can be accessed through ImageReader, and corresponding YUV data and other information can be obtained through Image.

Compressed Buffer: The input buffer for the decoder and the output buffer for the encoder will contain compressed data of the type corresponding to MediaFormat's KEY_MIME. For video types, this is usually a single compressed video frame, and for audio data, this is typically an encoded audio segment, usually containing a few milliseconds of audio, depending on the format type.

Raw Audio Buffer: The raw audio buffer contains the entire frame of PCM audio data, which is a sample for each channel in channel order. Each PCM audio sample is a 16-bit signed integer or a floating-point number (in native byte order). If you want to use a raw audio buffer encoded with floating-point PCM, you need to configure it as follows:

mediaFormat.setInteger(MediaFormat.KEY_PCM_ENCODING, AudioFormat.ENCODING_PCM_FLOAT);

The method to check for floating-point PCM in MediaFormat is as follows:

 static boolean isPcmFloat(MediaFormat format) {
  return format.getInteger(MediaFormat.KEY_PCM_ENCODING, AudioFormat.ENCODING_PCM_16BIT)
      == AudioFormat.ENCODING_PCM_FLOAT;
 }

To extract a channel from a buffer containing 16-bit signed integer audio data, you can use the following code:

// Assumes the buffer PCM encoding is 16 bit.
short[] getSamplesForChannel(MediaCodec codec, int bufferId, int channelIx) {
  	ByteBuffer outputBuffer = codec.getOutputBuffer(bufferId);
  	MediaFormat format = codec.getOutputFormat(bufferId);
  	ShortBuffer samples = outputBuffer.order(ByteOrder.nativeOrder()).asShortBuffer();
  	int numChannels = format.getInteger(MediaFormat.KEY_CHANNEL_COUNT);
  	if (channelIx < 0 || channelIx >= numChannels) {
    	return null;
  	}
 	short[] res = new short[samples.remaining() / numChannels];
  	for (int i = 0; i < res.length; ++i) {
    	res[i] = samples.get(i * numChannels + channelIx);
  	}
  	return res;
}

Raw Video Buffer: In ByteBuffer mode, the layout of the video buffer is determined by the value set in the MediaFormat's KEY_COLOR_FORMAT. The supported color formats can be obtained through relevant methods of MediaCodecInfo. The video codec may support three color formats:

native raw video format: The native raw video format, marked by the COLOR_FormatSurface constant of CodecCapabilities, can be used with input or output Surface.
flexible YUV buffers: Flexible YUV buffers, corresponding to the color format of the COLOR_FormatYUV420Flexible constant in CodecCapabilities, can be used with getInput, OutputImage, etc., along with input, output Surface, and ByteBuffer mode.
other specific formats: Other specific formats are usually only supported in ByteBuffer mode. Some color formats are vendor-specific, while others are defined in CodecCapabilities.

Since Android 5.1, all video codecs support flexible YUV 4:2:0 buffers. The MediaFormat#KEY_WIDTH and MediaFormat#KEY_HEIGHT keys specify the size of the video frame. In most cases, the video only occupies part of the video frame, as represented below:

To obtain the cropping rectangle of the raw output image from the output format, the following keys need to be used. If these keys do not exist in the output format, the video will occupy the entire video frame. Before using any MediaFormat#KEY_ROTATION, that is, before setting the rotation, the size of the video frame can be calculated as follows:

 MediaFormat format = decoder.getOutputFormat(…);
 int width = format.getInteger(MediaFormat.KEY_WIDTH);
 if (format.containsKey("crop-left") && format.containsKey("crop-right")) {
    width = format.getInteger("crop-right") + 1 - format.getInteger("crop-left");
 }
 int height = format.getInteger(MediaFormat.KEY_HEIGHT);
 if (format.containsKey("crop-top") && format.containsKey("crop-bottom")) {
    height = format.getInteger("crop-bottom") + 1 - format.getInteger("crop-top");
 }

The process of encoding and decoding with MediaCodec#

MediaCodec first obtains an empty input buffer, fills it with the data to be encoded or decoded, and then sends the filled input buffer to MediaCodec for processing. After processing the data, it releases this filled input buffer and finally obtains the already encoded or decoded output buffer, releasing the output buffer after use. The schematic diagram of its encoding and decoding process is as follows:

The corresponding APIs for each stage are as follows:

// Get the index of the available input buffer
public int dequeueInputBuffer (long timeoutUs)
// Get the input buffer
public ByteBuffer getInputBuffer(int index)
// Submit the filled inputBuffer to the encoding queue
public final void queueInputBuffer(int index,int offset, int size, long presentationTimeUs, int flags)
// Get the index of the successfully encoded or decoded output buffer
public final int dequeueOutputBuffer(BufferInfo info, long timeoutUs)
// Get the output buffer
public ByteBuffer getOutputBuffer(int index)
// Release the output buffer
public final void releaseOutputBuffer(int index, boolean render)

The lifecycle of MediaCodec#

MediaCodec has three states: Executing, Stopped, and Released. The executing and stopped states each have three sub-states. The three sub-states of executing are Flushed, Running, and Stream-of-Stream, while the three sub-states of stopped are Uninitialized, Configured, and Error. The lifecycle diagram of MediaCodec is as follows:

Lifecycle in synchronous mode	Lifecycle in asynchronous mode
	)

As shown in the figure above, the transitions between the three states are triggered by start, stop, reset, release, etc. Depending on how MediaCodec processes data, its lifecycle may vary slightly. For example, in asynchronous mode, after start, it immediately enters the Running sub-state. If it is already in the Flushed sub-state, it must call start again to enter the Running sub-state. The key APIs corresponding to the transitions of each sub-state are as follows:

Stopped state

// Create MediaCodec to enter Uninitialized sub-state
public static MediaCodec createByCodecName (String name)
public static MediaCodec createEncoderByType (String type)
public static MediaCodec createDecoderByType (String type)
// Configure MediaCodec to enter Configured sub-state; crypto and descrambler will be explained later
public void configure(MediaFormat format, Surface surface, MediaCrypto crypto, int flags)
public void configure(MediaFormat format, @Nullable Surface surface,int flags, MediaDescrambler descrambler)
// Error
// Enter Error sub-state when encountering an error during encoding or decoding

Executing state

// Enter Flushed sub-state immediately after start
public final void start()
// Enter Running sub-state when the first input buffer is dequeued
public int dequeueInputBuffer (long timeoutUs)
// When the input buffer is queued with the end-of-stream flag, the codec will transition to End-of-Stream sub-state
// At this point, MediaCodec will not accept other input buffers but will generate output buffers
public void queueInputBuffer (int index, int offset, int size, long presentationTimeUs, int flags)

Released state

// Release MediaCodec to enter Released state after encoding or decoding is completed
public void release ()

Creating MediaCodec#

As mentioned earlier, when creating MediaCodec, it enters the Uninitialized sub-state. The creation methods are as follows:

// Create MediaCodec
public static MediaCodec createByCodecName (String name)
public static MediaCodec createEncoderByType (String type)
public static MediaCodec createDecoderByType (String type)

When using createByCodecName, you can leverage MediaCodecList to obtain supported codecs. Below is how to obtain an encoder for a specified MIME type:

/**
 * Query the encoder for the specified MIME type
 */
fun selectCodec(mimeType: String): MediaCodecInfo? {
    val mediaCodecList = MediaCodecList(MediaCodecList.REGULAR_CODECS)
    val codeInfos = mediaCodecList.codecInfos
    for (codeInfo in codeInfos) {
        if (!codeInfo.isEncoder) continue
        val types = codeInfo.supportedTypes
        for (type in types) {
            if (type.equals(mimeType, true)) {
                return codeInfo
            }
        }
    }
    return null
}

Of course, MediaCodecList also provides methods to obtain codecs as follows:

// Get the encoder for the specified format
public String findEncoderForFormat (MediaFormat format)
// Get the decoder for the specified format
public String findDecoderForFormat (MediaFormat format)

The parameters in the above methods' MediaFormat format cannot contain any frame rate settings. If the frame rate has been set, it needs to be cleared before use.

The above mentioned MediaCodecList, here is a brief explanation. Using MediaCodecList makes it easy to list all codecs supported by the current device. When creating MediaCodec, you need to choose a codec that supports the current format, meaning the selected codec must support the corresponding MediaFormat. Each codec is wrapped in a MediaCodecInfo object, which allows you to check some features of the encoder, such as whether it supports hardware acceleration, whether it is a software or hardware codec, etc. Commonly used methods are as follows:

// Is it software-only
public boolean isSoftwareOnly ()
// Is it provided by the Android platform (false) or by the vendor (true)
public boolean isVendor ()
// Does it support hardware acceleration
public boolean isHardwareAccelerated ()
// Is it an encoder or a decoder
public boolean isEncoder ()
// Get the supported types of the current codec
public String[] getSupportedTypes ()
// ...

Software and hardware decoding should be mastered in audio and video development. When using MediaCodec, it cannot be said that it is all hardware decoding. Whether to use hardware or software decoding depends on the encoder used. Generally, vendor-provided codecs are hardware decoders, such as Qualcomm (qcom), while system-provided codecs are software decoders, such as those with the Android label. Below are some codecs from my (MI 10 Pro) phone:

// Hardware decoders
OMX.qcom.video.encoder.heic
OMX.qcom.video.decoder.avc
OMX.qcom.video.decoder.avc.secure
OMX.qcom.video.decoder.mpeg2
OMX.google.gsm.decoder
OMX.qti.video.decoder.h263sw
c2.qti.avc.decoder
...
// Software decoders
c2.android.aac.decoder
c2.android.aac.decoder
c2.android.aac.encoder
c2.android.aac.encoder
c2.android.amrnb.decoder
c2.android.amrnb.decoder
...

Initializing MediaCodec#

After creating MediaCodec, it enters the Uninitialized sub-state. At this point, some settings need to be made, such as specifying MediaFormat. If using asynchronous data processing, the MediaCodec.Callback must be set before configure. The key APIs are as follows:

// 1. MediaFormat
// Create MediaFormat
public static final MediaFormat createVideoFormat(String mime,int width,int height)
// Enable or disable features, see MediaCodeInfo.CodecCapabilities for details
public void setFeatureEnabled(@NonNull String feature, boolean enabled)
// Parameter settings
public final void setInteger(String name, int value)

// 2. setCallback
// If using asynchronous data processing, set MediaCodec.Callback before configure
public void setCallback (MediaCodec.Callback cb)
public void setCallback (MediaCodec.Callback cb, Handler handler)

// 3. Configuration
public void configure(MediaFormat format, Surface surface, MediaCrypto crypto, int flags)
public void configure(MediaFormat format, @Nullable Surface surface,int flags, MediaDescrambler descrambler)

The configure method mentioned above involves several parameters, where surface indicates the Surface that the decoder will render to, and flags specify whether the current codec is used as an encoder or decoder. Crypto and descrambler are related to decryption. For example, certain VIP videos require specific keys for decoding, and only after user login verification will the video content be decrypted. Otherwise, some videos that require payment to watch can be freely distributed after downloading. More details can be found in digital rights technologies in audio and video.

In addition, certain specific formats such as AAC audio and MPEG4, H.264, H.265 video formats contain specific data for initializing MediaCodec. When decoding these compressed formats, this specific data must be submitted to MediaCodec after start and before processing any frame data, that is, when calling queueInputBuffer, use the BUFFER_FLAG_CODEC_CONFIG flag to mark this type of data. This specific data can also be configured using ByteBuffer in MediaFormat as follows:

// csd-0, csd-1, csd-2 similarly
val bytes = byteArrayOf(0x00.toByte(), 0x01.toByte())
mediaFormat.setByteBuffer("csd-0", ByteBuffer.wrap(bytes))

The keys such as csd-0, csd-1 can be obtained from the MediaFormat retrieved from MediaExtractor#getTrackFormat. This specific data will be automatically submitted to MediaCodec during start, and there is no need to submit this data directly. If flush is called before output buffers or format changes, the submitted specific data will be lost, and the BUFFER_FLAG_CODEC_CONFIG flag must be used when calling queueInputBuffer for this type of data.

Android uses the following codec-specific data buffers. To correctly configure the MediaMuxer track, they also need to be set as track formats. Each parameter set and the codec-specific data section marked with (*) must start with the " \ x00 \ x00 \ x00 \ x01" start code, as referenced below:

When the encoder receives this information, it will also output the output buffer marked with BUFFER_FLAG_CODEC_CONFIG. At this point, this data is specific data, not media data.

Data processing methods of MediaCodec#

Each created codec maintains a set of input buffers, and there are two ways to process data: synchronous and asynchronous. Depending on the API version, there are differences. Starting from API 21, that is, from Android 5.0, it is recommended to use the ByteBuffer method for data processing. Before that, only the ByteBuffer array method could be used for data processing, as follows:

MediaCodec, that is, the codec's data processing, mainly involves obtaining input and output buffers, submitting data to the codec, and releasing output buffers. The key APIs for the differences between synchronous and asynchronous modes are as follows:

// Get input buffer (synchronous)
public int dequeueInputBuffer (long timeoutUs)
public ByteBuffer getInputBuffer (int index)
// Get output buffer (synchronous)
public int dequeueOutputBuffer (MediaCodec.BufferInfo info, long timeoutUs)
public ByteBuffer getOutputBuffer (int index)
// Input and output buffer indices are obtained from MediaCodec.Callback's callback when obtaining the corresponding input and output buffers (asynchronous)
public void setCallback (MediaCodec.Callback cb)
public void setCallback (MediaCodec.Callback cb, Handler handler)
// Submit data
public void queueInputBuffer (int index, int offset, int size, long presentationTimeUs, int flags)
public void queueSecureInputBuffer (int index, int offset, MediaCodec.CryptoInfo info, long presentationTimeUs, int flags)
// Release output buffer
public void releaseOutputBuffer (int index, boolean render)
public void releaseOutputBuffer (int index, long renderTimestampNs)

Below is a description of the ByteBuffer method applicable after Android 5.0.

Starting from Android 5.0, the ByteBuffer array method has been deprecated. The official website mentions that the ByteBuffer method has been optimized compared to the ByteBuffer array method. Therefore, when the device meets the conditions, it is recommended to use the corresponding API of ByteBuffer and to use asynchronous mode for data processing. The code references for synchronous and asynchronous processing modes are as follows:

Synchronous processing mode

MediaCodec codec = MediaCodec.createByCodecName(name);
 codec.configure(format, …);
 MediaFormat outputFormat = codec.getOutputFormat(); // option B
 codec.start();
 for (;;) {
  int inputBufferId = codec.dequeueInputBuffer(timeoutUs);
  if (inputBufferId >= 0) {
    ByteBuffer inputBuffer = codec.getInputBuffer(…);
    // Fill the input buffer with valid data
    …
    codec.queueInputBuffer(inputBufferId, …);
  }
  int outputBufferId = codec.dequeueOutputBuffer(…);
  if (outputBufferId >= 0) {
    ByteBuffer outputBuffer = codec.getOutputBuffer(outputBufferId);
    MediaFormat bufferFormat = codec.getOutputFormat(outputBufferId); // option A
    // bufferFormat is the same as outputFormat
    // Process or render the output buffer after it is ready
    …
    codec.releaseOutputBuffer(outputBufferId, …);
  } else if (outputBufferId == MediaCodec.INFO_OUTPUT_FORMAT_CHANGED) {
    // Output format changed, subsequent data will use the new format, at this time use getOutputFormat() to get the new format
    // If using getOutputFormat(outputBufferId) to get the format of a specific buffer, there is no need to listen for format changes
    outputFormat = codec.getOutputFormat(); // option B
  }
 }
 codec.stop();
 codec.release();

For specific examples, refer to the previous article: Camera2, MediaCodec recording mp4.

Asynchronous processing mode

MediaCodec codec = MediaCodec.createByCodecName(name);
 MediaFormat mOutputFormat; // member variable
 codec.setCallback(new MediaCodec.Callback() {
  @Override
  void onInputBufferAvailable(MediaCodec mc, int inputBufferId) {
    ByteBuffer inputBuffer = codec.getInputBuffer(inputBufferId);
    // Fill inputBuffer with valid data
    …
    codec.queueInputBuffer(inputBufferId, …);
  }
 
  @Override
  void onOutputBufferAvailable(MediaCodec mc, int outputBufferId, …) {
    ByteBuffer outputBuffer = codec.getOutputBuffer(outputBufferId);
    MediaFormat bufferFormat = codec.getOutputFormat(outputBufferId); // option A
    // bufferFormat is equivalent to mOutputFormat
    // outputBuffer is ready to be processed or rendered.
    …
    codec.releaseOutputBuffer(outputBufferId, …);
  }
 
  @Override
  void onOutputFormatChanged(MediaCodec mc, MediaFormat format) {
    // Subsequent data will conform to new format.
    // Can ignore if using getOutputFormat(outputBufferId)
    mOutputFormat = format; // option B
  }
 
  @Override
  void onError(…) {
    …
  }
 });
 codec.configure(format, …);
 mOutputFormat = codec.getOutputFormat(); // option B
 codec.start();
 // wait for processing to complete
 codec.stop();
 codec.release();

When the data to be processed ends (End-of-stream), the end of the stream needs to be marked. You can specify the flags as BUFFER_FLAG_END_OF_STREAM when submitting data on the last valid input buffer to mark its end, or you can submit an empty input buffer with the end-of-stream flag set after the last valid input buffer to mark its end. At this point, no more input buffers can be submitted unless the codec is flushed, stopped, or restarted. The output buffers will continue to return until the end of the stream is ultimately notified through dequeueOutputBuffer or through the BufferInfo specified in Callback#onOutputBufferAvailable with the same end-of-stream flag.

If an input Surface is used as the codec's input, there are no accessible input buffers, and the input buffers will be automatically submitted to the codec from this Surface, effectively omitting this input process. This input Surface can be created using the createInputSurface method. At this point, calling signalEndOfInputStream will send the end-of-stream signal, and after calling, the input surface will immediately stop submitting data to the codec. The key APIs are as follows:

// Create input Surface, must be called after configure and before start
public Surface createInputSurface ()
// Set input Surface
public void setInputSurface (Surface surface)
// Send end-of-stream signal
public void signalEndOfInputStream ()

Similarly, if an output Surface is used, the related functionalities of the output buffers will be replaced. You can set a Surface as the codec's output using setOutputSurface, and choose whether to render each output buffer on the output Surface. The key APIs are as follows:

// Set output Surface
public void setOutputSurface (Surface surface)
// false means do not render this buffer, true means render this buffer with the default timestamp
public void releaseOutputBuffer (int index, boolean render)
// Render this buffer with the specified timestamp
public void releaseOutputBuffer (int index, long renderTimestampNs)

Adaptive playback support#

When MediaCodec is used as a video decoder, you can check whether the decoder supports adaptive playback, that is, whether the decoder supports seamless resolution changes, as follows:

// Check if a feature is supported, CodecCapabilities#FEATURE_AdaptivePlayback corresponds to adaptive playback support
public boolean isFeatureSupported (String name)

At this point, the adaptive playback feature will only be activated when the decoder is configured to decode on a Surface. During video decoding, after calling start or flush, only key frames (key-frame) can be independently decoded, which is usually referred to as I-frames. Other frames are decoded based on this. The key frames corresponding to different formats are as follows:

Different decoders have different capabilities for supporting adaptive playback, and the processing after seek operations is also different. This part will be organized after further practical experience.

Exception handling of MediaCodec#

Regarding exception handling during the use of MediaCodec, here is a mention of the CodecException exception, which is generally caused by internal exceptions of the codec, such as media content corruption, hardware failure, resource exhaustion, etc. You can determine this for further processing as follows:

// true means it can be recovered through stop, configure, start
public boolean isRecoverable ()
// true means a temporary issue, encoding or decoding operations will be retried later
public boolean isTransient ()

If both isRecoverable and isTransient return false, resources need to be released through reset or release operations before reworking, and both cannot return true simultaneously.