AudioRecord collects audio data and synthesis

This article introduces the use of AudioRecord in Android audio and video development. The case will be based on the previous MediaCodec recording of MP4, using AudioRecord to synthesize audio data into MP4. The series of articles on Android audio and video are as follows:

The main content of this article is as follows:

Introduction to AudioRecord
Lifecycle of AudioRecord
Reading audio data from AudioRecord
Direct buffer and byte order (optional)
Using AudioRecord

Introduction to AudioRecord#

AudioRecord is an audio recording tool used in Android to capture audio from hardware devices by obtaining audio data through a pulling method. It is generally used to obtain raw audio data in PCM format and can achieve real-time processing of audio data while recording.

The parameters and descriptions for creating AudioRecord are as follows:

// Create AudioRecord
public AudioRecord (int audioSource, 
                int sampleRateInHz, 
                int channelConfig, 
                int audioFormat, 
                int bufferSizeInBytes)

audioSource: Represents the audio source, defined in MediaRecorder.AudioSource, such as the common audio source main microphone MediaRecorder.AudioSource.MIC, etc.
sampleRateInHz: Represents the sampling rate in hertz, meaning the number of samples per second for each channel. Among common sampling rates, only 44100Hz can ensure normal use on all devices. The actual sampling rate can be obtained through getSampleRate. This sampling rate is not the playback sampling rate of the audio content; for example, audio with a sampling rate of 8000Hz can be played on a device with a sampling rate of 48000Hz, and the corresponding platform will automatically handle the sampling rate conversion, so it will not play at 6 times the speed.
channelConfig: Represents the number of channels, defined in AudioFormat. Among common channels, only mono AudioFormat.CHANNEL_IN_MONO can ensure normal use on all devices. Others, such as AudioFormat.CHANNEL_IN_STEREO, represent dual channels, i.e., stereo.
audioFormat: Represents the format of the audio data returned by AudioRecord. For linear PCM, it reflects the size of each sample (8, 16, 32 bits) and the representation (integer, floating-point). Audio formats are defined in AudioFormat, and among common audio data formats, only AudioFormat.ENCODING_PCM_16BIT can ensure normal use on all devices. Formats like AudioFormat.ENCODING_PCM_8BIT cannot guarantee normal use on all devices.
bufferSizeInBytes: Represents the size of the buffer for writing audio data. This value cannot be less than the size of getMinBufferSize, which is the minimum buffer size required by AudioRecord. Otherwise, it will lead to AudioRecord initialization failure. This buffer size does not guarantee smooth recording under load; a larger value can be chosen if necessary.

Lifecycle of AudioRecord#

The lifecycle states of AudioRecord include STATE_UNINITIALIZED, STATE_INITIALIZED, RECORDSTATE_RECORDING, and RECORDSTATE_STOPPED, corresponding to uninitialized, initialized, recording, and stopped recording, as shown in the diagram below:

Mermaid Loading...

A brief explanation:

Before creation or after release, AudioRecord enters the STATE_UNINITIALIZED state.
When creating AudioRecord, it enters the STATE_INITIALIZED state.
Calling startRecording enters the RECORDSTATE_RECORDING state.
Calling stop enters the RECORDSTATE_STOPPED state.

So how to obtain the state of AudioRecord? You can get its state through getState and getRecordingState. To ensure correct usage, you can check its state before operating on the AudioRecord object.

Reading audio data from AudioRecord#

AudioRecord provides three ways to read audio data, as follows:

// 1. Read audio data, audio format is AudioFormat＃ENCODING_PCM_8BIT
int read(@NonNull byte[] audioData, int offsetInBytes, int sizeInBytes)
// 2. Read audio data, audio format is AudioFormat＃ENCODING_PCM_16BIT
int read(@NonNull short[] audioData, int offsetInShorts, int sizeInShorts)
// 3. Read audio data, see later chapters
int read(@NonNull ByteBuffer audioBuffer, int sizeInBytes)

The return value of reading audio data is greater than or equal to 0. Common exceptions when reading audio data are as follows:

ERROR_INVALID_OPERATION: Indicates that AudioRecord is uninitialized.
ERROR_BAD_VALUE: Indicates that the parameter is invalid.
ERROR_DEAD_OBJECT: Indicates that no error code is returned when some audio data has been transmitted, and the error code will be returned at the next read.

The above three read functions read audio data from hardware audio devices. The main difference between the first two is the audio format, which is 8-bit and 16-bit, corresponding to quantization levels of 2^8 and 2^16.

The third read function records audio data in a direct buffer (DirectBuffer). If this buffer is not a DirectBuffer, it will continuously return 0. Therefore, when using the third read function, the parameter audioBuffer must be a DirectBuffer; otherwise, it cannot correctly read audio data. At this time, the position of the Buffer will remain unchanged, and the audio format of the data in the buffer depends on the format specified in AudioRecord, with the byte storage method being in native byte order.

Direct buffer and byte order#

The two concepts of direct buffer and byte order have been mentioned above; here is a brief explanation:

Direct buffer#

DirectBuffer is part of NIO. Here is a brief look at some differences between a normal buffer and a direct buffer.

Normal buffer

ByteBuffer buf = ByteBuffer.allocate(1024);
public static ByteBuffer allocate(int capacity) {
    if (capacity < 0)
        throw new IllegalArgumentException();
    return new HeapByteBuffer(capacity, capacity);
}

It can be seen that a normal buffer allocates a byte buffer from the heap, which is managed by the JVM, meaning it can be garbage collected at appropriate times. GC collection involves memory compaction, which can affect performance to some extent.

Direct buffer

ByteBuffer buf = ByteBuffer.allocateDirect(1024);
public static ByteBuffer allocateDirect(int capacity) {
    // Android-changed: Android's DirectByteBuffers carry a MemoryRef.
    // return new DirectByteBuffer(capacity);
    DirectByteBuffer.MemoryRef memoryRef = new DirectByteBuffer.MemoryRef(capacity);
    return new DirectByteBuffer(capacity, memoryRef);
}

The above is the implementation of DirectBuffer in Android, which allocates from memory. The cost of obtaining and releasing this type of buffer is significant, but it can reside outside the garbage collection heap. It is generally allocated for large, long-lived buffers. Allocating this buffer can bring significant performance improvements, and whether it is a DirectBuffer can be determined by isDirect.

Byte order#

Byte order refers to the way bytes are stored in memory. Byte order is mainly divided into two types: BIG-ENDIAN and LITTLE-ENDIAN, commonly referred to as network byte order and native byte order, as follows:

Native byte order, i.e., LITTLE-ENDIAN (little-endian, low-byte order), means that low-order bytes are placed at the lower address end of memory, and high-order bytes are placed at the higher address end of memory, corresponding to network byte order.
Network byte order generally refers to the byte order used in the TCP/IP protocol, as the TCP/IP protocol defines the byte order as BIG-ENDIAN, so network byte order generally refers to BIG-ENDIAN.

Using AudioRecord#

In the previous article Camera2, MediaCodec recording mp4, only video was recorded, focusing on the use of MediaCodec. Here, based on video recording, AudioRecord will be used to add audio recording and synthesize it into the MP4 file. The key steps are as follows:

Start a thread to use AudioRecord to read audio data from hardware. Starting a thread can avoid lag. The code example is also provided at the end of the article, see AudioEncode2, as follows:

/**
 * Audio reading Runnable
 */
class RecordRunnable : Runnable{
    override fun run() {
        val byteArray = ByteArray(bufferSize)
        // Recording state -1 indicates default state, 1 indicates recording state, 0 indicates stop recording
        while (recording == 1){
            val result = mAudioRecord.read(byteArray, 0, bufferSize)
            if (result > 0){
                val resultArray = ByteArray(result)
                System.arraycopy(byteArray, 0, resultArray, 0, result)
                quene.offer(resultArray)
            }
        }
        // Custom end of stream data
        if (recording == 0){
            val stopArray = byteArrayOf((-100).toByte())
            quene.offer(stopArray)
        }
    }
}

Here, if only using AudioRecord to record audio data, when audio data is read, it can be written to a file.

To synthesize the read audio data into MP4, it needs to be encoded first. The audio data encoder configuration is as follows:

// Audio data encoder configuration
private fun initAudioCodec() {
    L.i(TAG, "init Codec start")
    try {
        val mediaFormat =
            MediaFormat.createAudioFormat(
                MediaFormat.MIMETYPE_AUDIO_AAC,
                RecordConfig.SAMPLE_RATE,
                2
            )
        mAudioCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_AUDIO_AAC)
        mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, 96000)
        mediaFormat.setInteger(
            MediaFormat.KEY_AAC_PROFILE,
            MediaCodecInfo.CodecProfileLevel.AACObjectLC
        )
        mediaFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, 8192)
        mAudioCodec.setCallback(this)
        mAudioCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
    } catch (e: Exception) {
        L.i(TAG, "init error:${e.message}")
    }
    L.i(TAG, "init Codec end")
}

For encoding, you can refer to the previous two articles:

Here, the asynchronous processing mode of MediaCodec is used for audio data encoding. The code will not be pasted here, but it is important to check conditions when filling and releasing Buffer. If InputBuffer is not released continuously, it will lead to no available InputBuffer, causing audio encoding failure, and also handle the end of the stream.

File synthesis uses MediaMuxer. Before starting MediaMuxer, ensure that both video and audio tracks are added.

override fun onOutputFormatChanged(codec: MediaCodec, format: MediaFormat) {
    L.i(TAG, "onOutputFormatChanged format:${format}")
    // Add audio track
    addAudioTrack(format)
    // Start MediaMuxer only if both audio and video tracks are added
    if (RecordConfig.videoTrackIndex != -1) {
        mAudioMuxer.start()
        RecordConfig.isMuxerStart = true
        L.i(TAG, "onOutputFormatChanged isMuxerStart:${RecordConfig.isMuxerStart}")
    }
}
// Add audio track
private fun addAudioTrack(format: MediaFormat) {
    L.i(TAG, "addAudioTrack format:${format}")
    RecordConfig.audioTrackIndex = mAudioMuxer.addTrack(format)
    RecordConfig.isAddAudioTrack = true
}
// ...

The usage of AudioRecord is basically as described above.