This article introduces the use of AudioRecord
in Android
audio and video development. The case will be based on the previous MediaCodec
recording of MP4
, using AudioRecord
to synthesize audio data into MP4
. The series of articles on Android
audio and video are as follows:
The main content of this article is as follows:
- Introduction to AudioRecord
- Lifecycle of AudioRecord
- Reading audio data from AudioRecord
- Direct buffer and byte order (optional)
- Using AudioRecord
Introduction to AudioRecord#
AudioRecord
is an audio recording tool used in Android to capture audio from hardware devices by obtaining audio data through a pulling
method. It is generally used to obtain raw audio data in PCM
format and can achieve real-time processing of audio data while recording.
The parameters and descriptions for creating AudioRecord
are as follows:
// Create AudioRecord
public AudioRecord (int audioSource,
int sampleRateInHz,
int channelConfig,
int audioFormat,
int bufferSizeInBytes)
- audioSource: Represents the audio source, defined in
MediaRecorder.AudioSource
, such as the common audio source main microphoneMediaRecorder.AudioSource.MIC
, etc. - sampleRateInHz: Represents the sampling rate in hertz, meaning the number of samples per second for each channel. Among common sampling rates, only 44100Hz can ensure normal use on all devices. The actual sampling rate can be obtained through
getSampleRate
. This sampling rate is not the playback sampling rate of the audio content; for example, audio with a sampling rate of 8000Hz can be played on a device with a sampling rate of 48000Hz, and the corresponding platform will automatically handle the sampling rate conversion, so it will not play at 6 times the speed. - channelConfig: Represents the number of channels, defined in
AudioFormat
. Among common channels, only monoAudioFormat.CHANNEL_IN_MONO
can ensure normal use on all devices. Others, such asAudioFormat.CHANNEL_IN_STEREO
, represent dual channels, i.e., stereo. - audioFormat: Represents the format of the audio data returned by
AudioRecord
. For linearPCM
, it reflects the size of each sample (8, 16, 32 bits) and the representation (integer, floating-point). Audio formats are defined inAudioFormat
, and among common audio data formats, onlyAudioFormat.ENCODING_PCM_16BIT
can ensure normal use on all devices. Formats likeAudioFormat.ENCODING_PCM_8BIT
cannot guarantee normal use on all devices. - bufferSizeInBytes: Represents the size of the buffer for writing audio data. This value cannot be less than the size of
getMinBufferSize
, which is the minimum buffer size required byAudioRecord
. Otherwise, it will lead toAudioRecord
initialization failure. This buffer size does not guarantee smooth recording under load; a larger value can be chosen if necessary.
Lifecycle of AudioRecord#
The lifecycle states of AudioRecord
include STATE_UNINITIALIZED
, STATE_INITIALIZED
, RECORDSTATE_RECORDING
, and RECORDSTATE_STOPPED
, corresponding to uninitialized, initialized, recording, and stopped recording, as shown in the diagram below:
A brief explanation:
- Before creation or after
release
,AudioRecord
enters theSTATE_UNINITIALIZED
state. - When creating
AudioRecord
, it enters theSTATE_INITIALIZED
state. - Calling
startRecording
enters theRECORDSTATE_RECORDING
state. - Calling
stop
enters theRECORDSTATE_STOPPED
state.
So how to obtain the state of AudioRecord
? You can get its state through getState
and getRecordingState
. To ensure correct usage, you can check its state before operating on the AudioRecord
object.
Reading audio data from AudioRecord#
AudioRecord
provides three ways to read audio data, as follows:
// 1. Read audio data, audio format is AudioFormat#ENCODING_PCM_8BIT
int read(@NonNull byte[] audioData, int offsetInBytes, int sizeInBytes)
// 2. Read audio data, audio format is AudioFormat#ENCODING_PCM_16BIT
int read(@NonNull short[] audioData, int offsetInShorts, int sizeInShorts)
// 3. Read audio data, see later chapters
int read(@NonNull ByteBuffer audioBuffer, int sizeInBytes)
The return value of reading audio data is greater than or equal to 0. Common exceptions when reading audio data are as follows:
- ERROR_INVALID_OPERATION: Indicates that
AudioRecord
is uninitialized. - ERROR_BAD_VALUE: Indicates that the parameter is invalid.
- ERROR_DEAD_OBJECT: Indicates that no error code is returned when some audio data has been transmitted, and the error code will be returned at the next
read
.
The above three read
functions read audio data from hardware audio devices. The main difference between the first two is the audio format, which is 8-bit and 16-bit, corresponding to quantization levels of 2^8 and 2^16.
The third read
function records audio data in a direct buffer (DirectBuffer
). If this buffer is not a DirectBuffer
, it will continuously return 0. Therefore, when using the third read
function, the parameter audioBuffer
must be a DirectBuffer
; otherwise, it cannot correctly read audio data. At this time, the position
of the Buffer
will remain unchanged, and the audio format of the data in the buffer depends on the format specified in AudioRecord
, with the byte storage method being in native byte order.
Direct buffer and byte order#
The two concepts of direct buffer and byte order have been mentioned above; here is a brief explanation:
Direct buffer#
DirectBuffer
is part of NIO. Here is a brief look at some differences between a normal buffer and a direct buffer.
- Normal buffer
ByteBuffer buf = ByteBuffer.allocate(1024);
public static ByteBuffer allocate(int capacity) {
if (capacity < 0)
throw new IllegalArgumentException();
return new HeapByteBuffer(capacity, capacity);
}
It can be seen that a normal buffer allocates a byte buffer from the heap, which is managed by the JVM, meaning it can be garbage collected at appropriate times. GC collection involves memory compaction, which can affect performance to some extent.
- Direct buffer
ByteBuffer buf = ByteBuffer.allocateDirect(1024);
public static ByteBuffer allocateDirect(int capacity) {
// Android-changed: Android's DirectByteBuffers carry a MemoryRef.
// return new DirectByteBuffer(capacity);
DirectByteBuffer.MemoryRef memoryRef = new DirectByteBuffer.MemoryRef(capacity);
return new DirectByteBuffer(capacity, memoryRef);
}
The above is the implementation of DirectBuffer
in Android, which allocates from memory. The cost of obtaining and releasing this type of buffer is significant, but it can reside outside the garbage collection heap. It is generally allocated for large, long-lived buffers. Allocating this buffer can bring significant performance improvements, and whether it is a DirectBuffer
can be determined by isDirect
.
Byte order#
Byte order refers to the way bytes are stored in memory. Byte order is mainly divided into two types: BIG-ENDIAN and LITTLE-ENDIAN, commonly referred to as network byte order and native byte order, as follows:
- Native byte order, i.e., LITTLE-ENDIAN (little-endian, low-byte order), means that low-order bytes are placed at the lower address end of memory, and high-order bytes are placed at the higher address end of memory, corresponding to network byte order.
- Network byte order generally refers to the byte order used in the TCP/IP protocol, as the TCP/IP protocol defines the byte order as BIG-ENDIAN, so network byte order generally refers to BIG-ENDIAN.
Using AudioRecord#
In the previous article Camera2, MediaCodec recording mp4, only video was recorded, focusing on the use of MediaCodec
. Here, based on video recording, AudioRecord
will be used to add audio recording and synthesize it into the MP4
file. The key steps are as follows:
- Start a thread to use
AudioRecord
to read audio data from hardware. Starting a thread can avoid lag. The code example is also provided at the end of the article, seeAudioEncode2
, as follows:
/**
* Audio reading Runnable
*/
class RecordRunnable : Runnable{
override fun run() {
val byteArray = ByteArray(bufferSize)
// Recording state -1 indicates default state, 1 indicates recording state, 0 indicates stop recording
while (recording == 1){
val result = mAudioRecord.read(byteArray, 0, bufferSize)
if (result > 0){
val resultArray = ByteArray(result)
System.arraycopy(byteArray, 0, resultArray, 0, result)
quene.offer(resultArray)
}
}
// Custom end of stream data
if (recording == 0){
val stopArray = byteArrayOf((-100).toByte())
quene.offer(stopArray)
}
}
}
Here, if only using AudioRecord
to record audio data, when audio data is read, it can be written to a file.
- To synthesize the read audio data into
MP4
, it needs to be encoded first. The audio data encoder configuration is as follows:
// Audio data encoder configuration
private fun initAudioCodec() {
L.i(TAG, "init Codec start")
try {
val mediaFormat =
MediaFormat.createAudioFormat(
MediaFormat.MIMETYPE_AUDIO_AAC,
RecordConfig.SAMPLE_RATE,
2
)
mAudioCodec = MediaCodec.createEncoderByType(MediaFormat.MIMETYPE_AUDIO_AAC)
mediaFormat.setInteger(MediaFormat.KEY_BIT_RATE, 96000)
mediaFormat.setInteger(
MediaFormat.KEY_AAC_PROFILE,
MediaCodecInfo.CodecProfileLevel.AACObjectLC
)
mediaFormat.setInteger(MediaFormat.KEY_MAX_INPUT_SIZE, 8192)
mAudioCodec.setCallback(this)
mAudioCodec.configure(mediaFormat, null, null, MediaCodec.CONFIGURE_FLAG_ENCODE)
} catch (e: Exception) {
L.i(TAG, "init error:${e.message}")
}
L.i(TAG, "init Codec end")
}
For encoding, you can refer to the previous two articles:
Here, the asynchronous processing mode of MediaCodec
is used for audio data encoding. The code will not be pasted here, but it is important to check conditions when filling and releasing Buffer
. If InputBuffer
is not released continuously, it will lead to no available InputBuffer
, causing audio encoding failure, and also handle the end of the stream.
- File synthesis uses
MediaMuxer
. Before startingMediaMuxer
, ensure that both video and audio tracks are added.
override fun onOutputFormatChanged(codec: MediaCodec, format: MediaFormat) {
L.i(TAG, "onOutputFormatChanged format:${format}")
// Add audio track
addAudioTrack(format)
// Start MediaMuxer only if both audio and video tracks are added
if (RecordConfig.videoTrackIndex != -1) {
mAudioMuxer.start()
RecordConfig.isMuxerStart = true
L.i(TAG, "onOutputFormatChanged isMuxerStart:${RecordConfig.isMuxerStart}")
}
}
// Add audio track
private fun addAudioTrack(format: MediaFormat) {
L.i(TAG, "addAudioTrack format:${format}")
RecordConfig.audioTrackIndex = mAudioMuxer.addTrack(format)
RecordConfig.isAddAudioTrack = true
}
// ...
The usage of AudioRecord
is basically as described above.