The first two articles mainly covered the basic usage of OpenGL ES and its coordinate system mapping, as follows:
Next, we will use MediaPlayer
and OpenGL ES to achieve basic video rendering and video frame correction, with the main content as follows:
- SurfaceTexture
- Rendering Video
- Frame Correction
SurfaceTexture#
SurfaceTexture
was introduced in Android 3.0. It does not directly display image streams but captures frames from the image stream as external textures for OpenGL. The image stream mainly comes from camera previews and video decoding, allowing for secondary processing of the image stream, such as filters and effects. You can understand SurfaceTexture
as a combination of Surface
and OpenGL ES textures.
The Surface
created by SurfaceTexture
is the data producer, while SurfaceTexture
is the corresponding consumer. The Surface
receives media data and sends it to SurfaceTexture
. When updateTexImage
is called, the content of the texture object created by SurfaceTexture
will be updated to the latest image frame, converting the image frame into a GL texture and binding that texture to the GL_TEXTURE_EXTERNAL_OES
texture object. updateTexImage
is only called in the OpenGL ES context thread, typically invoked in onDrawFrame
.
Rendering Video#
Everyone should be very familiar with how MediaPlayer
plays videos, so I won't elaborate here. With the introduction of SurfaceTexture
in the previous section, implementing video rendering using OpenGL ES is very simple. Define the vertex coordinates and texture coordinates as follows:
// Vertex coordinates
private val vertexCoordinates = floatArrayOf(
1.0f, 1.0f,
-1.0f, 1.0f,
-1.0f, -1.0f,
1.0f, -1.0f
)
// Texture coordinates
private val textureCoordinates = floatArrayOf(
1.0f, 0.0f,
0.0f, 0.0f,
0.0f, 1.0f,
1.0f, 1.0f
)
Texture coordinates must correspond to vertex coordinates. To briefly mention, vertex coordinates use the OpenGL coordinate system, with the origin in the center of the screen, while texture coordinates correspond to the coordinates on the screen, with the origin in the upper left corner. Generate the texture ID and activate the binding as follows:
/**
* Generate texture ID
*/
fun createTextureId(): Int {
val tex = IntArray(1)
GLES20.glGenTextures(1, tex, 0)
if (tex[0] == 0) {
throw RuntimeException("create OES texture failed, ${Thread.currentThread().name}")
}
return tex[0]
}
/**
* Create OES texture
* Automatic conversion from YUV format to RGB
*/
fun activeBindOESTexture(textureId: Int) {
// Activate texture unit
GLES20.glActiveTexture(GLES20.GL_TEXTURE0)
// Bind texture ID to the texture target of the texture unit
GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textureId)
// Set texture parameters
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MIN_FILTER, GL10.GL_NEAREST.toFloat())
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MAG_FILTER, GL10.GL_LINEAR.toFloat())
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_S, GL10.GL_CLAMP_TO_EDGE.toFloat())
GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_T, GL10.GL_CLAMP_TO_EDGE.toFloat())
Log.d(TAG, "activeBindOESTexture: texture id $textureId")
}
Bind the texture ID to the texture target of the texture unit. The chosen texture target here is GL_TEXTURE_EXTERNAL_OES
, which can automatically complete the conversion from YUV format to RGB. Now let's look at the shaders, where the vertex shader receives texture coordinates and saves them to vTextureCoordinate
for use by the fragment shader, as follows:
// Vertex shader
attribute vec4 aPosition; // Vertex coordinates
attribute vec2 aCoordinate; // Texture coordinates
varying vec2 vTextureCoordinate;
void main() {
gl_Position = aPosition;
vTextureCoordinate = aCoordinate;
}
// Fragment shader
#extension GL_OES_EGL_image_external : require
precision mediump float;
varying vec2 vTextureCoordinate;
uniform samplerExternalOES uTexture; // OES texture
void main() {
gl_FragColor = texture2D(uTexture, vTextureCoordinate);
}
The code for Shader compilation, Program linking, and usage is omitted here, as it has been introduced in previous articles, or you can directly check the source code at the end. The renderer is defined as follows:
class PlayRenderer(
private var context: Context,
private var glSurfaceView: GLSurfaceView
) : GLSurfaceView.Renderer,
VideoRender.OnNotifyFrameUpdateListener, MediaPlayer.OnPreparedListener,
MediaPlayer.OnVideoSizeChangedListener, MediaPlayer.OnCompletionListener,
MediaPlayer.OnErrorListener {
companion object {
private const val TAG = "PlayRenderer"
}
private lateinit var videoRender: VideoRender
private lateinit var mediaPlayer: MediaPlayer
private val projectionMatrix = FloatArray(16)
private val viewMatrix = FloatArray(16)
private val vPMatrix = FloatArray(16)
// Used for video aspect ratio calculation, see below
private var screenWidth: Int = -1
private var screenHeight: Int = -1
private var videoWidth: Int = -1
private var videoHeight: Int = -1
override fun onSurfaceCreated(gl: GL10?, config: EGLConfig?) {
L.i(TAG, "onSurfaceCreated")
GLES20.glClearColor(0f, 0f, 0f, 0f)
videoRender = VideoRender(context)
videoRender.setTextureID(TextureHelper.createTextureId())
videoRender.onNotifyFrameUpdateListener = this
initMediaPlayer()
}
override fun onSurfaceChanged(gl: GL10?, width: Int, height: Int) {
L.i(TAG, "onSurfaceChanged > width:$width,height:$height")
screenWidth = width
screenHeight = height
GLES20.glViewport(0, 0, width, height)
}
override fun onDrawFrame(gl: GL10) {
L.i(TAG, "onDrawFrame")
gl.glClear(GL10.GL_COLOR_BUFFER_BIT or GL10.GL_DEPTH_BUFFER_BIT)
videoRender.draw(vPMatrix)
}
override fun onPrepared(mp: MediaPlayer?) {
L.i(OpenGLActivity.TAG, "onPrepared")
mediaPlayer.start()
}
override fun onVideoSizeChanged(mp: MediaPlayer?, width: Int, height: Int) {
L.i(OpenGLActivity.TAG, "onVideoSizeChanged > width:$width ,height:$height")
this.videoWidth = width
this.videoHeight = height
}
override fun onCompletion(mp: MediaPlayer?) {
L.i(OpenGLActivity.TAG, "onCompletion")
}
override fun onError(mp: MediaPlayer?, what: Int, extra: Int): Boolean {
L.i(OpenGLActivity.TAG, "error > what:$what,extra:$extra")
return true
}
private fun initMediaPlayer() {
mediaPlayer = MediaPlayer()
mediaPlayer.setOnPreparedListener(this)
mediaPlayer.setOnVideoSizeChangedListener(this)
mediaPlayer.setOnCompletionListener(this)
mediaPlayer.setOnErrorListener(this)
mediaPlayer.setDataSource(Environment.getExternalStorageDirectory().absolutePath + "/video.mp4")
mediaPlayer.setSurface(videoRender.getSurface())
mediaPlayer.prepareAsync()
}
// Notify request to render
override fun onNotifyUpdate() {
glSurfaceView.requestRender()
}
fun destroy() {
mediaPlayer.stop()
mediaPlayer.release()
}
}
The VideoRender
in the above code mainly handles rendering operations, which are quite similar to the code in the previous article, so I won't include it here.
When using OpenGL ES for video rendering, you need to call the updateTexImage
method of SurfaceTexture
to update the image frame. This method must be used in the OpenGL ES context. You can set the rendering mode of GLSurfaceView
to RENDERMODE_WHEN_DIRTY
to avoid continuous drawing. When onFrameAvailable
is called, meaning there is available data, then call requestRender
to reduce unnecessary consumption.
Let's look at the original video rendering effect image:
Frame Correction#
The video above is played in full screen, but the screen resolution and video resolution are different, causing the video image to be stretched. This requires calculating the appropriate video frame size based on the screen resolution and video resolution. In this article, the mapping of coordinates is introduced, and the basic adaptation of triangle deformation is discussed. The video is similar; it is essentially a rectangle.
Projection mainly includes orthographic projection and perspective projection. Orthographic projection is generally used for rendering 2D images, such as ordinary video rendering, while perspective projection has the characteristic of larger near and smaller far, typically used for 3D image rendering, such as VR rendering. Therefore, we use orthographic projection to correct the image.
First, let's look at the modifications to the Shader
, mainly the changes in the vertex shader, as follows:
attribute vec4 aPosition;
attribute vec2 aCoordinate;
uniform mat4 uMVPMatrix;
varying vec2 vTextureCoordinate;
void main() {
gl_Position = uMVPMatrix * aPosition;
vTextureCoordinate = aCoordinate;
}
The key is to calculate the matrix uMVPMatrix
, which is the product of the projection matrix and the view matrix. The projection matrix is calculated using Matrix
for matrix operations in OpenGL ES. Orthographic projection uses Matrix.orthoM
to generate the projection matrix, calculated as follows:
// Calculate video scaling ratio (projection matrix)
val screenRatio = screenWidth / screenHeight.toFloat()
val videoRatio = videoWidth / videoHeight.toFloat()
val ratio: Float
if (screenWidth > screenHeight) {
if (videoRatio >= screenRatio) {
ratio = videoRatio / screenRatio
Matrix.orthoM(
projectionMatrix, 0,
-1f, 1f, -ratio, ratio, 3f, 5f
)
} else {
ratio = screenRatio / videoRatio
Matrix.orthoM(
projectionMatrix, 0,
-ratio, ratio, -1f, 1f, 3f, 5f
)
}
} else {
if (videoRatio >= screenRatio) {
ratio = videoRatio / screenRatio
Matrix.orthoM(
projectionMatrix, 0,
-1f, 1f, -ratio, ratio, 3f, 5f
)
} else {
ratio = screenRatio / videoRatio
Matrix.orthoM(
projectionMatrix, 0,
-ratio, ratio, -1f, 1f, 3f, 5f
)
}
}
The above mainly determines the appropriate projection matrix parameters based on the screen ratio and the original video ratio. This calculation is similar to image scaling. One principle is that the video image must be fully displayed within the screen boundaries. The ratio
above represents the boundaries of the orthographic projection frustum. For example, calculating ratio
for my phone, to simplify, assume the screen width equals the video width, with a screen resolution of 1080 * 2260 and a video resolution of 1080 * 540, then ratio
is approximately 2260 / 540, which is about 4.18. Clearly, if we take the screen height as the baseline, when the video height is 2260, the video width would be 4520, far exceeding the screen width. Therefore, we adapt based on the video width. Now let's look at the camera position settings:
// Set camera position (view matrix)
Matrix.setLookAtM(
viewMatrix, 0,
0.0f, 0.0f, 5.0f, // Camera position
0.0f, 0.0f, 0.0f, // Target position
0.0f, 1.0f, 0.0f // Camera up direction
)
The outward direction from the screen is the z-axis. The camera position (0, 0, 5) indicates that the camera is positioned 5 units away from the screen, along the z-axis. This value must be between the near and far of the frustum; otherwise, it will not be visible. For example, in this case, this value should be between 3 and 5. The target position (0, 0, 0) represents the screen, which is the plane formed by the x and y axes, and the camera up direction (0, 1, 0) indicates the positive direction along the y-axis. Finally, calculate the projection and view transformations as follows, merging projectionMatrix
and viewMatrix
into vPMatrix
through matrix multiplication:
// Calculate projection and view transformation
Matrix.multiplyMM(vPMatrix, 0, projectionMatrix, 0, viewMatrix, 0)
To correct the image, the original video size is needed, which can be obtained in the onVideoSizeChanged
callback of MediaPlayer
to initialize the matrix data. Now let's look at the effect after frame correction:
Thus, the video rendering using OpenGL ES is complete. You can obtain the keyword [RenderVideo] to get the source code.