OpenGL ES rendering to play video

The first two articles mainly covered the basic usage of OpenGL ES and its coordinate system mapping, as follows:

Next, we will use MediaPlayer and OpenGL ES to achieve basic video rendering and video frame correction, with the main content as follows:

SurfaceTexture
Rendering Video
Frame Correction

SurfaceTexture#

SurfaceTexture was introduced in Android 3.0. It does not directly display image streams but captures frames from the image stream as external textures for OpenGL. The image stream mainly comes from camera previews and video decoding, allowing for secondary processing of the image stream, such as filters and effects. You can understand SurfaceTexture as a combination of Surface and OpenGL ES textures.

The Surface created by SurfaceTexture is the data producer, while SurfaceTexture is the corresponding consumer. The Surface receives media data and sends it to SurfaceTexture. When updateTexImage is called, the content of the texture object created by SurfaceTexture will be updated to the latest image frame, converting the image frame into a GL texture and binding that texture to the GL_TEXTURE_EXTERNAL_OES texture object. updateTexImage is only called in the OpenGL ES context thread, typically invoked in onDrawFrame.

Rendering Video#

Everyone should be very familiar with how MediaPlayer plays videos, so I won't elaborate here. With the introduction of SurfaceTexture in the previous section, implementing video rendering using OpenGL ES is very simple. Define the vertex coordinates and texture coordinates as follows:

// Vertex coordinates  
private val vertexCoordinates = floatArrayOf(  
    1.0f, 1.0f,  
    -1.0f, 1.0f,  
    -1.0f, -1.0f,  
    1.0f, -1.0f  
)  
// Texture coordinates  
private val textureCoordinates = floatArrayOf(  
    1.0f, 0.0f,  
    0.0f, 0.0f,  
    0.0f, 1.0f,  
    1.0f, 1.0f  
)

Texture coordinates must correspond to vertex coordinates. To briefly mention, vertex coordinates use the OpenGL coordinate system, with the origin in the center of the screen, while texture coordinates correspond to the coordinates on the screen, with the origin in the upper left corner. Generate the texture ID and activate the binding as follows:

/**  
 * Generate texture ID  
 */  
fun createTextureId(): Int {  
    val tex = IntArray(1)  
    GLES20.glGenTextures(1, tex, 0)  
    if (tex[0] == 0) {  
        throw RuntimeException("create OES texture failed, ${Thread.currentThread().name}")  
    }  
    return tex[0]  
}  
  
/**  
 * Create OES texture  
 * Automatic conversion from YUV format to RGB  
 */  
fun activeBindOESTexture(textureId: Int) {  
    // Activate texture unit  
    GLES20.glActiveTexture(GLES20.GL_TEXTURE0)  
    // Bind texture ID to the texture target of the texture unit  
    GLES20.glBindTexture(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, textureId)  
    // Set texture parameters  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MIN_FILTER, GL10.GL_NEAREST.toFloat())  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_MAG_FILTER, GL10.GL_LINEAR.toFloat())  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_S, GL10.GL_CLAMP_TO_EDGE.toFloat())  
    GLES20.glTexParameterf(GLES11Ext.GL_TEXTURE_EXTERNAL_OES, GL10.GL_TEXTURE_WRAP_T, GL10.GL_CLAMP_TO_EDGE.toFloat())  
    Log.d(TAG, "activeBindOESTexture: texture id $textureId")  
}

Bind the texture ID to the texture target of the texture unit. The chosen texture target here is GL_TEXTURE_EXTERNAL_OES, which can automatically complete the conversion from YUV format to RGB. Now let's look at the shaders, where the vertex shader receives texture coordinates and saves them to vTextureCoordinate for use by the fragment shader, as follows:

// Vertex shader  
attribute vec4 aPosition; // Vertex coordinates  
attribute vec2 aCoordinate; // Texture coordinates  
varying vec2 vTextureCoordinate;  
void main() {  
    gl_Position = aPosition;  
    vTextureCoordinate = aCoordinate;  
}  
  
// Fragment shader  
#extension GL_OES_EGL_image_external : require  
precision mediump float;  
varying vec2 vTextureCoordinate;  
uniform samplerExternalOES uTexture; // OES texture  
void main() {  
    gl_FragColor = texture2D(uTexture, vTextureCoordinate);  
}

The code for Shader compilation, Program linking, and usage is omitted here, as it has been introduced in previous articles, or you can directly check the source code at the end. The renderer is defined as follows:

class PlayRenderer(
     private var context: Context,
     private var glSurfaceView: GLSurfaceView
 ) : GLSurfaceView.Renderer,
     VideoRender.OnNotifyFrameUpdateListener, MediaPlayer.OnPreparedListener,
     MediaPlayer.OnVideoSizeChangedListener, MediaPlayer.OnCompletionListener,
     MediaPlayer.OnErrorListener {
     companion object {
         private const val TAG = "PlayRenderer"
    }
    private lateinit var videoRender: VideoRender
    private lateinit var mediaPlayer: MediaPlayer
    private val projectionMatrix = FloatArray(16)
    private val viewMatrix = FloatArray(16)
    private val vPMatrix = FloatArray(16)
    // Used for video aspect ratio calculation, see below
    private var screenWidth: Int = -1
    private var screenHeight: Int = -1
    private var videoWidth: Int = -1
    private var videoHeight: Int = -1

    override fun onSurfaceCreated(gl: GL10?, config: EGLConfig?) {
        L.i(TAG, "onSurfaceCreated")
        GLES20.glClearColor(0f, 0f, 0f, 0f)
        videoRender = VideoRender(context)
        videoRender.setTextureID(TextureHelper.createTextureId())
        videoRender.onNotifyFrameUpdateListener = this
        initMediaPlayer()
    }

    override fun onSurfaceChanged(gl: GL10?, width: Int, height: Int) {
        L.i(TAG, "onSurfaceChanged > width:$width,height:$height")
        screenWidth = width
        screenHeight = height
        GLES20.glViewport(0, 0, width, height)
    }

    override fun onDrawFrame(gl: GL10) {
        L.i(TAG, "onDrawFrame")
        gl.glClear(GL10.GL_COLOR_BUFFER_BIT or GL10.GL_DEPTH_BUFFER_BIT)
        videoRender.draw(vPMatrix)
    }

    override fun onPrepared(mp: MediaPlayer?) {
        L.i(OpenGLActivity.TAG, "onPrepared")
        mediaPlayer.start()
    }

    override fun onVideoSizeChanged(mp: MediaPlayer?, width: Int, height: Int) {
        L.i(OpenGLActivity.TAG, "onVideoSizeChanged > width:$width ,height:$height")
        this.videoWidth = width
        this.videoHeight = height
    }

    override fun onCompletion(mp: MediaPlayer?) {
        L.i(OpenGLActivity.TAG, "onCompletion")
    }

    override fun onError(mp: MediaPlayer?, what: Int, extra: Int): Boolean {
        L.i(OpenGLActivity.TAG, "error > what:$what,extra:$extra")
        return true
    }

    private fun initMediaPlayer() {
        mediaPlayer = MediaPlayer()
        mediaPlayer.setOnPreparedListener(this)
        mediaPlayer.setOnVideoSizeChangedListener(this)
        mediaPlayer.setOnCompletionListener(this)
        mediaPlayer.setOnErrorListener(this)
        mediaPlayer.setDataSource(Environment.getExternalStorageDirectory().absolutePath + "/video.mp4")
        mediaPlayer.setSurface(videoRender.getSurface())
        mediaPlayer.prepareAsync()
    }
    // Notify request to render
    override fun onNotifyUpdate() {
        glSurfaceView.requestRender()
    }

    fun destroy() {
        mediaPlayer.stop()
        mediaPlayer.release()
    }
}

The VideoRender in the above code mainly handles rendering operations, which are quite similar to the code in the previous article, so I won't include it here.

When using OpenGL ES for video rendering, you need to call the updateTexImage method of SurfaceTexture to update the image frame. This method must be used in the OpenGL ES context. You can set the rendering mode of GLSurfaceView to RENDERMODE_WHEN_DIRTY to avoid continuous drawing. When onFrameAvailable is called, meaning there is available data, then call requestRender to reduce unnecessary consumption.

Let's look at the original video rendering effect image:

Frame Correction#

The video above is played in full screen, but the screen resolution and video resolution are different, causing the video image to be stretched. This requires calculating the appropriate video frame size based on the screen resolution and video resolution. In this article, the mapping of coordinates is introduced, and the basic adaptation of triangle deformation is discussed. The video is similar; it is essentially a rectangle.

Projection mainly includes orthographic projection and perspective projection. Orthographic projection is generally used for rendering 2D images, such as ordinary video rendering, while perspective projection has the characteristic of larger near and smaller far, typically used for 3D image rendering, such as VR rendering. Therefore, we use orthographic projection to correct the image.

First, let's look at the modifications to the Shader, mainly the changes in the vertex shader, as follows:

attribute vec4 aPosition;  
attribute vec2 aCoordinate;  
uniform mat4 uMVPMatrix;  
varying vec2 vTextureCoordinate;  
void main() {  
    gl_Position = uMVPMatrix * aPosition;  
    vTextureCoordinate = aCoordinate;  
}

The key is to calculate the matrix uMVPMatrix, which is the product of the projection matrix and the view matrix. The projection matrix is calculated using Matrix for matrix operations in OpenGL ES. Orthographic projection uses Matrix.orthoM to generate the projection matrix, calculated as follows:

// Calculate video scaling ratio (projection matrix)  
val screenRatio = screenWidth / screenHeight.toFloat()  
val videoRatio = videoWidth / videoHeight.toFloat()  
val ratio: Float  
if (screenWidth > screenHeight) {  
    if (videoRatio >= screenRatio) {  
        ratio = videoRatio / screenRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -1f, 1f, -ratio, ratio, 3f, 5f  
        )  
    } else {  
        ratio = screenRatio / videoRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -ratio, ratio, -1f, 1f, 3f, 5f  
        )  
    }  
} else {  
    if (videoRatio >= screenRatio) {  
        ratio = videoRatio / screenRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -1f, 1f, -ratio, ratio, 3f, 5f  
        )  
    } else {  
        ratio = screenRatio / videoRatio  
        Matrix.orthoM(  
            projectionMatrix, 0,  
            -ratio, ratio, -1f, 1f, 3f, 5f  
        )  
    }  
}

The above mainly determines the appropriate projection matrix parameters based on the screen ratio and the original video ratio. This calculation is similar to image scaling. One principle is that the video image must be fully displayed within the screen boundaries. The ratio above represents the boundaries of the orthographic projection frustum. For example, calculating ratio for my phone, to simplify, assume the screen width equals the video width, with a screen resolution of 1080 * 2260 and a video resolution of 1080 * 540, then ratio is approximately 2260 / 540, which is about 4.18. Clearly, if we take the screen height as the baseline, when the video height is 2260, the video width would be 4520, far exceeding the screen width. Therefore, we adapt based on the video width. Now let's look at the camera position settings:

// Set camera position (view matrix)  
Matrix.setLookAtM(  
    viewMatrix, 0,  
    0.0f, 0.0f, 5.0f, // Camera position  
    0.0f, 0.0f, 0.0f, // Target position  
    0.0f, 1.0f, 0.0f // Camera up direction  
)

The outward direction from the screen is the z-axis. The camera position (0, 0, 5) indicates that the camera is positioned 5 units away from the screen, along the z-axis. This value must be between the near and far of the frustum; otherwise, it will not be visible. For example, in this case, this value should be between 3 and 5. The target position (0, 0, 0) represents the screen, which is the plane formed by the x and y axes, and the camera up direction (0, 1, 0) indicates the positive direction along the y-axis. Finally, calculate the projection and view transformations as follows, merging projectionMatrix and viewMatrix into vPMatrix through matrix multiplication:

// Calculate projection and view transformation  
Matrix.multiplyMM(vPMatrix, 0, projectionMatrix, 0, viewMatrix, 0)

To correct the image, the original video size is needed, which can be obtained in the onVideoSizeChanged callback of MediaPlayer to initialize the matrix data. Now let's look at the effect after frame correction:

Thus, the video rendering using OpenGL ES is complete. You can obtain the keyword [RenderVideo] to get the source code.