7 januari 2012

Setting up a deferred shader

Background
See also the part 2 about deferred rendering.

The idea with a deferred shader is to use two (or more) shader stages. The first stage will render to internal buffers, but with typically more information than is usually shown on screen. The second stage will use the internal buffers to create the final screen image.

Notice the difference between deferred shading and deferred lighting. The case of deferred lighting only do the lighting in the second (deferred) stage. Information about the geometry is not saved, and so need to be rendered again. It can still be efficient, as the depth buffer is reused.

If there are a lot of effects that are added, like lighting, and other pixel transformations, then it may be a disadvantage to do this in a single render stage (forward renderer). The reason is that a lot of GPU processing power can be used for computing effects of pixels that are thrown away because they were found to be occluded. One advantage of using a deferred shader is that all drawn objects will have light effects added from the same algorithms, even if they use separate first stage shaders (as long as the correct data for the second stage are created).

A disadvantage of a deferred shader is that transparent objects are more difficult to handle. One way is to simply draw the transparent objects after the deferred stage. In my case, I draw the transparent objects also in the deferred stage.

In the following, I will show an example on how it can be implemented. I am using one FBO (frame buffer object), one depth buffer as a render buffer and four colour buffers. The FBO is not a buffer on its own. It is a container object, much like vertex array objects. When a FBO is bound, all drawing will go to the attached buffers of the FBO instead of the visible screen. There are two different types of buffers that can be attached; textures and render buffers. The texture buffer is used when the result of the operation shall be used as a texture in another rendering stage. A render buffer, on the other hand, can't be used by a shader. A way to use the result from a render buffer after the draw operation is glReadPixels() or glBlitFramebuffer().

Setting up the FBO
This has to be done again if the screen size changes. As the depth buffer isn't used again after the FBO drawing, it is allocated in a render buffer.

glGenFramebuffers(1, &fboName);
glGenRenderbuffers(1, &fDepthBuffer);

// Bind the depth buffer
glBindRenderbuffer(GL_RENDERBUFFER, fDepthBuffer);
glRenderbufferStorage(GL_RENDERBUFFER, GL_DEPTH_COMPONENT24, width, height);

// Generate and bind the texture for diffuse
glGenTextures(1, &fDiffuseTexture);
glBindTexture(GL_TEXTURE_2D, fDiffuseTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA,
    GL_UNSIGNED_BYTE, NULL);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

// Generate and bind the texture for positions
glGenTextures(1, &fPositionTexture);
glBindTexture(GL_TEXTURE_2D, fPositionTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA32F, width, height, 0, GL_RGBA, GL_FLOAT,
    NULL);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

// Generate and bind the texture for normals
glGenTextures(1, &fNormalsTexture);
glBindTexture(GL_TEXTURE_2D, fNormalsTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA16F, width, height, 0, GL_RGBA, GL_FLOAT,
    NULL);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

// Generate and bind the texture for blending data
glGenTextures(1, &fBlendTexture);
glBindTexture(GL_TEXTURE_2D, fBlendTexture);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA, width, height, 0, GL_RGBA,
    GL_UNSIGNED_BYTE, NULL);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_LINEAR);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);

Now the buffers have been allocated, and have to be attached to the FBO.

// Bind the FBO so that the next operations will be bound to it.
glBindFramebuffer(GL_FRAMEBUFFER , fboName);
glFramebufferRenderbuffer(GL_FRAMEBUFFER, GL_DEPTH_ATTACHMENT, GL_RENDERBUFFER, fDepthBuffer);
// Attach the texture to the FBO
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT0, GL_TEXTURE_2D, fDiffuseTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT1, GL_TEXTURE_2D, fPositionTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT2, GL_TEXTURE_2D, fNormalsTexture, 0);
glFramebufferTexture2D(GL_FRAMEBUFFER, GL_COLOR_ATTACHMENT3, GL_TEXTURE_2D, fBlendTexture, 0);

GLenum fboStatus = glCheckFramebufferStatus(GL_FRAMEBUFFER);
if (fboStatus != GL_FRAMEBUFFER_COMPLETE) {
printf("DeferredLighting::Init: FrameBuffer incomplete: 0x%x\n", fboStatus);
exit(1);
}
glBindFramebuffer(GL_FRAMEBUFFER , 0);

As can be seen, the colour buffers are texture buffers. They have an initialized size, but no initialized data. The GL_TEXTURE_MIN_FILTER and GL_TEXTURE_MAG_FILTER doesn't really matter, as the final screen will have the same size as the internal buffers. So there will be no magnification or reduction, but it still has to be defined as default for the reduction is GL_NEAREST_MIPMAP_LINEARDefault for magnification is GL_LINEAR, though.

The FBO is bound using glBindFramebuffer. There are three possible targets, GL_DRAW_FRAMEBUFFERGL_READ_FRAMEBUFFER  and GL_FRAMEBUFFER. It is recommended that GL_FRAMEBUFFER is used when the FBO is defined, and that GL_DRAW_FRAMEBUFFER or GL_READ_FRAMEBUFFER are bound when the FBO is used.

Some explanation is needed why I use 4 colour buffers. These buffers will consume many Megabytes of GPU memory, and should be kept to a minimum. However, with modern graphic cards, the problem is smaller. The fDiffuseTexture will contain the colour of the material. As the original textures are of type GL_RGBA, this buffer can as well be GL_RGBA. The fPositionTexture will store the world coordinates of the pixel. For this, we need higher precision (GL_RGBA32F). The coordinates are needed in the deferred shader to compute distances to lamps and other objects. The fNormalsTexture buffer stores the normals. In this case, a limited precision is good enough (GL_RGBA16F). The normals are needed to compute effects of directional light and lamps. Finally, there is also a fBlendTexture buffer. The blending can also be done in a separate render stage after the deferred shader (remember to reuse the depth buffer if that is the case). But I use the blending data for some special effects in the deferred shader.

First stage shader
The first stage vertex shader looks like this:

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform mat4 projectionMatrix;
uniform mat4 modelMatrix;
uniform mat4 viewMatrix;
in vec3 normal;
in vec2 texCoord;
in vec4 vertex;
in float intensity; // sun light
in float ambientLight;
out vec3 fragmentNormal;
out vec2 fragmentTexCoord;
out float extIntensity;
out float extAmbientLight;
out vec3 position;
void main(void)
{
   fragmentTexCoord = texCoord;
   fragmentNormal = normalize((modelMatrix*vec4(normal, 0.0)).xyz);
   gl_Position = projectionMatrix * viewMatrix * modelMatrix * vertex;
   position = vec3(modelMatrix * vertex); // Copy position to the fragment shader
   extIntensity = intensity/255.0;        // Scale the intensity from [0..255] to [0..1].
   extAmbientLight = ambientLight/255.0;
}

To map output from the first fragment shader stage, I do as follows. This has to be done before the shader program is linked.

glBindFragDataLocation(prg, 0, "diffuseOutput");
glBindFragDataLocation(prg, 1, "posOutput");
glBindFragDataLocation(prg, 2, "normOutput");
glBindFragDataLocation(prg, 3, "blendOutput");

The names are the output names of the fragment shader, which looks as follows. A layout command could also have been used, but it is not available in OpenGL 3.0. The shader is executed twice; first for normal materials, and second for transparent materials. The second time will only have the blendOutput enabled. The blending uses pre multiplied alpha, which makes the operation associative. The first stage fragment shader looks as follows. For this example, the same shader is used for opaque objects and transparent objects, but eventually they should be split into two.

#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform sampler2D firstTexture;
in vec3 fragmentNormal;
in vec2 fragmentTexCoord;
in vec3 position;       // The model coordinate, as given by the vertex shader
out vec4 diffuseOutput; // layout(location = 0)
out vec4 posOutput;     // layout(location = 1)
out vec4 normOutput;    // layout(location = 2)
out vec4 blendOutput;   // layout(location = 3)
void main(void)
{
   posOutput.xyz = position;   // Position given by the vertext shader
   normOutput = vec4(fragmentNormal, 0);
   vec4 clr = texture(firstTexture, fragmentTexCoord);
   float alpha = clr.a;
   if (alpha < 0.1)
       discard;   // Optimization that will not change the depth buffer
   blendOutput.rgb = clr.rgb * clr.a; // Pre multiplied alpha
   blendOutput.a = clr.a;
   diffuseOutput = clr;
}

Deferred stage shader
The vertex shader is very simple. It is used only to draw two triangles covering the whole window. The main work will be done in the fragment shader. The default projection of OpenGL is x and y in the range [-1,+1]. The position information forwarded to the fragment shader has to be in the range [0,1] as it is used to interpolate in the textures. The triangles are defined in the range [0,1], which I transform to the range [-1,+1]. This is a simple operation with no need for a transformation matrix.


#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
in vec4 vertex;
out vec2 position;
void main(void)
{
   gl_Position = vertex*2-1;
   gl_Position.z = 0.0;
   // Copy position to the fragment shader. Only x and y is needed.
   position = vertex.xy;
}


The fragment shader for the deferred stage looks as follows. Some simplifications have been done to keep the listing short. Other lighting effects are easy to add, e.g. material properties for reflection. The specular glare should not be the same for all materials. Other things that can be added is information about ambient light and sun light, which would also need to be prepared in the first render stage. More texture buffers can be allocated for this, but there are unused space available already in the current buffers (i.e. the alpha channels). The input textures are the ones generated by the FBO.


#version 130 // This corresponds to OpenGL 3.0
precision mediump float;
uniform sampler2D diffuseTex; // The color information
uniform sampler2D posTex;     // World position
uniform sampler2D normalTex;  // Normals
uniform sampler2D blendTex;   // A bitmap with colors to blend with.
uniform vec3 camera;          // The coordinate of the camera
in vec2 position;             // The world position
out vec4 fragColor;           // layout(location = 0)
void main(void)
{
   // Load data, stored in textures, from the first stage rendering.
   vec4 diffuse = texture2D(diffuseTex, position.xy);
   vec4 blend = texture2D(blendTex, position.xy);
   vec4 worldPos = texture2D(posTex, position.xy);
   vec4 normal = texture2D(normalTex, position.xy);
   // Use information about lamp coordinate (not shown here), the pixel
   // coordinate (worldpos.xyz), the normal of this pixel (normal.xyz)
   // to compute a lighting effect.
   // Use this lighting effect to update 'diffuse'
   vec4 preBlend = diffuse * lamp + specularGlare;
   // manual blending, using premultiplied alpha.
   fragColor = blend + preBlend*(1-blend.a);
// Some debug features. Enable any of them to get a visual representation
// of an internal buffer.
// fragColor = (normal+1)/2;
//      fragColor = diffuse;
// fragColor = blend;
// fragColor = worldPos; // Scaling may be needed to range [0,1]
// fragColor = lamp*vec4(1,1,1,1);
}

Execute the drawing every frame
Now everything has been prepared, and can be used for every frame update. Clear the fbo buffers from the previous frame:

glBindFramebuffer(GL_DRAW_FRAMEBUFFER, fboName);
GLenum windowBuffClear[] = { GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1, GL_COLOR_ATTACHMENT2, GL_COLOR_ATTACHMENT3 };
glDrawBuffers(4, windowBuffClear); // Select all buffers
glClearColor(0.0f, 0.0f, 0.0f, 0.0f); // Set everything to zero.
glClear(GL_COLOR_BUFFER_BIT|GL_DEPTH_BUFFER_BIT);

Execute the first render stage, which will fill out the internal buffers with data:

EnableRenderProgramStage1();
GLenum windowBuffOpaque[] = { GL_COLOR_ATTACHMENT0, GL_COLOR_ATTACHMENT1,  GL_COLOR_ATTACHMENT2, GL_NONE };
// Do not produce any blending data on the 4:th render target.
glDrawBuffers(4, windowBuffOpaque);
DrawTheWorld(); // Will also produce depth data in the depth buffer

GLenum windowBuffTransp[] = { GL_NONE, GL_NONE, GL_NONE, GL_COLOR_ATTACHMENT3 };
glDrawBuffers(4, windowBuffTransp); // Only update blending buffer
glEnable(GL_BLEND);
// Use alpha 1 for source, as the colours are premultiplied by the alpha.
glBlendFunc(GL_ONE, GL_ONE_MINUS_SRC_ALPHA);
// The depth buffer shall not be updated.
glDepthMask(GL_FALSE);
DrawAllSortedTransparent();
glDepthMask(GL_TRUE);
glBlendFunc(GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA); // Restore to default
glDisable(GL_BLEND);
glBindFramebuffer(GL_DRAW_FRAMEBUFFER, 0);

The output from the first render stage is now available in the texture buffers. Execute the second render stage, the deferred shader.

EnableRenderProgramDeferredStage();
// The depth buffer from stage 1 is not used now as the fbo is disabled.

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
SetupDeferredStageUniforms();
glEnableVertexAttribArray(fVertexIndex);

glActiveTexture(GL_TEXTURE3);
glBindTexture(GL_TEXTURE_2D, fBlendTexture);

glActiveTexture(GL_TEXTURE2);
glBindTexture(GL_TEXTURE_2D, fNormalsTexture);

glActiveTexture(GL_TEXTURE1);
glBindTexture(GL_TEXTURE_2D, fPositionTexture);

glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, fDiffuseTexture);

DrawSimpleSquare();

glDisableVertexAttribArray(fVertexIndex);

The result

The material colour information.


Positional data.




Normals.

Final result. In this picture, blending data, lamps, fog effects and ambient light are also used.




Update history

2012-09-13 Added reference to deferred lighting. Clarified some distinction between using GL_FRAMEBUFFER, GL_DRAW_FRAMEBUFFER and GL_READ_FRAMEBUFFER. Cleaned up the fragment shader of the first stag.
2012-10-26 Add reference to part 2.