Sprite Fright

Denoising Sprite Fright

At the end of the production we were facing a high level of noise in some of the key shots. This is how we handled the denoising.

Update
26 Nov 2021
10 min read

Since early in the production we've decided to make use of the Open Image Denoiser (OIDN) that is integrated in Cycles to get noise-free frames with a relatively low sample count.

One big challenge that we faced at the end of production was a relatively high level of noise in certain shots that had already been rendered in final quality. To make things worse: trying to get rid of it by tripling the sample count did little to improve the situation, the noise was still there. So we had to come up with a creative solution that would allow us to make our renders noise-free with relatively little effort and while also re-rendering as little as possible.

Some of the especially noisy shots that needed additional attention to denoising in comparison.

The Problem

To come up with a working idea, we had to exactly understand what was causing the noise we were facing. The issue was not that individual frames had leftover noise or artifacts. OIDN does a very good job at interpreting the individual noisy render passes to generate a clean image, even with low sample counts. But where it all falls apart is in animation, which is a well-known issue with the current implementation of the denoiser. With a low sample count there is simply not enough information in the pixels for the denoiser to make an accurate guess on how the theoretical noise-free render would actually look like. So it takes some freedom in guessing to fill in the gaps. That is all fine when you look at a single frame, but as multiple images are rendered, there can be differences in how that missing information is filled in.

On surfaces with little detail the noise pattern moving through the image gets visible, as can be seen on the rock here.

For that reason, it was important from the beginning to render with a constant seed. As long as the noise pattern stays the same, the denoiser will create predictable results and interpret the frames the same way, resulting in stable noise-free frames. But that is unfortunately only true, as long as the camera stays static. The noise-pattern is constant on the screen. So as the camera pans, rotates, zooms or has any movement at all, the noise pattern will move through the image, stuck to the screen and destroy the illusion of a truly clean moving image.

More Samples

One obvious thing that we tried, was to simply increase the sample count. While that naturally does help, the increased quality was hardly worth the increased render time, as even rendering with 6x the samples (resulting in 6x render time) the video was still not noise free. We did end up doing this in a couple of places selectively by border rendering noisy parts, but we couldn't afford to do this everywhere.

Renders with 800 and 5000 samples in comparison.

Super Resolution Rendering

Another method that we tried, instead of up-scaling the sample count, was to up-scale the resolution of the frames and reduce the samples proportionally. By doubling the pixel amount while cutting the sample count in half we could render at the same cost per frame with an advantage for the denoiser. Rendering with more pixels means getting additional information over things like the surface normals that the denoiser takes into account with much higher detail. This information is basically free, as it needs only a single sample. That way, details that would usually be smaller than a single pixel can be treated much more reliably by the denoising algorithm. The image is then down-scaled back to original scale and hopefully the up-scaling behind the scenes helped to improve things.

In a still frame there is a visible improvement of the AI denoiser retaining fine detail when denoising at an increased resolution. (Especially bark and mushrooms show this here.)

This method did produce some solid, measurable improvements, in areas that can capitalize on the additional information (areas with small-scale detail) but it still is not stable over time in the areas that were especially problematic with a lot of transparency or volume passes. This is something we started looking into too late into the rendering process to take advantage of, but it did not help with the most jarring noise issues anyways.

The issue of general jittering noise in motion is not solved using this method. (Especially noticeable in the moss.)

So how do we get rid of the noise now?

In general, the idea is to use the noise-free nature of a single frame, to stabilize an entire sequence. To pull that off, something that Andy started doing quite early on was to build a proxy mesh, that is loosely following the shape of the noisy background in 3D. Then a single rendered frame could be projected onto it and rendered out for the entire sequence.

Projection of a single rendered frame onto manually created meshes.

This method gives a very nice result, but it is quite cumbersome to set up and gets rid of all the parallax in the noise-free areas that is not accounted for in the proxy mesh. Its viability also highly depends on the layout of the shot and the camera movement.

Z-Depth Projection

Finally, let's talk about the method that we ended up using to save a couple of important, especially noisy shots from jarringly jittering around. It's based on the same idea of projecting a still frame onto a proxy mesh. Only the way that proxy mesh is generated, is fully automatic. That can be done with a technique that is comparable with a simplified version of photogrammetry. Because the image that this projection is based on is a 3D rendering, we basically have all of the information that we need to recreate the position of each pixel in 3D space.

Breakdown of the z-depth projection method using geometry nodes to generate the projection mesh.

In theory we could just use the actual geometry that was used to render out the shot and project the render back onto it. But dealing with things like volumes, transparency passes and instancing can make it difficult to use only the geometry that is actually needed and it would be difficult to filter out the geometry that was actually seen from that angle that the image is projected from.

But if we just output the depth pass of the render, together with the information about the camera, we already have everything we need to know to make an exact reconstruction of the rendered geometry that is seen from the camera. By just taking into account the camera's position, rotation, focal length and aspect ratio, we can use the z-depth to calculate the 3D representation of every single pixel of an image. The mesh can be dynamically created using geometry nodes.

That creates a highly detailed 3D representation of the background in the shot with fully baked lighting information. And this can then be rendered out again from the moving shot camera and composited together with the original shot, leaving an almost perfectly noise-free result.
For the composition it's important to note that any lighting that wasn't baked into the projected representation, like the moving shadows and occlusions of characters, needs to be masked back in, losing the denoising effect for that area.

You can take a look at an example file showing this method here.

Screencapture of the setup with geometry nodes using the described method as it was used for one of the shots in Sprite Fright.

Shortcomings

It's important to point out here that in our case the conditions were specifically beneficial to be able to use this method. There are a couple of things that could simply be improved in the future, but here I want to mention some inherent shortcomings.

It is only possible to utilize this method with fully static backgrounds. A budget decision early on in the production was to have no moving foliage in any of the backgrounds (except for opening and closing shots). Without this decision the issue of jittering noise would have been in basically every day-time shot and we would also not have been able to denoise them with the projection method.
Another requirement is on the materials and lighting conditions of the reconstructed scene. If there are a lot of reflections and other material properties that rely on the viewing angle, the illusion can quickly fall apart, as the reconstruction will create a fully static result. The same is true for moving shadows and light sources.
While the 3D reconstruction allows for some freedom with the camera movement of the shot, the fact that it was done from a single camera point reveals quite some open areas that are missing information and will stay noisy in the final composite. So a smart viewing point needs to be chosen and it works ideally when the camera is not moving too much. Of course, this can also be solved with a setup using multiple projection cameras.

Further Ideas

For this method specifically, I think there are multiple ways this could go to further improve the usability and quality of results.

One thing that I was planning initially was to use all the frames of the already rendered sequences, mapping them all in 3D and averaging the colors of the result to get an even cleaner image.
More advanced than that one could even take all frames and directly solve a 3D model from the input to use for recreation of the rendered data. Meshing point clouds is also used for photogrammetry and has many further applications than just denoising like here.
To solve the issue of manually compositing characters, or specifically their shadows into the cleaned plates some masking could be done procedurally by looking at deviations from the clean base on clusters of pixels. That's something that would need a lot of fine-tuning and was not quite worth investigating in our case due to time constraints and scope.
Another idea that would expand the use of baking render data into generated geometry would be to come up with a way that allows to render the occluded geometry as well. This could be done by either slicing the geometry with camera clipping, or a smarter approach that skips a certain number of initial bounces from the camera rays in the renderer. That way a number of layers of geometry could be reconstructed and baked from the same camera angle using this method. The reconstructions of the different ray depths could then be composed to a single one that can be navigated much more freely, as surfaces that would be hidden from the projection camera are revealed. That could be very useful to create interactive rendered snapshots of 3D scenes.

Proof of concept for an x-ray rendering setup that allows to skip light bounces to reveal hidden geometry.

I think in general this idea to retain some of the inherent 3D information of a rendered image for the compositing step is something that can be utilized much more in Blender to allow for a more dynamic compositing workflow. Be it for a denoising application like here, depth-based masking or camera-independent composition, etc..

The flexibility that geometry nodes are showcasing, allowing this workflow already, demonstrates as well how the different areas of Blender together can largely propel a workflow with their synergy and compositing is in no way an exception.

Join to leave a comment.

Facial Rigging

Blender Fundamentals 4.5 LTS

3D Printing with Blender

Video Log #6: Promo video

Singularity

Wing It!

Spring

Hero

Impulse Purchase

Project Storm

DogWalk

Project Gold

Characters

Assets

Libraries

Denoising Sprite Fright

The Problem

More Samples

Super Resolution Rendering

So how do we get rid of the noise now?

Z-Depth Projection

Shortcomings

Further Ideas

English

简体中文

繁體中文

Download

What's New

Blender Studio

Manual

Developers Blog

Documentation

Benchmark

Blender Conference

Development Fund

One-time Donations

Facial Rigging

Blender Fundamentals 4.5 LTS

3D Printing with Blender

Video Log #6: Promo video

Singularity

Wing It!

Spring

Hero

Impulse Purchase

Project Storm

DogWalk

Project Gold

Characters

Assets

Libraries

Denoising Sprite Fright

The Problem

More Samples

Super Resolution Rendering

So how do we get rid of the noise now?

Z-Depth Projection

Shortcomings

Further Ideas