Saturday, January 10, 2009

Virtual Geometry Images

Geometry images are one of those ideas so simple you ask yourself "Why didn't I think of this?" I'll admit it isn't the topic of much discussion concerning the "more geometry" problem for the next generation. They work great for compression but they don't inherently solve any of the other problems. Multi-chart geometry images have a complicated zipping procedure that is also invalid if a part is rendered at a different resolution.

A year ago when I was researching a solution to "more geometry" on DirectX 9 level hardware I came across this paper that was in line with the direction I was thinking. The idea is an extension to virtual textures by having another layer with the textures that is a geometry image. For every texture page that is brought in there is a geometry image page with it. By decomposing the scene into a seemless texture atlas you are also doing a Reyes like split operation. The splitting is a preprocess and the dice is real time. The paper also explains an elegant seem solution.

My plan on how to get this running really fast was to use instancing. With virtual textures every page is the same size. This simplifies many things. The way detail is controlled is similar to a quad tree. The same size pages just cover less of the surface and there are more of them. If we mirror this with geometry images every time we wish to use this patch of geometry it will be a fixed size grid of quads. This works perfectly with instancing if the actual position data is fetched from a texture like geometry images imply. The geometry you are instancing then is grid of quads with the vertex data being only texture coordinates from 0 to 1. The per instance data is passed in with a stream and the appropriate frequency divider. This passes data such as patch world space position, patch texture position and scale, edge tessellation amount, etc.

If patch tessellation is tied to the texture resolution this provides the benefit that no page table needs to be maintained for the textures. This does mean that there may be a high amount of tessellation in a flat area merely because texture resolution was required. Textures and geometry can be at a different resolution but still be tied such as the texture is 2x the size as the geometry image. This doesn't affect the system really.

If the performance is there to have the two at the same resolution a new trick becomes available. Vertex density will match pixel density so all pixel work can be pushed to the vertex shader. This gets around the quad problem with tiny triangles. If you aren't familiar with this, all pixel processing on modern GPU's gets grouped into 2x2 quads. Unused pixels in the quad get processed anyways and thrown out. This means if you have many pixel size triangles your pixel performance will approach 1/4 the speed. If the processing is done in the vertex shader instead this problem goes away. At this point the pipeline is looking similar to Reyes.

If this is not a possibility for performance reasons, and it's likely not, the geometry patches and the texture can be untied. This allows the geometry to tessellate in detailed areas and not in flat areas. The texture page table will need to come back though which is unfortunate.

Geometry images were first designed for compression so disk space should be a pretty easy problem. One issue though is edge pixels. Between each page the edge pixels need to be exact otherwise there will be cracks. This can be handled by losslessly compressing just the edge and using normal lossy image compression for the interiors. As the patches mip down they will be using shared data from disk so this shouldn't be an issue. It should be stored uncompressed in memory thought or the crack problem will return.

Unfortunately vertex texture fetch performance, at least on current console hardware, is very slow. There is a high amount of latency. Triangles are not processed in parallel either. With DirectX 11 tessellators it sounds like they will be processed in parallel. I do not know whether vertex texture fetch will be up to the speed of a pixel texture fetch. I would sure hope so. I need to read specs for both the API and this new hardware before I can postulate on how exactly this scheme can be done with tessellators instead of instanced patches but I think it will work nicely. I also have to give the disclaimer that I have not implemented this. The performance and details of the implementation are not yet known because I haven't done it.

To compare this scheme with the others it has some advantages. Given that it is still triangle rasterization dynamic objects are not a problem. To make this work with animated meshes it will probably need bone indexes and weights stored in a texture along with the position. This can be contained to an animation only geometry pool. It doesn't have the advantage subd meshes have that you can animate just the control points. This advantage may not work that well anyways because you need a fine grained cage to get good animation control which increases patch number, draw count, and tessellation of the lowest detail LOD (the cage itself).

It's ability to LOD is better than subd meshes but not as good as voxels. The reason for this is the charts a model has to be split up into are usually quite a bit bigger than the patches of a subd mesh. This really depends on how intricate the mesh is though. It scales the same subd meshes do but just with a different multiplier. Things like terrain will work very well. Things like foliage work terribly.

Tools side, anything can be converted into this format. Writing the tool unfortunately looks very complicated. This primarily lies with the texture parametrization required to build the seemless texture atlas. After UV's are calculated the rest should be pretty straight forward.

I do like this format better than subd meshes with displacement maps but it's still not ideal. Tiny triangles start to lose the benefits of rasterization. There will be overdraw and triangles missing the center of pixels. More important I think is that it doesn't handle all geometry well, so it doesn't give the advantage of telling your artists they can make any model and place it however they want and it will have no impact on the performance of the game. Once they start making trees or fences you might as well go back to how they used to work because this scheme will run even slower than the old way. The same can be said for subd meshes btw.

To sum it up I think it's a pretty cool system but it's not perfect and doesn't solve all the problems.

Thursday, January 8, 2009

More Geometry

There has been a lot of talk lately about the next generation of hardware and how we are going to render things. The primary topic seems to be "How are we going to render a lot more geometry than we can now?" There are two approaches that are getting a lot of attention. The first is subdivision surfaces with displacement maps and the second is voxel ray casting.

Here are some others that are getting a bit of attention.

Ray casting against triangles
Intel's ray tracer
Cuda ray tracer

Point splatting
Far Voxels
QSplat
Atom

Progressive meshes
Progressive Buffers
View dependent progressive mesh on the GPU

Progressive Buffers is one of my favorite papers. It's one that I keep coming back to time and time again.

Otoy
interview

Who knows exactly what is going on in Otoy. It almost seems like they are being deliberately confusing. It's for a game engine, it's for movie CG, it's lightstage (which I thought was a non-commercial product), it's a server side renderer, it's a web 3d viewer / streaming video.

What I have gathered it generates an unstructured point cloud on the gpu and creates a point hierarchy. It uses this for ray tracing not just eye rays but shadows and reflections. The reflections are massively cached. It's not clear how. I can't figure out how this is working with full animations like they have in their videos. Either that would require regenerating the points, which makes ray casting into it kind of pointless, or it has to deal with holes. Whatever they are doing the results are very impressive.

Subdivision surfaces with displacement maps
This has a lot of powerful people behind it. Both nVidia and AMD are behind it along with DirectX 11 API support through the hull shader, tessellator, and domain shader. I'm not a big fan of this. First off, it's only really useful in data amplification not data reduction. For example our studio and many others are now using the subd cage for the in game models for our characters. That means the the lowest tessellation level the subd surface can get to, the subd cage, is the same poly count as our current near LOD character meshes. It makes subdivision surfaces not useful at all in LODing moderate distances. This can be reduced some but likely not by enough to solve the problem. It looks really complicated to implement and rife with problems. It requires artists input to create models that work well with it. The data imported from the modelling package can be specific to that package. It's hard to beat the generality of a triangle soup.

The plus side is there is no issue with multiple moving meshes or deforming. They are fully animatable. To allow good animation control the tessellation of the cage may need to be higher. In theory every piece of geometry in your current engine could be replaced with subd models with displacement maps and the rest of your pipeline would work exactly the same.

Check out some work in this direction from Michael Bunnell of Fantasy Lab:
GPU Gems 2 chapter
Fantasy Lab

Voxel ray casting
This has been made popular by John Carmack who has described this as his planned approach for idTech6. Considering he's always at the head of the pack this should give it pretty strong backing. John refers to his implementation as a sparse voxel octree (SVO). The idea is to extend his current virtual texturing mip blocks to 3D with a "mip" hierarchy of voxels that will be stored as an octree. The way this is even remotely reasonable is that you only need to store the important data, no data in empty space. This is very different from most scientific and medical applications that require the whole data block. This structure is great for compression. Geometry compression now turns into image compression which is a well studied problem and effective. LODing works from the whole screen bing a single voxel to subpixel detail. To render it every screen pixel casts a ray into the octree until it hits a voxel the size of a pixel or smaller. This means that both rendering is reduced by LODing and memory is reduced. If you don't need the voxel data you don't need it in memory.

I like this approach because it gives one elegant way of handling textures, geometry, streaming, compression and LODing all in one system. There are no demands on the input data. Anything can be converted into voxels. Due to a streaming structure very similar to virtual texturing it allows unique geometry and unique texturing. This means there are no restrictions on the artists. There is little way an artist can impact performance or memory with assets they create. That puts the art and visuals in the artists hands and the technical decisions in the engineers hands.

There are some problems. Ray tracing has always been slow. Ray casting is a search where rasterization is binning. Binning is almost always faster than searching. In a highly parallel environment the synchronization required in binning may tip the scales in searching's favor. As triangles shrink the number of multiple triangles in one bin or bin misses also hurts the speed advantage. It has now been demonstrated to be fast enough to render on current hardware at 60fps. This should be enough proof to let this concern slide a bit. It does mean it will only be able to be rendered once. It's unlikely there will be power left to render a shadow map or ray trace to the light for that matter. My guess is John's intent is to have the lighting fully baked into the texture like how idTech5 works currently. This also cuts down on the required memory as only one color is required per voxel.

Memory is another possible problem. Jon Olick's demo of one character using a SVO required ~1gb of video memory which was not enough to completely hide paging. His plans to decrease this size was entropy encoding which means each child's data is based on its parents data. As far as I'm aware this is only going to work if you use a kd-tree restart traversal which is slower than the other alternatives. Otherwise he would need to evaluate the voxel data for the whole stack once he wishes to draw the pixel.

The most important problem is it doesn't work with dynamic meshes. The scheme I believe John Carmack is planning on using is a static world with baked lighting with all dynamic objects and characters using traditional triangle meshes. I expect this to work out well. The performance of this type of situation has been shown to be there so it's not too risky to pursue this direction. There is something about it that bugs me. You release the constraints of the environmental artists but leave the other artists with the same problems. If you handle texturing inherently with voxels does that mean he needs to keep around his virtual texturing for everything else? Treating the world and the dynamic objects in it separately has been required in the past with lightmaps and vertex lighting. To bring back this Hanna Barbara effect in a even more complicated way leaves a bad taste in my mouth. I'm really looking for a uniform solution for geometry and textures.

For more detailed information see Jon Olick's Siggraph presentation. He is a former employee of id. Also check out the interview that started this all off.

The brick based voxel implementations seem like a better solution to me than having the tree uniform. This means the leaves are a different type than the nodes. They consist of a fixed size brick of voxels. Being in a brick format has many advantages. It allows free filtering and hardware volume texture compression through DXT formats.

Check out these for brick approaches:
Gigavoxels
GPU ray casting of voxels