An enhanced OpenGL renderer for Unreal Tournament News Archive
6-24-2007
I added a few updates to the D3D8 renderer. The NoAATiles option is supported. The 16-bit textures mode uses 555 instead of 565 textures by default. There are a few other minor changes.
Version 1.3 or UT: utd3d8r13.zip (102 KB).
Version 1.2 for Deus Ex (works with Deus Ex version 1112fm): dxd3d8r12.zip (102 KB).
Version 1.1 for Rune (works with Rune version 1.07 or compatible): runed3d8r11.zip (103 KB).
The source code package for this version of the D3D8 renderer is utd3d8r13src.zip (56 KB). It contains MSVC8 project files. If rebuilding the source with this compiler, make sure to use service pack 1 to avoid a code generation bug with some of the SSE code. Also make sure to apply the UTGLR_NO_APP_MALLOC changes to the copy of UnFile.h that comes with the headers in the Core/Inc directory to avoid problems with certain debug features and sstream class usage.
3-11-2007
Version 3.4 is released. It adds a new option that should eliminate the HUD corruption that can occur when antialiasing is enabled. It also contains a number of other changes. This version is built with a new compiler, Microsoft Visual C++ 2005 Express Edition with service pack 1. To make installation simpler to deal with, it uses the statically linked runtime library, which makes the file a bit larger.
Changes in version 3.4:
- Added the new NoAATiles option to avoid HUD corruption with antialiasing enabled.
- Fixed a bug with fragment programs and masked textures with details textures.
- Fragment program version of single pass fog mode.
- Other fragment program related changes.
- Single pass detail texture path for the less common no lightmap case.
- Built with a different compiler.
- Removed the TruForm code.
- A few other minor changes.
Version 3.4 or UT: utglr34.zip (111 KB).
Version 1.8 for Deus Ex (works with Deus Ex version 1112fm): dxglr18.zip (111 KB).
Version 1.2 for Rune (works with Rune version 1.07 or compatible): runeglr12.zip (112 KB).
Enabling NoAATiles should eliminate the HUD corruption that can occur when antialiasing is enabled. As the name implies, it disables antialiasing when drawing tiles. Note that corruption with antialiasing enabled can still occur on the logo background if using Entry.unr on startup. This background isn't made of tiles (when talking about it from the renderer perspective). There may be some hardware/drivers where NoAATiles won't work if they don't support the feature it uses.
I added new fragment programs, plus a few new vertex programs, which should allow it to always stay in fragment program mode for the entire frame in most cases. I didn't see any significant performance impact from this change though. As mentioned before, the single pass fog fragment program path by itself appears to run a little faster than the fixed function version on my system. If I didn't mess up any of the checks, it should also allow for single pass fog support on video cards from vendors besides ATI and NVIDIA (the fixed function path needed vendor specific extensions that didn't become standard or get picked up by other vendors).
With a lot of new fragment program code, there is a higher chance of other new bugs in this area. To check for any potential problems in cases like this, the UseFragmentProgram option can be toggled while the game is running to check for any differences that shouldn't be there. Note that there are a few known differences such as DetailMax=2 only working in certain modes and single pass detail texturing looking different in some cases with OneXBlending disabled and certain combinations of textures. To use fragment program mode, vertex program mode must also be enabled.
I removed the TruForm code since it interfered with other changes. ATI pulled TruForm support from their drivers a long time ago. If you look closely enough at the source for this version, you'll notice that it's not really doing anything new that appears to interfere with the TruForm code. However, I only released this version since a few of the changes I made in a branch with some experimental code could be useful to have in a general release. Other changes there required modifications that meant it was a good time to drop the TruForm code. It was easy and low cost to leave in place if no nearby changes, but otherwise it made sense to drop since it's no longer supported in ATI's drivers.
2-24-2007
Single pass fragment program detail textures are buggy if they show up on a masked texture. I didn't test it much, but this problem may only show up if OneXBlending is also enabled. This might be fixed in a future version, but otherwise just disable fragment programs to work around this one.
I'm looking at converting everything, except for maybe a few obscure combinations, to use fragment programs just in case vertex and fragment program on/off mode switches happen to cost a bit more than switching between different programs in some cases. I added a bunch of bugs with masked textures while working on this, but hopefully those are fixed now.
If I do finish the fragment program conversion, I'll probably put together a new Rune build. I doubt it benefits from any of the new rendering paths, but I'd prefer to keep it working if built from the latest source, and some of these changes could break it unless a few extra Rune specific updates are added too.
With fragment programs it's possible to support single pass fog without having to use an NVIDIA or ATI specific extension that this mode current requires. Note that this doesn't apply to Rune since it handles fog differently. Although the main point of this would be to support this mode on chipsets from other vendors, the fragment program version did run a little faster on my GeForce 6 series video card in some very limited tests.
I'll probably look at using a new compiler before building a possible future version since it looks like Visual C++ 2005 won't mess up on the SSE code now that SP1 is out. It probably won't make much of a difference with the current code, but there are a few other good reasons to move to something newer.
2-4-2007
Version 3.3 is released. It adds fragment program support, which is only used for a few detail texture paths in this one. However, the only useful one is probably single pass with DetailMax=2. I'm not sure if it will be generally faster across all hardware configurations, but it's usually a bit faster than the no fragment program multipass path for DetailMax=2 in the few tests I ran on my system. It may be good to avoid this one for online play, at least for a while, as anticheat checks across all servers may not be kept so up to date these days.
It looks like Visual C++ 2005 SP1 fixes the code generation bug with unaligned SSE loads (compiler version 14.00.50727.762).
If it wasn't for the one single pass fragment program path that can do the two detail texture layers, it may not have been worth doing a new release with this one. Other than that, it's really just some testability and infrastructure updates for the next big thing, except I'm not sure if that one will ever happen. If not, there's always a chance the fragment program support might help with something else. And, although fairly limited in scope, it's one more use of advanced graphics features for a couple older games.
1-28-2007
runeglr09.zip, dxglr12.zip, and dxglr13.zip join the list of obsolete and no longer hosted files.
11-29-2006
I built a new Deus Ex D3D8 renderer that I think fixes the save game thumbnail problem. I also built a new Deus Ex OpenGL renderer with similar changes, but it looks like there's some other problem with save game thumbnails there, so they probably still won't work with this one.
5-16-2006
I decided to bump up the release date for the D3D8 renderers a little bit. My video card decided to start dying and I wanted to get the final testing done before it became completely unusable if possible. I don't like even hinting at pre-announcing things for projects like this anyway. There's too much risk of something coming up and things not coming out as planned. What I can say with far more certainty however is that this is most likely my final renderer code update, at least in the short to mid term. I might work on documentation like stuff a little more, but I don't like writing documentation...
I think I fixed a problem with the projection matrix that I didn't quite set up correctly. Hopefully this takes care of problems with the z far plane being a bit too close. This would cause things a certain far enough distance away to not draw. Although it's hopefully fixed, or at least better now, even if the same as the OpenGL renderer, there still might be the chance of z far clipping on a really huge map. I added clamping for colors in the DrawTile() path. I'm not sure it's needed anywhere in UT, but it's good to have to be safe in case it's needed in some case I don't know about. I also added a few optimizations and other minor updates that were added to the last OpenGL renderer release.
This D3D8 renderer release comes with source code. So, if you're wondering what the source looks like or you want to compare it to the source for the OpenGL renderer, now you can. More importantly, if I happened to miss something major, now someone else can easily add some fixes, features, or try to make it work with some other game if the opportunity came up. The source package is in the file utd3d8r12src.zip (54 KB).
1-15-2006
Version 3.2 is released. It has some new SSE2 code in few places. Minor improvements were made to some of the existing assembly code. A few rdtsc instructions used for profiling that negatively impacted performance a bit too much in some cases were removed. A few other mostly minor changes were made.
While looking through older renderer code (D3D/Glide stuff), I noticed that the DrawTile path was clamping colors. I didn't do this in the BufferTileQuads path up until now. Although I never saw it causing any problems, I added clamping code in version 3.2 just in case. This is a little slower of course (even though still faster than previous similar code), but pulling the rdtsc instructions in the same area should help balance things out. I also added some SSE2 code to the DrawTile path for buffering color data. This code can do the clamping at no added cost and is a bit faster than the previous code, both clamping and no clamping versions.
The source code now includes VC6 project files with various updates. Unfortunately, this release won't build correctly with the VC8 compiler version 14.00.50727.42 due to a code generation bug with optimizations enabled. The does far too many extra moves with SSE/SSE2 intrinsics register allocator in previous compiler releases appears to have been at least partially fixed in VC8, well sort of. Now it generates incorrect code when telling it to do unaligned loads. I see no good way to work around this since the problem still occurs with just a simple single load followed by single store code snippet. So, if you try to build the renderer with any VC8 compiler with this problem, either disable all of the SSE/SSE2 code with the ifdef, or make the necessary modifications to only remove the SSE2 code in the DrawTile path.
12-4-2005
The last thing I was working on was replacing a couple major functions in render.dll. I never got it to be completely stable and it didn't handle a few special effects correctly, but I was able to run a few tests with it on frames that it did draw identically. After a while, I decided to not bother with trying to finish it since I don't play anymore, but I was able to test a few things of interest. Most of the details don't really matter, but without spending too much time optimizing parts of it (after spending a lot of time trying to make it work the same as the original...), I was able to increase the frame rate by up to 5% in mesh heavy frames.
Fixing the TruForm problem with incorrectly applying it to non-character meshes only took adding a new flag bit and a few simple checks. The other problem with corrupt triangles that spanned the edge of the screen when TruForm was enabled was automatically fixed as part of the optimization to not spend time clipping these in software. Of course now that ATI has pulled TruForm support from their drivers, fixing these glitches isn't so important anymore.
I was supposed to be done with renderer updates, but I might put together one more test build. Although it may have some other (hopefully minor) side effects, I might know how to avoid one of the major remaining game speed problems. This is the one that causes problems on systems that dynamically vary the speed of the CPU's timestamp counter. The major classes of systems affected by this include ones with Pentium M, certain newer P4, and certain newer K8 CPUs.
This isn't really anything that's fixable nicely in the renderer, but by tweaking the right internal flag, this problem might be avoidable, and it's easiest for me to build a setting to do this into a renderer. Unfortunately it will end up with only an up to 1 ms resolution timer instead. So at least it will be stable, but I'm not sure how smoothly it'll work. I observed significant interactions with the frame rate limiter in the renderer in some tests I ran, but the game still seemed to run okay. The better fix is to add an option to use QPC if present, but this code isn't in the renderer, so I can't fix it there (though likely would be easy to patch the right part of some other binary for this one).
6-12-2005
Deus Ex renderer
I built a new renderer for Deus Ex. It's the 3.0 code built to work with Deus Ex. The file is dxglr14.zip (works with Deus Ex version 1112fm). OneXBlending is enabled by default in this renderer, but if the brightness looks off, in addition to GammaOffset adjustments, also make sure OneXBlending=True in the [OpenGLDrv.OpenGLRenderDevice] section of your DeusEx.ini file. A new option called SceneNodeHack was added (in the previous version). Enabling this may work around some minor problems, though it wasn't tested extensively, so there's a chance it might cause other problems.
Old files
I removed a few old files. This includes old versions of the Deus Ex renderer dxglr10.zip and dxglr11.zip, and the one renderer u1glr10.zip that I built for Unreal Gold. For Unreal Gold, and other versions of Unreal, use the newer renderer from OldUnreal.
5-15-2005
I ran a few tests on the D3D8 renderer built for D3D9, which only required minor modifications. With D3D9, V Sync in a window control is available and it has access to a more rational z-bias implementation. It also tends to run slightly slower.
5-7-2005
I built a new version of the D3D8 renderer. This one is a bit faster with interleaved vertex/color data, larger vertex buffers, and the BufferTileQuads code added. BufferTileQuads is enabled by default in this renderer since not having it hurts D3D a lot more than OpenGL, and I don't have to be concerned about any backwards compatibility issues. I also added a few more features and some minor optimizations. The file is utd3d8r10.zip.
I didn't add paletted texture support to this renderer, so if you have a GeForce1-4 series video card, you should make sure to use the OpenGL renderer and enable the settings that tell it to use paletted textures (these are disabled by default). Also, on other video cards with good enough OpenGL driver support, the OpenGL renderer may be better.
Performance differences between this renderer and the OpenGL renderer are fairly small on my system, though it does tend to be a little slower. It may be possible to improve this in some cases by interleaving the texture arrays, but this is a lot of extra work, so I may not try it. It doesn't help that D3D seems to have poor small batch performance in general due to intrinsic design/implementation characteristics. There's no avoiding this after a certain point since UT has fairly low geometric complexity.
So, D3D is far simpler compared to OpenGL in the feature set it supports on the API side and yet ends up with far worse small batch performance. In various places in the renderer, it's possible to get moderate performance with a minor amount of work using OpenGL, but with D3D, it requires extra work just to make it work at all and end up with poor performance. With either API, it's possible to get higher performance by adding more advanced buffering schemes such as actor triangle buffering, clipped actor triangle buffering, BufferTileQuads, etc. This D3D renderer will be far slower than the OpenGL renderer for line drawing since it lacks advanced buffering in this area. This shouldn't be a problem with the editor because I don't support it with this renderer anyway since selection support is not implemented. Hopefully line drawing isn't used too heavily, or at all, outside the editor.
z-buffer issues
Like the OpenGL renderer, this D3D renderer may have problems with far away decals flickering due to z-buffer precision issues if only a 24-bit z-buffer is available. It doesn't support w-buffering either, though it looks like a lot of newer video cards don't support this feature anyway. It's probably possible to work around this problem in the renderer, though it may not be anything I'll add. Of course if all these new GPUs/VPUs didn't drop support for 32-bit z-buffers, this wouldn't be a problem.
4-26-2005
I finally decided to learn Direct3D in case knowing it would be good for a future job. Porting the renderer only added a few days, with a lot of that time spent dealing with things D3D makes difficult, so I tried building one that uses D3D. D3D has gotten better in recent versions, but some areas are still problematic. I'm sure glad I never used D3D7 or earlier.
This renderer will most likely be slower than the OpenGL one on ATI, NVIDIA, or other graphics cards that at least have reasonably good OpenGL drivers. I also left out a few likely significant optimizations in the current build that may limit its performance. I guess I'll find out later if fixing these can bring it up to the speed of the OpenGL renderer on my system. It uses D3D8 and since it uses certain advanced features, it will not function on various older video cards. Also, due to certain SDK complications, I think it ends up requiring at least DirectX 8.1, which I believe means it will not support Win95.
I added single pass fog mode to this one, since it happened to be easy with D3D. The required blend mode on the OpenGL side requires one extension for NVIDIA, another extension for ATI, and probably just isn't there for various other video cards since providing a standard way to access it on the fixed function side seems to have been forgotten about. It's too bad some of the other vendors didn't at least add support for the ATI version of the extension since it doesn't really add much and their hardware probably supports it all. I'll check the standard extensions again sometime, but I don't think the functionality required for single pass fog in UT is there.
I'm checking a large number of caps bits/values in this build, but a few checks are still missing. I'll probably fix a few of these later, but may leave a few of the more complicated ones out.
Windowed mode, windowed mode resizing, and surviving through various mode switches should work, but some things in this area get awfully difficult to support and test when using D3D. Windowed mode screen shots hopefully work okay, including without crashing in various special cases when the window isn't fully within the screen. D3D still makes something basic like grabbing a copy of what got rendered far too difficult in cases like this.
This initial build of this renderer supports a large number of features, but some are missing at this time.
- Selection support for UnrealEd isn't there. I may never add it, so don't use it with the editor (other functionality should work, but it's not really usable there without this feature).
- S3TC support is there.
- 16 bit texture support is there, but I did the conversions using simple clipping instead of proper rounding.
- Not checking texture aspect ratio restrictions yet, so if any specific requirements here, it may just crash when trying to load certain textures (good chance this may not be an issue on any new enough cards to run this renderer though).
- V Sync on or off request only works full screen. D3D8 doesn't allow something basic like V Sync on or off to be requested when in windowed mode. I believe this got fixed in D3D9.
- All the texture filtering modes and LOD bias should work.
- No paletted texture support, and I'm not sure I'll ever add it to this one.
- Lots of other features are supported, but a few others are not.
11-29-2004
Version 2.8 is released. It contains a couple bug fixes, basic support for 16-bit textures, and various other changes.
The rare SinglePassDetail with OneXBlending disabled bug is fixed. The fix may also optimize away a few low cost state changes.
The bug with a few incorrect gradients showing up in the console that can occur when precaching is enabled is fixed. It actually resulted in a number of textures getting unnecessary higher quality filtering, so this fix could speed things up in some cases, though without higher quality filtering modes enabled, it may make little to no difference on a number of video cards. This one was broken due to previous optimizations, though in a number of cases, the CPU savings may still have been more beneficial than any potential loss due to unnecessary high quality texture filtering. It was also somewhat difficult to fix, which is one reason why it remained broken so for long.
The new option for 16-bit textures is Use16BitTextures. The Use4444Textures option is gone. If mostly video card limited rather than CPU limited, using this new option should speed things up at the expense of reduced texture quality, which varies from case to case. In many cases, there is only minor quality loss. In other cases, like with various skyboxes and coronas, there is often major quality loss.
This basic 16-bit texture support was kept simple by just sending BGRA8 textures to the OpenGL driver and telling it to use RGB5, or RGB5_A1 if masked. It could be made faster if the renderer converted the textures to 16-bit before sending them to the driver, but I didn't want to deal with added complexity in this area right now for various reasons. So, the performance of some aspects of this new feature relies on good format conversion code in the driver, and in some cases it's not there. Enabling this feature will also reduce brightness a little bit, though it's fairly minor (much more noticeable with the old 4444 textures option). From reading the OpenGL specification, it sounds like the color components are supposed to be rounded to nearest during the conversion, but with the NVIDIA, ATI, and Intel drivers I tested, they were truncated, which causes the slight brightness reduction on average.
I ran some specific tests on BGRA8 to RGB5 and RGB5_A1 conversion performance on NVIDIA, ATI, and Intel OpenGL drivers. The results are:
| Year or so old NVIDIA drivers on my old system: | Good |
| Current ATI drivers: | Bad |
| Current Intel drivers: | Worse |
| With V Sync: | ~53° C |
| With V Sync and with frame rate limit of 85: | ~49° C |
| Without V Sync and with frame rate limit of 85: | ~46° C |
Copyright 2002-2006 Chris Dohnal