An enhanced OpenGL renderer for Unreal Tournament News Archive

New experimental Deus Ex renderer build that should fix the problems with flickering lighting on the trees. Also fixed the pixel lighting shader to normalize the incoming normals. Usually didn't make a huge difference, but had missed this since initially created it.

Built a new experimental Deus Ex renderer (beta page). Fixed another problem with dealing with PF_Modulated in certain cases. Didn't fix problems with flickering lighting on trees, and sometimes other objects, when using the per pixel mesh lighting path.

Also a few other mostly minor updates to the general renderer code:
- NoMaskedS3TC option removed. Always uses RGBA DXT1. This matches the only option for DXT1 in D3D.
- GL_NV_multisample_filter_hint extension support removed. Don't consider this one very useful anymore.
- A few 227 editor related updates that were general renderer code fixes.
- MaxLogUOverV and MaxLogVOverU config settings removed. These are set internally now.
- Larger default maximum allowed texture size in the not using S3TC config case.
- Potential NVIDIA driver bug workaround for the major graphics corruption after windowed / full screen switch issue. Suspect this may be fixed in newer drivers now, but was easy to add.
- RequestHighResolutionZ option removed. Modified code to attempt to get a 32-bit, 24-bit, or 16-bit z-buffer in that order.
- If first mipmap pointer set to NULL in SetTexture(), skip looking at others.

Although I consider these to be useful updates, I don't consider them critical enough for a new binary release right now. They're mostly non-essential cleanup. Not sure if going to look at doing anything else with the OpenGL renderer code in the near future though, so prepared a source code only release of the main renderer code with these changes: (111 KB). May merge these updates into a future D3D9 renderer build that may contain some additional changes.

Uploaded the last Direct3D8 renderer versions, but I wouldn't recommend using these. Consider them obsolete and superceded by the Direct3D9 renderers.

Major graphics corruption across fullscreen / windowed switches went away on my system after upgrading to NVIDIA driver version 185.85 from 182.50. In the case where it was broken, it appears that the new OpenGL rendering context created after the switch may have incorrectly retained some non-default state internally related to CLIENT_ACTIVE_TEXTURE state from the just deleted one. May still implement the simple workaround for this if ever get around to updating the experimental Deus Ex beta renderer, but not planning a new release for just this. Hopefully just an OpenGL driver bug the has already been fixed. Alternatively can try enabling UseVertexProgram and UseFragmentProgram to avoid it if using older drivers or still might be seeing this with newer drivers on different hardware than what I have.

New D3D9 renderer builds with the changes from version 3.5 of the OpenGL renderer. These binaries were built with a newer compiler and require Windows 2000 or later.

Version 1.1 or UT: (105 KB).
Version 1.1 for Deus Ex (works with Deus Ex version 1112fm): (105 KB).
Version 1.1 for Rune (works with Rune version 1.07 or compatible): (106 KB).

The source code package for this version of the D3D9 renderer is (64 KB). It contains MSVC9 project files. If using this source code, make sure to apply the UTGLR_NO_APP_MALLOC changes to the copy of UnFile.h that comes with the headers in the Core/Inc directory to avoid problems with certain debug features and sstream class usage.

Old files
I removed a few old files. They are,,, and

Version 3.5 is released. These binaries were built with a newer compiler and require Windows 2000 or later.

Changes in version 3.5:
- z-buffer writes disabled for screen flashes.
- Fixed ortho view editor selection.
- Frame rate limit option requests higher resolution scheduling on Win32 (in case something else didn't already set this up).
- Changed a few default settings.
- A few other minor changes.

Version 3.5 or UT: (110 KB).
Version 1.9 for Deus Ex (works with Deus Ex version 1112fm): (110 KB).
Version 1.3 for Rune (works with Rune version 1.07 or compatible): (111 KB).

ZRangeHack will be enabled by default for UT if not already present in the ini file, but this one may still need to be watched a little more closely. There are a couple cases I know of where it has minor side effects. However, with most video cards these days only supporting 24-bit but not 32-bit z-buffers, or unless modified other parts of the game engine to draw decals a little further away, it is needed to avoid decal flickering in the distance in many common cases.

Old files
I removed a few old files. They are,,,, and

Finished a D3D9 renderer. Its feature set is very similar to the most recent OpenGL renderer. Like the D3D8 renderer, it doesn't support selection in the editor.

- Fixed a minor bug with masked texture blending and OneXBlending disabled that's present in the latest D3D8 renderer.
- All pixel shader 2.0 (if UseFragmentProgram enabled). Many could easily be done with 1.x, but also wouldn't offer much over fixed function if not using the most complicated pixel shaders.
- Added SceneNodeHack to all builds and set to enabled by default.
- Picks up SinglePassDetail paths for base texture only plus detail texture. Latest OpenGL renderer has this, but latest D3D8 renderer does not.

Version 1.0 or UT: (106 KB).
Version 1.0 for Deus Ex (works with Deus Ex version 1112fm): (106 KB).
Version 1.0 for Rune (works with Rune version 1.07 or compatible): (107 KB).

The source code package for this version of the D3D9 renderer is (64 KB). It contains MSVC8 project files. If rebuilding the source with this compiler, make sure to use service pack 1 to avoid a code generation bug with some of the SSE code. Also make sure to apply the UTGLR_NO_APP_MALLOC changes to the copy of UnFile.h that comes with the headers in the Core/Inc directory to avoid problems with certain debug features and sstream class usage.

New D3D8 renderer builds to fix a bug related to internal texture tracking that can occur when handling a lost D3D device. This one is unlikely to cause any noticeable problems in previous builds unless using single pass fog or single pass detail texture modes. If it does occur in previous builds, a flush command from the console will fix things.

Version 1.4 or UT: (102 KB).
Version 1.3 for Deus Ex (works with Deus Ex version 1112fm): (102 KB).
Version 1.2 for Rune (works with Rune version 1.07 or compatible): (103 KB).

The source code package for this version of the D3D8 renderer is (56 KB). It contains MSVC8 project files. If rebuilding the source with this compiler, make sure to use service pack 1 to avoid a code generation bug with some of the SSE code. Also make sure to apply the UTGLR_NO_APP_MALLOC changes to the copy of UnFile.h that comes with the headers in the Core/Inc directory to avoid problems with certain debug features and sstream class usage.

Built a new experimental Deus Ex renderer (beta page). It contains a few fixes, features, and minor improvements.

I added a few updates to the D3D8 renderer. The NoAATiles option is supported. The 16-bit textures mode uses 555 instead of 565 textures by default. There are a few other minor changes.

Version 1.3 or UT: (102 KB).
Version 1.2 for Deus Ex (works with Deus Ex version 1112fm): (102 KB).
Version 1.1 for Rune (works with Rune version 1.07 or compatible): (103 KB).

The source code package for this version of the D3D8 renderer is (56 KB). It contains MSVC8 project files. If rebuilding the source with this compiler, make sure to use service pack 1 to avoid a code generation bug with some of the SSE code. Also make sure to apply the UTGLR_NO_APP_MALLOC changes to the copy of UnFile.h that comes with the headers in the Core/Inc directory to avoid problems with certain debug features and sstream class usage.

I think some modifications will need to be made to the HDR mode in the not a joke joke beta renderer. Not entirely unexpected. Still unfortunate that instead of getting HDR for free with pulling lighting into the fragment program to instead have to look at adding extra calculations in an attempt to make it not look too different in various cases. It's supposed to make things look brighter when there are bright lights, but without it some of those bright lights might have been made far brighter than they're supposed to be without much noticeable effect. Specular could be a problem without more detailed material properties built into the engine and levels. I think I'll look into significantly limiting specular while trying to not break anything else too badly in this mode. Hopefully this mode can still work to a lesser extent in cases where it's supposed to, including cases like allowing bright colored lights to be able to have a greater impact on the color of the meshes they illuminate.

I'm not sure when I'll get a chance to look at this code again, so it may be best to leave PixelLightHDR disabled in most cases for now. I think this should leave things looking very similar to the existing lighting in most cases. It probably won't look too much different in screenshots unless looking at certain cases where the vertex lighting doesn't work so well. It should make things look a bit smoother when lit objects are moving. I'm hoping it won't be a major problem in cases where it does occur, but in some cases certain objects could end up with darker corners or edges that might look better if lit up a little bit more.

If you want the same lighting, but have lots of mesh polys on screen, the new mesh path without pixel lighting enabled might be worth testing. Although it's not designed to be a pixel perfect match, as long there aren't any bugs (possibly due to less common cases I missed and didn't modify it to handle), it should be close enough that the differences won't be noticeable unless doing detailed compares on screenshots. I haven't taken much time to try to benchmark the new mesh path across various cases, but if you're CPU limited with lots of mesh polys on the screen, it should be faster.

As some already know, UT and Deus Ex aren't the fastest when it comes to handling high poly counts. They do a lot of per vertex processing on the CPU. Even if there's no fog, this still ends up taking around 50 instructions per mesh vertex to figure out on the CPU side. I didn't look at optimizing this case yet, but there are a few other far more significant cases taken care of in the new mesh code along with the more streamlined renderer interface it's using. There's also some SSE code in the new mesh code that should help a bit. Unlike the SSE/SSE2 code in the renderer, some of this SSE code should make more of a difference. Fast approximate reciprocal square roots and floating point compares are good.

April 1st. Time for that experimental Deus Ex renderer with some new per pixel lighting features. It's on the beta page. Requires a video card with advanced feature support that doesn't exist yet. No, that part's not right. A video card with OpenGL vertex program and fragment program support should be enough (DirectX 9 class video hardware). Remember that this one is an experimental beta build. It may not run smoothly and has a much higher probability of having rendering bugs in certain areas.

Version 3.4 is released. It adds a new option that should eliminate the HUD corruption that can occur when antialiasing is enabled. It also contains a number of other changes. This version is built with a new compiler, Microsoft Visual C++ 2005 Express Edition with service pack 1. To make installation simpler to deal with, it uses the statically linked runtime library, which makes the file a bit larger.

Changes in version 3.4:
- Added the new NoAATiles option to avoid HUD corruption with antialiasing enabled.
- Fixed a bug with fragment programs and masked textures with details textures.
- Fragment program version of single pass fog mode.
- Other fragment program related changes.
- Single pass detail texture path for the less common no lightmap case.
- Built with a different compiler.
- Removed the TruForm code.
- A few other minor changes.

Version 3.4 or UT: (111 KB).
Version 1.8 for Deus Ex (works with Deus Ex version 1112fm): (111 KB).
Version 1.2 for Rune (works with Rune version 1.07 or compatible): (112 KB).

Enabling NoAATiles should eliminate the HUD corruption that can occur when antialiasing is enabled. As the name implies, it disables antialiasing when drawing tiles. Note that corruption with antialiasing enabled can still occur on the logo background if using Entry.unr on startup. This background isn't made of tiles (when talking about it from the renderer perspective). There may be some hardware/drivers where NoAATiles won't work if they don't support the feature it uses.

I added new fragment programs, plus a few new vertex programs, which should allow it to always stay in fragment program mode for the entire frame in most cases. I didn't see any significant performance impact from this change though. As mentioned before, the single pass fog fragment program path by itself appears to run a little faster than the fixed function version on my system. If I didn't mess up any of the checks, it should also allow for single pass fog support on video cards from vendors besides ATI and NVIDIA (the fixed function path needed vendor specific extensions that didn't become standard or get picked up by other vendors).

With a lot of new fragment program code, there is a higher chance of other new bugs in this area. To check for any potential problems in cases like this, the UseFragmentProgram option can be toggled while the game is running to check for any differences that shouldn't be there. Note that there are a few known differences such as DetailMax=2 only working in certain modes and single pass detail texturing looking different in some cases with OneXBlending disabled and certain combinations of textures. To use fragment program mode, vertex program mode must also be enabled.

I removed the TruForm code since it interfered with other changes. ATI pulled TruForm support from their drivers a long time ago. If you look closely enough at the source for this version, you'll notice that it's not really doing anything new that appears to interfere with the TruForm code. However, I only released this version since a few of the changes I made in a branch with some experimental code could be useful to have in a general release. Other changes there required modifications that meant it was a good time to drop the TruForm code. It was easy and low cost to leave in place if no nearby changes, but otherwise it made sense to drop since it's no longer supported in ATI's drivers.

Single pass fragment program detail textures are buggy if they show up on a masked texture. I didn't test it much, but this problem may only show up if OneXBlending is also enabled. This might be fixed in a future version, but otherwise just disable fragment programs to work around this one.

I'm looking at converting everything, except for maybe a few obscure combinations, to use fragment programs just in case vertex and fragment program on/off mode switches happen to cost a bit more than switching between different programs in some cases. I added a bunch of bugs with masked textures while working on this, but hopefully those are fixed now.

If I do finish the fragment program conversion, I'll probably put together a new Rune build. I doubt it benefits from any of the new rendering paths, but I'd prefer to keep it working if built from the latest source, and some of these changes could break it unless a few extra Rune specific updates are added too.

With fragment programs it's possible to support single pass fog without having to use an NVIDIA or ATI specific extension that this mode current requires. Note that this doesn't apply to Rune since it handles fog differently. Although the main point of this would be to support this mode on chipsets from other vendors, the fragment program version did run a little faster on my GeForce 6 series video card in some very limited tests.

I'll probably look at using a new compiler before building a possible future version since it looks like Visual C++ 2005 won't mess up on the SSE code now that SP1 is out. It probably won't make much of a difference with the current code, but there are a few other good reasons to move to something newer.

Version 3.3 is released. It adds fragment program support, which is only used for a few detail texture paths in this one. However, the only useful one is probably single pass with DetailMax=2. I'm not sure if it will be generally faster across all hardware configurations, but it's usually a bit faster than the no fragment program multipass path for DetailMax=2 in the few tests I ran on my system. It may be good to avoid this one for online play, at least for a while, as anticheat checks across all servers may not be kept so up to date these days.

It looks like Visual C++ 2005 SP1 fixes the code generation bug with unaligned SSE loads (compiler version 14.00.50727.762).

If it wasn't for the one single pass fragment program path that can do the two detail texture layers, it may not have been worth doing a new release with this one. Other than that, it's really just some testability and infrastructure updates for the next big thing, except I'm not sure if that one will ever happen. If not, there's always a chance the fragment program support might help with something else. And, although fairly limited in scope, it's one more use of advanced graphics features for a couple older games.

1-28-2007,, and join the list of obsolete and no longer hosted files.

I built a new Deus Ex D3D8 renderer that I think fixes the save game thumbnail problem. I also built a new Deus Ex OpenGL renderer with similar changes, but it looks like there's some other problem with save game thumbnails there, so they probably still won't work with this one.

Despite not actively working on this project anymore, I've still been a bit short on time over the past few months. Since I messed up the projection matrix in previous D3D8 renderers, I'd recommend only using the latest version 1.2, though I'll leave the version 1.1 link around for a while.

Although it might not break anything in UT, it's also best to use the latest OpenGL and Direct3D8 renderers because of the color clamping fix I added to the BufferTileQuads path in these. The only thing I currently know it breaks with the old code is coronas going multi-color in Unreal when getting close enough to them, but there's always some chance it might break something else too.

I noticed one map where enabling ZRangeHack partially breaks skybox rendering, though it's not too major in this case. If you have a 24-bit z-buffer maximum, which is very likely these days, it's a choice between enabling this option or having far away decals flicker due to z-fighting with an OpenGL renderer or the Direct3D8 renderer. Enabling this option also partially breaks wireframe first person weapon rendering, but I don't consider that a major problem.

There are a few settings that get mixed in with the device specific renderer settings but are controlled exclusively by higher level code (probably render.dll). Unless going entirely for speed rather than quality and features, you may want to enable some or all of these:
- HighDetailActors
- Coronas
- ShinySurfaces
- VolumetricLighting
These will all come up disabled with a new D3D8 renderer install, or if you delete all settings in the .ini file to get defaults back, etc.

If you have problems with slowdowns when larger animated textures are in view (often water or flame textures) and you don't mind losing the animation, you can try enabling NoFractalAnim in the Display section of Advanced Options. Large per frame texture uploads may be slow on some systems and disabling this feature can avoid them.

I decided to bump up the release date for the D3D8 renderers a little bit. My video card decided to start dying and I wanted to get the final testing done before it became completely unusable if possible. I don't like even hinting at pre-announcing things for projects like this anyway. There's too much risk of something coming up and things not coming out as planned. What I can say with far more certainty however is that this is most likely my final renderer code update, at least in the short to mid term. I might work on documentation like stuff a little more, but I don't like writing documentation...

I think I fixed a problem with the projection matrix that I didn't quite set up correctly. Hopefully this takes care of problems with the z far plane being a bit too close. This would cause things a certain far enough distance away to not draw. Although it's hopefully fixed, or at least better now, even if the same as the OpenGL renderer, there still might be the chance of z far clipping on a really huge map. I added clamping for colors in the DrawTile() path. I'm not sure it's needed anywhere in UT, but it's good to have to be safe in case it's needed in some case I don't know about. I also added a few optimizations and other minor updates that were added to the last OpenGL renderer release.

This D3D8 renderer release comes with source code. So, if you're wondering what the source looks like or you want to compare it to the source for the OpenGL renderer, now you can. More importantly, if I happened to miss something major, now someone else can easily add some fixes, features, or try to make it work with some other game if the opportunity came up. The source package is in the file (54 KB).

I built another Glide renderer with a few more minor efficiency updates. I also added code to support wireframe (only tested on Voodoo2 but not original Voodoo, so hope no clipping related problems). The file is Updated source code is

I think I found out why some coronas display an incorrect faintly visible box in areas where they should be transparent in OpenGL and D3D, but not in Glide. The texture GenFX.LensFlar.3 contains the RGB color (1, 1, 1) instead of (0, 0, 0) in areas where it should be transparent. I believe this problem gets corrected by another bug in the Glide renderer. The fast x86 version of the appFloor() function in UT ends up subtracting 1 from every other integer (probably would have been good to bias it a little to move the error case to very slightly under every other integer). In the case of the lens flare texture in question, this turns all those 1s in the palette into 0s, which avoids the problem. It also means each palette entry only selects from 7 bits instead of 8 bits in this case, though in some quick tests I ran, I couldn't notice any perceptible difference due to this sometimes lost low bit. In cases where the full range of a color channel isn't used, the incorrect rounding is far less likely to make as much of a difference (though still some limited cases it could frequently change things a little). This is based on my analysis of the OpenUT Glide renderer source, which is a little out of date, but there's a very good chance this part of it is still the same in the UT Glide renderer.

Since I was looking through the Glide renderer source in more detail anyway, I added a few tweaks to it and built a new one. The OpenUT Glide renderer doesn't contain line drawing code, and I didn't try to add it, so some render debug modes won't work right with it (wireframe models won't draw). Hopefully this doesn't break anything critical. It also won't draw detailed debug stats. Other than this, I didn't notice anything different, though I also didn't spend a lot of time looking. So, if you're interested in trying a tweaked Glide renderer, I built one that should be a little faster. Make sure to keep a backup of the original renderer, especially since I know this one doesn't implemented a couple things (that are hopefully not really needed). I'm not sure how useful it'll be in 2006..., but perhaps there are still a number of Voodoo3s out there in older systems. I tested it a little on my old Voodoo2 and it seemed to work okay. I'm not sure I'm interested in spending any more time on this one, but if you notice any obvious bugs you can send me the info. Hopefully it would be something minor or easy to fix, otherwise I'd probably just say won't try to fix it and go back to the original one.

Version 3.2 is released. It has some new SSE2 code in few places. Minor improvements were made to some of the existing assembly code. A few rdtsc instructions used for profiling that negatively impacted performance a bit too much in some cases were removed. A few other mostly minor changes were made.

While looking through older renderer code (D3D/Glide stuff), I noticed that the DrawTile path was clamping colors. I didn't do this in the BufferTileQuads path up until now. Although I never saw it causing any problems, I added clamping code in version 3.2 just in case. This is a little slower of course (even though still faster than previous similar code), but pulling the rdtsc instructions in the same area should help balance things out. I also added some SSE2 code to the DrawTile path for buffering color data. This code can do the clamping at no added cost and is a bit faster than the previous code, both clamping and no clamping versions.

The source code now includes VC6 project files with various updates. Unfortunately, this release won't build correctly with the VC8 compiler version 14.00.50727.42 due to a code generation bug with optimizations enabled. The does far too many extra moves with SSE/SSE2 intrinsics register allocator in previous compiler releases appears to have been at least partially fixed in VC8, well sort of. Now it generates incorrect code when telling it to do unaligned loads. I see no good way to work around this since the problem still occurs with just a simple single load followed by single store code snippet. So, if you try to build the renderer with any VC8 compiler with this problem, either disable all of the SSE/SSE2 code with the ifdef, or make the necessary modifications to only remove the SSE2 code in the DrawTile path.

The simple workaround I tried for timer related problems has other not so minor side effects, so I won't be releasing it. So, for multiple reasons, UT is likely to have trouble on a significant number of systems with Pentium M or Athlon 64 X2 processors, and possibly some other less common configurations. There are a couple ways I might be able to write something that can avoid these timer related problems externally, but I'd rather not. These potential solutions are more difficult to implement and/or potentially slightly unreliable compared to the proper fix. The proper solution is fairly easy to implement, but it's core engine stuff so you should refer this one to UTPG or Epic. The timer APIs to use are supported all the way back through Win95 and WinNT 3.1 so there wouldn't be any need for special dynamic linking to preserve backwards compatibility.

I discovered something else of interest while experimenting with the timer workaround. It looks like parts of UT's typically always on internal profiling code may be using the rdtsc instruction a bit too much when it comes to running efficiently on P4s. This instruction is very slow on P4s, and although I'd need to do some better tests with a reliable external time source to know for sure, I have good reason to believe that this code may slow down the frame rate by 5%+ in various demos I use for testing. What I can say for sure is that getting rid of a couple of close proximity rdtsc instructions in the lower half renderer side of the buffered actor triangle code path was able to improve the frame rate by a little over 1% on my P4 2.8C in select frames with a lot of mesh triangles (I usually use 6-7 bots on screen and nearby for these tests). If I ever do another renderer release, these instructions are gone in a couple places since it's fair to consider these areas a subset of other profiling times for the buffered paths. This still leaves a lot of likely similar slow cases elsewhere however. These profiling times aren't essential during normal game operation, so it's just a matter of having a simple flag/setting to disable them (if not doing better updates that might not make this so significant...). If an existing flag already used in the current code were split, this could be implemented with no additional overhead compared to what's already there.

The last thing I was working on was replacing a couple major functions in render.dll. I never got it to be completely stable and it didn't handle a few special effects correctly, but I was able to run a few tests with it on frames that it did draw identically. After a while, I decided to not bother with trying to finish it since I don't play anymore, but I was able to test a few things of interest. Most of the details don't really matter, but without spending too much time optimizing parts of it (after spending a lot of time trying to make it work the same as the original...), I was able to increase the frame rate by up to 5% in mesh heavy frames.

Fixing the TruForm problem with incorrectly applying it to non-character meshes only took adding a new flag bit and a few simple checks. The other problem with corrupt triangles that spanned the edge of the screen when TruForm was enabled was automatically fixed as part of the optimization to not spend time clipping these in software. Of course now that ATI has pulled TruForm support from their drivers, fixing these glitches isn't so important anymore.

I was supposed to be done with renderer updates, but I might put together one more test build. Although it may have some other (hopefully minor) side effects, I might know how to avoid one of the major remaining game speed problems. This is the one that causes problems on systems that dynamically vary the speed of the CPU's timestamp counter. The major classes of systems affected by this include ones with Pentium M, certain newer P4, and certain newer K8 CPUs.

This isn't really anything that's fixable nicely in the renderer, but by tweaking the right internal flag, this problem might be avoidable, and it's easiest for me to build a setting to do this into a renderer. Unfortunately it will end up with only an up to 1 ms resolution timer instead. So at least it will be stable, but I'm not sure how smoothly it'll work. I observed significant interactions with the frame rate limiter in the renderer in some tests I ran, but the game still seemed to run okay. The better fix is to add an option to use QPC if present, but this code isn't in the renderer, so I can't fix it there (though likely would be easy to patch the right part of some other binary for this one).

Numerous versions of ATI's drivers up to and including the current version 5.8 have issues that may affect gamma correction for UT in OpenGL mode, and various other games and applications. This isn't anything I can fix. If gamma correction doesn't work, try moving the in game gamma slider back and forth as this will sometimes fix it. Switching out of full screen mode and back again may also sometimes work around the problem.

Now that I added a second monitor, ATI's gamma correction problems are even worse. Their control panel reports that the hotkeys to adjust gamma for full screen 3D don't work with the desktop extended to another monitor. I disabled the second monitor and the hotkeys still didn't work with D3D games where I had seen them working before. I never had any luck with them working with various OpenGL applications the few times I'd tried. So, the hotkeys don't work with anything I use now and yet their drivers still mess with SetDeviceGammaRamp.

ATI drivers have had glitchy hardware gamma ramp support for around a year and a half now (since version 3.10 I believe). You can ask ATI if they ever intend to fix it.

A detail texture issue that comes up every once in a while may be due to a hardware bug in ATI R300 family chipsets. This is the one that may look similar to a mipmap line when using bilinear filtering, but is actually something different. I'm not going to take the time to try to prove this one in the case of UT, but I wouldn't be surprised if this issue only shows up on ATI R300 and later hardware (unless avoiding it by using single pass detail texture mode).

ATI has other driver bugs and issues. Although they took care of the major stability problems a while ago, their OpenGL support continues to be weak or broken in various areas.

I created a list of bugs I won't fix because they're not in the renderer. Some of the things that end up on this list may be due to video driver bugs. Most are due to things elsewhere in the game engine code. Only the lower half of the renderer was open sourced. There are things in the upper half of the renderer that I can't really fix. Other issues may be caused by other code in the game engine that isn't in the upper half of the renderer, but as long as it's not in the lower half of the renderer, it's probably nothing I can fix.


Deus Ex renderer
I built a new renderer for Deus Ex. It's the 3.0 code built to work with Deus Ex. The file is (works with Deus Ex version 1112fm). OneXBlending is enabled by default in this renderer, but if the brightness looks off, in addition to GammaOffset adjustments, also make sure OneXBlending=True in the [OpenGLDrv.OpenGLRenderDevice] section of your DeusEx.ini file. A new option called SceneNodeHack was added (in the previous version). Enabling this may work around some minor problems, though it wasn't tested extensively, so there's a chance it might cause other problems.

Old files
I removed a few old files. This includes old versions of the Deus Ex renderer and, and the one renderer that I built for Unreal Gold. For Unreal Gold, and other versions of Unreal, use the newer renderer from OldUnreal.

I ran a few tests on the D3D8 renderer built for D3D9, which only required minor modifications. With D3D9, V Sync in a window control is available and it has access to a more rational z-bias implementation. It also tends to run slightly slower.

I built a new version of the D3D8 renderer. This one is a bit faster with interleaved vertex/color data, larger vertex buffers, and the BufferTileQuads code added. BufferTileQuads is enabled by default in this renderer since not having it hurts D3D a lot more than OpenGL, and I don't have to be concerned about any backwards compatibility issues. I also added a few more features and some minor optimizations. The file is

I didn't add paletted texture support to this renderer, so if you have a GeForce1-4 series video card, you should make sure to use the OpenGL renderer and enable the settings that tell it to use paletted textures (these are disabled by default). Also, on other video cards with good enough OpenGL driver support, the OpenGL renderer may be better.

Performance differences between this renderer and the OpenGL renderer are fairly small on my system, though it does tend to be a little slower. It may be possible to improve this in some cases by interleaving the texture arrays, but this is a lot of extra work, so I may not try it. It doesn't help that D3D seems to have poor small batch performance in general due to intrinsic design/implementation characteristics. There's no avoiding this after a certain point since UT has fairly low geometric complexity.

So, D3D is far simpler compared to OpenGL in the feature set it supports on the API side and yet ends up with far worse small batch performance. In various places in the renderer, it's possible to get moderate performance with a minor amount of work using OpenGL, but with D3D, it requires extra work just to make it work at all and end up with poor performance. With either API, it's possible to get higher performance by adding more advanced buffering schemes such as actor triangle buffering, clipped actor triangle buffering, BufferTileQuads, etc. This D3D renderer will be far slower than the OpenGL renderer for line drawing since it lacks advanced buffering in this area. This shouldn't be a problem with the editor because I don't support it with this renderer anyway since selection support is not implemented. Hopefully line drawing isn't used too heavily, or at all, outside the editor.

z-buffer issues
Like the OpenGL renderer, this D3D renderer may have problems with far away decals flickering due to z-buffer precision issues if only a 24-bit z-buffer is available. It doesn't support w-buffering either, though it looks like a lot of newer video cards don't support this feature anyway. It's probably possible to work around this problem in the renderer, though it may not be anything I'll add. Of course if all these new GPUs/VPUs didn't drop support for 32-bit z-buffers, this wouldn't be a problem.

I finally decided to learn Direct3D in case knowing it would be good for a future job. Porting the renderer only added a few days, with a lot of that time spent dealing with things D3D makes difficult, so I tried building one that uses D3D. D3D has gotten better in recent versions, but some areas are still problematic. I'm sure glad I never used D3D7 or earlier.

This renderer will most likely be slower than the OpenGL one on ATI, NVIDIA, or other graphics cards that at least have reasonably good OpenGL drivers. I also left out a few likely significant optimizations in the current build that may limit its performance. I guess I'll find out later if fixing these can bring it up to the speed of the OpenGL renderer on my system. It uses D3D8 and since it uses certain advanced features, it will not function on various older video cards. Also, due to certain SDK complications, I think it ends up requiring at least DirectX 8.1, which I believe means it will not support Win95.

I added single pass fog mode to this one, since it happened to be easy with D3D. The required blend mode on the OpenGL side requires one extension for NVIDIA, another extension for ATI, and probably just isn't there for various other video cards since providing a standard way to access it on the fixed function side seems to have been forgotten about. It's too bad some of the other vendors didn't at least add support for the ATI version of the extension since it doesn't really add much and their hardware probably supports it all. I'll check the standard extensions again sometime, but I don't think the functionality required for single pass fog in UT is there.

I'm checking a large number of caps bits/values in this build, but a few checks are still missing. I'll probably fix a few of these later, but may leave a few of the more complicated ones out.

Windowed mode, windowed mode resizing, and surviving through various mode switches should work, but some things in this area get awfully difficult to support and test when using D3D. Windowed mode screen shots hopefully work okay, including without crashing in various special cases when the window isn't fully within the screen. D3D still makes something basic like grabbing a copy of what got rendered far too difficult in cases like this.

This initial build of this renderer supports a large number of features, but some are missing at this time.
- Selection support for UnrealEd isn't there. I may never add it, so don't use it with the editor (other functionality should work, but it's not really usable there without this feature).
- S3TC support is there.
- 16 bit texture support is there, but I did the conversions using simple clipping instead of proper rounding.
- Not checking texture aspect ratio restrictions yet, so if any specific requirements here, it may just crash when trying to load certain textures (good chance this may not be an issue on any new enough cards to run this renderer though).
- V Sync on or off request only works full screen. D3D8 doesn't allow something basic like V Sync on or off to be requested when in windowed mode. I believe this got fixed in D3D9.
- All the texture filtering modes and LOD bias should work.
- No paletted texture support, and I'm not sure I'll ever add it to this one.
- Lots of other features are supported, but a few others are not.

Various things.

Broken TruForm support in the renderer
TruForm support in the renderer is broken for a few reasons. Consider it an experimental and incomplete feature right now. There is no easy fix for the problem with player models where it doesn't look good when enabled. There are two other fixes that could be made to other parts of the game engine code that could correct two other outstanding issues.

Higher level rendering code clips actor polygons to the edge of the screen. This destroys information contained in normals needed to implement TruForm correctly. This will lead to polygons that cross the edge of the screen having minor to potentially severe graphical corruption. This is trivial to fix, but the code that does it isn't in the part of the renderer that was open sourced, so it's nothing I can fix right now. Also, with many video card/driver combinations, letting the driver or hardware deal with clipping polygons that are partially clipped by the edge of the screen should speed things up a little.

Once actor triangles make it to DrawGouraudPolygon in the renderer, there's no good way to tell if they're from a player actor that should have TruForm applied or some other actor that should not have TruForm applied. It would probably be fairly easy for higher level rendering code to use a new PolyFlags bit to tag triangles from objects that should have TruForm applied if enabled. This could reliably eliminate problems with weapons and other objects that look bad with TruForm applied. Note that the TruFormMinVertices setting attempts to solve these problems, but cannot do so reliably, and while it can fix some cases, it will also break others.

Linux builds
I never seem to hear any good news about attempts to build the updated renderer on Linux. Unfortunately, I can only provide limited help in this area. I do try to keep the code cross platform friendly, but it's unlikely that I will be able to attempt to build it on other platforms anytime in the near future. I know the current code won't compile as is with gcc, but I expect that only minor syntax fixups and basic non-essential feature removal will be able to make the updates I added both compile cleanly and work correctly.

The first major step in attempting to build the updated renderer on Linux is to make sure you can build the original renderer code before I added any updates to it. This will ensure that there are not any major existing issues before going forward and attempting to use the new code. If any problems are encountered in this stage, it's not really anything I can help with much because I don't have a local build environment for this platform, didn't write this code, etc. If there's some problem like maybe the ut432 header files don't quite match the current Linux version of UT and cause problems because of this, that falls into the I didn't break it and I can't fix it category.

You'll need to use a compatible version of gcc. Unfortunately, ABI changes mean you will almost certainly not be able to use a newer version of gcc (unless the rest of the game were compiled with it of course).

You can easily ifdef out the SSE code I added because whatever compiler you use probably won't support it. This is not a major loss since the SSE code I added only provides very minor speedups.

There's a chance there will be problems with the sstream code I used for the debug stream when using an older gcc and/or older libraries. Although it requires numerous changes to remove it, this code is non-essential, and the changes should be simple.

There are good reasons to try to get an updated renderer working on Linux if running UT natively here. Besides just being very obsolete at this point, the original OpenGL renderer code does contain a couple of more major design/implementation issues that would be good to have fixed.

Version 2.8 is released. It contains a couple bug fixes, basic support for 16-bit textures, and various other changes.

The rare SinglePassDetail with OneXBlending disabled bug is fixed. The fix may also optimize away a few low cost state changes.

The bug with a few incorrect gradients showing up in the console that can occur when precaching is enabled is fixed. It actually resulted in a number of textures getting unnecessary higher quality filtering, so this fix could speed things up in some cases, though without higher quality filtering modes enabled, it may make little to no difference on a number of video cards. This one was broken due to previous optimizations, though in a number of cases, the CPU savings may still have been more beneficial than any potential loss due to unnecessary high quality texture filtering. It was also somewhat difficult to fix, which is one reason why it remained broken so for long.

The new option for 16-bit textures is Use16BitTextures. The Use4444Textures option is gone. If mostly video card limited rather than CPU limited, using this new option should speed things up at the expense of reduced texture quality, which varies from case to case. In many cases, there is only minor quality loss. In other cases, like with various skyboxes and coronas, there is often major quality loss.

This basic 16-bit texture support was kept simple by just sending BGRA8 textures to the OpenGL driver and telling it to use RGB5, or RGB5_A1 if masked. It could be made faster if the renderer converted the textures to 16-bit before sending them to the driver, but I didn't want to deal with added complexity in this area right now for various reasons. So, the performance of some aspects of this new feature relies on good format conversion code in the driver, and in some cases it's not there. Enabling this feature will also reduce brightness a little bit, though it's fairly minor (much more noticeable with the old 4444 textures option). From reading the OpenGL specification, it sounds like the color components are supposed to be rounded to nearest during the conversion, but with the NVIDIA, ATI, and Intel drivers I tested, they were truncated, which causes the slight brightness reduction on average.

I ran some specific tests on BGRA8 to RGB5 and RGB5_A1 conversion performance on NVIDIA, ATI, and Intel OpenGL drivers. The results are:

Year or so old NVIDIA drivers on my old system:Good
Current ATI drivers:Bad
Current Intel drivers:Worse

I'm not really surprised that old NVIDIA OpenGL drivers are still superior to current drivers from various other vendors in a number of areas. What did surprise me is just how bad certain parts of ATI's and Intel's OpenGL drivers are. Although this may not be the highest priority path when it comes to performance compared to 16-bit textures coming from 16-bit source data, I'd still consider it to be of moderate importance and something one would hope would be handled reasonably efficiently by the OpenGL driver.

Fortunately, since only a subset of textures is converted to 16-bit when Use16BitTextures is enabled, using precaching should catch most of them. However, it looks like animated textures fall into the 16-bit conversion group, but if going for speed over quality, these may already be disabled.

I didn't add the more complicated dynamic scaling of 16-bit textures that the D3D and Glide renderers have. I don't think this will be a major loss since lightmaps are still 32-bit even with this new option enabled (and they're low resolution, so not converting them to 16-bit impacts speed and memory usage less compared to the other textures that are converted). Also, with the coronas and skyboxes that tend to take the largest quality hit with 16-bit textures, the dynamic scaling code may have made little to no difference due to them often having a wide dynamic range (or more specifically, high maximum color values).

I ran a few benchmarks on an Intel 865G integrated graphics subsystem with dual channel DDR400. Although it's of course fairly slow, if using the right combination of a low enough resolution and not too many high quality features, it's quite useable. Unfortunately, single pass detail texture mode ran a lot slower. Even though there are some tradeoffs with this mode, I got the feeling that quad texture performance was unusually low for some reason or another. That's too bad because in theory it was supposed to help by trading quad texturing on a larger number of pixels against what is likely to be more expensive read/modify/write blending on a fairly large portion of these pixels when doing dual texture two pass rendering. Vertex program mode didn't work correctly with the latest drivers, though at least it didn't cause a system lockup. It would have been interesting to see how it compared.

I changed a few other things in this build of the renderer. This includes some tweaks to the vertex programs, cleaning up some old junk in the multipass detail texture code, and various minor general optimizations.

I may have fixed the problem with UnrealEd not restoring gamma on exit, but I still need to review the changes to make sure they have little risk of causing new problems. It doesn't help that ATI's drivers still do odd stuff with gamma correction. They're getting close to a year of breaking things in this area to various degrees. There's something odd about their installer for 4.9 too, as I had to temporarily rename my bin directory to prevent it from failing.

I don't think there's any easy way to fix the 16-bit z-buffer problems without using a w-buffer. I can sort of half fix it, but it's not really good enough to be of much use, so I'll either leave the new code ifdef'ed out for a bit or just delete it. W-buffers are supported through D3D on some cards, but I've never seen them supported through OpenGL.

Although I wasn't specifically looking for it, I noticed a minor blending problem with detail textures when not using the single pass mode. It's a subtle line that keeps a constant distance from the viewpoint and is caused by a very minor brightness difference. I doubt I'll take the time to try to confirm it, but I wonder if this might be due to the minor blending bug I've read is present in R300 family ASICs. This anomaly does not show up on my GeForce4.

It looks like ATI has left hardware gamma ramp support for various full screen OpenGL apps a bit broken in their 4.1 drivers. This problem affects UT OpenGL and first showed up in their 3.10 drivers. Using the start button to switch back to the desktop and then switching back to full screen UT may be able to work around this problem.

I may add frame rate limiting. V Sync should work though. I just spent an hour or so play testing V Sync at 75 Hz on my primary system and it works just fine. It's an Intel P4, ATI 9800, and WinXP. I also had no problems with V Sync on my old system with an NVIDIA Ti4200.

If you don't want to use V Sync for whatever reason, for online games, just use a lower netspeed of around 10000 or so to prevent UT from running to fast, which can cause it to work incorrectly. For many online games, I'd expect this to make zero difference besides frame rate limiting, as I don't think online servers with a MaxClientRate of over 10000 are very common. Even on servers that do have a higher MaxClientRate, the server would actually have to want to send more than 10000 bytes per second for a netspeed of 10000 to limit anything. I'm not sure what kind of tick rate and gameplay situation would be required for this, as I've never seen it happen. Of course I've probably never played on a server with a high enough player count and/or high enough tick rate, and with a high enough MaxClientRate to ever find out.

ATI's current drivers are not good about sharing the CPU while waiting on V Sync. NVIDIA fixed this a long time ago, though I haven't checked lately if it has stayed fixed. Not sharing the CPU is bad for multitasking performance. On modern PCs, keeping the processor busy doing some sort of spin wait while waiting on V Sync can also increase power consumption.

The frame rate limiting code I've been experimenting with will share the CPU. In this case, whether or not it does so is controlled by the implementation of the sleep function in the UT engine. It could either spin or pass the sleep call down to the OS. The following numbers from whatever my motherboard uses to get CPU temperature are from just running around CTF-Coret single player with 16 bots. The game was run at 85 Hz in a window.
With V Sync:~53° C
With V Sync and with frame rate limit of 85:~49° C
Without V Sync and with frame rate limit of 85:~46° C
Under heavier load, there will be less idle CPU time available and these numbers should eventually converge. But even in games with a lot going on, there is still a lot of potential for wasted energy or otherwise useful CPU time on average unless the frame rate is constantly stuck below 85 Hz, or some other set limit.

I'm probably not going to be fixing that assertion that pops up when switching 16/32 bit color depth in the Video tab in game. For a couple of reasons, it's not easy to fix. It is possible to change this setting by manually editing the ini file of course.

For at least a few releases now, the DLL MSVCP60.dll is required. If you get any error message about trouble finding this file, send me an email. I may upload a copy of it eventually after researching how common it is to have this file installed. In many cases, you'll have it installed already from some other piece of software.

In version 1.5, I added a new option to convert all DXT1 compressed texture to DXT3 format on upload. This can be used to work around bad DXT1 texture quality on NVIDIA Geforce1 - GeForce4 series video cards. The DXT3 textures take twice as much texture memory as the DXT1 textures though. If you are interested in playing around with this setting, take a look at the TexDXT1ToDXT3 option in the [New options] section. If you're looking for a good comparison texture, the sky texture in dm-kgalleon does particularly bad on the NVIDIA cards with the bad DXT1 quality. On the other hand, many of the other DXT1 texture still look very good, so it might not be worth the performance hit in some cases.

This Unreal Developer Network page has some good examples of bad looking DXT1 textures on NVIDIA cards.

Copyright 2002-2010 Chris Dohnal