I’m actually absconding from the usual subject of these blogposts this week to do technical writing which people might find useful – mainly because a lot of the information on this seems to be scattered all over the Internet. One of the issues with our Mac port of Clockwork Empires which has prevented it from launching when we wanted to is that we discovered – a bit late in our schedule – that we couldn’t fix all of the shader issues on OS X without porting the entire game from what we were using (OpenGL’s “compatibility” mode plus a ton of extensions, dating back to when this codebase was started ages and ages ago) to what is called OpenGL Core 3.2, which is part of the OpenGL Architectural Review Board’s attempt to remove all of the “cruft” that had accumulated in their codebase throughout the years. In the process of doing this, they removed a lot of the nice stuff about OpenGL that makes it a good teaching tool, which is sort of annoying; you can’t just throw together a program any more and get stuff up on the screen, not without doing large quantities of work. Oh well.
On OS X Mavericks, you can only get access to OpenGL features newer than version 2.1 if you create a core OpenGL context; they aren’t even accessible as extensions, which is just flat out weird. OS X’s OpenGL implementation has been charitably described as “a mess” by everybody involved for some time, as the issue seems to be that all the driver engineering teams at Apple are now cranking out OpenGL implementations for your iPhone (and these are quite good), while neglecting the desktop platform. The situation used to be *worse* before Mavericks, if you can believe it. The net result of this is that I had to take a legacy codebase, with numerous pecadilloes, and port it to what is almost, but not quite, a new graphics API.
Ugh.The first step to any porting job is to try not to do it all at once, if you have the option. Instead, limp from point A to point B – break down a list of all the things that break moving from compatibility OpenGL to core OpenGL, change them one at a time so you can test them, then – only at the last minute – switch the core OpenGL profile on. I chose to do things in a certain order, which I think makes sense, but YMMV.
First, an inventory of advantages that I had going into this:
- We already moved away from the OpenGL matrix stack (see below);
- Our code is already set up for efficient batching, for the most part. Every time we render an object from the front end (scene graph) we push it to the back end (enormous circular ring buffer of render commands) in more or less the manner proscribed by the Firaxis LORE talk which you can read, uh, here. (As used in Civilization V! Except we do it in OpenGL! Uh!) This means we have a bunch of central points to fix, but not as many as we would if we had draw calls scattered all over the place.
So, what did we actually need to fix?
- Fix the darned font rendering. Font rendering in Clockwork Empires used a weird texture mode due to a bodge job trying to get SDL_TTF to speak OpenGL’s language. We passed RGBA texture data to glTexImage2D() but with an internal format of GL_ALPHA – somehow, magically, this worked when I wrote it many years ago, and I never went back to fix it. GL_ALPHA is deprecated; if you can’t display fonts, you’re doomed.
- Create a vertex array object, somewhere, to just… exist. Remember that whole “hey, we created the idea of Core OpenGL profiles to remove cruft” thing that we started this exercise with? Well, vertex array objects are the new cruft that got added back to the specification. The idea is supposed to be that you can quickly switch which vertex states are enabled and which vertex states are disabled by simply switching VAOs – and this would be really cool. However, you need to have one VAO enabled at all time in order for anything to render; and, as Eric Lengyel convincingly shows here, they’re just broken. Despite the fact that they should make games faster, they make them slower, and the reaction that Eric Lengyel seems to have gotten from developer relations on this subject is that he was screwing things up, and that “it works here.” Eric then proceeded to show in exhausting detail that no, he wasn’t screwing up, and since then… crickets. Meanwhile, we’re stuck with VAOs. So create one.
- Move everything not using shaders to using shaders. In particular, our UI code didn’t use shaders and fell back on the fixed function pipeline, so I rolled a shader and made it work.
- Move everything not using VBOs to VBOs. We did most of this previously for AZDO amongst other things – which isn’t working yet with Core OpenGL 3.2, alas – but the main culprit, again, was the UI.
- Move everything from using the OpenGL matrix stack to using your own matrix stack. You should do this anyway; I did this about six months ago because, as it turns out, the OpenGL matrix stack is actually slow as heck. I rolled two sets of functions – one of which looked like the old OpenGL matrix stack, except it didn’t submit anything, and one of which did the actual upload. I then sprinkled the “actually set your matrix” function before any actual draw call.
- Move everything away from using the fixed function client-side array functions (glVertexPointer, etc) to using glVertexAttribPointer(). This was time-consuming. One thing that will save your bacon here is defining your layout in the vertex shader (a sensible place for it!) instead of in C++ (a not sensible place for it); shove that block in an include file, stick it at the top of every shader.
- Move everything away from using gl_FragData and gl_FragColor to using the new out semantics. In compatibility mode, the shader compiler will yell at you but do its thing anyway; in core mode, the shader compiler (at least on NVidia cards) will throw a warning and then just won’t draw anything.
- Make sure all the shaders compile without errors or warnings.
- Remove any old references to things that now have no semantic meaning. The suspects: calls to enable or disable alpha testing; calls to enable or disable texture units with glDisable(GL_TEXTURE_2D); etc. None of this has any point any more, just get rid of it.
- Run everything through gDebugger a few times with “break on OpenGL error” enabled to see what turns up, fix any errors you see.
At this point, I thought I’d removed everything from the game that used the old compatibility code, or at least stubbed it out. I turned on the OpenGL core mode, and fired up the game. The UI displayed, but nothing else, which means that the problem is *somewhere* in the deferred renderer. Now time to go through and figure out what we forgot – starting, by dumping all the frame buffer objects out for the deferred renderer.
First discovery: for some reason, blend modes weren’t blending everything in the compatibility mode, but were on the core mode, so things were being rendered with various exciting levels of transparency. I have *no* idea what changed here, but I just removed all alpha blending from the main loop of the deferred renderer (which should have been off anyhow!) and the output of the frame buffer seems okay, but… nope, nothing.
Second discovery: okay, take the entire deferred renderer apart, top to bottom, trying to figure out why it wasn’t rendering and only deferring. Fix shader bugs, nothing. Set the shaders to only do certain things, like “just dump a texture”? Nothing. Swear loudly, drink coffee, google “nothing bloody well draws you @#$@er” and get reminded (thanks to an ancient blogpost by Eric Lengyel) that Core OpenGL 3.2 no longer supports drawing quads. Why? I don’t know. Most hardware usually has specific silicon to set up and draw quads, and it’s definitely something that OpenGL’s professional market would like, where quads are basically excellent. I suspect it’s due to something in the geometry shader that is making it unhappy. Once this was done, though, we ended up in business – the game draws on Core OpenGL 3.2, albeit with a few small graphical errors caused by typoes, things not set correctly in shaders, and so forth.
The last step is just fixing small stuff, and then making sure we actually get the correct function pointers to all of our functions on OS X, and then hopefully we’re in business. Right now Micah is busy removing “ARB” from a bunch of extension requests, as when Apple says “Core OpenGL” they mean business and won’t even pass you anything if you dare to ask for an extension.
I hope this is useful to my fellow OpenGL people out there. For the rest of you – we’ll be back to our usual blogging about occult stabbings and fishpeople next week.