Well, my grand idea of using a transparent proxy died a violent death last week, thanks to cookies.
Wikipedia's solution - what makes it usable for rendered PHP/HTML - is to have a "Vary: Cookies" header, so that pages are only cached between users with identical cookies. Logged-out wikipedians have no cookies, so they all get the cached pages. (Note also that nothing about the page is specific to that logged-out user - nothing at all. Something to keep in mind!).
We, however, keep cookies. Google Analytics keeps cookies on our behalf (so we can track stuff), vBulletin keeps cookies (to try to have read/unread thread stuff work when you're logged out), and cookies are kept so we can track how many unqiue users are online. (The front page of the forums tell you how many logged-out users are online).
So that idea is a bust.
It turns out, though, that it's fairly easy to adjust the main CGP pages to cache the whole page to disk. For the PHP programmers out there, check out the ob_start() family of functions. They let you do cool things like grab what you've already rendered of the page, and do stuff with it. We use it all over the place - from interacting with vBulletin, to adjusting image sources to take into account SSL. It's a great way to hack some extra functionality in, and this time it's for caching.
Again, we've only been trialling this for logged out users. A few astute people have noticed that some actions are "forgotten" temporarily when they log out. That's why!
I've also been going through CGP code, cleaning up various bits and pieces, and combining chunks of cached data whenever possible. Right now, rendering a page can use so many different DB queries and memcache queries that it's downright scary! Even though memcache is so fast, the sheer number of requests is adding that extra bit of latency. The "networks" code had a good clean-up in particular today - let me know if you notice any oddities.
On a note related to my last entry, I've written a new hacked-up priority engine (again!!). This one is based on a different premise entirely - instead of trying to mathematically "solve" a rather complex problem, it instead iterates over a large set of ad:zone priority combinations and scores each combination, until it finds a better solution - effectively brute-forces the problem. So far it's running OK, and I'm hoping (along with another neat trick that's not up for discussion today) that it will solve the majority of our priority issues.
(As with the last engine, low-pri campaigns aren't yet taken into account. It shouldn't be too hard to do it with this engine).