We have made a slight change to how we store TrueAchievement Scores in the database
Pre-amble: Keeping TrueAchievements as fast as possible is a challenge
Good morning
TrueAchievements experienced our annual surge of registrations in December as we launched My Year on Xbox, and of course with every registration comes more traffic as well as more gameplay data to process!
TrueAchievements is unlike the vast majority of gaming websites, as we show personalised versions of many of our pages that show the user’s own progress in that game, achievement or walkthrough, plus of course the user’s profile page which collates not only their data but that of all their friends. This means that we cannot “cache” these pages in our CDN, they have to be created on the fly every time someone (or a web crawler or bot) visits one.
To add to the complications, because of my initially “simple idea” that the TA score should reflect the rarity of the achievement in each game, we have to recalculate the scores of the achievements, games and gamers virtually every day.
On top of that, we have probably the most complicated Xbox leaderboard systems on the planet, where you are able to find your scores on up to 17,000 different leaderboards, most of which need to be pre-built every day to make them fast enough to view on the site.
So, we have a lot of traffic (around 1m pageviews a day from humans alone) viewing lots of pages that we can’t cache, along with huge amounts of number-crunching running against the data that’s being shown on those pages (we currently have 3,287,979,385 tracked achievements across 232,541,069 games in the database).
This resulted in huge slowdowns on the site during December, and you may well have noticed pages timing out, scans taking a long time, or just general sluggishness while browsing the site.
Of course, these challenges aren’t new — I’ve been rewriting or refactoring the database pretty much since the day it launched. A database designed for 5,000 users doesn’t work well when there’s 1.2m users. It’s also quite enjoyable work as you can actually measure your changes and see how much impact they are having pretty much immediately. It gives me a sense of enormous well-being, and then I’m happy for the rest of the day (Parklife!).
Performance changes we have made since December
In order to get things running smoothly again, there are a lot of changes I have made since the middle of December:
The TrueAchievement score of an achievement is now a whole number in the database
This is perhaps the simplest change from a back-end perspective, but it is probably the one you will notice first. The TA scores of achievements have always been stored to 4 decimal places, but then rounded down before being displayed everywhere on the website. So an achievement might show as 17 TA Score, but in the database it was stored as 17.3862. Now it is stored as 17.
While you won’t notice this change at an achievement level, you might spot your total TA for a game drop, as each of those fractions of a TA are removed from your total score for that game. We are processing the games over the next week or so, and during that time you might see a difference in the max TA of a game and your personal TA in that game even if you have completed it – this should all be sorted by the weekend as we go through and process the games.
There are a huge number of benefits that come from this change:
- It’s no longer confusing for users. One of the first questions we are asked is “how come I have unlocked an achievement worth 16 TA and an achievement worth 7 TA but my score is 24 for the game?”. This was due to the rounding in the back-end, but having to explain that constantly is a pain
- Storage is reduced – storing an integer is about a third of the size of the decimal we were using previously, and generally have 3 sets of TA scores (due to DLC settings) against every game, gamer’s game, leaderboard, contest and gamer, as well as achievement
- Processing is faster when we add all of the scores up every time someone is scanned
- We no longer have to update every gamers’ records for a game when the achievement scores have changed by less than 1 – this is probably the biggest performance gain from this tweak
- We no longer have to round the values everywhere on the website
The recalculations if every game are happening over the coming days, and during that time you might notice differences between the total score for a game and your score if you’ve completed it. These scores will normalise during the course of this week.
Some site leaderboards now have a minimum Gamerscore requirement
The daily site leaderboard build had been creeping up in time, to be almost 3 hours every day, and 5 hours on the day the weekly boards are made. The site noticeably slows down while these boards are building, so I have been looking at various ways to optimize that process.
The first thing I did was to only include gamers on the genre/platform leaderboards that have a Gamerscore of at least 20,000. Previously we were including around 350,000 gamers that were below this threshold. This was a huge amount of processing for gamers that, given their low Gamerscore, probably don’t care that they were 207,976th on the Xbox One First Person Shooter Leaderboard. If anything, they’d probably rather not know that at all 🙂 And if they do want to be included on those leaderboards again, it is very quick and easy to get above 20k GS these days.
Every registered gamer is still included in all of the main site leaderboards.
I’ve also rewritten a lot of the leaderboard build calls and generation to make the summary tables smaller and faster, and done various tweaks around how the leaderboard build is distributed across the server cores.
After all of those tweaks, the daily leaderboards are now building in less than 45 minutes, down from 3+ hours in December.
Gamer blogs are now cached
We have some quite popular bloggers on TA, and our bloggers often post huge lists of links to TA pages in their blogs. When rendering these blogs, we would parse these links and then check the viewer’s progress in any games are achievements before showing them. This added potentially huge numbers of database calls (some blogs had around 1,000 links in them!) in order to show a single blog. So we have decided to cache these blogs and not show the viewer’s progress any more. This is a minor loss of functionality, but protects the site from effectively being DDOS’d when a popular blog was posted with hundreds of links in it.
Some panels are now only available viewable when you are logged in
Our traffic from bots and scrapers has dramatically increased over the last few years. According to our hosting company Cloudflare, in the last 24 hours we have had around 3.5m requests to TrueAchievements from verified or suspected bots. We are blocking some of the most obviously nefarious ones, but some of them are actually beneficial for us to let through (search engines, discord and twitter preview cards, etc). However, these useful bots don’t need to see the very complicated panels (such as your friend feed), so we have set some of these very heavy panels to only be viewable if you are logged into the site. This means the bots can still read the pages but they aren’t putting huge stresses onto the servers.
There may be some more tweaks to come
The TGN dev team have devoted the whole of January to performance work to try to speed up as much as possible on the sites. The vast majority of this work will not result in any noticeable changes from a user perspective, apart from hopefully a gain in speed. If we do make any more functionality tweaks, you will be able to read about them first in the TA Discord server, so please join that if you haven’t already and want to stay up to date on our performance work.
Happy New Year!