June 2020 – EXTENDED STATS BLOG

June 26, 2020June 26, 2020

Out with the Old, In with the New!

After dreaming about it for two years or so, I have switched from using a normal MySQL database hosted by AWS to using AWS’s serverless database, Aurora Serverless. I expected this to be a mildly traumatic changeover, but all of the trauma was caused by Oracle’s stupidity and nothing at all from AWS or the new database.

I exported the current MySQL database to a dump file, created an Aurora Serverless database, imported the dump file, and told the downloader, the API, and the Express server to use it. And it seems to work. The only way I can tell the difference is to look at the performance logs in AWS RDS to see which database says it’s doing stuff.

On the other hand I did hope that the switch might help with a performance issue I’d been having, and it didn’t. Hmph. Back to the drawing board on that one.

Notice in the graphs, database capacity was high during the initial data import – this was AWS automatically allocating the required capacity. The after the import it dropped down again. There were some bursts of CPU while I was doing some testing, and the particularly high ones are the troublesome pages I’m debugging.

June 26, 2020

I Killed It

Some time ago – it was probably last year – I had a hardware failure on the old stats server in my study, and the site was down for a few weeks. During that time I did a thing I never wanted to do, and ported the old site into the cloud. That became stats3.drfriendless.com.

This has always been a problematic service, because:

I never wanted to do it
I hate the technology that it uses
I can’t remember how the old site works
The database is badly designed
The site is slow
It’s very expensive

And I could go on whining all day. But then add to that:

The expensive service was underpowered for what I was trying to do
And it didn’t work anyway and I couldn’t be arsed finding out why

so today I finally pulled the pin and shut it down. stats3.drfriendless.com now points to the same server as stats.drfriendless.com.

So now I am back down to two sites, both of which are cheaper. I won’t miss stats3, it was not a good solution.

June 20, 2020June 20, 2020

Like a Bought One!

I had a pleasant tech experience yesterday. That’s a sufficiently rare experience that I need to write about it.

I’ve been working with a technology called GraphQL, which is a bit like SQL expect it works with structured data, e.g. a Facebook user, who has a list of friends and a list of posts, and the post have users who liked them, and the friends have posts of their own, and so on, not to mention the cats. GraphQL lets you write a query (like SQL does) that you send to a server which executes it and returns the data.

That sounds easy. The hard bit for me is that because I own the data, I have to write the bit that retrieves the structured data from the database. GraphQL makes that easy enough, so I can usually just do a query on a table, give it all the data, and it sorts out what it wants. People say bad things about it, but I think GraphQL is a wonderful technology.

However it’s not always that easy. I have a concept in my database called a GeekGame. It’s the relationship of one geek with one game. It includes values like that geek’s rating for that game, and whether they own it or not. It does not include the name of the game, or how many players it’s for – that data goes in the game.

So with GraphQL I can ask for a GeekGame, and I can say that I want to get whether the geek owns it, but I don’t want the rating. But until yesterday, one thing I could NOT do which I should have been able to was get the game as well. So I would have liked to say “give me all the games owned by this geek, with the rating and the name”.

The problem was that GraphQL would ask me for all of the GeekGames, and then after retrieving them, it would find out which games it wanted, and ask me for them one by one. And as there were hundreds of them, that would generate a lot of queries (which cost me money and take a long time). And in fact I couldn’t get it to work at all. Quite a few times the database got cranky at me and refused to talk to me any more until I found out about the FLUSH HOSTS command and ran that. That was a learning experience!

But, there’s a technology called DataLoader, which was made at Facebook (they invented GraphQL as well), which promised to collect all of the single queries, and give them to me in a batch. Then I could do one query on the game table, give it to GraphQL, and it would sort the rest out. So I thought Id give it a go.

O. M. G.

It worked, first time. I don’t even mean it worked after I fixed all of my stupid mistakes. I mean I copied from the example, implemented my bulk query, and kablammo, it did what it was supposed to.

So I was kinda pleased about that.

I could then convert the Monthly Page over to use GraphQL, and then today I could add the How Much Do You Play New Releases? chart to that table. For that, I needed the publication year of the game, which that page did not previously load. But with GraphQL it’s simply a matter of changing the data query URL to say you want that field as well.

I feel I’m on a bit of a roll with the new features now. I’ve stabilised on Angular 9 (with Ivy, the new compiler technology), some version of Vega that seems to not break, and GraphQL. According to my Trello board I have about 10 features remaining to be feature-compatible with the old system. Then with any luck I can get to inventing cool new stuff.

June 13, 2020

Fiddling with Stuff

When I was writing the “Best Days in Gaming” table the other day, I realised that there were 3 parameters that I used which were sort of arbitrary. And I also realised that it probably wouldn’t be too hard to allow users to change them – Angular is really good at that sort of stuff.

So last night I sat down and wrote the code, and I really liked how it turned out. It has gone live already so you can check it out. I had to give the button to get to it a name, so I called it “Fiddle”.

And then today I decided I could do the same thing for a couple of other charts on that page, so now the Florence Nightingale diagram and the Most Played Games by Year chart have Fiddle buttons as well. I think this will become a thing.

I think my next task in this direction should be saving the fiddled values to the user’s account (if you’re logged in). That is the whole point of logging in, after all.

June 12, 2020

I Had ONE Job

At last, I am getting my COVID dividend. By which I mean, 3 months after my industry collapsed in a screaming insolvent heap, I get a day of leave. With any luck there will be a few more to come, as I have a heap due to me.

I have many ambitions for this one day, and at 4pm it looks like I will be achieving basically zero of them, except for a blog post. But I did log in to the “Elastic Beanstalk” server. I call it that because I had plans to use Elastic Beanstalk to host it, until I found out how much Elastic Beanstalk costs. Now it’s just a tiny EC2 running an Express server.

So I logged in and noticed that there were a lot of upgrades waiting for me to do. I have this issue at work as well, and I don’t do the upgrades because we don’t want one mysterious problem to take down the entire company. It’s a small risk, but it’s a real one. Due to some sort of AWS failure we were down for 22 hours once, and many customers got very antsy about that.

So I did the upgrades and rebooted the server. Then I ran some graphics software called The Gimp which logged me out and I lost all of my windows. And after the trauma of figuring out what was going on there, I forgot that I had ever touched that server.

But what I did not do was restart the server. Which meant that when users went to the front page of Extended Stats, the little box where you can search for your user name didn’t work. Ooh, bad look.

Luckily a new user emailed me to say that although I had told him that his account was ready, he could not find it.

That feature is about the only part of Extended Stats that can actually be down. It and the blog are the only servers. Everything else is serverless, and although it can happen that I can screw it up, it can’t be just not started. If extstats.drfriendless.com is ever down it’s either because I didn’t pay the bill or AWS os down. And if AWS is down, you won’t be worrying about my site because all hell will break loose.

Well, at least y’all know I’m alive. I’m going to go try to do some programming for a bit.

June 7, 2020

Testing Times

Only three months after the sudden death of our entire industry, work is starting to slow down. I’ve dealt with all the requirements the customers suddenly had with respect to refunds, finished all the work we had promised to customers who can still pay us, and have made a couple of the more risky changes that needed to be made. For example, I switched our user interface over to use CSS Grid, which OMG is so lovely! This is sort of like replacing the skeleton of a living creature, and I am proud to say I did it with only one blindingly stupid mistake.

Now I’m onto the less urgent jobs, i.e. the ones that have been undone for years despite being ticking time bombs. So you know, things like minesweeping the front yard, putting locks on the doors, and making sure that for any third party package we know which version of it we’re using (and even better, that it’s a somewhat recent one).

One of the dreary consequences of that last task was dealing with the antiquated version of one of the test frameworks we use. We have four different test frameworks which do mostly the same thing. I am not a connoisseur of test frameworks, and it seems neither were the developers before me. But the particular problem with this one was that the old version of Scala that it wanted clashed with the old version of Scala that we actually use. So I wanted to upgrade the testing framework, from the antiquated version to a merely old-fashioned version. However there was no documentation for the old-fashioned version, and it seemed to be different to the antiquated version. That’s why Earl had to die. I mean, that’s why I had to get rid of it.

Now despite not being a connoisseur, there is one testing framework I admire because it’s not as shit as the others, and that’s Mockito (with JUnit of course). So I decided to rewrite a bunch of Scala unit tests into Java using Mockito. Well… that was easier said than done. There were a few tricky things built into the Scala framework, so I had to figure out how to replicate the trickery, and Scala itself has some tricky things which are about as entertaining and as useful as Vegemite on the toilet seat. But I battled on and got maybe 50 tests rewritten in two days. Now I am an empty husk of a man but still unfairly overweight.

This story is going somewhere. Gradually.

So this weekend I’ve been doing some stuff with the stats website, and one thing I’d like to do is group plays by location. I do download the location of plays, but then I normalise those plays and store them in another table, and I did not until today retain the location information. And as all of the good charts run from normalised plays data, I had to have it there.

I’m glad you asked, Dr Evil. A normalised play is a play of an expansion or a base game. If the play is of an expansion, and we can figure out what the base game is, then there is also a normalised play for that base game. Any plays of expansions for which there are base game plays have at least as many base game plays as expansion plays.

Yeah, that didn’t make any sense. Let me explain with board games. You record a play of Race for the Galaxy: The Gathering Storm on BGG. That is not normalised, because you didn’t record a play of Race for the Galaxy as well. We know you must have, but you didn’t say so. So part of the normalisation process is recording that play of Race for the Galaxy as well.

Similarly if you record a play of The Lord of the Rings: The Card Game – Nightmare Deck: Shadow and Flame 3 times, then you must have played the Khazad-dûm expansion 3 times, and so you must have played Lord of the Rings: The Card Game 3 times as well. So from that play we infer plays of the others. When I ask the database “how many times did she play Khazad-dûm?” I don’t need to notice that there are 3 plays of Shadow and Flame, because the database already has it. That’s why I do normalisation. It can get a bit complex.

Which is all very well and good except when you add locations in as well. Let’s say you record a play of Carcassonne: The River and a play of Carcassonne. How many times did you play Carcassonne? At the moment, Extended Stats infers that you played Carcassonne one time, and you used the expansion The River. But if you record a play of Carcassonne: The River at home, and a play of Carcassonne on boardgamearena.com, how many times did you play Carcassonne? Well, twice, is my guess. So that’s what Extended Stats needs to do.

So with the addition of location information, the normalisation algorithm needed to be updated. And as I mentioned earlier, it’s a bit of a complex algorithm. I’m a pretty bold and capable programmer, but I am also a cautious and not stupid one, so I was wary of messing with the code without being sure I got it right. This is what tests are for, so I needed to write some.

The problem was that Extended Stats is written in TypeScript, and I am not familiar with any TypeScript testing frameworks at all. So I asked Mr Google, and he said the best one these days is called Jest, so I looked it up and it seemed easy enough (because hey, I spent two days converting Scala tests to Java tests, I know what to expect) so I started to install it.

And that was where it got a bit messy. JavaScript documentation (TypeScript is like a fancy new version of JavaScript, but it’s still the same ecosystem) assumes that you know a whole bunch of JavaScript stuff, which I often don’t. I’m just a programmer, not stackoverflow. So there was a lot of guessing and muddling and maybe some swearing, and when I found the remnants of my previous attempt to use a JavaScript testing framework – mocha – in which I had written the comment “what a fucking retarded ecosystem” – I joyously deleted that. But then after learning a few things, I got it to work.

So then I was able to proceed with changing the code to support locations. I have that working now, but I haven’t sent it live yet because while I’m on a roll there are some other things I’d like to fix.

Some games have really nasty expansion trees. Agricola, for example, has a revised edition. Some of the expansions are suitable for either edition. So if you record a play of Agricola: Agricola Meets Newdale, I can’t say what base game you played. However if you also record a play of Agricola: Farmers of the Moor (revised edition), then I can assume that you played Agricola (revised edition), with two expansions. I don’t think my code can handle that yet, so I’m going to make sure it does before I bother deploying a new version.

So that is what I’ve been up to – details details details! I’ve also done a couple of new charts recently, which I am quite pleased with, but I haven’t blogged about them. The UI work is always nice because it shows concrete results, but the excellence needs to continue all the way down.