A Problem and a Plan

In case you are thinking that I am neglecting the Extended Stats project, well you are probably right. When I started the project in 2006 or whatever, my aims were to do the stats thing, but also to play with some technologies. At the time the technology was Python, then it became web-hosting, then it became cloud tech. The technologies were things I was interested in that I wasn’t using otherwise, but would like to.

These days I have a job using those technologies – this job requires most of the skills I learnt at my old job way back in 2006, plus all of the stuff I learnt building the various Extended Stats websites. On any given day I could be writing Java, Scala or TypeScript, writing Angular, improving user experience, optimising database queries, writing documentation (I LIKE writing documentation), writing financial reports (I do NOT LIKE writing reports) or building cloud infrastructure.

In June I rebuilt all of our (i.e. work’s) cloud infrastructure to be inside a virtual private cloud. I knew nothing about how to do this, but over several months I figured out all the things, and then one evening I turned off the old servers, cloned the database into the VPC, and then started up new servers. This required a migration from Java 7 to Java 8, from Ubuntu to Amazon Linux 2, from a self-hosted ElasticSearch to a cloud-hosted one, and many many other small things that I had worked on for the previous two years. There was some small collateral damage but given that I literally just replaced everything, it was mild and the project was an astonishing success.

This had to be done in June because in July we had our busiest month of work ever, with respect to server load. Customers were hammering us, and the site would occasionally break due to some limitation being reached in some new way. And when our site breaks, there are literally crowds of people waiting for it get better. So I was a bit stressed. I couldn’t get a day off, and as the site was busy on weekends I was occasionally fixing stuff then as well.

Meanwhile, Extended Stats was quietly breaking in its own way.

The way the downloader works is that there’s a module called Downloader which scrapes data from BGG. It passes it to a module called Inside (because it is inside my VPC) which writes it to the database. The database is quite small because it’s (supposed to be) the most expensive component of the system. But what seems to happen is that all of the files being downloaded come at approximately the same time, and sometimes the database gets overwhelmed and the update fails. So Inside fails, it tells Downloader it failed, and Downloader says “no problem I will try again soon.” But when it tries again the problem is still happening, so a whole lot of computing happens (which costs me $$) for no result.

This is coincidentally one of the ways that the system at work was failing – with the hammering from the customers, the site was maybe 50% busier than it had ever been before, so most bits were doing 50% more work. But occasionally there would be some interaction where a task that should take a second would blow out to an hour – I suspect some sort of database contention, but I don’t know how to prove that.

At work I seem to have solved the last problem by adding an index to a seemingly unrelated table – something that was taking 5 seconds now takes 0.1 second, and incidentally fixes the unexplained hour-long blowouts in another part of the system. As a perfectionist I am not happy with my understanding of the system, but as a human being I’m pretty fucking stoked that I stopped shit breaking. So I have a chance to relax a bit.

The PLAN is that in my relaxation time I can work on Extended Stats. But as explained, the problems with Extended Stats, and most of the technologies, are the same as work. So it’s not the thing I could do that relaxes me the most… but here I am writing a blog to show you that I am still on the project. Work owes me a bunch of days off and I intend to claim some.

So the plan with Extended Stats is to replace the Inside module with a module called InsideQueue. AWS has a service called SQS – simple queueing service – where you can put data items onto a queue with one program and take them off with another. The Downloader will put its data from BGG onto a queue, and InsideQueue will take them off one by one and make the database changes. This is a pattern we use at work which works well, as long as we separate the important short tasks from the slow ones.

So apart from procrastinating and writing a blog post (oops, I mean “communicating with the users”) that is what I’m doing today. I apologise to all of the people waiting to be added to the site – it’s kinda futile to do that while the downloader’s not working anyway. OK, let’s see if I can make this work…

The Downloader Went Crazy So I Shot It

I’ve just been looking at some problems with the site, and I noticed that the downloader is doing its weird thing. The graph of it looks like this:

So for the last week it has been very busy downloading data from BGG. I mean, INSANELY busy. That has probably cost me about $A200, when I should be able to do the work for approximately free.

This happens monthly, but I don’t think it’s usually that bad. But this time I was afraid the bill would get much worse, so I destroyed the downloader completely. I will have to rethink how it works, in light of the fact that the current plan is not working well at all.

For anyone who is thinking this is a bad sign, yes, it is a bad sign. At this point fixing Extended Stats is feeling a lot like work and not a lot like fun.

Postscript: It seems to be something like when I go to update statistics on a bunch of games, the database is unavailable because it is handling too many connections already. So it is failing to update 5000 games in 69 million attempts to write to the database. I think I will need to put the database writes into a queue.

Yes I Am Still Alive, For The Time Being Anyway

I know it has been a really long time between blog posts. And I haven’t done any coding on the since I Don’t Know When. Yes, I can hear you, Gentle Reader, saying “WHAT THE FUCK IS UP WITH THAT?”

Well let me just say first of all is that the reason I am even taking the time now to post is that I’m in isolation with COVID (omicron strain, I expect) and I’m running out of things to do. I’ve been here nearly 9 days, so if I was going to have any problems I would have had them by now. I have had a very mild case.

Other than that, what have I been doing? Well, a lot of work. My job is as the sole techie for a small company, so I do the programming, graphic design, database admin, cloud configuration, documentation, requirements gathering and troubleshooting. It’s well within my skill set but with so many different things to do I am a bit pulled from pillar to post, solving a different sort of stupid problem every day.

I do all those same things for Extended Stats as well, and the real problem is that when I have spare time I just don’t want to do it any more. Yes, Extended Stats uses newer technology, but if you’ve seen one inexplicably stupid problem trying to connect to a cloud database, you’ve seen ’em all. Not that it’s any use to you because the new problem seems to contradict everything you previously learnt no matter how much it is. But this means that the part of my brain which needs to Figure Tech Stuff Out is completely over it by the time I finish work, and I am not motivated to work on Extended Stats.

For example, one of the periodic tasks I have to do is to look in the Extended Stats database and look at the errors about deleted BGG users. Extended Stats assumes it has the wrong URL and tries again later, hundreds of thousands of times. And yes, I should fix that, but I’m never clear whether the user has really gone or Aldie has changed something, so I err on the side of caution. I went to do that task last night – it should take 10 minutes – and spent 2 hours trying to figure out how to connect to my database.

I used to connect from my work laptop where I had the correct credentials set up, but that laptop died in December (and due to something stupid in the current Ubuntu software, it still hates being rebooted) and I lost that config. So I had to rebuild it, and I kept getting it wrong. But it was really hard to tell because AWS’s networking and MySQL’s user security only ever tell you Computer Says No, Do This Udemy Course To Find Out Why. So, you know, 2 hours to do a 10 minute maintenance job, when is there time to do coding?

But anyway, the other insidious thing that is preventing me from looking at the site ever is MMORPGs. I’ve been a massive fan of Skyrim for 10 years, and Extended Stats has survived two previous Skyrim obsessions. Early last year I decided it was time for Skyrim VR, so I bought a VR headset – as it would turn out, almost the last one available in Sydney.

Skyrim VR turned out to be a flop because for some reason I couldn’t put my weapon away – it could be a bug in the game or a fault in the controller. But we really got into Beat Saber and I played that for a while.

I was still hankering for an RPG fix, so I downloaded the free Neverwinter game. Oh wow, it was so complicated! I had a whole new thing to learn, so I spent months doing that. I replaced my 10 year old TV, got a headset to chat online, etc, and ended up becoming a guildmaster. But then the Neverwinter developers decided that they had to revise the whole statistics system. It’s not like they had got the previous one working properly anyway. So they revised the whole game, introduced even more bugs, and devalued all of the things I had earned and that I had paid for.

That’s no way to treat customers, and I recognised when I was being played for a chump, so I stopped playing. But now that I knew a lot more about MMOs, I decided to bite the bullet and try out Elder Scrolls Online, which is the same IP as Skyrim.

ESO is a very good game, and the development team seems to be a lot more considerate and competent than the Neverwinter rabble. For example, some online ESO doco is 5 years old but not out of date. In many games, for example World of Warcraft, players need to get the newest stuff or their characters become relatively weak, but ESO tries to avoid doing that. Yes, there are dissenting opinions on the capabilities and ethics of the developers, but many of them come from people who have no money, are not programmers, and think that the world owes them video games. I’m very happy playing ESO.

Which does not help you guys One Little Bit. Nor does it help my board gaming, either – I haven’t even been playing solo games. But rest assured I’ve been doing Extended Stats for 15 years now and I don’t see any reason to stop. One day it will become the project I am obsessed with, again.

The Rainbow Button

Does this say “geek” to you?

Today I decided to tackle the problem of the Geek button. I have known for a long time that there was a bit of a navigability issue, where it was enough to get to the data you wanted, but apart from the navigation bar, all of the pages with features on them were dead-ends. So it was a little bit hard to stay on the site and explore. So I needed some way to get back to the Geek page from the feature pages. I will need more such things as time goes by, but I felt this one was really missing.

The problems were things like:

  • what does this button look like?
  • where does it go?
  • how does it work?

These are not usually issues you consider when using a site, unless something is wrong – only then do you notice.

I realised in the last week or so that the way to show what the button did was by colour. Yes, the site has a lot of colours. That’s a deliberate choice, and if you disagree please send me your copy of The Oracle of Delphi. This is a site about games and I feel that it should look like a game rather than say, a newspaper, or a chainsaw massacre. And it should go on the nav bar despite the other problems that would cause (see below).

So I needed to make a colourful button, and I decided on diagonal stripes. I am still not sold on diagonal stripes, but it can be replaced when I get a better idea – I feel that the diagonals don’t interfere with the text, whereas orthogonal shapes might. And as my image editing program is broken on the new laptop, I decided to write a program to generate the image for me.

And then it remained to make the button work. Longtime readers with good memories and insomnia may remember me posting about the nav bar and Gatsby and server-side rendering… but let’s say you don’t. The navigation bar and all of the things it does are built on my laptop using software called React and Gatsby, and copied to AWS, and they are mostly static and don’t use JavaScript.

The login button, which is part of that system, uses JavaScript but is also pretty painful to work with. However my problem was that the Geek button (and the page buttons which I dreamed of adding below it) would need to know who the current geek was, and they needed to pick that up from the page (and in particular, from the URL). So I create the rainbow button and its submenu on my laptop, copy that HTML across to AWS, and then when the page runs there’s some JavaScript which looks at the URL, decides whether to show that menu or not, and fills the links into the menu options. It’s somewhere between cunning and gross.

Does this say “board game” to you?

I wonder if some day people will confuse my colour scheme for support the LGBTIQ cause. It’s not, but also I am cool with that if it happens. Most especially if it pisses off some RWNJ deplorable…

Out with the Old, In with the New!

After dreaming about it for two years or so, I have switched from using a normal MySQL database hosted by AWS to using AWS’s serverless database, Aurora Serverless. I expected this to be a mildly traumatic changeover, but all of the trauma was caused by Oracle’s stupidity and nothing at all from AWS or the new database.

I exported the current MySQL database to a dump file, created an Aurora Serverless database, imported the dump file, and told the downloader, the API, and the Express server to use it. And it seems to work. The only way I can tell the difference is to look at the performance logs in AWS RDS to see which database says it’s doing stuff.

On the other hand I did hope that the switch might help with a performance issue I’d been having, and it didn’t. Hmph. Back to the drawing board on that one.

Database Usage Since Creation

Notice in the graphs, database capacity was high during the initial data import – this was AWS automatically allocating the required capacity. The after the import it dropped down again. There were some bursts of CPU while I was doing some testing, and the particularly high ones are the troublesome pages I’m debugging.

I Killed It

Some time ago – it was probably last year – I had a hardware failure on the old stats server in my study, and the site was down for a few weeks. During that time I did a thing I never wanted to do, and ported the old site into the cloud. That became stats3.drfriendless.com.

This has always been a problematic service, because:

  • I never wanted to do it
  • I hate the technology that it uses
  • I can’t remember how the old site works
  • The database is badly designed
  • The site is slow
  • It’s very expensive

And I could go on whining all day. But then add to that:

  • The expensive service was underpowered for what I was trying to do
  • And it didn’t work anyway and I couldn’t be arsed finding out why

so today I finally pulled the pin and shut it down. stats3.drfriendless.com now points to the same server as stats.drfriendless.com.

So now I am back down to two sites, both of which are cheaper. I won’t miss stats3, it was not a good solution.

Like a Bought One!

I had a pleasant tech experience yesterday. That’s a sufficiently rare experience that I need to write about it.

I’ve been working with a technology called GraphQL, which is a bit like SQL expect it works with structured data, e.g. a Facebook user, who has a list of friends and a list of posts, and the post have users who liked them, and the friends have posts of their own, and so on, not to mention the cats. GraphQL lets you write a query (like SQL does) that you send to a server which executes it and returns the data.

That sounds easy. The hard bit for me is that because I own the data, I have to write the bit that retrieves the structured data from the database. GraphQL makes that easy enough, so I can usually just do a query on a table, give it all the data, and it sorts out what it wants. People say bad things about it, but I think GraphQL is a wonderful technology.

However it’s not always that easy. I have a concept in my database called a GeekGame. It’s the relationship of one geek with one game. It includes values like that geek’s rating for that game, and whether they own it or not. It does not include the name of the game, or how many players it’s for – that data goes in the game.

So with GraphQL I can ask for a GeekGame, and I can say that I want to get whether the geek owns it, but I don’t want the rating. But until yesterday, one thing I could NOT do which I should have been able to was get the game as well. So I would have liked to say “give me all the games owned by this geek, with the rating and the name”.

The problem was that GraphQL would ask me for all of the GeekGames, and then after retrieving them, it would find out which games it wanted, and ask me for them one by one. And as there were hundreds of them, that would generate a lot of queries (which cost me money and take a long time). And in fact I couldn’t get it to work at all. Quite a few times the database got cranky at me and refused to talk to me any more until I found out about the FLUSH HOSTS command and ran that. That was a learning experience!

But, there’s a technology called DataLoader, which was made at Facebook (they invented GraphQL as well), which promised to collect all of the single queries, and give them to me in a batch. Then I could do one query on the game table, give it to GraphQL, and it would sort the rest out. So I thought Id give it a go.

O. M. G.

It worked, first time. I don’t even mean it worked after I fixed all of my stupid mistakes. I mean I copied from the example, implemented my bulk query, and kablammo, it did what it was supposed to.

So I was kinda pleased about that.

I could then convert the Monthly Page over to use GraphQL, and then today I could add the How Much Do You Play New Releases? chart to that table. For that, I needed the publication year of the game, which that page did not previously load. But with GraphQL it’s simply a matter of changing the data query URL to say you want that field as well.

I feel I’m on a bit of a roll with the new features now. I’ve stabilised on Angular 9 (with Ivy, the new compiler technology), some version of Vega that seems to not break, and GraphQL. According to my Trello board I have about 10 features remaining to be feature-compatible with the old system. Then with any luck I can get to inventing cool new stuff.

Fiddling with Stuff

Rome Burns, AD64. This is fine.

When I was writing the “Best Days in Gaming” table the other day, I realised that there were 3 parameters that I used which were sort of arbitrary. And I also realised that it probably wouldn’t be too hard to allow users to change them – Angular is really good at that sort of stuff.

So last night I sat down and wrote the code, and I really liked how it turned out. It has gone live already so you can check it out. I had to give the button to get to it a name, so I called it “Fiddle”.

And then today I decided I could do the same thing for a couple of other charts on that page, so now the Florence Nightingale diagram and the Most Played Games by Year chart have Fiddle buttons as well. I think this will become a thing.

I think my next task in this direction should be saving the fiddled values to the user’s account (if you’re logged in). That is the whole point of logging in, after all.

I Had ONE Job

At last, I am getting my COVID dividend. By which I mean, 3 months after my industry collapsed in a screaming insolvent heap, I get a day of leave. With any luck there will be a few more to come, as I have a heap due to me.

I have many ambitions for this one day, and at 4pm it looks like I will be achieving basically zero of them, except for a blog post. But I did log in to the “Elastic Beanstalk” server. I call it that because I had plans to use Elastic Beanstalk to host it, until I found out how much Elastic Beanstalk costs. Now it’s just a tiny EC2 running an Express server.

So I logged in and noticed that there were a lot of upgrades waiting for me to do. I have this issue at work as well, and I don’t do the upgrades because we don’t want one mysterious problem to take down the entire company. It’s a small risk, but it’s a real one. Due to some sort of AWS failure we were down for 22 hours once, and many customers got very antsy about that.

So I did the upgrades and rebooted the server. Then I ran some graphics software called The Gimp which logged me out and I lost all of my windows. And after the trauma of figuring out what was going on there, I forgot that I had ever touched that server.

No, not that gimp!

But what I did not do was restart the server. Which meant that when users went to the front page of Extended Stats, the little box where you can search for your user name didn’t work. Ooh, bad look.

Luckily a new user emailed me to say that although I had told him that his account was ready, he could not find it.

That feature is about the only part of Extended Stats that can actually be down. It and the blog are the only servers. Everything else is serverless, and although it can happen that I can screw it up, it can’t be just not started. If extstats.drfriendless.com is ever down it’s either because I didn’t pay the bill or AWS os down. And if AWS is down, you won’t be worrying about my site because all hell will break loose.

AWS is down, as seen from Sydney

Well, at least y’all know I’m alive. I’m going to go try to do some programming for a bit.

Testing Times

Only three months after the sudden death of our entire industry, work is starting to slow down. I’ve dealt with all the requirements the customers suddenly had with respect to refunds, finished all the work we had promised to customers who can still pay us, and have made a couple of the more risky changes that needed to be made. For example, I switched our user interface over to use CSS Grid, which OMG is so lovely! This is sort of like replacing the skeleton of a living creature, and I am proud to say I did it with only one blindingly stupid mistake.

Now I’m onto the less urgent jobs, i.e. the ones that have been undone for years despite being ticking time bombs. So you know, things like minesweeping the front yard, putting locks on the doors, and making sure that for any third party package we know which version of it we’re using (and even better, that it’s a somewhat recent one).

One of the dreary consequences of that last task was dealing with the antiquated version of one of the test frameworks we use. We have four different test frameworks which do mostly the same thing. I am not a connoisseur of test frameworks, and it seems neither were the developers before me. But the particular problem with this one was that the old version of Scala that it wanted clashed with the old version of Scala that we actually use. So I wanted to upgrade the testing framework, from the antiquated version to a merely old-fashioned version. However there was no documentation for the old-fashioned version, and it seemed to be different to the antiquated version. That’s why Earl had to die. I mean, that’s why I had to get rid of it.

Now despite not being a connoisseur, there is one testing framework I admire because it’s not as shit as the others, and that’s Mockito (with JUnit of course). So I decided to rewrite a bunch of Scala unit tests into Java using Mockito. Well… that was easier said than done. There were a few tricky things built into the Scala framework, so I had to figure out how to replicate the trickery, and Scala itself has some tricky things which are about as entertaining and as useful as Vegemite on the toilet seat. But I battled on and got maybe 50 tests rewritten in two days. Now I am an empty husk of a man but still unfairly overweight.

This story is going somewhere. Gradually.

So this weekend I’ve been doing some stuff with the stats website, and one thing I’d like to do is group plays by location. I do download the location of plays, but then I normalise those plays and store them in another table, and I did not until today retain the location information. And as all of the good charts run from normalised plays data, I had to have it there.

I’m glad you asked, Dr Evil. A normalised play is a play of an expansion or a base game. If the play is of an expansion, and we can figure out what the base game is, then there is also a normalised play for that base game. Any plays of expansions for which there are base game plays have at least as many base game plays as expansion plays.

Yeah, that didn’t make any sense. Let me explain with board games. You record a play of Race for the Galaxy: The Gathering Storm on BGG. That is not normalised, because you didn’t record a play of Race for the Galaxy as well. We know you must have, but you didn’t say so. So part of the normalisation process is recording that play of Race for the Galaxy as well.

Similarly if you record a play of The Lord of the Rings: The Card Game – Nightmare Deck: Shadow and Flame 3 times, then you must have played the Khazad-dûm expansion 3 times, and so you must have played Lord of the Rings: The Card Game 3 times as well. So from that play we infer plays of the others. When I ask the database “how many times did she play Khazad-dûm?” I don’t need to notice that there are 3 plays of Shadow and Flame, because the database already has it. That’s why I do normalisation. It can get a bit complex.

Which is all very well and good except when you add locations in as well. Let’s say you record a play of Carcassonne: The River and a play of Carcassonne. How many times did you play Carcassonne? At the moment, Extended Stats infers that you played Carcassonne one time, and you used the expansion The River. But if you record a play of Carcassonne: The River at home, and a play of Carcassonne on boardgamearena.com, how many times did you play Carcassonne? Well, twice, is my guess. So that’s what Extended Stats needs to do.

So with the addition of location information, the normalisation algorithm needed to be updated. And as I mentioned earlier, it’s a bit of a complex algorithm. I’m a pretty bold and capable programmer, but I am also a cautious and not stupid one, so I was wary of messing with the code without being sure I got it right. This is what tests are for, so I needed to write some.

The problem was that Extended Stats is written in TypeScript, and I am not familiar with any TypeScript testing frameworks at all. So I asked Mr Google, and he said the best one these days is called Jest, so I looked it up and it seemed easy enough (because hey, I spent two days converting Scala tests to Java tests, I know what to expect) so I started to install it.

And that was where it got a bit messy. JavaScript documentation (TypeScript is like a fancy new version of JavaScript, but it’s still the same ecosystem) assumes that you know a whole bunch of JavaScript stuff, which I often don’t. I’m just a programmer, not stackoverflow. So there was a lot of guessing and muddling and maybe some swearing, and when I found the remnants of my previous attempt to use a JavaScript testing framework – mocha – in which I had written the comment “what a fucking retarded ecosystem” – I joyously deleted that. But then after learning a few things, I got it to work.

So then I was able to proceed with changing the code to support locations. I have that working now, but I haven’t sent it live yet because while I’m on a roll there are some other things I’d like to fix.

Some games have really nasty expansion trees. Agricola, for example, has a revised edition. Some of the expansions are suitable for either edition. So if you record a play of Agricola: Agricola Meets Newdale, I can’t say what base game you played. However if you also record a play of Agricola: Farmers of the Moor (revised edition), then I can assume that you played Agricola (revised edition), with two expansions. I don’t think my code can handle that yet, so I’m going to make sure it does before I bother deploying a new version.

So that is what I’ve been up to – details details details! I’ve also done a couple of new charts recently, which I am quite pleased with, but I haven’t blogged about them. The UI work is always nice because it shows concrete results, but the excellence needs to continue all the way down.