I Figured Out the Kaboom!

On the weekend when I noticed that the database was worn out, I updated some code to try to make database operations for saving game data less expensive. When I turned the downloader back on, the situation was just as bad as before, which was much worse than it was last month. I didn’t really know what had changed.

I noticed in the Lambda logs that a lot of updates of game data were timing out after 30 seconds, which I thought was odd. There might be a couple of dozen SQL statements involved, which should not take that long. So I added some logging to the appropriate Lambda to see which bits might be taking a long time.

The answer was that the statement where I calculate a game’s score for the rankings table was taking maybe 8 seconds. Actually I think it takes longer if there are more things hitting the same table (the one which records geeks’ ratings of games), so 30 seconds is not unbelievable. But it is bad.

I checked out the indexes on that table, and realised that I had relatively recently removed an index on that table – the index that lets me quickly find the ratings for a game. So it seems that the lack of that index was slowing the calculation of rankings down, and hence causing updates to games to wreck the database.

So I fixed that index, but there was a reason I took it off. MySQL InnoDB (not sure what that even means) tables have a problem where if you do lots of inserts into the same table you can get deadlocks between the updates to the table and the updates to the indexes. I figured I didn’t need the index on games so much, so I took it off to fix some deadlocks I was seeing. Silly me! Now I suppose the deadlocks will come back at some point.

Next time though, I hope to remember how important that index is. I’ll just rewrite the code that was deadlocking to do smaller transactions and retry if it fails.

POGO Bounces Back!

One of my favourite features on the old site is the Plays of Games Owned Graph. I’ve been intending to put it on the new site for a long time, but have finally achieved it! As far as I can recall it has all the same features as the old one, including the click-through to BGG.

This graph is now on the Owned page. It’s available now on test.drfriendless.com, and tomorrow on extstats.drfriendless.com.

Kaboom!

Hmm, something bad happened yesterday. Database CPU went very high, and database capacity was exceeded and it used up all of the burst capacity. That means the database is worn out for a few hours until it recovers. I wonder what went wrong? I suspect one of the Lambdas went crazy and ran too many times, but I don’t have a good idea why that would happen. For the moment I have turned off the downloader so it will stop hassling the database.

These graphs show database performance. The top-right one is probably a cause – lots of incoming connections – and the bottom right one is a consequence. In particular the blue line diving into the ground is a bad thing.

Looking at the Lambda invocations, it seems about every 35 days there’s a spike, and yesterday’s spike was the biggest ever.

Taking a closer look at the spike, we can see that it was the oranges and the greens wot dunnit. Greens are downloading data about a game from BGG, and orange is storing that game in the database.

I just checked the code, and each game is updated every 839 hours (there’s a boring reason for that). So, that would be what’s causing the problem – every 35 days, I go to update 66000 games, which causes 66000 Lambdas to download data from BGG (sorry Aldie) and then 66000 Lambdas try to update the database. It seems I need some more dithering.

A Massive Overreaction!

I blogged a couple of weeks ago about my unsatisfying experience with React. I then got to pondering what I might do with React – I liked the navigation bar that I did, and so would like to continue using React for little bits of the page outside of the main applications, like the login button.

One of my goals with the system design has been to allow other developers to write pages, by which I mean the data presentation components. Angular and React usually call those things Single Page Applications (SPAs), by which they mean you don’t keep loading new HTML pages as you click around and do stuff. What they don’t typically mention is that because SPAs don’t play well with each other they tend to be You Must Write The Whole Page Using This And No Other JavaScript Applications.

If you try to put two Angular applications on the same page, they both try to install something called zone.js, which can only be installed once. So don’t put two Angular apps on the same page.

If you try to put two React apps on the same page, they both include React libraries. And if the two apps use different versions of the React libraries, then they interfere in unpredictable ways.

The way I discovered this was I rewrote the login button in React, and put it on the same page as the navigation bar. Due to quirks of fate, each used a different version of the React library, and it didn’t work. I consulted the React guys on Reddit, and they suggested I was doing it wrong, and that I should just write the whole page in React. I didn’t want to do that, because what if some other developer wants to write a data presentation component in React? Then they would need to match the React version of the hosting page. I am an extremely stubborn person when it comes to implementing a plan, so that was not going to happen.

I continued thinking about this, and about how the navigation bar didn’t really need React at all, and I had the idea of server-side rendering. SSR is when you run the JavaScript to generate HTML before sending the page out – so you send more HTML and less JavaScript. And there’s a technology called GatsbyJS which is designed specially for writing server-rendered sites in React, so I decided to give it a go. (I also tried one called NextJS but I did not like where that was going.)

Previously, the HTML pages, e.g. index.html, rankings.html, were written using a technology called Mustache, which is just a template language. If I wanted the navigation bar I would just put in {{> navbar}}. See how those curly brackets look like mustaches? That’s the joke.

So to convert to Gatsby I pretty much had to convert my HTML to React JSX, which is basically JavaScript code which looks like XML. That wasn’t so hard. But then Gatsby gets miffed if it doesn’t own the whole world, and if you want to refer to things which are outside the Gatsby world you have to use a feature called dangerouslysetinnerhtml. Being the daredevil and mule-headed SOB that I am, I did that, and pretty much got the site being generated from Gatsby.

There was a hiccup when I generated the Gatsby site – remember the point of server-side rendering is to do the JavaScript on the server, not in the browser – and Gatsby stuffed a whole bunch of its own JavaScript into the page to preload pages it thought the user might go to next. I was pretty annoyed by that – if I want JavaScript in my pages, I’ll put it there! Luckily Gatsby has decent facilities for hacking the result, so I figured out how to tell it to throw away all of that JavaScript I didn’t ask for. I do resent always ending up working on the most advanced topics on my first day with a new technology.

And that was when my IDE (my smart code editor) stopped coping. I had most of Extended Stats in one github repository. So I would open the project in the IDE (I use Webstorm for this) and it would try to find all my code and figure out what bits referred to what other bits, which is very handy when you want to know whether something is used or not. However with 3 separate CloudFormation stacks for Lambdas, Gatsby, and a dozen Angular applications, it would get confused sorting all that out. As far as I could tell it would take an hour to reindex the code, and during that time it wouldn’t allow me to paste more code in. That was unacceptable , so I decided to move the client module out into another repository and project.

Great, except that something had sabotaged the Gatsby project so that Github ignored it. Github is the cloud site where I store my code, and if my code’s not there it’s only a hard drive crash away from ceasing to exist altogether. So I had to convince Github to store the Gatsby code. I never did figure out what was going on, but I did copy the code to a different place and pretended it was new, and that worked.

And then after that worked, I could get the login button working, and then I could build a replacement site using Gatsby. The login button is tricky, because it does require JavaScript to work, and I can’t write that JavaScript in Angular or React, it has to be what they call VanillaJS. But that’s OK, I had a few versions of that code lying around already, so I just copied it into React’s dangerouslysetinnerhtml drama. And now it all works, mostly!

I deployed that version this morning, and it just made the daily sync with the CDN, so the Gatsby version of the site went live a short time ago. I just noticed that the user page is calling itself the Selector Test page, which I will have to fix. But overall I’m happy with this experiment. I still have to delete a few things that have become obsolete, like older versions of the login button and the nav bar, and I will fix up the CSS so the pages don’t look so cluttered and jumbled, but I feel this solution is better than the Mustache one. And I guess I can now really claim to have some experience with React.

This Ain’t the Panama Canal

“A Man, a Plan, a Canal – Panama”

The lovely palindrome suggests that the building of the Panama Canal was planned and executed in a rather neat and reversible way. So when I consider my development process which is kinda haphazard, I feel a bit inadequate and envious. Fortunately, a quick glance at the Wikipedia entry on the building of the Panama Canal suggests that its 33 year construction was not without drama, difficulty or incompetence. Ferdinand de Lesseps, the man with the plan, was sentenced to go to jail for his role, but never actually did.

So that makes me feel better. The first Extended Stats site started as a few Python scripts. I spent years running them nightly and uploading them to free hosting sites, which were universally shitty, resulting in my decision to host the site at home. Maybe only a year after that I rewrote the scripts as a proper web site. Between the shitty free hosting and the world’s mistrust of dynamic DNS, the site name kept changing. As web users continue to expect more (e.g. https) and I become more professional in my approach, it continues to evolve.

The same thing is happening with the new site. I spent a good deal of 2015 and 2016 thinking “I should make a new site”, then a good deal of 2017 thinking “hmm, what would be the best way?” And then I spent a few more months trying to get those ideas to work. It was only in June 2018 that the technology started to work for me. About then I realised that Node.js, which I had chosen to write the server-side in, was not one of my talents. So I wrote some fairly bad Node code for a while, and maybe I am still doing so. Though it’s a few months since I’ve found a way I like better.

Similar things happened with the web pages. Although I am competent with Angular, when I started I didn’t know what I wanted on the pages, and I still don’t, really. However they are slowly converging. Whenever I write a new page there are some bits I like and some bits that are horrible. When I’m done, I take the bits that I like and put them into a library on NPM, and use them again later. That makes it easier to re-use those ideas on later pages, or on revisions of existing pages.

So this morning I sat down, determined to put the Plays Of Games Owned table into the new site. The first task was to find the right place for it, which would be on an existing page which already has the correct data. I looked at the Owned Games page, and noticed that it was the first page where I had implemented the pattern of the page loading a set of data, and multiple components rendering their views on it. However some code from that page had been put into a library, so I fixed up the page to use the library and stuck in a loader component while I was there (the coloured bobbles that indicate that the data hasn’t arrived yet).

However the Owned Games page doesn’t have plays data, so it’s not the right page for that table. So I next considered the Favourites table, which does include plays. The selector isn’t quite right though, as the Favourites table defaults to games played and rated, and isn’t anything to do with owning them.

The Favourites page one of the first pages to use Vega charts, so it had a Charts button that you could click on to get to the charts. That preceded the way I do it now where each chart appears on the page as its own thing, and uses the data the page provides. So I moved those charts out onto the main Favourites page, and wired them up to receive the data. Then I noticed that the Favourites page was the one where you can change the selector, but the new selector was injected into the table, rather than into the page. So I went back to the library and changed the way the config component worked, so now you can change the selector on the Favourites page, and the table and all of the charts update. It’s still clunky, but it works. Oh, and the Favourites page had one of the old-style tables without the handy header bar, so I fixed that up as well.

And that’s why it’s now late afternoon and I have not yet achieved the first thing I set out to do today. The site has evolved a little, and I look forward to wiring the selector changer into other pages, and hopefully making it nicer when I do so. I’ll see if I get to try again tomorrow.

“A man, no plan, a clusterfuck, a clean up.”