Happy Holiday!

So, this is Christmas. I don’t know what Christmas is like for you, but it’s quiet for me these days. My family lives interstate, my wife is not a Christian, and my idea of a good time is fixing bugs. So we spent Christmas at home and did very little.

However I did get a bit of a chance to work on the site. I managed to:

  • remove deleted BGG users
  • discover that one of the Lambdas I’ve been invoking apparently never existed (whut? maybe I got distracted before I finished it?)
  • write a new email response for new users
  • send email responses to 20 or 30 users
  • added a “number of plays column” to the updates page, so you can compare the count to BGG easily
  • fix the SSL certificate on the old server
  • install a new kernel on the old server
  • write a blog post so you know I’m not dead!

Tomorrow is also a holiday, although we will be having people over. Maybe I will be able to ignore them and finish the new feature that I’ve had on my smaller laptop for about the last 3 months. I wish that every day could be Christmas Day, where people do their own things and don’t hassle me… happy holidays, team!

The State of the Site

I just received an email from an annoyed user of the site, that they had not been able to access the old site for ages. I thought that was a bit harsh as I am now running 3 versions of the site, but then I realised that it was my fault, as after repairing the physical machine that the old site runs on, I had forgotten to re-enable the hostname to point to it. So it turns out that I’m forgetting how it’s all configured, and it’s entirely reasonable that other people will be confused as well. So let’s go through it all from the beginning.

There’s the PC in my study. It’s running stats.drfriendless.com (since I fixed it a minute ago). I suspect it is doing quite a poor job of it, but since I have to get past a bunch of people to get to it to find out, I won’t know for sure – my wife and my niece work in that room. The HTTPS on that system is a pain, and I have to go in there to fix it again soon.

There’s a virtual machine which duplicates that PC, and it’s called stats3.drfriendless.com. It’s more reliable than stats.drfriendless.com as it is not subject to physical problems and does not run over my home WiFi. So I think in the future the VM will become stats.drfriendless.com – probably in summer, when the PC overheats, that will become a necessity. The downside is that it’s a bit expensive to run that system – maybe $50 / month – but I consider that a tax I have to pay because I have not got the new site doing as much as the old site yet. (Hmm… and I just discovered that some pages that load on stats.drfriendless.com don’t work on stats3.drfriendless.com. *sigh*)

And then there’s the (mostly) serverless system, extstats.drfriendless.com. I can’t see that breaking due to anything except funding problems. Serverless architecture is just so robust – if that site goes down, probably half the internet is down with it. However that site still needs a lot of work to be done.

The other thing to consider is the state of me. I haven’t been doing so much work on the site for a couple of months, and it’s hard to say why. I am spending a lot of time taking my dog to the vet – this is the same dog as I’ve had since the site started, so she is very old now. So even if I wake up energetic and enthusiastic on a Saturday morning to get some work done on the site, at about 9am I have to go for a very very slow walk to the park and back, and usually then drive her to the vet for an arthritis injection. It breaks my concentration, occupies my time, and makes me sad. She has been such a wonderful dog for such a long time, and now she’s almost an invalid that we have to take care of. So when she finally shuffles off this mortal coil, my life will change, and I hope I will take more time to do my own things.

And then there’s work. Goodness me, I am a devoted and conscientious and competent worker! But that means that my head is full of work stuff, and not so full of stats stuff, and it’s a bit hard to motivate myself to think about a whole different set of problems just because the name of the day changed. But the problems also interbreed – it’s quite common that my boss will forward me some email from AWS telling us to deal with some problem, and I have got the same one at home. So I experiment at home to make sure I know how to solve it!

When I’m feeling slack I always console myself with the thought that the old site was built over about a 5 year period, and we went through years of problems while I sorted out hosting. We got through that, and I’m sure the new site will get through its current problems as well.

One Step Back, Two Steps Forward

I’m so far behind that I have to blog about what happened last weekend. My wife was away so I had a bit of time to myself which I planned to use doing some work on the site, instead of acting like a human being and having a life.

The first plan was to fix the Login button on the front page. What happens is that I display both a Login and a Logout button, then I make a call to the server to find out whether you are logged in or not, and then I remove the irrelevant button. That was working, but it was very slow due to Lambda startup time, so the two buttons both displayed for a disturbingly long time. So I wanted to move the API for whether you’re logged in or not into the “Elastic Beanstalk” server. I put that in quotes because although I still call it that, Elastic Beanstalk is not involved at all. Elastic Beanstalk is a service which would start another one of those servers if the first got overwhelmed. I had that working until I discovered that Elastic Beanstalk costs extra money, and as it was unnecessary I took it out.

Anyway, I moved that service as planned, and it didn’t work. It turns out that API Gateway – the bit where I say which server or Lambda each URL goes to – has very firm poorly documented opinions about HTTP headers transiting through itself. And since the cookie which says whether you are logged in goes in a header, it didn’t want to send it. In fact I still don’t have that working. And as much as I understand how HTTP and CORS and API Gateway work, I’m not sure how it is ever going to work, although it seems very much like the sort of thing that should. So after many many hours, that problem got put on the backburner.

The check for whether you’re logged in or not is a bit quicker, but it always says you’re not logged in, because the cookie is never sent to the server. I’ll figure it out.

The other thing I worked on last weekend was also supposed to be easy. It was the Plays of Games Owned by Published Year graph, which looks like this:

This graph goes on the Owned page, which I hadn’t touched in a while, so I started by updating the libraries I use to make it. Well, that was a mistake. For four hours I juggled library versions, checked commit logs, and researched why the heck I was getting the wrong version of TypeScript. OK, the wrong version of TypeScript was totally my fault, but the rest of it was not and it was annoying.

But then I got the page running again, and I could add the new graph. That just involved confusion about how Vega allocates colours to data points, and a few things being upside down and the wrong size and so on, as happens in this sort of work. And then it was finished and I sent it out to the site and dragged my sorry backside to bed. It really does get a bit wearing, doing things that don’t work, all the time.

But of course this morning I was optimistic and naive again, and at 7am I started adding another feature to the Owned page. This time I didn’t need to upgrade any of the libraries, but I did discover that some of the data I wanted to display, in particular whether you had a game for trade, was not available in the data set I was using, so I had to fix that. And then I decided that rather than improving the REST data set, I should use the GraphQL data set and improve that. So I ended up creating a new version of extstats-core, a new version of extstats-angular, a new release of the API, and converted the whole page over to GraphQL. And then, only 6 hours after I started (some of which were spent on domestic duties), I got the “Why Do You Even Own This?” table working on the new site.

On the whole though, today’s mucking around was much more productive than last weekend’s. GraphQL is kinda lovely, and much easier to work with than REST, and the changes I made today should be useful for all of the other pages as well. I think about 4 of them out of 9 or 10 use GraphQL so far. However I think if I do some more work this weekend I’ll probably just try to get some features out, rather than mucking with stuff.

Lift and Shift

As mentioned in the previous post, the old Extended Stats server blew a foofoo valve and needs parts. The fan I ordered on EBay arrived and when I went to install it I discovered that I’d ordered the wrong size. This is one reason I hate hardware. Although I love the details of software, the details of hardware bore me senseless. Why can’t they just make it out of Lego so I can hack something together rather than having to figure out exactly what thing I want?

Anyway, I ordered another fan and it didn’t arrive yet. In the mean time the idea of doing a lift and shift continued to grow in my mind. A “lift and shift” in cloud computing is when you take an existing service and dump it in the cloud without adapting it to cloud architecture at all. It’s kind of the worst, most expensive, way to get into the cloud, but it’s also the quickest.

Another of the things I use the site for (other than being there for you guys) is experimenting with stuff. There’s a type of database that I’d like to try at work, which would be a huge risk for the company, but I can muck with Extended Stats for a few hours without anyone losing too much money. I mentioned the idea of experimenting on the stats site to my boss, and I suspect I might be able to get him to throw me some funding… so yeah, that’s gonna happen.

So as you might guess I did the lift and shift. It took a few hours, spread over two weekends, as as usual everything in Django seems to have changed between releases. And then I had an annoying problem where the web server couldn’t find the code, even though everything was configured perfectly, and that took me a good 3 hours to figure out – my home directory had permissions which prevented the web server from seeing the code! Late last night I got that sorted, and the new site started to work.

Then this morning I spent another hour or so getting the downloader to work. And then, in a small brain explosion, I told git (the version control system) to undo all of my changes… so then I had to go and fix all the things again, which was not at all what I had planned.

But now it seems that http://stats3.drfriendless.com is working OK. You might notice that there’s an S missing from HTTP. That’s another problem for me to sort out! I’ve come to realise that I don’t particularly enjoy solving AWS configuration problems, so I’m not really looking forward to that one. I could do it with a load balancer (which is expensive) or with a CloudFront CDN (which is annoying to configure but I suspect I’m getting OK at it by now). Or I could try to do SSL termination in Apache – actually, nah, that’s the sort of thing that will just make more work for me later.

Well, that’s how I spend my weekends! I’ve been hearing cries from the crowd asking for utilisation numbers on the new site, and I have no idea why I haven’t done them yet. So I might try to do some programming and get those happening. I like programming, it’s better than configuring web sites.

Login is Useful!

… for small values of useful.

I’ve been working on login again recently. As I say every time, login is hard to work on because it doesn’t work on the test system and it doesn’t work on the development system, due to the authentication service saying “no, you can only login to extstats.drfriendless.com from extstats.drfriendless.com” which is eminently sensible but a bit frustrating. So I have to make the change, send it live, wait up to 24 hours for the CDN to put it on the site, and then I can test my code.

Now when I log in, I get this (we are looking at the three buttons on the right):

The presentation still needs some work – that does not come naturally to me! The orange button is a link to the user page, and the yellow button is a link to the geek page for Friendless.

The user page is only available to you if you’re logged in. It looks like this:

The Buddy Groups aren’t used yet, not quite. What is a bit useful is the list of BGG user names above them. In that area, I can enter the list of BGG users that I am, or just ones that I like to stalk. And then when I’m logged in, those names turn into yellow links. And those will take you directly to the geek page for those users.

I’m not yet happy with the layout of the buttons, but as with all things on the site I’ll figure it out eventually.

Now that I’ve had this first success with user data, I hope to plumb it in in more places in the site, so that logging in becomes a useful thing to do.

That’s the Way the Money Goes!

I’ve just figured out a new way to work the AWS Cloudwatch graphs, so rather than just graphing the number of Lambda invocations, I can graph the total duration. For Lambda, I pay for each invocation, but if an invocation goes long it counts for multiple. So I’m sort of paying for duration as well. I figured out how to graph total duration per Lambda.

This graph is for the last 2 weeks, and shows that inside-dev-processPlaysResult is taking by far the most time. That’s the one that takes plays scraped from BGG and writes them to the database. I’ll take a look at that code. It is a bit on the complex side, as it’s the bit that infers plays of base games from plays of expansions, but I can usually find something to optimise.

Looking at the same graph for the past 4 weeks, all we can see is the Kaboom! Everything else literally pales into insignificance compared to that bug. Cool!

Sorry I am having too much fun graphing AWS performance to graph board game stuff :-).

Cleanin’ Out My Closet

Nah, I’m not going to go all Eminem and aggressive and stuff. I’ve literally been cleaning stuff up today. It was a great, productive day. There were a couple of users that I added over a week ago, and before advising them that their pages were ready, I decided to check whether they were, and they weren’t. This is the sort of bug that cannot be tolerated. With 3034 users, stuff’s got to work without me watching it.

So I hunted down what the problem was, and discovered that I’d modified some SQL in a buggy way a couple of weeks ago, and then swallowed the error so that I never noticed. So I fixed the SQL and the users started being created.

But they still weren’t coming through properly, so I investigated further. There were half a dozen or so users who had deleted themselves from BGG, and so I was unable to process. Yet I kept trying to, every minute. So I deleted them. And then there was one user whose BGG collection is so big that BGG just tells me it’s too big. I’m not sure what to do about that.

You will notice in the graph below of Lambda invocations that there was a solid orange band at the bottom. That was just doing broken things over and over. Oh, and by the way, I pay for the height of this graph – lambda invocations cost some tiny amount of money. The right hand end of the graph shows how much the orange band decreased after fixing that stuff up. It will cost me a bit this month (like, a dollar), but next month it should be better.

These sorts of problems can’t be allowed to persist. So I wrote some code to send errors to the database. When errors happen in the hundreds of thousands of lambda invocations per month, I don’t necessarily notice them. If I write them to the database I can at least find them. With any luck I will find the next similar problem faster.

So then after cleaning that up, the new users started working properly, and I had a clear conscience. I then started emailing people who have been waiting to be added to the site to tell them that there was a new site and they were added. I emailed 280 people, some of whom had been waiting for 18 months. I hope they still play board games.

Anyway, whether or not they still remember who I am, it was nice to get 280 messages out of my inbox, and to have that weight of guilt lifted from my shoulders after such a long time. On the other hand, I’ve increased my potential active users by 280, and that might reveal some other problems. I don’t expect it will be too much, as the architecture I’ve chosen is nothing if not scalable, but you never know. The database is a non-scalable weak link, but I think the impact of users is trivial compared to the impact of the downloader.

And then, because I’m hyperactive or something (not to mention that the weather outside was a bit yukky, so I wasn’t tempted to do anything else), I updated my spreadsheet of ongoing costs. It was 3 months behind.

May 2019 shows a jump in Lambda costs, due to the Kaboom I blogged about previously. It was only $6, but it was an architectural problem that was going to stick around and cost more each month until I dealt with it. Hence why things like that get my attention sooner than actual useful features, and get blogged about.

The kaboom happens about every 35 days, which puts the next one in the first few days of July. Due to the dithering I put in, and fixing the huge database index bug, I don’t expect a big kaboom, just more of a tremor. And due to the continued effect of the dithering, that will become less every 35 days.

The next thing I’m hoping to work on is the update schedule page. It’s not a headline feature, more of a necessary evil. Also I logged a play from October last year, today, so I need it myself. And of course so does everyone from time to time.

Work also continues on the login stuff that I mentioned in the blog post about sticking the cookie. Now that the cookie is working, I need all the bits of code to use it properly, or they won’t be able to access user data. And then I want to write more code which reads and writes user-specific data so I can realise some benefits from all of that mucking around.

Auth0 tells me I have 138 users with accounts, which I think is pretty wonderful since having an account is of little use. But it’s supposed to be a feature, so let me make it that way!

Fiddle Faddle!

I’ve been quiet for a couple of weeks, but I’ve been persistently working on the Plays by Month page. I recently added a couple of tables to the Plays page, that should have been on Plays By Month, so I moved them across. Of course it wasn’t quite a perfect match, and it turned out to be a lot more fiddly than I anticipated. There are so many numbers!

For example, on that page the “plays for a month” could mean the plays in the month, the cumulative plays forever until that month, or the cumulative plays from the start of the year until that month. There’s meant to be synergy between the tables, but it turns out there’s just complexity and confusion. Anyway, it’s done now, and I can get onto some more interesting problems.

Those tables from the Plays page will eventually be removed and replaced with other things that use the data that the page has.

Scheming

I decided I needed a loading widget for the pages where a great deal of data needs to be loaded (i.e. all the useful ones). So after copying some stuff and fiddling with it, I came up with this:

I kinda like it. I like the colours. If I knew anything about graphic design I’d know how to proceed from that to make a colour scheme for the site.

To the uninitiated, developing a web site like this is just all beer and coding and living the high life from the Patreon proceeds. I have other responsibilities though! I’m fully conscious that the site looks a bit amateur, but that’s because I am indeed an amateur in many aspects of web development. I’ve been reading books about colour, and design – in the aesthetic sense, not in a how-to-do-it-in-HTML sense. So I’m hoping that over time some sense of style will infect my brain, and I’ll apply it to the site.

Much Frustration

Sometimes it would be more fun to bang my head against a brick wall rather than be a programmer. However programming pays better and I don’t have much experience in headbanging. This is no time for a career change.

I managed to make some progress on the User page. That’s the one you get to after you log in and click on your user name. It seems that it can now store your BGG name and your buddy lists, and then display them again afterwards. It has taken WAY TOO LONG to get that working.

I then continued work on the Plays page – the one that uses the data about all the games you’ve played. I’m working on a chart to compare how many new games you’ve played when with the same for your buddies from a buddy list. I’ve had the data gathering working for a few weeks, and I have just finished the chart. Well, sort of.

The charting tool, Vega, is the most beautiful and argumentative contrary piece of shit software I’ve ever used. After spending a couple of hours getting the chart to work in the debugger, I had to address why it didn’t work in the page. My first idea was to upgrade some versions of some things.

That was a complete disaster. Vega has internal conflicts, so now my code can’t compile and won’t run. Using the latest versions of everything, it doesn’t work. Using the same versions of everything from this morning, it doesn’t work. I have turned it off and on again so many times… so now I’m bored and I’m going to do read a book instead. This happens every time with Vega, I seem to be missing something.