July 2019 – EXTENDED STATS BLOG

July 27, 2019July 27, 2019

Terraform

I can imagine the despair out there in gamer land, when you realise that this is another post about technology and not about the site actually doing something. Yeah… sorry about that. My nerdiness is definitely trending towards tech at the moment.

Anyway, Terraform is nothing about Mars. It’s software for specifying what to do with Amazon Web Services. For example, for Extended Stats, it might include stuff like this:

there’s a database
there’s a Lambda which runs every 10 minutes, which looks at a file on pastebin to see who the users are
there’s a Lambda which looks at BGG to get data about users
there’s a host called extstats.drfriendless.com
there’s an API Gateway which connects the host extstats.drfriendless.com to some Lambdas.
and so on… there would be a lot of it.

At the moment I use a technology called Serverless to do that sort of thing, but I am not very good at it. On the other hand, after a couple of days of Terraform, I’m really getting into it. It seems better designed and easy to use.

Sadly I suspect it would be a lot of work to change from Serverless to Terraform for Extended Stats, though I will have to one day. Extended Stats is a poor combination of Serverless and ad-hoc undocumented changes made in AWS. This is fine for Extended Stats as it is, but it would be difficult for someone to reproduce the site.

In particular, the tricky bit is specifying the security permissions, which allow different bits of the project to use / modify / read other bits. For this Terraform stuff I’m doing, I’m very carefully doing even that in Terraform. For Serverless, I have not been so careful (because it was too annoyingly difficult!).

Anyway, it’s cool. One day I will get to use it in anger. Tomorrow though, I suspect I will have to write some features for you.

July 12, 2019July 12, 2019

GraphQL

So there’s this technology called GraphQL (graph query language). It’s a way of getting exactly the data you need from a database using HTTP. That’s a vast oversimplification, but I am not writing an academic paper here.

When I started the site I couldn’t imagine a need for it, as for the pages I know exactly what data I need, so I figured I’d just write API methods that retrieved that data. (That’s what’s happening when you see those 8 pulsing blobs.)

But then I wrote the Comparative Plays page, and that thing requires a boatload of data. At first, it required all of the plays of all of the geeks, and all of the data about all of the games that they’d played. And then I tried to open the page with geeks Friendless, jmdsplotter, and Nap16, and AWS Lambda said “BZZZT! That’s more data than I can return!” That was when I realised I would need to think harder.

So I did an experiment where I fetched the data for Friendless, ferrao and Simonocles. That was 1,462,103 bytes, which is a LOT of data for just one graph. So I mucked around and rewrote the code to only send the first play of each game, and that was 632,447 bytes. And then rather than send year, month and date for a play separately, I just sent YMD, e.g. 20190712. That got it down to 587,717 bytes. And then I realised that I was doing a lot of futzing and there was still a lot of data I didn’t really need still coming, for example the minimum player count for games, and so on. So I decided to think about it very hard for a while.

I then slowly realised that selecting the values you want was one of the things GraphQL offered. So I did some more reading and I did not get how it was supposed to work, but I did find a very nice tutorial written by a bearded gentleman about how to use GraphQL on AWS Lambda. So I sat down one evening earlier this week and copied that code into my API, and after an hour or so got the basic example working. And that was when it clicked what it was doing, so I then proceeded to implement a query for plays following the same pattern. I got it working that evening, in a shambolic kind of fashion. The next evening I came back and cleaned it up and was able to rerun the query for Friendless, ferrao and Simonocles, and it came back with 214,470 bytes. Woohoo! And in transit, that gets gzipped! So I was pretty content with that (because remember, I pay for bytes which come out of the server).

And then the next day I updated the Comparative Plays page to use GraphQL, which wasn’t as easy as it sounded – AWS blatantly lied about the error it was giving me – but in the end it was all good.

Well, there was one moment of neurotic anguish. It turns out that for a game, in that graph, all I need is its number and name. (In fact all I need is the name, OMG OMG OMG, no I will worry about that later.) However I have a nice set of definitions of what a Play is, what a Game is, what a GeekGame is (it’s the relationship between a geek and a game, e.g. the rating), and now with GraphQL I’m saying I’m going to call it a game, but it’s really just a number and a name. And TypeScript is not overly pleased with that, and pedantic programmers aren’t either. So I will have to think about my neuroses.

But now I’m like a man with a hammer! So many things I’ve done in the API can be done better in some other way. Everything should be rewritten! But believe it or not, even I know that’s a bad idea. So I have to restrain myself. Nevertheless there is still that thrill of having discovered a wonderful thing that will keep me interested for a while yet.

July 7, 2019

Login is Useful!

… for small values of useful.

I’ve been working on login again recently. As I say every time, login is hard to work on because it doesn’t work on the test system and it doesn’t work on the development system, due to the authentication service saying “no, you can only login to extstats.drfriendless.com from extstats.drfriendless.com” which is eminently sensible but a bit frustrating. So I have to make the change, send it live, wait up to 24 hours for the CDN to put it on the site, and then I can test my code.

Now when I log in, I get this (we are looking at the three buttons on the right):

The presentation still needs some work – that does not come naturally to me! The orange button is a link to the user page, and the yellow button is a link to the geek page for Friendless.

The user page is only available to you if you’re logged in. It looks like this:

The Buddy Groups aren’t used yet, not quite. What is a bit useful is the list of BGG user names above them. In that area, I can enter the list of BGG users that I am, or just ones that I like to stalk. And then when I’m logged in, those names turn into yellow links. And those will take you directly to the geek page for those users.

I’m not yet happy with the layout of the buttons, but as with all things on the site I’ll figure it out eventually.

Now that I’ve had this first success with user data, I hope to plumb it in in more places in the site, so that logging in becomes a useful thing to do.

July 5, 2019

“Holiday” (it would be so nice)

My wife has leave at the moment, so she suggested that we go on a holiday interstate. I do not have leave at the moment, but I’m the one who does the long-distance driving so I was tangled up in this plan. I often work from home so my boss said it was OK, so next thing you know I’ve driven a thousand kilometres and am staying at a holiday unit at the beach. Here is the view from the balcony.

Of course, I am not on holiday, so I’m busy coding away in between dropping the family at the shops and the train station and so on. I have a couple of pretty big deadlines so I really am busy.

Even worse, the internet here is not as good as at home. For work stuff that’s generally OK. However I’ve done a couple of updates to extstats.drfriendless.com, and it takes quite a while to upload 50 megabytes. However I did finish an update to the user page (the one you get to if you click on your name when you are logged in). It still looks pretty bad because I can’t work Angular Material very well yet, but I think it’s more useable. I’ve also been trying to start using some of the configuration you can set on that page. However I have not really had the time to make a lot of progress. Worst holiday ever.

On the other hand, I was watching my wife web-surf last night, and discovered that the h-index is a popular statistic amongst academics. An academic’s h-index is the highest number n such that they have n papers which have been cited n times each. And of course being academics, whose performance is measured by their h-index and similarly absurdly trivial metrics, they think way too much about this sort of thing.

For example, they have a g-index. The g-index is the largest number n such that their n most-cited papers have been cited on average n times each. We don’t have that metric. NOT YET!

They also have a rational h-index, which is approximately the h-index but with some indication of how close you are to getting to the next number. So we definitely want that! The formula (which took a while to track down) is:

say your h-index is h
and the minimum number of plays you could play to get it to h+1 is n
then your rational h-index h_r is:

h_r = h + 1 – n / (2h + 1)

and of course you keep the fractional part. OMG, I am so excited! I can’t wait to implement it! But right now, I have to have a “holiday”.