Technology – EXTENDED STATS BLOG

June 20, 2020June 20, 2020

Like a Bought One!

I had a pleasant tech experience yesterday. That’s a sufficiently rare experience that I need to write about it.

I’ve been working with a technology called GraphQL, which is a bit like SQL expect it works with structured data, e.g. a Facebook user, who has a list of friends and a list of posts, and the post have users who liked them, and the friends have posts of their own, and so on, not to mention the cats. GraphQL lets you write a query (like SQL does) that you send to a server which executes it and returns the data.

That sounds easy. The hard bit for me is that because I own the data, I have to write the bit that retrieves the structured data from the database. GraphQL makes that easy enough, so I can usually just do a query on a table, give it all the data, and it sorts out what it wants. People say bad things about it, but I think GraphQL is a wonderful technology.

However it’s not always that easy. I have a concept in my database called a GeekGame. It’s the relationship of one geek with one game. It includes values like that geek’s rating for that game, and whether they own it or not. It does not include the name of the game, or how many players it’s for – that data goes in the game.

So with GraphQL I can ask for a GeekGame, and I can say that I want to get whether the geek owns it, but I don’t want the rating. But until yesterday, one thing I could NOT do which I should have been able to was get the game as well. So I would have liked to say “give me all the games owned by this geek, with the rating and the name”.

The problem was that GraphQL would ask me for all of the GeekGames, and then after retrieving them, it would find out which games it wanted, and ask me for them one by one. And as there were hundreds of them, that would generate a lot of queries (which cost me money and take a long time). And in fact I couldn’t get it to work at all. Quite a few times the database got cranky at me and refused to talk to me any more until I found out about the FLUSH HOSTS command and ran that. That was a learning experience!

But, there’s a technology called DataLoader, which was made at Facebook (they invented GraphQL as well), which promised to collect all of the single queries, and give them to me in a batch. Then I could do one query on the game table, give it to GraphQL, and it would sort the rest out. So I thought Id give it a go.

O. M. G.

It worked, first time. I don’t even mean it worked after I fixed all of my stupid mistakes. I mean I copied from the example, implemented my bulk query, and kablammo, it did what it was supposed to.

So I was kinda pleased about that.

I could then convert the Monthly Page over to use GraphQL, and then today I could add the How Much Do You Play New Releases? chart to that table. For that, I needed the publication year of the game, which that page did not previously load. But with GraphQL it’s simply a matter of changing the data query URL to say you want that field as well.

I feel I’m on a bit of a roll with the new features now. I’ve stabilised on Angular 9 (with Ivy, the new compiler technology), some version of Vega that seems to not break, and GraphQL. According to my Trello board I have about 10 features remaining to be feature-compatible with the old system. Then with any luck I can get to inventing cool new stuff.

July 27, 2019July 27, 2019

Terraform

I can imagine the despair out there in gamer land, when you realise that this is another post about technology and not about the site actually doing something. Yeah… sorry about that. My nerdiness is definitely trending towards tech at the moment.

Anyway, Terraform is nothing about Mars. It’s software for specifying what to do with Amazon Web Services. For example, for Extended Stats, it might include stuff like this:

there’s a database
there’s a Lambda which runs every 10 minutes, which looks at a file on pastebin to see who the users are
there’s a Lambda which looks at BGG to get data about users
there’s a host called extstats.drfriendless.com
there’s an API Gateway which connects the host extstats.drfriendless.com to some Lambdas.
and so on… there would be a lot of it.

At the moment I use a technology called Serverless to do that sort of thing, but I am not very good at it. On the other hand, after a couple of days of Terraform, I’m really getting into it. It seems better designed and easy to use.

Sadly I suspect it would be a lot of work to change from Serverless to Terraform for Extended Stats, though I will have to one day. Extended Stats is a poor combination of Serverless and ad-hoc undocumented changes made in AWS. This is fine for Extended Stats as it is, but it would be difficult for someone to reproduce the site.

In particular, the tricky bit is specifying the security permissions, which allow different bits of the project to use / modify / read other bits. For this Terraform stuff I’m doing, I’m very carefully doing even that in Terraform. For Serverless, I have not been so careful (because it was too annoyingly difficult!).

Anyway, it’s cool. One day I will get to use it in anger. Tomorrow though, I suspect I will have to write some features for you.

July 12, 2019July 12, 2019

GraphQL

So there’s this technology called GraphQL (graph query language). It’s a way of getting exactly the data you need from a database using HTTP. That’s a vast oversimplification, but I am not writing an academic paper here.

When I started the site I couldn’t imagine a need for it, as for the pages I know exactly what data I need, so I figured I’d just write API methods that retrieved that data. (That’s what’s happening when you see those 8 pulsing blobs.)

But then I wrote the Comparative Plays page, and that thing requires a boatload of data. At first, it required all of the plays of all of the geeks, and all of the data about all of the games that they’d played. And then I tried to open the page with geeks Friendless, jmdsplotter, and Nap16, and AWS Lambda said “BZZZT! That’s more data than I can return!” That was when I realised I would need to think harder.

So I did an experiment where I fetched the data for Friendless, ferrao and Simonocles. That was 1,462,103 bytes, which is a LOT of data for just one graph. So I mucked around and rewrote the code to only send the first play of each game, and that was 632,447 bytes. And then rather than send year, month and date for a play separately, I just sent YMD, e.g. 20190712. That got it down to 587,717 bytes. And then I realised that I was doing a lot of futzing and there was still a lot of data I didn’t really need still coming, for example the minimum player count for games, and so on. So I decided to think about it very hard for a while.

I then slowly realised that selecting the values you want was one of the things GraphQL offered. So I did some more reading and I did not get how it was supposed to work, but I did find a very nice tutorial written by a bearded gentleman about how to use GraphQL on AWS Lambda. So I sat down one evening earlier this week and copied that code into my API, and after an hour or so got the basic example working. And that was when it clicked what it was doing, so I then proceeded to implement a query for plays following the same pattern. I got it working that evening, in a shambolic kind of fashion. The next evening I came back and cleaned it up and was able to rerun the query for Friendless, ferrao and Simonocles, and it came back with 214,470 bytes. Woohoo! And in transit, that gets gzipped! So I was pretty content with that (because remember, I pay for bytes which come out of the server).

And then the next day I updated the Comparative Plays page to use GraphQL, which wasn’t as easy as it sounded – AWS blatantly lied about the error it was giving me – but in the end it was all good.

Well, there was one moment of neurotic anguish. It turns out that for a game, in that graph, all I need is its number and name. (In fact all I need is the name, OMG OMG OMG, no I will worry about that later.) However I have a nice set of definitions of what a Play is, what a Game is, what a GeekGame is (it’s the relationship between a geek and a game, e.g. the rating), and now with GraphQL I’m saying I’m going to call it a game, but it’s really just a number and a name. And TypeScript is not overly pleased with that, and pedantic programmers aren’t either. So I will have to think about my neuroses.

But now I’m like a man with a hammer! So many things I’ve done in the API can be done better in some other way. Everything should be rewritten! But believe it or not, even I know that’s a bad idea. So I have to restrain myself. Nevertheless there is still that thrill of having discovered a wonderful thing that will keep me interested for a while yet.

June 20, 2019July 12, 2019

So You Can Take That Cookie…

I’ve been working on the login button for a few days. This is not because I want to, but because I discovered that the way I was handling login was regarded as bad practice. When a user logs in, Auth0 sends me a thing called a JWT (JSON Web Token), which is effectively information about who that user is and what privileges they get. So I was getting that and storing it in browser local storage where other parts of the site could retrieve it later.

It turns out that’sbad, because third party code that I use on the site might look into the browser local storage and get the JWT out, and send it off somewhere else for Nefarious Purposes (TM). Well, we don’t want nefarious porpoises around here. So the better way todo it is forme to send the JWT to my server, and for the server to set a cookie reminding me of who you are. That sounds easy enough.

But oh goodness me, the drama! Because my site is extstats.drfriendless.com, and my server is api.drfriendless.com, which are different, they don’t trust each other unless I do all sorts of “yeah, it’s OK, they’re my friend” stuff in the code. That’s called CORS, and although it’s not so complicated, it’s just too boring to remember.

And you can’t do CORS and cookie stuff with the API Gateway Lambda integration (well, not very easily using serverless framework), you have to use the lambda-proxy integration. Which is OK, but it means everything in the code has to be much more explicit. So I did all that.

And then it still didn’t work. I could see the Set-Cookie header coming back from my server, but Chrome denied it existed. Firefox said it existed, but ignored it. So I poked around for a bit longer, and found out that if you set an expiry time on a cookie, Chrome throws it away. Why? I have no idea. It just does. So I have to set the maximum age for the cookie instead.

And then finally I got the cookie set. And by then I had kinda forgotten what I was trying to achieve. Like a chump!

So I think now the cookie is working as intended, but I have to change the code on the pages to use it properly. At the moment the user page (the one you get to if you click on your user name under the Logout button) is broken, and is awaiting the CDN’s pleasure to be fixed.

Overall I quite like this solution. I feel I have more control over where data is going, and I understand how it works. It has just been pretty painful to get to this point!

April 27, 2019August 24, 2019

A Bad Reaction

I may have mentioned several times on this blog that I use a technology called Angular to write the web pages on the site. That’s what I use at work so I have some experience with it. There’s a competing technology called React, which I have heard a lot about, in that I have subscribed to various React podcasts and so on, and generally opened myself up to it in the hope that understanding would come without too much effort.

Several BGG users who use React have told me “Angular’s dead! React has won that battle!” So, in response to such taunting, and also in response to a weakness in Angular where I can’t have two Angular applications on the same page, I decided to rewrite the navigation bar in React. The nav bar is a terribly simple component, and it couldn’t be that hard, could it?

Indeed, it was not. React (like Angular) has a nice feature where I can change the code and see it instantly changed in my web browser. So, following my usual process of one Google search for one line of code, I got it working to my satisfaction. And it was so simple I really did not need React, but maybe some other ideas I have will justify my choice in the future.

So that was good, I had some working React, now I just needed to update my process to take the JavaScript that React produced and put it onto my web site. React produced 3 files, with extremely unhelpful obfuscated names that change every time you change anything. This is bad because the rest of my site, which loads those files, does not change very much. I edit those files by hand, and there’s no way I’m going to edit them to put unhelpful obfuscated file names into them on a regular basis.

No problem! There’s a thing called webpack which does that. Oh yeah, except for the problem that webpack is badly documented, hard to configure, and keeps changing. My first attempt to use it involved copying some configuration I found on the internet (it seemed like a good idea at the time). And that seemed to work at first, but then it stopped. So I had to poke around and debug a whole lot of things to figure out what was going on, and that took WAY TOO LONG.

Of course, this could have been a whole lot easier if React had not generated the unhelpful obfuscated file names, but the React developers refuse to understand why anyone might not want that. It seems that React is the whole universe, and once the code works in any fashion, their job is done. They have no concern with fitting in with larger processes. Nevertheless despite my fury on that particular point, I liked React and I’ll use it again. Webpack, on the other hand, needs a bullet in the head.

Well, I hope this blog post provides me some catharsis! I’ve spent several evenings reading up on how webpack works when I never cared to know in the first place.

One final word. Angular is not dead. In my time outside the React world, I heard a lot about state management. There’s a technology called Redux which is used to help with that. It’s totally irrelevant in the Angular world, because we don’t have the state management problem – the solution is built in. So it worries me a bit that React requires a complex (and by all accounts, really boring) solution to a non-problem. Angular does have its problems (I nominate zone.js) but I’d say otherwise the technologies are equally capable, but different. And different is OK.

April 14, 2019July 12, 2019

Swaggering, or Staggering?

During the week (well, last week, I think), a strange thing happened. I got an email from GitHub, where I keep the source code for the site, saying that someone wanted to fix a bug in the site. So I had a look at what they suggested and it made perfect sense, so I accepted the change. So since then that developer and I have been chatting, and now I have a collaborator. Or at the very least, a second opinion on many things. This is very odd for a doctor of friendlessness such as myself.

Now I have been a programmer longer than most people have been alive (source) so I have a great many opinons about programming things, but for much of the technology used in Extended Stats Serverless, I’m quite new to it. And being a humble motherfucker with a big ass appetite to learn, I’m very happy to take advice on these techs.

So one thing that has been on my mind is a product called Swagger (now changed to the Open API Initiative, which is a very dull name). Swagger is a system for describing APIs. An API is an application programming interface, which means it’s a way to talk to an application by programming. These days, it is coming to mostly mean a way to talk to a web application by making calls to the web site. One way to use an API is as the back-end to the web site, which is what I do.

When you go to extstats.drfriendless.com, all of that stuff is being loaded from AWS S3 (think of that as Amazon’s hard drive). Then the code runs in your web browser and talks to api.drfriendless.com to retrieve data – api.drfriendless.com implements the API. So what I was doing with Swagger was documenting what you could say to the API and what it would say back.

That doesn’t sound hard. And indeed it wasn’t, to do the sort of half-assed job that I did. Nevertheless it came to about a thousand lines, which took quite a while to do. I’m sure you’re desperate to see what I produced, so here it is:

https://app.swaggerhub.com/apis/DrFriendless/ExtendedStatsServerless/1.0.0

Yeah. Although the swaggerhub interface is nice, it’s still pretty dry stuff. Nevertheless I felt it was an essential bit of due diligence that I should do. And while doing I found a few cases where I thought “huh? why did I do it like that?” which I will probably go back and redo to be more sensible. So that’s a good thing.

November 17, 2018July 12, 2019

In Which Jack Climbs the Elastic Beanstalk

Woohoo, I got Elastic Beanstalk to work! The “Express & EB” sticky note has been removed! Actually I got Elastic Beanstalk to work on about Tuesday, and today I’ve been learning more about getting Express to do stuff.

The first thing to do was a URL to find geeks with a particular partial name. For example, http://eb.drfriendless.com/findgeeks/Fri will give you up to 10 geeks whose names start with “Fri”. And http://eb.drfriendless.com/findgeeks/boy will give you up to 10 geeks whose names start with “boy”. Because there are none, it will instead give you up to 10 geeks with “boy” somewhere in their name. This is how autocomplete fields work, but I didn’t write the autocomplete field yet so I can’t show you what the plan is.

Update: here is the working autocomplete field:

So that was a pretty fine victory to get that working, to have Elastic Beanstalk, Express and the database all working together on one problem. I then went onto address a somewhat more abstruse yet expensive problem.<a href="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24.png"><img class="alignnone wp-image-175 size-large" src="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-1024x522.png" alt="Lambda performance for the last week." width="525" height="268" srcset="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-1024x522.png 1024w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-300x153.png 300w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-768x392.png 768w" sizes="(max-width: 525px) 100vw, 525px" /></a>The top graph is the time consumed by lambdas that I invoked in the last week. The bottom graph is the number of times different lambdas were invoked. There’s a big lump every 3 days because I check for new plays and collection updates every 3 days, and due to bad planning, everybody comes due at about the same time. I should fix that by dithering the time difference so it’s a random time period which is about 3 days, and those peaks spread out a bit over time. The real problem though is the height of the blue peaks in the top graph. The particular offender there is the lambda which takes information about a user’s collection that I got from BGG, and writes it to the database. It’s in two parts: <ol> <li>Make sure that all the games in the user’s collection are in the Extended Stats database. Sometimes for a new user, there are a lot that aren’t.</li> <li>For each game in the user’s collection, update our information about it.</li> </ol> For some reason – either a faint memory or a hallucination – I had the idea that the first one was the slow one, so I moved that into Elastic Beanstalk. Elastic Beanstalk has the advantages over Lambda that: <ul> <li>A task taking longer doesn’t cost me more, and doesn’t time out.</li> <li>It can be cheaper, if I use it for enough tasks.</li> <li>There’s no cold start time – if a Lambda hasn’t been used in a while it can take a few seconds to start up, particularly if it has to talk to the database.</li> <li>A Lambda costs some fixed (but tiny) amount every time it’s invoked, so it’s worse if it’s invoked a lot, e.g. for autocompletes.</li> </ul> The disadvantages are: <ul> <li>Unlike Lambda, I pay for Elastic Beanstalk even if it doesn’t do anything.</li> <li>I have to do all this hoohah about setting up Elastic Beanstalk and Express.</li> </ul> I planned for the site to be serverless (i.e. no server machines, not even virtual ones), but I don’t think that’s quite cheap enough yet for my use case and my budget. As to Part 2 mentioned above, I realised I had some quite poor code in that, that I’d copied across from the old system, and probably wasn’t appropriate for a system where resources cost me money. So I rewrote that as well. Speaking of performance hits, this is a graph of the database usage. When the green line is at the bottom, the database is as good as dead. This is because I’m cheap and paid for a cheap database. <img class="alignnone wp-image-176 size-large" src="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-1024x387.png" alt="It's bad when the green line bottoms out." width="525" height="198" srcset="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-1024x387.png 1024w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-300x113.png 300w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-768x291.png 768w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57.png 1586w" sizes="(max-width: 525px) 100vw, 525px" /> Note that every 3 days, when I do the collection and plays updates, the green line dives into the ocean. If I can improve my database usage, I can stop it doing that. Look forward to updates on that!

July 29, 2018June 2, 2020

Once More Into the Breach!

Cry ‘Havoc,’ and let slip the dogs of war;
That this foul deed shall smell above the earth
With carrion men, groaning for burial.

It really has been trench warfare this week. My last post was about the database crash. That took a day to get better by itself, but after discussing the matter with the AWS people on reddit, I decided that this was definitive proof that the database was too small, so I dumped it and got a bigger one. Which is a shame, because I think I already paid $79 for that small one.

Database Statistics — The database hovers between life and death for a week

Anyway, the bigger one is still not very big, but if I recall correctly it will cost about $20 / month. When I get some funding for the project I’ll probably upgrade again, but for the moment I’m struggling along with this one.

The graph above shows CPU used in orange. It’s good when that’s high, it means I’m doing stuff. The blue and green lines are the ones that broke the database during the crash, and they must not be allowed to touch the bottom. Particularly notice when the blue line hit the bottom it stayed there for most of the day and the site was broken. So let’s not do that.

So in response to this problem, I made some changes so that I can control the amount the downloader is doing from the AWS console. So in the graph, if the orange line goes down and the green line goes up, that’s because I turned off the downloader. And then later I turn it back on again. The initial download of games is about half done, so I expect another week or two of this!

On the other hand, the good news is that there are plays in the database, so I started using them. My project yesterday was the favourites table, for which I had to write a few methods to retrieve plays data. That bit is working just fine, and the indexes I have on the plays make it very fast.

The table comes with documentation which explains what the hard columns are, and the column headers have tooltips. There are other things about the table, like the pagination, which still annoy me, but I’m still thinking about what I want there. Some sort of mega-cool table with bunches of features which is used in all the different table features on the site…

That was a major advance, so I decided today to follow up with some trench warfare, and had another shot at authentication. This is so that you can login to the site IF YOU WANT TO. I went back to trying to use Auth0, which has approximately the world’s most useless documentation. When I implement a security system I want to know:

where do the secrets go?
how can I trust them?
what do I have to do?

Auth0 insists on telling you to type some stuff in and it will all work. It doesn’t say where to type stuff in, or what working means, or what I have to do. I know security is complicated, but that doesn’t mean you shouldn’t even try to explain it, it means you have to be very clear. It’s so frustrating.

Authentication dialog — You can sign in but why would you?

But anyway, after a lot of failures I got this thing called Auth0.Lock “working”, in the sense that when you click Login it comes up, you can type in a username and password, and then its happy. I get told some stuff in the web page about who you are.

The remaining problems with this are:

when the web page tells the server “I logged in as this person”, how do I know the web page isn’t lying? Never trust stuff coming to the server from a web page.
there are pieces of information that the client can tell the server, and then the server can ask Auth0 “is this legit?”… but I am not yet getting those pieces of information.
I have to change all of the login apparatus in the web page once you’ve logged in, to say that you’re now logged in and you could log out. But that’s not really confusing, that’s just work.

One of the changes I had to make to get this going was to change extstats.drfriendless.com from http to https. That should have been a quick operation as I did the same for www.drfriendless.com, but I screwed it up and it took over an hour. Https is better for everybody, except the bit that adds the ‘s’ on is a CDN (content delivery network) which caches my pages, so it means whenever I make a change to extstats.drfriendless.com I need to invalidate the caches and then wait for them to repopulate. And that’s a pain.

Nevertheless, I’m pretty optimistic that Auth0 will start playing more nicely with me now that I’m past the first 20 hurdles. Once I get that going, I’ll be able to associate your login identity stuff like what features you want to see. And then I will really have to implement some more features that are worth seeing.