July 2018 – EXTENDED STATS BLOG

July 29, 2018June 2, 2020

Once More Into the Breach!

Cry ‘Havoc,’ and let slip the dogs of war;
That this foul deed shall smell above the earth
With carrion men, groaning for burial.

It really has been trench warfare this week. My last post was about the database crash. That took a day to get better by itself, but after discussing the matter with the AWS people on reddit, I decided that this was definitive proof that the database was too small, so I dumped it and got a bigger one. Which is a shame, because I think I already paid $79 for that small one.

Database Statistics — The database hovers between life and death for a week

Anyway, the bigger one is still not very big, but if I recall correctly it will cost about $20 / month. When I get some funding for the project I’ll probably upgrade again, but for the moment I’m struggling along with this one.

The graph above shows CPU used in orange. It’s good when that’s high, it means I’m doing stuff. The blue and green lines are the ones that broke the database during the crash, and they must not be allowed to touch the bottom. Particularly notice when the blue line hit the bottom it stayed there for most of the day and the site was broken. So let’s not do that.

So in response to this problem, I made some changes so that I can control the amount the downloader is doing from the AWS console. So in the graph, if the orange line goes down and the green line goes up, that’s because I turned off the downloader. And then later I turn it back on again. The initial download of games is about half done, so I expect another week or two of this!

On the other hand, the good news is that there are plays in the database, so I started using them. My project yesterday was the favourites table, for which I had to write a few methods to retrieve plays data. That bit is working just fine, and the indexes I have on the plays make it very fast.

The table comes with documentation which explains what the hard columns are, and the column headers have tooltips. There are other things about the table, like the pagination, which still annoy me, but I’m still thinking about what I want there. Some sort of mega-cool table with bunches of features which is used in all the different table features on the site…

That was a major advance, so I decided today to follow up with some trench warfare, and had another shot at authentication. This is so that you can login to the site IF YOU WANT TO. I went back to trying to use Auth0, which has approximately the world’s most useless documentation. When I implement a security system I want to know:

where do the secrets go?
how can I trust them?
what do I have to do?

Auth0 insists on telling you to type some stuff in and it will all work. It doesn’t say where to type stuff in, or what working means, or what I have to do. I know security is complicated, but that doesn’t mean you shouldn’t even try to explain it, it means you have to be very clear. It’s so frustrating.

Authentication dialog — You can sign in but why would you?

But anyway, after a lot of failures I got this thing called Auth0.Lock “working”, in the sense that when you click Login it comes up, you can type in a username and password, and then its happy. I get told some stuff in the web page about who you are.

The remaining problems with this are:

when the web page tells the server “I logged in as this person”, how do I know the web page isn’t lying? Never trust stuff coming to the server from a web page.
there are pieces of information that the client can tell the server, and then the server can ask Auth0 “is this legit?”… but I am not yet getting those pieces of information.
I have to change all of the login apparatus in the web page once you’ve logged in, to say that you’re now logged in and you could log out. But that’s not really confusing, that’s just work.

One of the changes I had to make to get this going was to change extstats.drfriendless.com from http to https. That should have been a quick operation as I did the same for www.drfriendless.com, but I screwed it up and it took over an hour. Https is better for everybody, except the bit that adds the ‘s’ on is a CDN (content delivery network) which caches my pages, so it means whenever I make a change to extstats.drfriendless.com I need to invalidate the caches and then wait for them to repopulate. And that’s a pain.

Nevertheless, I’m pretty optimistic that Auth0 will start playing more nicely with me now that I’m past the first 20 hurdles. Once I get that going, I’ll be able to associate your login identity stuff like what features you want to see. And then I will really have to implement some more features that are worth seeing.

July 23, 2018May 29, 2019

She Cannae Take Any More, Cap’n!

So, a couple of hours after I wrote the blog post last night saying how everything was going full steam ahead, it all blew up. This morning, many bits of the system which were working just fine are failing. This points to the database, which is at the heart of everything, and all indications are that it broke at about midnight.

I had a poke around, and eventually found the BurstBalance metric. In the top right graph, it’s the orange one that dives into the ground and bounces up.

What it seems to be is that if you overuse your database (in particular, the database’s disk) , you eat into your overuse credits, i.e. the burst balance. And at midnight I ran out of burst balance so the database stopped responding.

Well, that’s something I learned today. At least now I know to watch this when the system is under proper load. It’s also a good indication of when it’s time to fork out for a bigger database.

July 22, 2018

Full Steam Ahead!

I’ve been beavering away on plays downloads. There were a couple of bugs so the downloader was stuck for most of the week, but this afternoon I got it working properly (this time for sure), so I cranked up the pace. I told the system to do 100 downloads per minute. It had 231900 to do, so it’ll still take a while – maybe 4 days, unless there are more bugs.

Anyway, that’s to complete that job. As some plays have already been downloaded, I’ve started populating the SdJ column in the War Table, and Total Plays in the Rankings table.

To populate the SdJ column I needed to know what the series were, so I coded up the bit that downloads the series metadata as well. I’ve dumped the old Catan / Carcassonne / Command & Colours, etc series, as they were getting silly and nobody cares. And if somebody does care I can put them back.

When the plays data is ready, I’ll get the War Table and the Rankings Table working properly, and then I’ll be in a position to implement some of the other features of the old system. I have a plan in mind for constructing pages of features, which I will experiment with a bit when I have some features to do it with.

I hope it gets more fun after this. I mean, it’s kinda fun for me to see it all coming together, but it’ll be better when it’s fun for you guys too. Here’s a pretty picture:

Graph of Lambda invocations over time showing sharp increase — Lambda invocations take a salmon leap

July 21, 2018

Watch Out, There Might Be COOKIES.

I was watching ABC TV this morning, and some commentators from “Download This Show” were talking about digital privacy. They mentioned that cookies were used to track you all across the internet and invade your privacy blah blah blah. As I work closely with digital marketing people I have to know a bit about that sort of thing, and I’m not scared of it, but I figure I should tell you guys a bit about potential privacy things.

First of all, cookies. The old site has cookies. I use them to store information you want me to know about your preferences, e.g. screen size and what features you want on your custom page etc. Whenever you come to the site, your browser sends me the cookies and I look in them to see what to do. I don’t know your identity, though it is extremely likely that if you’ve put your BGG user name in the cookie that you’re that same person. But you could put my BGG user name in there if you like. That’s about all cookies are good for.

However, the privacy stuff gets a bit more interesting once you combine it with “pixels” and tracking. A pixel used to be exactly what it says. The page includes a one pixel image which is too tiny to be seen really. However, that pixel is loaded from Facebook. So when your web browser goes to load that pixel from Facebook, it gets told “Facebook user John Farrell requests this pixel because he’s looking at extstats.drfriendless.com” or whatever. So then Facebook knows what you’re looking at, because the person who made the website (me) put the Facebook pixel on the page.

This is very very common on the internet. You’d be appalled. I have a browser plugin called WASP.inspector which tells me how many pixels get installed by a page. news.com.au just downloaded 295 of them in the page. Practically every site you go to has such stuff on it.

Now what the Facebook pixel does is reports back to Facebook that you went to this page. It doesn’t tell me who you are, that information stays with Facebook. But it does mean that if someone goes to Facebook advertising and says “I want to sell stuff to people who like board games”, Facebook can identify you as such a person. Facebook does not tell the seller who you are, they just take that seller’s ad and stick it on your page. So it’s really only Facebook who knows everything about you, and hey, you knew that Facebook knew that stuff anyway, didn’t you?

So, on drfriendless.com I use this thing called Google Tag Manager, which is a place where I can configure all of the pixels I want to dump on you, i.e. all of the third parties who will find out that you visited my site. As I have no particular need to tell Facebook that you were there, there is no Facebook pixel in my GTM, so Facebook does not know that you came to DrFriendless.com. The only thing I do use is Google Analytics.

Analytics is a Google product which tells me a bit about my visitors. Some of the graphs it produces are shown below. Now you know that a person like me who creates a site like this wants those stats! So that is the evil privacy-invading tracking that I’m doing.

As for the future, I intend to be very transparent about privacy. The new General Data Privacy Regulations in Europe kinda require it, as I am operating in the European market, and I think the rules they have are sensible. So I’ll comply with them as much as I can from the ground up.

I also intend to allow users to log in to the site. This won’t be a requirement to get your stats, that will be public as always, but there are some ideas I have that require that I associate data with you, and for that I need to know your identity. And like the cookies on the old site, there will be only circumstantial evidence that links your account on drfriendless.com to your BGG account.

And as for pixels, I think it’s fair to say that I’ll tell you if I add more tracking pixels to tag manager. It might be, for example, that I add the ability to play games online to the site, and one day I need to advertise on Facebook to get more users. So then I could add the Facebook pixel to the site, and tell Facebook “I want more people who are like the people I already have”. I think that would be a reasonable use case.

Oh, and finally before I go, I should tell you that you can block this tracking stuff. I used to run Chrome extensions called Ghostery and Disconnect that stop the pixels, although I don’t know how. I know they worked though, because for days I couldn’t get work’s Google Tag Manager integration happening on my laptop. I eventually realised that if I wanted to test tracking pixels I had to stop blocking them. Hence now wherever I go all of my privacy gets invaded. On the other hand, Facebook ads do occasionally show me things I want, which is a pleasant change.

July 18, 2018

Let There Be Plays!

Sometime recently the downloading of games completed (again), and the list of all things that are expansions of what things became completeish. That meant that I could now download plays, and then infer plays of base games from plays of expansions. So at about 10 o’clock this morning I turned on downloading of plays.

Coincidentally at about 10 o’clock this morning I started leaking database connections. I discovered that I could change the maximum connections that the database allows, so I went from 66 to 200, which just meant that I leaked all of those as well.

So I must have put a connection leak in somewhere, but as it’s now 10:30pm, the connection leak is going to stay there until tomorrow evening when I get time to look at it. And it means that most of the site is broken because it can’t get data from the database. On the other hand, 47000 plays have downloaded successfully. I expect there will be in the order of 20 million, so if I don’t get that bug out it’ll take forever.

July 15, 2018July 21, 2018

An Offering of War

“The Offerings of War” is a statue at the entrance to the Art Gallery of New South Wales. I”m not a really arty person, but I do like that statue and many other exhibits in that gallery. However that is not what this blog is about.

Whenever I can get a spare 36 hours I’ve been working on the new site. The downloader is working quite well, chugging away gathering data, and it had caught up to me, in that it had downloaded everything I had told it to and I needed to tell it to do some more stuff. In particular it needs to download plays. It knows there are 231064 (actual number) instances of a geek having played games in a month that it needs to get data on, but I hadn’t written that code yet. So I started doing that.

Getting the raw plays data is relatively easy, but Extended Stats does not run from raw data. Extended Stats tries to be smart and says “Hmm, he played Roll for the Galaxy: Ambition, so he must have played Roll for the Galaxy as well. I’ll put that in.” Which is all well and good but I hadn’t done the bit where I record which game is an expansion of which other game, so I had to do that first. So now I am reprocessing 61575 games to find out what all the expansions are. It’s working well, but there are 45036 to go.

And then I wrote the plays logic, but I won’t start running it seriously until the expansions are done, or it will look silly. And until the plays are downloaded, there will be gigantic holes in the data.

I was a bit stymied in that direction, so I turned my efforts to the user interface. From the user point of view, the web pages are all there is to Extended Stats, so I have to deliver some of them at some point. One thing I was missing was the War Table. Originally called the Pissing War Table, and renamed to suit the sensibilities of those who weren’t raised in the Australian scrub like I was, the War Table is where geeks can compare their geek cred. It also serves as a directory of everyone on the site and a way to find a link to your personal page. Here’s the new page:

http://extstats.drfriendless.com/wartable.html

The War Table boasts a number of cool features. First of all, I plan to be reusing the table component in a lot of pages, so I forked my own version of it. The plan is that I will be able to upgrade the table which will improve all of the places which use it. – for example, I would like a search box which you can use to find a particular entry in the table. However no such upgrading was done today.

I then added tooltips to the column headers, because I know that these columns can be a bit meaningless if you don’t know what they are. And then I went the whole hog and added a Documentation button, which opens up into a few tabs of doco where I can explain exactly what’s going on.

I’m not completely pleased with the War Table. It doesn’t look exactly how I want it. The pager buttons down the bottom are the wrong colour and I can’t figure out why. There are rounded corners on things when I want square. However I’ll probably bring those pager buttons up to the top of the table, and somehow show the Documentation button, the page flicker, the page size chooser, and the search box all into one row. However domestic duties call, and I must make an Offering of Peace to Scrabblette.

July 8, 2018

It Don’t Matter Who You Are, Just So Long As You Are There…

I’ve been working on more invisible stuff. That’s why although it looks like there’s no change, I’m still exhausted and pissed off. This afternoon’s adventure was with a user login capability. As it’s security stuff, it’s confusing and mostly seems like useless guff… but as I know so little about it, I just code how I’m told.

Now the old site doesn’t have user logins, so you might think that the new site doesn’t need them either. On the other hand the old site uses way too many cookies, and I’d like a more robust solution than that. Also I have ideas for very cool features that produce user-specific information that they would want to keep and edit later. So I have to have user login. It won’t be required to use the site, but it will be required for features that need to store information on a user’s behalf.

There are third-party packages that can handle these things for you. Auth0 is one of them, so I jammed in some Angular code to allow users to login to the site with their Facebook or Google credentials. And that sort of worked a bit. But then Auth0 called me to tell me someone had logged in and it all went pear-shaped. See Auth0 calls me at a URL that I specify, and it sticks the user’s credentials on the end of the URL. However as that URL went through an API Gateway to get to my Lambda, AWS lost the URL and I didn’t get the information I needed. I think that’s a design flaw on their part. Apparently there are ways to get around that, but as I was trying to understand them I found another option.

AWS has a service called AWS Mobile which offers a suite of user-attached features, such as authentication, profile photos, and a bunch of other guff that I ignored. You see, my requirements are trivial beyond belief – I just want to know if this person logging in is the same person who logged in some other time. I don’t care for their name, email address, blah blah blah. I just need an opaque token that I can save in the database, and when I see that token again I get the settings out of the database and start using them for that user. Nobody seems to design for such a simple use case.

Anyway, I signed up to my own site using the AWS Mobile widget, and I appeared in the user pool on the back-end. Hooray! The widget doesn’t behave very well, so I’ll have to explore that, and I still haven’t figured out how to get the opaque token I was wanting, but the documentation seems nice. Though I do get sick of being told how to install stuff, I’d much prefer to hear what it does and how it’s used.

So that was my Sunday afternoon. Stuff is progressing, slowly slowly.

July 7, 2018July 21, 2018

Why Rewrite?

DrFriendless holds his face in his having just knocked down Villa Paletti — The agony of incompetence

One question I often ask myself is “why would you rewrite working software? Have you not read the millions of articles saying it’s a bad idea?”

Well, yeah… but. I started writing Extended Stats in 2004 or so, when Python was a new programming language, and I was very excited about using this new language. The plan was to just do a bit of scripting with this easy language, and see some cool numbers. At first I was generating HTML pages, and then I moved on to generating them overnight and bulk uploading to some free hosting site, and then in maybe 2008 I recoded most of the system to run in a web server and make pages up on the fly, much as it is today. And then I stopped mucking with the infrastructure, and jammed a whole bunch of features on and off until 2015.

Then in 2015 I moved states and jobs, and learned a whole bunch of new things that I would very much like to muck with. Also – and don’t laugh, this is true – the study in our new house is too hot in summer and too cold in winter, and I hate sitting at my PC. I ended up getting a laptop which I proceeded to use more than my PC. Extended Stats runs on the PC and so is kinda bound to it, so it didn’t get any love.

When I did drag myself into the study to look at Extended Stats, I found that I was struggling to decode the Python that I’d written 10 years previously. Python doesn’t have types, and types are a strong indication of the structure of a piece of code. So I’d be in the guts of the play calculation stuff, trying to figure out how it worked, without many clues from the code. And that’s bad. I know many companies are using Python for big projects, but I think that’s a bad plan. I am a doctor in this stuff, and I’ve been writing Python for maybe 18 years, so I think my opinion is not without foundation.

So, with me not wanting to go to my PC, and not wanting to look at the code, and being excited about other technologies, Extended Stats got no love. However I was still interested in the data set, and wrote a couple of applications in Kotlin to do some cool stuff with it.

And then I got a job working with Amazon Web Services. AWS is a cloud computing platform. The cloud is a great place to host software. It is a much much better place than the PC in my hot and cold study. You can get to the cloud from your laptop! I had experimented with cloud-hosting Extended Stats before – I got one version running on an EC2 (just a virtual computer), and one version running on an EC2 with an external database on RDS. However as I had no money at the time to pay hosting fees, and I didn’t really see why hosting like that would be significantly better than hosting on my PC, I didn’t bother with that plan. However, that little bit of experience with AWS came in invaluable when I ended up getting a job working with software that is hosted like that.

And then Amazon invented this thing called Lambda. Back in the 1930s, the mathematician Alonzo Church invented a thing called lambda calculus which is a mathematical model based on functions. Lambda (λ) is the symbol used to denote the start of a function. What AWS invented is a way to put your code in the cloud and let it be run without worrying about EC2s or any other sort of virtual machine. Basically you write the code, and they run it, and stuff happens. At first I was sceptical that it would be viable, but it seems that it is as they’re making a big business out of it. Furthermore it is very cheap.

That has led to this thing called serverless computing, where rather than write a program in any traditional sense, or even a bunch of services hosted inside a web server or other container, you just write Lambdas. And then you tie them together with string and chewing gum, and hey presto, you have an application that runs in the cloud. If you need to run one Lambda a month, that’s fine. If you need to run a thousand simultaneously, that’s fine too.

That brings me to another problem with Extended Stats. I’m approaching the 3000 user mark, and the bandwidth required is making an impression on our home quota. As we’re watching more TV across the internet now, and downloading PlayStation games, there could be problems. And I’d love to take Extended Stats up to 10000 or 100000 users. And that would be a problem.

So implementing the system with AWS Lambda has these advantages:

infinite scalability
does not use my home bandwidth
can be accessed from my laptop, even if I’m on holidays
does not require me to go to the study
fun to play with
different lambdas can be written in different languages so I can muck with new things.

That solves most of the problems I was having, except for the disadvantage:

all the code has to be different.

But hey, I’m a programmer! I can fix that bit! In fact I love doing that bit.

I came up with this plan about halfway through 2017, and started mucking around a bit towards the end of the year. However there’s not really a standard way to do that sort of architecture yet, so I had to do some experiments which were often abysmal failures. That got a bit depressing so my interest sort of drifted in and out. Then in June this year I decided to have another bash at some of the boring bits, and got it to work. And then I got some more bits to work, and next thing you know the whole plan was coming together. So then I started a blog to keep people up-to-date on where the project was at.

Now, I admit, I could have kept some of the Python. There is a variant of Python that uses types. But even Python is broken these days. After version 2.7 of Python (I think I started on 1.5.2 or something), they went to version 3 which was radically different. So even if I did keep the Python there was a lot of rewriting to do. So… nope.

As it turns out I’m doing most of the coding in Node.js in TypeScript, which I didn’t know much about and am still not very impressed with, even though it’s doing the job.

I did decide to keep the SQL schema. They have NoSQL databases these days, but a lot of the Extended Stats data really is relational, and there’s not so much value in changing to NoSQL despite it being flavour of the month. Although the guys at the Sydney MongoDB meetup are really nice guys and I enjoy going to see them. And then, when I say “I decided to keep the SQL schema”, what I really mean is “I started with the old SQL schema, didn’t like a lot of things and changed them pretty drastically.” Creators gonna create, you know.

I really like where the new system is going. As mentioned in an earlier post, I got my monthly bill and it’s cheap, I’m fixing a lot of the bad architectural choices the first system evolved into, and I’m learning oh-so-much stuff and have oh-so-many great ideas. I hope you all can enjoy this as much as I do.

July 6, 2018May 29, 2019

Ooh, I got the bill!

One of the thrilling things about cloud computing is sticker shock. It’s OK once you’ve established your pattern of usage and know roughly how much you’ll pay each month, but for an experimenter like me, strange things can happen. Like that time last year I tried something out, it didn’t work, then a month later I got a bill for $40 because I hadn’t deleted a thing and I didn’t even know what the thing did. So it was that this month I was a bit worried, as the site had been running somewhat properly for a few weeks.

So after that stress, $12 is good news! It suggests that I’m doing something right and that I can keep this up.

I’ll explain what the bits are. RDS is Relational Database Service – the database. AWS does automatic backups and so on – we tried it at work the other day, it’s pretty nice – so for that money they’re keeping our data safe. EC2 is the virtual computer that the blog runs on. It’s a tiny one, so it could be free, but I think I used up my quota of free EC2s already. Route 53 is the DNS (Dynamic Name Service), i.e. the bit that tells the world where stats.drfriendless.com, extstats.drfriendless.com, api.drfriendless.com and blog.drfriendless.com are – they are all different computers, if they exist at all. Data transfer is costs for downloading from BGG. 4c seems reasonable. Of the “other” 5c, 4c is for S3 (Something Simple Storage) which is like a big hard drive, and 1c is for API Gateway, which is the bit that implements api.drfriendless.com by turning API requests into calls to lambdas.

There’s no charge for Lambdas, as I seem to be under the free limit still. When I get more stuff going I might have to pay 20c for that. However that is where I’ve put most of my effort recently – trying to keep them small, and trying to make them do their job and not run uselessly.

I love it when a plan comes together.

July 1, 2018

Collections and Mechanics

There has been much activity on the site this weekend. I guess I should explain a little how the site works… I’m using an architecture called “serverless”, where I code each thing that the site does separately (as a thing called a Lambda). So far there are 17 different Lambdas that all beaver away. Most of them at the moment are cooperating to populate the database. Amazon charges me money depending on how many times they are all invoked and how big they are – here’s a graph of the number of invocations:

I received email from Amazon saying “you’ve used so many of those it’s almost not free any more”. That wasn’t alarming in itself, but what it told me was “the less attention you pay to costs, the more this will cost you”. So I did some research on how to make the Lambdas smaller, and how many I would need and so on. My guess at the moment is that Lambdas will cost me about $1 per thousand users per month. As long as I’m careful / skilful.

So that was why I spent some time doing invisible stuff that made the site cheaper to run. At the moment it looks like the database will be by far the most expensive part of the site (maybe $200 / year), and my plan is to eventually design that cost away. So it’s all good as long as I pay attention.

But after doing that I had a look at funding the site with ads. It’ll never be cost-free, and I don’t plan it to be profitable, but I *would* like the site to be cost-neutral. I was going to put Amazon Affiliate ads on the site, but Amazon said “nope, drfriendless.com is not an appropriate name for putting our ads on”, so that plan died very quickly. So then I signed up for Google Adsense, and the job of actually sticking those ads in is yet to be done.

Just have a look at this page and see if one is there yet:

http://extstats.drfriendless.com/

That’s the stats stats page that I can quickly look at to see whether stuff is progressing. It is progressing, slowly, but I don’t want to crank up the rate of work until more things are working. By my calculations, at full speed I would need to fire off 100 tasks every minute to support 3000 users. At the moment I’m just doing 10 per minute.

What does stuff working even mean? The site has been downloading game information for a couple of days, but it was throwing away the categories and mechanics. So I added the ability to record those. I also cleaned up the database so that I don’t store categories and mechanics by name everywhere, which will make this site more efficient than the old one (and hopefully faster, goodness me the old site is slow). That involved some work with foreign key constraints and async / await stuff which is a bit beyond the scope of this blog, but made me happy. And now categories and mechanics are going into the database.

The next step will be some sort of War Page and some sort of Front Page (with game rankings). I’ve decided those are the minimal features I require before I tell new users that the site exists. And the War Page will link to the user-specific pages, as the old site does.

Well, that’s enough chit-chat, I’m going back into the code mines.