Why Rewrite?

DrFriendless holds his face in his having just knocked down Villa Paletti
The agony of incompetence

One question I often ask myself is “why would you rewrite working software? Have you not read the millions of articles saying it’s a bad idea?”

Well, yeah… but. I started writing Extended Stats in 2004 or so, when Python was a new programming language, and I was very excited about using this new language. The plan was to just do a bit of scripting with this easy language, and see some cool numbers. At first I was generating HTML pages, and then I moved on to generating them overnight and bulk uploading to some free hosting site, and then in maybe 2008 I recoded most of the system to run in a web server and make pages up on the fly, much as it is today. And then I stopped mucking with the infrastructure,  and jammed a whole bunch of features on and off until 2015.

Then in 2015 I moved states and jobs, and learned a whole bunch of new things that I would very much like to muck with. Also – and don’t laugh, this is true – the study in our new house is too hot in summer and too cold in winter, and I hate sitting at my PC. I ended up getting a laptop which I proceeded to use more than my PC. Extended Stats runs on the PC and so is kinda bound to it, so it didn’t get any love.

When I did drag myself into the study to look at Extended Stats, I found that I was struggling to decode the Python that I’d written 10 years previously. Python doesn’t have types, and types are a strong indication of the structure of a piece of code. So I’d be in the guts of the play calculation stuff, trying to figure out how it worked, without many clues from the code. And that’s bad. I know many companies are using Python for big projects, but I think that’s a bad plan. I am a doctor in this stuff, and I’ve been writing Python for maybe 18 years, so I think my opinion is not without foundation.

So, with me not wanting to go to my PC, and not wanting to look at the code, and being excited about other technologies, Extended Stats got no love. However I was still interested in the data set, and wrote a couple of applications in Kotlin to do some cool stuff with it.

And then I got a job working with Amazon Web Services. AWS is a cloud computing platform. The cloud is a great place to host software. It is a much much better place than the PC in my hot and cold study. You can get to the cloud from your laptop! I had experimented with cloud-hosting Extended Stats before – I got one version running on an EC2 (just a virtual computer), and one version running on an EC2 with an external database on RDS. However as I had no money at the time to pay hosting fees, and I didn’t really see why hosting like that would be significantly better than hosting on my PC, I didn’t bother with that plan. However, that little bit of experience with AWS came in invaluable when I ended up getting a job working with software that is hosted like that.

And then Amazon invented this thing called Lambda. Back in the 1930s, the mathematician Alonzo Church invented a thing called lambda calculus which is a mathematical model based on functions. Lambda (λ) is the symbol used to denote the start of a function. What AWS invented is a way to put your code in the cloud and let it be run without worrying about EC2s or any other sort of virtual machine. Basically you write the code, and they run it, and stuff happens. At first I was sceptical that it would be viable, but it seems that it is as they’re making a big business out of it. Furthermore it is very cheap.

That has led to this thing called serverless computing, where rather than write a program in any traditional sense, or even a bunch of services hosted inside a web server or other container, you just write Lambdas. And then you tie them together with string and chewing gum, and hey presto, you have an application that runs in the cloud. If you need to run one Lambda a month, that’s fine. If you need to run a thousand simultaneously, that’s fine too.

That brings me to another problem with Extended Stats. I’m approaching the 3000 user mark, and the bandwidth required is making an impression on our home quota. As we’re watching more TV across the internet now, and downloading PlayStation games, there could be problems. And I’d love to take Extended Stats up to 10000 or 100000 users. And that would be a problem.

So implementing the system with AWS Lambda has these advantages:

  • infinite scalability
  • does not use my home bandwidth
  • can be accessed from my laptop, even if I’m on holidays
  • does not require me to go to the study
  • fun to play with
  • different lambdas can be written in different languages so I can muck with new things.

That solves most of the problems I was having, except for the disadvantage:

  • all the code has to be different.

But hey, I’m a programmer! I can fix that bit! In fact I love doing that bit.

I came up with this plan about halfway through 2017, and started mucking around a bit towards the end of the year. However there’s not really a standard way to do that sort of architecture yet, so I had to do some experiments which were often abysmal failures. That got a bit depressing so my interest sort of drifted in and out. Then in June this year I decided to have another bash at some of the boring bits, and got it to work. And then I got some more bits to work, and next thing you know the whole plan was coming together. So then I started a blog to keep people up-to-date on where the project was at.

Now, I admit, I could have kept some of the Python. There is a variant of Python that uses types. But even Python is broken these days. After version 2.7 of Python (I think I started on 1.5.2 or something), they went to version 3 which was radically different. So even if I did keep the Python there was a lot of rewriting to do. So… nope.

As it turns out I’m doing most of the coding in Node.js in TypeScript, which I didn’t know much about and am still not very impressed with, even though it’s doing the job.

I did decide to keep the SQL schema. They have NoSQL databases these days, but a lot of the Extended Stats data really is relational, and there’s not so much value in changing to NoSQL despite it being flavour of the month. Although the guys at the Sydney MongoDB meetup are really nice guys and I enjoy going to see them. And then, when I say “I decided to keep the SQL schema”, what I really mean is “I started with the old SQL schema, didn’t like a lot of things and changed them pretty drastically.” Creators gonna create, you know.

I really like where the new system is going. As mentioned in an earlier post, I got my monthly bill and it’s cheap, I’m fixing a lot of the bad architectural choices the first system evolved into, and I’m learning oh-so-much stuff and have oh-so-many great ideas. I hope you all can enjoy this as much as I do.

Ooh, I got the bill!

 

One of the thrilling things about cloud computing is sticker shock. It’s OK once you’ve established your pattern of usage and know roughly how much you’ll pay each month, but for an experimenter like me, strange things can happen. Like that time last year I tried something out, it didn’t work, then a month later I got a bill for $40 because I hadn’t deleted a thing and I didn’t even know what the thing did. So it was that this month I was a bit worried, as the site had been running somewhat properly for a few weeks.

So after that stress, $12 is good news! It suggests that I’m doing something right and that I can keep this up.

I’ll explain what the bits are. RDS is Relational Database Service – the database. AWS does automatic backups and so on – we tried it at work the other day, it’s pretty nice – so for that money they’re keeping our data safe. EC2 is the virtual computer that the blog runs on. It’s a tiny one, so it could be free, but I think I used up my quota of free EC2s already. Route 53 is the DNS (Dynamic Name Service),  i.e. the bit that tells the world where stats.drfriendless.com, extstats.drfriendless.com, api.drfriendless.com and blog.drfriendless.com are – they are all different computers, if they exist at all. Data transfer is costs for downloading from BGG. 4c seems reasonable. Of the “other” 5c, 4c is for S3 (Something Simple Storage) which is like a big hard drive, and 1c is for API Gateway, which is the bit that implements api.drfriendless.com by turning API requests into calls to lambdas.

There’s no charge for Lambdas, as I seem to be under the free limit still. When I get more stuff going I might have to pay 20c for that. However that is where I’ve put most of my effort recently – trying to keep them small, and trying to make them do their job and not run uselessly.

I love it when a plan comes together.

Collections and Mechanics

There has been much activity on the site this weekend. I guess I should explain a little how the site works… I’m using an architecture called “serverless”, where I code each thing that the site does separately (as a thing called a Lambda). So far there are 17 different Lambdas that all beaver away. Most of them at the moment are cooperating to populate the database. Amazon charges me money depending on how many times they are all invoked and how big they are – here’s a graph of the number of invocations:

I received email from Amazon saying “you’ve used so many of those it’s almost not free any more”. That wasn’t alarming in itself, but what it told me was “the less attention you pay to costs, the more this will cost you”. So I did some research on how to make the Lambdas smaller, and how many I would need and so on. My guess at the moment is that Lambdas will cost me about $1 per thousand users per month. As long as I’m careful / skilful.

So that was why I spent some time doing invisible stuff that made the site cheaper to run. At the moment it looks like the database will be by far the most expensive part of the site (maybe $200 / year), and my plan is to eventually design that cost away. So it’s all good as long as I pay attention.

But after doing that I had a look at funding the site with ads. It’ll never be cost-free, and I don’t plan it to be profitable, but I *would* like the site to be cost-neutral. I was going to put Amazon Affiliate ads on the site, but Amazon said “nope, drfriendless.com is not an appropriate name for putting our ads on”, so that plan died very quickly. So then I signed up for Google Adsense, and the job of actually sticking those ads in is yet to be done.

Just have a look at this page and see if one is there yet:

http://extstats.drfriendless.com/

That’s the stats stats page that I can quickly look at to see whether stuff is progressing. It is progressing, slowly, but I don’t want to crank up the rate of work until more things are working. By my calculations, at full speed I would need to fire off 100 tasks every minute to support 3000 users. At the moment I’m just doing 10 per minute.

What does stuff working even mean? The site has been downloading game information for a couple of days, but it was throwing away the categories and mechanics. So I added the ability to record those. I also cleaned up the database so that I don’t store categories and mechanics by name everywhere, which will make this site more efficient than the old one (and hopefully faster, goodness me the old site is slow). That involved some work with foreign key constraints and async / await stuff which is a bit beyond the scope of this blog, but made me happy.  And now categories and mechanics are going into the database.

The next step will be some sort of War Page and some sort of Front Page (with game rankings). I’ve decided those are the minimal features I require before I tell new users that the site exists. And the War Page will link to the user-specific pages, as the old site does.

Well, that’s enough chit-chat, I’m going back into the code mines.

The Mostly Useless Collection Widget Just Got a Bit Nicer But No Less Useless

In the world of software development, it’s a common thing that you will get some piece of packaged software, stick it into your application, and you’ll get that functionality for free. It’s almost as common a thing that you’ll get some piece of packaged software, stick it into your application, spend 2 hours trying to get it to work, then decide that it’s a piece of crap that doesn’t really work at all. So that’s what I’ve been doing this morning. There’s a rudimentary collection widget that you can currently see on a page like this:

http://extstats.drfriendless.com/collection.html?geek=tallboy

(stick your own BGG user ID in the URL). Today’s job which took a mere 6 hours was to put the data in a table which supports sorting and paging. I hope not every task on this project takes 12 times longer than it should.

Hey, There’s a Blog!

I need to start onboarding people to the new site, because some people have been waiting months for any acknowledgement from me that they can have stats. And if I’m going to give users a site under development, I’d better at least have some way of telling them how that development is going. So, there’s a blog. I am not a very arty person so it’s likely to be a kinda clunky boring blog.

Anyway, this evening I’ve been working on getting some bugs out of the downloader. Some people on the user list have deleted their BGG accounts, and the downloader was very confused about why they had no games. So I had to get those people off the list and then get the downloader to delete them from the database.

I also added a dashboard so I can see which Lambda functions are giving errors. That has been extremely useful – you can see the errors dropped off at the end of the night when I’d got some of the bugs out.

For those who know about AWS, the blog is hosted on a teensy weensy EC2 instance. So that bit is not serverless.