November 2018 – EXTENDED STATS BLOG

November 25, 2018

Dithering

Well, I had a bit of a disaster during the week. On Thursday morning my laptop stopped charging, and now it’s in the shop being repaired for a hefty fee. It’ll be back some time during the week.

Luckily I have this other, smaller, laptop, so I can still do my work. It only took 7 hours to update it with all the work stuff… Well I guess that’s quicker than many other problems could have been resolved.

The bad news is that this laptop seems to be too underpowered to work on Extended Stats, which is not that big of a project, but it is a kinda complex one – it includes about a dozen Angular applications, and a couple of dozen Lambdas in a variety of locations, and I think the editor gets confused by the non-standard organisation.

Even worse, one of the bits of code that’s on the broken laptop is the autocomplete demo, which is one reasonably small thing that I could potentially have worked on on the smaller laptop. So that sucks as well.

Nevertheless, there is something to talk about. The work I did last weekend on putting the code in Elastic Beanstalk was aimed at preventing the really spiky bits in the Lambda performance graph (because they cost me money). Over the last couple of days we’ve been going through a period which should be spiky, and it has definitely changed.

By the way, these bits correspond to the update of all the users’ collections and played games, which happen every three days. I did two things.

First of all, I rewrote the bit that was appearing light blue in the bottom graph, because as explained in this post:

Bugs Bugs Bugs!

it was costing me a lot of money.

The second thing I did was to change “every 3 days” to “about every 3 days”. So when I schedule an update, I don’t schedule it for exactly 3 days, because all that does is preserve the periodic load. And as you can see, that flattened out the load a bit, and should continue to do so. I was calling this “dithering” to myself, but now checking on Wikipedia, dithering may be something more specific than that. Nevertheless, that plan is sort of working.

I need to stop the green line from bottoming out.

On the other hand, as Tevye would say, it didn’t seem to really help the database performance. The orange line is how hard the database is working, and making that change seemed to spread the peak out, but increase its total area. And then that caused the green line to hit the bottom for longer. And that’s bad. I’m hoping that with more dithering the green line might not get to the bottom at all.

I miss my big laptop. You don’t know what you’ve got till it’s gone.

November 18, 2018

Foiled Again!

As it’s Monday morning and I’m supposed to be going to work, this one will be short. I spent yesterday working on the autocomplete functionality, and after a great deal of mucking with Express security, I got it working. It looks lovely! However then when I went to put it on the site, the application I use to copy all of my HTML and so on up to AWS just refused to do anything. Computer says no.

That’s the “fun” and frustration of this project. The tech is pretty much bleeding edge, so stuff just fails quite often. My test site, http://test.drfriendless.com, is completely broken at the moment!

November 17, 2018July 12, 2019

In Which Jack Climbs the Elastic Beanstalk

Woohoo, I got Elastic Beanstalk to work! The “Express & EB” sticky note has been removed! Actually I got Elastic Beanstalk to work on about Tuesday, and today I’ve been learning more about getting Express to do stuff.

The first thing to do was a URL to find geeks with a particular partial name. For example, http://eb.drfriendless.com/findgeeks/Fri will give you up to 10 geeks whose names start with “Fri”. And http://eb.drfriendless.com/findgeeks/boy will give you up to 10 geeks whose names start with “boy”. Because there are none, it will instead give you up to 10 geeks with “boy” somewhere in their name. This is how autocomplete fields work, but I didn’t write the autocomplete field yet so I can’t show you what the plan is.

Update: here is the working autocomplete field:

So that was a pretty fine victory to get that working, to have Elastic Beanstalk, Express and the database all working together on one problem. I then went onto address a somewhat more abstruse yet expensive problem.<a href="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24.png"><img class="alignnone wp-image-175 size-large" src="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-1024x522.png" alt="Lambda performance for the last week." width="525" height="268" srcset="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-1024x522.png 1024w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-300x153.png 300w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-20-37-24-768x392.png 768w" sizes="(max-width: 525px) 100vw, 525px" /></a>The top graph is the time consumed by lambdas that I invoked in the last week. The bottom graph is the number of times different lambdas were invoked. There’s a big lump every 3 days because I check for new plays and collection updates every 3 days, and due to bad planning, everybody comes due at about the same time. I should fix that by dithering the time difference so it’s a random time period which is about 3 days, and those peaks spread out a bit over time. The real problem though is the height of the blue peaks in the top graph. The particular offender there is the lambda which takes information about a user’s collection that I got from BGG, and writes it to the database. It’s in two parts: <ol> <li>Make sure that all the games in the user’s collection are in the Extended Stats database. Sometimes for a new user, there are a lot that aren’t.</li> <li>For each game in the user’s collection, update our information about it.</li> </ol> For some reason – either a faint memory or a hallucination – I had the idea that the first one was the slow one, so I moved that into Elastic Beanstalk. Elastic Beanstalk has the advantages over Lambda that: <ul> <li>A task taking longer doesn’t cost me more, and doesn’t time out.</li> <li>It can be cheaper, if I use it for enough tasks.</li> <li>There’s no cold start time – if a Lambda hasn’t been used in a while it can take a few seconds to start up, particularly if it has to talk to the database.</li> <li>A Lambda costs some fixed (but tiny) amount every time it’s invoked, so it’s worse if it’s invoked a lot, e.g. for autocompletes.</li> </ul> The disadvantages are: <ul> <li>Unlike Lambda, I pay for Elastic Beanstalk even if it doesn’t do anything.</li> <li>I have to do all this hoohah about setting up Elastic Beanstalk and Express.</li> </ul> I planned for the site to be serverless (i.e. no server machines, not even virtual ones), but I don’t think that’s quite cheap enough yet for my use case and my budget. As to Part 2 mentioned above, I realised I had some quite poor code in that, that I’d copied across from the old system, and probably wasn’t appropriate for a system where resources cost me money. So I rewrote that as well. Speaking of performance hits, this is a graph of the database usage. When the green line is at the bottom, the database is as good as dead. This is because I’m cheap and paid for a cheap database. <img class="alignnone wp-image-176 size-large" src="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-1024x387.png" alt="It's bad when the green line bottoms out." width="525" height="198" srcset="http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-1024x387.png 1024w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-300x113.png 300w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57-768x291.png 768w, http://blog.drfriendless.com/wp-content/uploads/2018/11/Screenshot-from-2018-11-17-21-01-57.png 1586w" sizes="(max-width: 525px) 100vw, 525px" /> Note that every 3 days, when I do the collection and plays updates, the green line dives into the ocean. If I can improve my database usage, I can stop it doing that. Look forward to updates on that!

November 11, 2018November 11, 2018

Soldiering On

I penetrated the impenetrable twaddle! After getting very frustrated last week with AWS security, I did some research, and finally found a description of a few things that I could understand. I now basically understand what a virtual private cloud is, and sort of how it works. That was really awesome news on Monday. Then on Tuesday we had some system problems at work and I got distracted sorting out other (AWS) stuff.

I was finally able to get back to the Express on Elastic Beanstalk project today. As I now understood the VPC stuff, I configured EB to use the VPC I already had. And then I commanded “Work now!” And behold, it did not.

It sure would be nice if this graph was a flat line at the “OK” level

I screwed around with it for a few hours, until I came to the conclusion that the reason it wasn’t working was because it thought it wasn’t working. No really, that makes sense.

Elastic Beanstalk is designed for implementing a web site (which looks like one computer) by using a bunch of computers in the background. If one of those bunch of computers goes bad for some reason, EB throws it out and replaces it with a new one. So to do this, EB has to know how to tell whether one of those computers is bad. And for a long time, I had that wrong.

However even after sorting that out, I still can’t get it to work. I’m beginning to suspect it’s a networking problem again – although Express “works” when I run it on my own machine, it doesn’t work when I connect to either the load balancer (the part of EB which farms work off to the worker machines) nor when I try to connect to the individual machines. My recollection from how we do this at work is that *should* work. So I’m back trying to figure out networking. And since I have known that stuff since Monday, I thought I got it…

I’ve also been considering whether I should actually get back to working on functionality (i.e. pretty things) instead of this AWS navel-gazing. But I should not. The bill for October (which I still have not blogged about) was a bit high, and I need to get this project working to cut down the costs. While it’s costing me too much, this project is at risk, so I need to get that stuff sorted.

The good news is that I was plagued by problems like this for about a year while I was trying to get this project off the ground, and with perseverance and experience I managed to figure them out. I feel like I”m making progress, even if it’s just developing a callus where I’m bashing my head on the wall.

November 4, 2018

Sticky Notes!

Cherry Blossoms with To-Do List, a montage by John Farrell

I’m struggling to get back into my motivated groove for Extended Stats since all the holiday stuff. Although my wife is away and hence is unable to distract me, the dog is not and she demands an inordinate amount of attention. I’m also doing a lot of cooking of things that I can’t make for my vegetarian wife. This evening’s nasi goreng was pretty good if I do say so myself. But not much coding got done.

Today’s project was the top-right sticky note, which says “Express & EB”. Express is node.js software for writing a web server, and EB is Elastic Beanstalk, an AWS technology for scaling web servers – essentially you tell it “Here’s my code, put it on a machine. If it gets busy start some more machines running the same code.” I use EB at work, but I’ve never used Express before.

I’ve been getting the feeling that AWS Lambda is a little expensive for some of the things I want to use it for. In particular, the note below “Express & EB” which says “geek buddies”. The plan for that note is that if you’re logged in you’ll be able to tell the site which other users are in your game group / family / other peer group. And the plan for that involves an autocomplete field where you start typing a BGG user’s name, and I give you all the valid completions for what you’re typing.

That’s easy enough to code, but it usually involves one HTTP call for each character typed, and as I pay for each HTTP call to a Lambda, I’m not very keen on that. So the plan is to make an Express server which can handle very tiny calls like that for a pretty much flat rate of a couple of bucks a month. That should decrease my Lambda costs (which I should also write a post about).

However, today’s plan to run Express on Elastic Beanstalk foundered at the bit where I tell Elastic Beanstalk about the database. Omigod, AWS security is basically impenetrable twaddle. At least, it is to me. At work we tell the servers about the database a different way which I don’t want to use here, so I would like to configure the database in EB. I ended up with EB and the database both in virtual private clouds, but in different virtual private clouds. Is that good? I don’t know. It was like when you can’t find two socks the same colour. So in the end I got cranky and dumped the EBs and will try to do some reading so I can get half a clue for the next attempt.

In happier news, I did manage to fix a big. BGG started sending my ratings for games people hadn’t rated is “N/A” instead of just completely absent, so I was trying to store “N/A” as a number which broke stuff. Thanks to a new user for pointing out to me that something was broken.

The next bug I found was that if you have many many plays in a month, for example 3750, BGG tells me to stop asking for so many files all the time. As the Lambda wants to get its job done quickly because I’m being billed for the time it runs for, I can’t just sit around and do nothing. So I need to come up with a plan for dealing with many pages of plays over several lambda invocations. Something nice-ish will spring to mind eventually, I suppose. That technique could possibly also be adapted for users with very large (BGG) collections, e.g. 100000 as I have seen. Where’s my thinking cap? I need my thinking cap.