Wednesday, November 6, 2013

When will Sagemath Cloud hit 100,000 user accounts?

The Sagemath Cloud is an online environment for computational mathematics. Get an account, log in, and your web-browser transforms into almost everything you need to study algebra, calculus, numerics, statistics, number and game theory, (computational aspects of) physics, chemistry and all other quantitative sciences. It's built around all sorts of tools and utilities offered in a proper Linux environment. All files can be shared with collaborators and edited in real-time without stepping on each other's toes. Oh wait, there is also a LaTeX editor: given you know LaTeX, qualitative sciences are also covered ;-)

But beware, this posting will only talk about the very first: accounts. Below, I'll explain how I managed to create this page, the plot below and how I let it update automatically every hour - all done solely by SMC!

SMC accounts over time


Sagemath Cloud is very public about what is happening on its servers. This stats link gives you the raw data for the current overall load of the machines. What I did over the last weeks was the following: Every hour I've downloaded this stats json file, parsed it, accumulated some interesting numbers, and stored the processed data in a CSV file.

I did all this in SMC, because it allows you to run your own crontab files. Cron periodically goes through all kind of files to figure out if there is a job to do. Just enter a magical line in your own crontab file, and the given command is run whenever you tell it to do so. You do not have to be logged in!

In my case, it's like this:
First, edit your crontab file:
$ crontab -e
Then enter this line:
0 * * * * python $HOME/get.py

The $HOME is important, because you have to specify the full path where your script sits. (By the way, if you have to start something when the SMC server reboots, use the @reboot descriptor)

So, what is get.py doing? It uses the wonderful requests Python library to retrieve and parse the stats url and extracts some data. Then it appends a line to a CSV file.

Two minutes later (the crontab line starts with a "2"), another script is called which processes this CSV file. It reads the columns via pandas, properly parses the dates into time-series, and allows me to do all sorts of analysis, transformations and plots with it. For example, the plot shown above overlays the raw time series plot with OLS fits done by statsmodels for selected time ranges (where it looks "flat"). We can see the growth trends clearly! Even more important, so far the growth increases and hence we are watching the exponential growth phase as part of the beginning of the usual logistic growth.

The other plots on this statistics page show aggregated time-series data. For example, the plot for the number of concurrent connections is also increasing and reassures that SMC is scaling well.
Concurrent connections to SMC
On this statistics page, there are also a few "dynamic" fields in the HTML content. This is done by jinja2 in such a way, that the template "stats.tmpl" contains the HTML code and "mustache"-style variables. Jinja2 renders this template with some variables and that's it.

import jinja2 as j2
env = j2.Environment(loader=j2.FileSystemLoader("."))
stats = env.get_template("stats.tmpl")
data = {
        'date' : "%s UTC" % datetime.utcnow(),
        'recent_data' : totals.ix[-24:].to_html()
}
with open("stats.html", "wb") as output:
    output.write(stats.render(**data))

The last step of the script is to actually publish the files to the webserver. That's rather straight forward. First and only once, create ssh keys via ssh-keygen without a password and then use ssh-copy-id -i ~/.ssh/id_dsa.pub name@server to copy over your keys. Subsequent ssh connections will be established without any questions asked, because the remote server knows your identity. I'm using scp to copy the files: scp *.png stats.html name@remote-server:~/target/dir/

Last but not least, when will SMC hit the 100,000 user mark? In less than 6 days the 10,000 mark should be crossed and I hope the trend continues into the upward direction.