Sunday, November 29, 2009

Maintaining a Mirror Network

Here are some notes about maintaining a mirror network. I'm talking about the Sage software, which is a nice open source mathematics program. Besides the source, there are many prebuilt binaries for various platforms. In total, for each release once a month, there have to move about 14GB in 27 files to 18 mirror websites on all continents of the world. It's important to have nearby mirrors, because the network connectivity might be weak in some areas. In Nov. 2009, the virtual box image for windows alone generated about 1.5 TB of traffic - all other binaries and source combined probably also 1.5 TB. Therefore, the mirror networks net efficiency is approximately a tenfold increases of the outgoing data volume.

To ensure that only those mirrors which are online and up-to-date are listed on the website, a small Python script checks a time-stamp file with a content-specific checksum. Only if it is correct and retrievable, the mirror is included. This check happens every 10 minutes, using Linux's cron mechanism.

Recently I enhanced this to visualize how the mirrors perform over time. This is especially interesting when a new release is mirrored out into the world. To see how this looks like, here is the graphic for the Sage 4.2.1 rollout:

Time starts at the bottom, you see it's around 14:00 GMT on Nov. 17th. The first working mirror is "UW", which is the master. The master mirror is excluded, once there is at least another one in North America online - as you can see that's SFU and Boston (Harvard). You can also see that it takes more than one day to sync with the mirrors. Besides that the transfer takes its time, they are probably scheduled to start sync only once a day, the master mirror rejects too many simultaneous connections, too many users download directly from the main server or there is simply some other timeout! Why are there two breaks? Well, not all binaries are ready at the same time and the timestamp is for the whole mirror, so that you do not access an outdated mirror. The broken line of Yandex is also a bit odd, but that's because they have multiple servers and seem to switch between. 

How does it look right now? Here is the website for the mirror log visualization. There is also a link to how it looked last week.

Beyond just retrieving a file through http or ftp you can use a metalink file. Through a client it enables downloading from all available mirrors simultaneously and the client selects a few but fastest mirrors automatically. It also checks consistency and downloads are resumable. That's the ideal solution if you have a weaker connection and need to stop a huge transfer for some time or it is just flaky. Hint: use aria2 or DownThemAll

Ok, at last some words about hosting a mirror. Two things are important: sync often to avoid wasting time when you could transfer data, but only sync once at a time ;) I've written how i did set up a mirror (the Boston one) in a MirrorNetwork wiki page.

In case the main website is down, you can find a mirror here.

Thank's to all our mirror hosters! Without them Sage would probably never reach their users and new mirrors are always welcome, especially in southern asia!

Tuesday, August 11, 2009

Popularity of Sage

This post is about Sage download statistics.

I try to track the actual download events and derive some information about them. This post is about the timerange June 1st until Aug 12th (today). The basic numbers are, that Sage is downloaded most often in the USA (35%), followed by Germany (7%), France (5%), UK, Italy, Spain, Canada, China, Brazil and Japan (4 to 2 %). On average, Sage was downloaded 150 times each day during the last semester, and a bit more than 100 times during summer break. I hope, numbers will go up again ;)

But that's rather boring and I tried to put the numbers in context. I focused on those countries with more than one million inhabitants and scaled the download numbers with the population size or with the number of internet users. To my surprise, the hit-list looks totally different now!

Download Top 20 weighted by population
New Zealand
United States
Czech Republic

I was somewhat surprised to see Switzerland, Austria and New Zealand and Uruguay on top! Concerning Uruguay, I know that they use Sage at University.

Download Top 20 weighted by "internet users"
Puerto Rico
Czech Republic
New Zealand

For me, that's also rather interesting. Of course, those countries with only a few internet users are easily on top (Niger), but e.g. Guatemala has more than 1 million users and so far I have never heard of them using Sage. In this context, our top 3 from the pure download numbers are way down below: USA #23, Germany #21, France #19!

Now the important questions:
What does that say about Sage adoption? Market potential? Is Uruguay a maths country and the USA not? Does it depend on the average income, GDP, ... (Switzerland & Austria vs. Uruguay & Guatemala?!?!) ... or just statistical noise?

If you want to see the whole data: online Webpage, CSV, ODS


Here the same weighted top-20 lists for "Visits" (a session of one or more pages impressions on or an related website (mirror, wiki, etc.))

Visists weighted by Population:
New Zealand
United Kingdom

Visits weighted by internet users:
New Zealand
Puerto Rico
United Kingdom

Update: whole spreadsheet, also for visits: online Webpage, CSV, ODS


Friday, July 31, 2009

ihhhehks fingers with tiny condoms

Just when I wanted to do some vintage maths, I found this:


They acually rescanned those pages, now the font is darker. Condomfingers are gone ;)

Monday, July 13, 2009

Sage Downloads - Spring 2009 Edition

This post is about Sage download statistics. It is not really exact science, but good enough to measure tendencies. It uses JavaScript on all download pages of the mirror sites and tracks clicked download links by assigning them certain values. Therefore, it misses those situations where somebody downloads "directly", uses wget, has JavaScript disabled or scripts blocked. So, the actual numbers are higher. Said that, here are the numbers for April, March and June 2009:

Total Downlods:

Apr 5, 2009 - Apr 11, 2009 1110
Apr 12, 2009 - Apr 18, 2009 964
Apr 19, 2009 - Apr 25, 2009 1082
Apr 26, 2009 - May 2, 2009 1197
May 3, 2009 - May 9, 2009 1005
May 10, 2009 - May 16, 2009 1001
May 17, 2009 - May 23, 2009 1160
May 24, 2009 - May 30, 2009 1169
May 31, 2009 - Jun 6, 2009 1298
Jun 7, 2009 - Jun 13, 2009 1212
Jun 14, 2009 - Jun 20, 2009 970
Jun 21, 2009 - Jun 27, 2009 1079

There is no clear tendency, maybe a bit more at the end of the semester but falling since the end of it. Day-to-day stats are rather linear, too.

Distribution of Operating Systems:

microsoft_windows 5668 39%
linux/32bit 3109 22%
src 2101 15%
apple_osx/intel 1515 11%
linux/64bit 1290 9%
apple_osx/powerpc 364 3%
linux/atom 157 1%
apple_osx 85 1%
solaris 54 0%
src-old 10 0%
linux/itanium 4 0%


sage-3.4-linux-Ubuntu_8.10-i686-Linux.tar.gz 559 17.98%
sage-3.4.1-linux-Ubuntu_8.10-sse2-i686-Linux.tar.gz 430 13.83%
sage-3.4.2-linux-Ubuntu_8.10-sse2-i686-Linux.tar.gz 415 13.35%
sage-4.0-linux-Ubuntu_9.04-sse2-i686-Linux.tar.gz 248 7.98%
sage-4.0.1-linux-Ubuntu_9.04-sse2-i686-Linux.tar.gz 215 6.92%
sage-3.4-Fedora_release_9-i686-Linux.tar.gz 171 5.50%
sage-4.0.2-linux-Ubuntu_9.04-i686-Linux.tar.gz 142 4.57%
sage-3.4-linux-PentiumM-ubuntu-8.04.1-i686-Linux.tar.gz 123 3.96%
sage-3.4-linux-Debian_GNU_Linux_5.0_lenny-i686-Linux.tar.gz 69 2.22%
sage-3.4.1-linux-Debian_GNU_Linux_5.0_lenny-sse2-i686-Linux.tar.gz 62 1.99%
sage-3.4.2-linux-Debian_GNU_Linux_5.0_lenny-sse2-i686-Linux.tar.gz 62 1.99%
sage-4.0.1-linux-Debian_GNU_Linux_5.0_lenny-sse2-i686-Linux.tar.gz 49 1.58%
sage-4.0-linux-Debian_GNU_Linux_5.0_lenny-sse2-i686-Linux.tar.gz 47 1.51%
sage-4.0.2-linux-Debian_GNU_Linux_5.0_lenny-i686-Linux.tar.gz 47 1.51%
sage-3.4.1-linux-openSUSE_11.1_i586-sse2-i686-Linux.tar.gz 44 1.42%
sage-3.4-linux-Mandriva_Linux_2009.0-i686-Linux.tar.gz 43 1.38%
sage-3.4-linux-openSUSE_11.1_i586-i686-Linux.tar.gz 42 1.35%
sage-3.4-linux-CentOS_release_5.2_Final-i686-Linux.tar.gz 34 1.09%
sage-4.0-linux-CentOS_release_5.2_Final-sse2-i686-Linux.tar.gz 31 1.00%
sage-3.4.2-linux-openSUSE_11.1_i586-sse2-i686-Linux.tar.gz 27 0.87%
sage-4.0-linux-openSUSE_11.1_i586-sse2-i686-Linux.tar.gz 24 0.77%
sage-4.0.1-linux-openSUSE_11.1_i586-sse2-i686-Linux.tar.gz 24 0.77%
sage-3.4-linux-Debian_GNU_Linux_4.0_etch-i686-Linux.tar.gz 22 0.71%
sage-4.0.1-linux-mandriva32bit_linux_2009-sse2-i686-Linux.tar.gz 22 0.71%
sage-3.4.1-linux-CentOS_release_5.2_Final-sse2-i686-Linux.tar.gz 21 0.68%

No surprises here, Ubuntu leads them all, then Fedora and Debian. Linux 64bit looks the same.

Country, Top 50

United States 5017 34.94%
Germany 1402 9.77%
France 693 4.83%
United Kingdom 668 4.65%
Italy 643 4.48%
Spain 487 3.39%
Canada 460 3.20%
China 413 2.88%
Japan 286 1.99%
Australia 255 1.78%
Brazil 254 1.77%
India 240 1.67%
Mexico 215 1.50%
Switzerland 192 1.34%
Netherlands 182 1.27%
Austria 168 1.17%
Portugal 135 0.94%
Belgium 134 0.93%
Greece 133 0.93%
Sweden 127 0.88%
Poland 119 0.83%
Russia 118 0.82%
Czech Republic 106 0.74%
Argentina 95 0.66%
South Korea 92 0.64%
Finland 89 0.62%
Israel 89 0.62%
New Zealand 83 0.58%
Norway 83 0.58%
Taiwan 81 0.56%
Denmark 73 0.51%
Colombia 72 0.50%
Hungary 58 0.40%
Singapore 56 0.39%
Romania 51 0.36%
Ireland 47 0.33%
Turkey 46 0.32%
Lithuania 45 0.31%
Peru 41 0.29%
Slovenia 41 0.29%
Thailand 41 0.29%
Indonesia 36 0.25%
Philippines 33 0.23%
Slovakia 32 0.22%
Croatia 31 0.22%
Panama 31 0.22%
Uruguay 31 0.22%
Chile 30 0.21%
Iceland 30 0.21%
Hong Kong 24 0.17%

It would be interesting to score them against some index, like, population, GDP and similar. And then fit the factors to understand, why Brazil is next to China for example ...

By the way, overall visits to the website during that timespan were rather constant. There were over 100,000 unique visitors, no changes since my last updates on the website concerning other parameters and Google is our main friend among all search engines. All major traffic sources:

google 55,654 31.47%
(direct) 39,037 22.07% (another page there) 29,638 16.76% (referral) 3,643 2.06% (referral) 2,928 1.66%


Sunday, July 12, 2009

Download Website Redesign

What is a great software like Sage worth without the ability to get your hands on? Nothing!

That's why it is important to provide a good download! Fast downloads a dependent on your location and that of the server. For example this means that a European Server works better in Europe (alghough that's not really true, some European servers also work very well in the US).

Until last week there was one heavily used dedicated download link and the possibility to alternatively choose a mirror. That has changed in a way where the dedicated primary download link is replaced by an overview of all mirrors. Now, everyone is "forced" to choose the download option that suits best! The second reason is, that the main server got slow since most oft the downloads happened from there and i.e. only few of the European users downloaded from the French mirror.

Additionally to the redesign, I contacted several server admins and extended the list of mirrors. Now, Europe is covered by serveral mirrors, also Russia, and what's still missing is South America and Central-South Asia. I'm still waiting for answers from server admins over there.

Now some numbers from the last two weeks, comparing Mondays to Fridays.

Jun 29th - Jul 3rd:

1. 53.48%
2. 8.44%
3. 7.70%
4. 7.57%
5. 4.84%
6. 3.88%
7. 2.33%
8. 2.11%
9. 2.02%
10. 1.99%
As you can see, since * and sage.math are behind the same network, far more than half of the activity happend on the primary server. The week before it was even above 70%!

Now the numbers after the redesign:

Jul 6th - Jul 10th:

1. 17.69%
2. 14.98%
3. 14.22%
4. 11.74%
5. 9.32%
6. 8.95%
7. 4.65%
8. 4.07%
9. 3.50%
10. 3.13%

Numbers massively differ after the redesign. Nearly only 20% of the activity on the main server is left, everything has moved to the other servers. Top server is located in Boston, but immediately followed by two servers in Europe: Switzerland and France. Also, the countries of the clients fit to the mirror locations. So, they actually choose and understand what to do ;) I'll probably post about the 4.1 release and the distribution around the world of Sage downloads in my next posting.

My next project is to find an easy way to detect if a server is updated or lags behind. Then, I want that this script updates the website automatically and thus the chances of hitting an outdated or broken server are very low. I hope that's not too difficult.

Happy downloading, Harald

Sunday, July 5, 2009

Sage Mirrors Revamp

Until now, on all download pages was a dedicated main download link for Sage. It pointed to the master server in Seattle. The problem is, that traffic increased and the server got slower. There was always the possibility to use a mirror, but it was only used rarely. More than 50% of all downloads still happened to be from the master server ( + sage.math).

Now, the list of mirrors has moved into the download box area on all download pages. That should be a strong push for the use of mirrors. Also, non-US mirrors in bold should even work well outside Europe.

Numbers for last month, June 2009:

# | server | number of accesses | percentage
1. 13,467 47.89%
2. 4,160 14.79%
3. 3,911 13.91%
4. 1,440 5.12%
5. 1,396 4.96%
6. 1,047 3.72%
7. 633 2.25%
8. 486 1.73%
9. 435 1.55%
10. 339 1.21%

Also, notice the 15% for France. This will change, since Euopre is no longer covered by only one mirror.

Harald & Minh

Friday, June 26, 2009

two new mirrors

Sage only had one foot in Europe for the last months. It was the very busy French mirror. Statistics showed that it was the most often accessed mirror of all and numbers were even increasing release by release. The upside is, there is growing interest in Sage in Europe - the downside a possible bottleneck in distributing Sage. I asked around and well, long story short: finally we have two new high quality mirrors in Europe:
Big thanks!

May the sageification of Europe continue - world domination soon to follow ;)

If you are also able to help distributing Sage, contact the webmaster.


Saturday, May 30, 2009

sage + google wave

google wave is a new type of web 2.0 hosted communication, presented at google's i/o conference a view days ago. . it is very early to tell what it really is, but it could be big. it is very open and also consists of an open protocol for communication, that can be used (basically) for everything - and that's where i started to think about sage. but first, i explain what i think it does and how sage could play a role in there!

a server (not necessarily google - it's more like jabber/xmpp where everybody could host a server and servers talk to each other if users use different servers) hosts the "communication", called "wave". that's a tree-like structure of data, replies, extensions, meta-info, etc. if someone modifies something or adds a reply, every other participant sees the modification in real-time.

there are also artificial participants, called robots. they can also do things just like a regular user. example: spellchecking! a spellchecker analyzes all text and if something is odd, it annotates/modifies the text.
okay, how works synchronization? basically, all operations happen on the server. i think it is an abelian group so that there are no "destructive" operations - only invertible - that cannot destroy the state-flow and the sum of all state changes is the current state. therefore, there is only one thread of communication for all participants. if something happens (lost connection), everything is re-synced fast.

the interesting point is, there are also gadgets: interactive elements for all participants (an example is a chess game) and that's where i think sage could be used. basically, my idea is to implement a button to create a cell from the notebook. then, you enter the url of a sage server + your credentials. now, input is sent to the server (also, autocomplete ....) and you get the answer back and a new cell is created. if i understand it correctly, the server is actually talking to sage, so it is possible that several users edit at the same time and the actual cell-data is saved inside this "wave".
what's missing is formatting the formulas - but there is also mathml which could do it.

i applied for an early developer sandbox access, once i know more about this i'll tell you more.