My desktop backup solution

I was inspired by a good blogpost by Martin Pitt to setup my own desktop backup solution. I liked the idea of not requiring the computer to be on all the time, and having the backup pushed from the client rather than pulled from the server. However, my needs were slightly different from his, so I adapted it.

His solution uses rsnapshot locally, then pushes the resulting directories to a remote server. I didn't want to use local disk space (SSD ain't cheap), but I had a local server with 2Tb available. So in my solution, the client rsyncs to the server, then the server triggers rsnapshot locally if the rsync was successful. This is done over SSH and the server has no right whatsoever on the client.

Prerequisites

In the examples the client to back up will be called mycli and the server on which the backup will live is named mysrv. As a prerequisite, mycli will need rsync and openssh-client installed. mysrv will need rsnapshot and openssh-server installed. OpenSSH needs to have public-key authentication enabled.

SSH setup

On the client side, generate a specific passwordless SSH key for the backup connection:

mkdir ~/.backup
ssh-keygen -f ~/.backup/id_backup

On the server side, we'll assume you want to put backups into /srv/backup. First of all, create an rbackup user that will be used to run the backup serverside:

sudo mkdir /srv/backup
sudo adduser --home /srv/backup --no-create-home --disabled-password rbackup

Next, add your backup public key (the contents of mycli:\~/ .backup/id_backup.pub) to mysrv:/srv/backup/.ssh/authorized_keys. The trick is to prefix it (same line, one space separator) with the only command you want the rbackup user to perform via that SSH connection:

command="rsync --config /srv/backup/rsyncd-mycli.conf --server
--daemon ." ssh-rsa AAAAB3NzaLwm0ckRdzotb3...5Mbiw== ttx@mycli

Finally, you need to let rbackup read those .ssh files:

sudo chgrp -R rbackup /srv/backup/.ssh
sudo chmod -R g+r /srv/backup/.ssh

rsync setup (server-side)

Now we need to set up the rsync configuration that will be used on those connections:

# /srv/backup/rsyncd-mycli.conf
max connections = 1
lock file = /srv/backup/mycli/rsync.lock
log file = /srv/backup/mycli/rsync.log
use chroot = false
max verbosity = 3
read only = false
write only = true

[mycli]
 path = /srv/backup/mycli/incoming
 post-xfer exec = /srv/backup/kick-rsnapshot /srv/backup/mycli/rsnapshot.conf

The post-xfer exec command is executed on successful transfers to /srv/backup/client/incoming. In our case, we want rsync to trigger the /srv/backup/kick-rsnapshot script:

1
2
3
4
#!/bin/bash
if [ "$RSYNC_EXIT_STATUS" == "0" ]; then
   rsnapshot -c $1 daily
fi

Don't forget to make that one executable :)

rsnapshot setup (server-side)

rsnapshot itself is configured in the /srv/backup/mycli/rsnapshot.conf file. This is where you specify how many pseudo-weekly copies you want to keep (read rsnapshot documentation to understand the interval concept):

# /srv/backup/mycli/rsnapshot.conf
config_version    1.2
snapshot_root    /srv/backup/mycli
cmd_rm      /bin/rm
cmd_rsync   /usr/bin/rsync
cmd_logger  /usr/bin/logger
interval    daily    6
interval    weekly    6   
verbose     2
loglevel    3
lockfile    /srv/backup/mycli/rsnapshot.pid
rsync_long_args    --delete --numeric-ids --delete-excluded
link_dest   1
backup      /srv/backup/mycli/incoming/    ./

Now you just have to create the backup directory hierarchy with appropriate permissions:

mkdir -p /srv/backup/mycli/incoming
chown -R rbackup:rbackup /srv/backup/mycli

The backup (client-side)

The client will rsync periodically to the server, using the following script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
#!/bin/bash
set -e
TOUCHFILE=$HOME/.backup/last_backup

# Check if last backup was more than a day before
now=`date +%s`
if [ -e $TOUCHFILE ]; then
   age=$(($now - `stat -c %Y $TOUCHFILE`))
else
   unset age
fi
[ -n "$age" ] && [ $age -lt 86300 ] && exit 0

nice -n 10 rsync -e "ssh -i $HOME/.backup/id_backup" -avzF   
    --delete --safe-links $HOME rbackup@mysrv::mycli
touch $TOUCHFILE

That script ensures that at most once per day, you will sync to the server. You can run it (as your user) as often as you'd like (I suggest hourly via cron). On successful syncs, the server will trigger rsnapshot to do its magic backup rotation ! Using the same model, you can easily set up multiple directories or multiple clients.

Like with Martin's solution, you should set up various .rsync-filter files to exclude the directories and files you don't want copied to the backup server.

The drawback of this approach is that the server keeps an extra copy of your backup (in the incoming directory). But in my case, since the server has plenty of space, I can afford it. It also does not work when you are away from your backup server.

I hope you find that setup useful, it served me well so far.

The art of release management

Last week I started a new job, working for Rackspace Hosting as the Release Manager for the Openstack project. I'm still very much working from home on open source software, so that part doesn't change. However, there are some subtle differences.

First of all, Openstack is what we call an upstream project. Most of my open source work so far involves distribution work: packaging and delivering various open source software components into a well-integrated, tested and security-maintained distribution. This is hard work, one that is never completely finished or perfect. It is also a necessary part of the open source ecosystem: without distributions, most software would not be easily available for use.

Upstream work, on the other hand, is about developing the software in the first place. It's a more creative work, in a much more controlled environment. The Openstack project is the new kid on the block of cloud computing software, one that strives to become the open source standard for building cloud infrastructures everywhere. It was announced in July, so it's relatively young. There are lots of procedures and processes to put in place, an already-large developer group, and an ever-growing community of users and partners. The software itself is positioned to run in high-demand environments: The storage component is in production use at Rackspace, the compute component is in production use at NASA. Openstack is planned to fully replace the current Rackspace Cloud software next year, and a number of governments plan to use it to power their local cloud infrastructure. Those are exciting times.

What does an open source project Release Manager do ? Well first, as it says on the tin, it manages the release process. Every 3 or 6 months, Openstack will release a new version of its components, and someone has to make sure that that happens. That's OK, but what do I do the other 50 weeks of the year ? Well, release managers also manage the release cycle. A cycle goes through four stages: Design, Development, QA and Release. It is the job of the release manager to drive and help the developer community through those stages, follow work in progress, making sure everyone knows about the steps and freezes, and granting exceptions when necessary. At the very end, he must balance between the importance of a bug and the risk of regression the bugfix introduces: it's better to release with a known bug than with an unknown regression. He is ultimately responsible for the delivery, on time, of the complete release cycle. And yes, if you condense everything to 3 or 6 months, this is a full-time job :)

My duties also include ensuring that the developers have everything they need to work at their full potential and that the project is transparent. I also have to make sure the developer community is a welcoming environment for prospective new contributors, and present the project as a technical envangelist in conferences. And if I still have free time, I may even write some code where I need to scratch an itch. All in all, it's a pretty exciting job, and I'm very happy to meet everyone this week at the Openstack design summit in ~~Orlando~~ San Antonio.

The real problem with Java in Linux distros

Java is not a first-class citizen in Linux distributions. We generally have decent coverage for Java libraries, but lots of Java software is not packaged at all, or packaged in alternate repositories. Some consider that it's because Linux distribution developers dislike Java and prefer other languages, like C or Python. The reality is slightly different.

Java is fine

There is nothing sufficiently wrong with Java that would cause it to uniformly be a second-class citizen on every distro. It is a widely-used language, especially in the corporate world. It has a vibrant open source community. On servers, it generated very interesting stable (Tomcat) and cutting-edge (Hadoop, Cassandra...) projects. So what grudge do the distributions hold against Java ?

Distributing distributions

The problem is that Java open source upstream projects do not really release code. Their main artifact is a complete binary distribution, a bundle including their compiled code and a set of third-party libraries they rely on. If you take the Java project point of view, it makes sense: you pick versions of libraries that work for you, test that precise combination, and release the same bundle for all platforms. It makes it easy to use everywhere, especially on operating systems that don't enjoy the greatness of an unified package management system.

That doesn't play well with how Linux distributions package software. We want to avoid code duplication (so that a security update in a library package benefits all software that uses it), so we package libraries separately. We keep those up to date, to benefit from bugfixes and new features. We consider libraries to be part of the platform provided by the Linux distribution.

The Java upstream project consider libraries to be part of the software bundle they release. So they keep the libraries at a precise version they tested, and only update them when they really need to. Essentially, they maintain their own platform of libraries. They do, at their scale, the same work the Linux distributions do. And that's where the real problem lies.

Solutions ?

Force software to use your libraries

For simple Java software, stripping the upstream distribution and forcing it to use your platform libraries can work. But that creates friction with upstream projects (since you introduce an untested difference). And that doesn't work with more complex software: swapping libraries below it will just make it fail.

Package all versions of libraries

The next obvious solution is to make separate packages for every version of library that the software uses. The problem is that there is no real convergence on "commonly-used" versions of libraries. There is no ABI protection, nor general guidelines on versioning. You end up having to package each and every minor version of a library that the software happens to want. That doesn't scale well: it creates an explosion in the number of packages, code duplication, security update nightmares, etc. Furthermore, sometimes the Java project patches the libraries they ship with to include a specific feature they need, so it doesn't even match with a real library version anymore.

Note: The distribution that is the closest to implementing this approach is Gentoo, through the SLOT system that lets you have several versions of the same package installed at the same time.

Bundle software with their libraries

At that point, you accept code duplication, so just shipping the precise libraries together with the software doesn't sound that bad of an idea. Unfortunately it's not that simple. Linux distributions must build everything from source code. In most cases, the upstream Java project doesn't ship the source code used in the libraries it bundles. And what about the source code of the build dependencies of your libraries ? In some corner cases, the library project is even abandoned, and its source code lost...

What can we do to fix it ?

So you could say that the biggest issue the Linux distributions have with Java is not really about the language itself. It's about an ecosystem that glorifies binary bundles and not source code. And there is no easy solution around it, that's why you can often hear Java packagers in Linux distributions explain how much they hate Java. That's why there is only a minimal number of Java projects packaged in distributions. Shall we abandon all hope ?

The utopia solution is to aim for a reference platform, reasonably up-to-date libraries that are known to work well together, and encourage all Java upstream developers to use that. That was one of JPackage's goals, but it requires a lot more momentum to succeed. It's very difficult, especially since Java developers often use Windows or OSX.

Another plan is to build a parallel distribution mechanism for Java libraries inside your distro. A Java library wouldn't be shipped as a package anymore. But I think unified package systems are the glory of Linux distributions, so I don't really like that option.

Other issues, for reference

There are a few other issues I didn't mention in this article, to concentrate on the "distributing distributions" aspect. The tarball distributions don't play nice with the FHS, forcing you to play with symlinks to try to keep both worlds happy (and generally making both unhappy). Maven encourages projects to pick precise versions of libraries and stick to them, often resulting in multiple different versions of the same library being used in a given project. Java code tends to build-depend on hundreds of obscure libraries, transforming seemingly-simple packaging work into a man-year exponential effort. Finally, the same dependency inflation issue makes it a non-trivial engagement to contractually support all the dependencies (and build dependencies) of a given software (like Canonical does for software in the Ubuntu main repository).

The 6 dimensions of Open Source

Why do people choose to participate in Open Source ? It's always a mix of various reasons, so let's try to explore and classify them.

Technical

The first dimension is technical. People like open source because looking directly in the code gives them the ability to understand the behavior of their software. No documentation can match that level of precision. They also like the ability to fix it themselves when it's broken, rather than relying on usually-broken support contracts. Any non-Fortune500 that tried to report a bug to Microsoft and get it fixed will probably get my point. Sometimes, they like the ability to shape and influence the future of the software, when that software uses open design mechanisms (like Ubuntu with its free and open-to-anyone Development Summits). Finally, they may be convinced, like I am, that open source software development methods result in better code quality.

Political

Next to the technical dimension, we have a political dimension, more precisely a techno-political dimension. People like Free software as a way to preserve end-user freedom, privacy and control over technology. Some powerful companies will use every trick in the book to reduce your rights and increase their revenue, so its more and more important that we are aware of those issues and fight back. Working on free and open source software is a way to contribute to that effort.

Philosophical

Very close to the political dimension, we are now seeing philosophic interest in open source software. The 20th century saw the creation of a consumer class with a new divide between those who produce and those who consume. This dissociated usage of technology is a self-destroying model, and contributing models (or participative production models) are considered to be the solution to fix our societies for the future. Be a producer and a consumer at the same time and be associated with technology rather than alienated by it. Open source is an early and highly successful manifestation of that.

Economical

Back on the ground, there are strong and rational economic reasons for companies to opt to fund open source development. From most virtuous to less, we first find companies using the technology internally rather than selling it : sharing development and maintenance costs among several users of that same technology makes great sense, and makes very virtuous open source communities. Next you find companies selling services around open source software: being the main sponsor of a project gives you a unique position to leverage your know-how around software that is freely available. Next you find open core approaches, from companies making a business selling proprietary add-ons to those using open source as crippleware. Finally, at the bottom, you'll find companies using "open source" or "community" as a venture capitalist honeypot. They don't believe in it, they resist implementing what it takes to do it, but they like the money that pretending to do open source will bring them.

Social

A very important dimension of open source is the social dimension. Many people join open source projects to belong to a cool community that allows you to prove yourself, gain mastery and climb the ladder of a meritocracy. If your community doesn't encourage and reward those that are in this social dimension, you'll miss a huge chunk of potential contributors. Another social aspect is that doing work in the open (and in all transparency) is also great publicity for your skills and to get employment. The main reason I got hired by Canonical was due to my visible work on Gentoo's Security team, much more than to the rest of my professional experience. Finally, the sheer ego-flattering sensation you get by knowing that millions of people are using your work is definitely a powerful drive.

Ethical

The last dimension is ethical: the idea of directly contributing to the sum of the world's common knowledge is appealing. Working on open source software, you just make the world a better place. For example, open source helps third-world and developing countries to reduce their external debt, by encouraging the creation of local service companies rather than encouraging to buy licenses to US companies. That sense of purpose is what drives a lot of people (including me) to work on open source.

Did I miss anything ? What drives you to participate on open source ? Please let me know, by leaving a comment !

Why Open Core is wrong

Open core is a business model where the base version of a software would be released as open source while some advanced features would be closed source. It's been under a lot of discussion lately, so I'll just add my 2 cents...

Outside the obvious workaround of the free software principles, there are well-known issues with this model. In particular, it is difficult to set the right limit between the "community edition" and the "enterprise edition", and you end up having to refuse legitimate patches that happen to be a feature in your enterprise edition roadmap. So building a real open source community on top of the Open Core model can be quite a challenge. But the main reason why I think it's wrong is purely technical.

I am a perfectionist. I work on open source software because I truly believe that the open source development methodology ends upcreating better code. Having all your code out there, up for scrutiny and criticism, makes you think twice before committing something half-baked. Allowing everyone to scratch their own itch ensures top motivation of contributors and quick advancement of new features. And I could go on and on...

Open Core denies that the open source development model creates better code. Open Core basically screams: for the basics we use open source, but for the most advanced features, the enterprise-quality ones, closed source is at least as good. You end up alienating a potential community of developers for the benefit of writing closed source code of lesser quality. You end up using open source just as a VC honeypot.

Open Core advocates say that open source software companies need some unfair advantage to monetize their efforts, and justify Open Core based on that. I'd argue that selling expertise on a awesome piece of software is a better business model. It's true it's a longer road to become rich, but I still think it's the right one.