I was inspired by a good blogpost by Martin
Pitt to setup
my own desktop backup solution. I liked the idea of not requiring the
computer to be on all the time, and having the backup pushed from the
client rather than pulled from the server. However, my needs were
slightly different from his, so I adapted it.
His solution uses rsnapshot locally, then pushes the resulting
directories to a remote server. I didn't want to use local disk space
(SSD ain't cheap), but I had a local server with 2Tb available. So in my
solution, the client rsyncs to the server, then the server triggers
rsnapshot locally if the rsync was successful. This is done over SSH and
the server has no right whatsoever on the client.
Prerequisites
In the examples the client to back up will be called mycli and the
server on which the backup will live is named mysrv. As a
prerequisite, mycli will need rsync and openssh-client installed. mysrv
will need rsnapshot and openssh-server installed. OpenSSH needs to have
public-key authentication enabled.
SSH setup
On the client side,
generate a specific passwordless SSH key for the backup connection:
mkdir ~/.backup
ssh-keygen -f ~/.backup/id_backup
On the server side,
we'll assume you want to put backups into /srv/backup. First of all,
create an rbackup user that will be used to run the backup serverside:
sudo mkdir /srv/backup
sudo adduser --home /srv/backup --no-create-home --disabled-password rbackup
Next, add your backup public key (the contents of mycli:\~/
.backup/id_backup.pub) to mysrv:/srv/backup/.ssh/authorized_keys. The
trick is to prefix it (same line, one space separator) with the only
command you want the rbackup user to perform via that SSH connection:
command="rsync --config /srv/backup/rsyncd-mycli.conf --server
--daemon ." ssh-rsa AAAAB3NzaLwm0ckRdzotb3...5Mbiw== ttx@mycli
Finally, you need to let rbackup read those .ssh files:
sudo chgrp -R rbackup /srv/backup/.ssh
sudo chmod -R g+r /srv/backup/.ssh
rsync setup (server-side)
Now we need to set up the rsync configuration that will be used on those
connections:
# /srv/backup/rsyncd-mycli.conf
max connections = 1
lock file = /srv/backup/mycli/rsync.lock
log file = /srv/backup/mycli/rsync.log
use chroot = false
max verbosity = 3
read only = false
write only = true
[mycli]
path = /srv/backup/mycli/incoming
post-xfer exec = /srv/backup/kick-rsnapshot /srv/backup/mycli/rsnapshot.conf
The post-xfer exec command is executed on successful transfers to
/srv/backup/client/incoming. In our case, we want rsync to trigger the
/srv/backup/kick-rsnapshot script:
| #!/bin/bash
if [ "$RSYNC_EXIT_STATUS" == "0" ]; then
rsnapshot -c $1 daily
fi
|
Don't forget to make that one executable :)
rsnapshot setup (server-side)
rsnapshot itself is configured in the /srv/backup/mycli/rsnapshot.conf
file. This is where you specify how many pseudo-weekly copies you want
to keep (read rsnapshot documentation to understand the interval
concept):
# /srv/backup/mycli/rsnapshot.conf
config_version 1.2
snapshot_root /srv/backup/mycli
cmd_rm /bin/rm
cmd_rsync /usr/bin/rsync
cmd_logger /usr/bin/logger
interval daily 6
interval weekly 6
verbose 2
loglevel 3
lockfile /srv/backup/mycli/rsnapshot.pid
rsync_long_args --delete --numeric-ids --delete-excluded
link_dest 1
backup /srv/backup/mycli/incoming/ ./
Now you just have to create the backup directory hierarchy with
appropriate permissions:
mkdir -p /srv/backup/mycli/incoming
chown -R rbackup:rbackup /srv/backup/mycli
The backup (client-side)
The client will rsync periodically to the server, using the following
script:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 | #!/bin/bash
set -e
TOUCHFILE=$HOME/.backup/last_backup
# Check if last backup was more than a day before
now=`date +%s`
if [ -e $TOUCHFILE ]; then
age=$(($now - `stat -c %Y $TOUCHFILE`))
else
unset age
fi
[ -n "$age" ] && [ $age -lt 86300 ] && exit 0
nice -n 10 rsync -e "ssh -i $HOME/.backup/id_backup" -avzF
--delete --safe-links $HOME rbackup@mysrv::mycli
touch $TOUCHFILE
|
That script ensures that at most once per day, you will sync to the
server. You can run it (as your user) as often as you'd like (I suggest
hourly via cron). On successful syncs, the server will trigger rsnapshot
to do its magic backup rotation ! Using the same model, you can easily
set up multiple directories or multiple clients.
Like with Martin's solution, you should set up various .rsync-filter
files to exclude the directories and files you don't want copied to the
backup server.
The drawback of this approach is that the server keeps an extra copy of
your backup (in the incoming directory). But in my case, since the
server has plenty of space, I can afford it. It also does not work when
you are away from your backup server.
I hope you find that setup useful, it served me well so far.
Last week I started a new job, working for Rackspace Hosting as the
Release Manager for the Openstack project. I'm still very much working
from home on open source software, so that part doesn't change. However,
there are some subtle differences.
First of all, Openstack is what we call an upstream project. Most of my
open source work so far involves distribution work: packaging and
delivering various open source software components into a
well-integrated, tested and security-maintained distribution. This is
hard work, one that is never completely finished or perfect. It is also
a necessary part of the open source ecosystem: without distributions,
most software would not be easily available for use.
Upstream work, on the other hand, is about developing the software in
the first place. It's a more creative work, in a much more controlled
environment. The Openstack project is the new kid on the block of cloud
computing software, one that strives to become the open source standard
for building cloud infrastructures everywhere. It was announced in July,
so it's relatively young. There are lots of procedures and processes to
put in place, an already-large developer group, and an ever-growing
community of users and partners. The software itself is positioned to
run in high-demand environments: The storage component is in production
use at Rackspace, the compute component is in production use at NASA.
Openstack is planned to fully replace the current Rackspace Cloud
software next year, and a number of governments plan to use it to power
their local cloud infrastructure. Those are exciting times.
What does an open source project Release Manager do ? Well first, as it
says on the tin, it manages the release process. Every 3 or 6 months,
Openstack will release a new version of its components, and someone has
to make sure that that happens. That's OK, but what do I do the other 50
weeks of the year ? Well, release managers also manage the release
cycle. A cycle goes through four stages: Design, Development, QA and
Release. It is the job of the release manager to drive and help the
developer community through those stages, follow work in progress,
making sure everyone knows about the steps and freezes, and granting
exceptions when necessary. At the very end, he must balance between the
importance of a bug and the risk of regression the bugfix introduces:
it's better to release with a known bug than with an unknown regression.
He is ultimately responsible for the delivery, on time, of the complete
release cycle. And yes, if you condense everything to 3 or 6 months,
this is a full-time job :)
My duties also include ensuring that the developers have everything they
need to work at their full potential and that the project is
transparent. I also have to make sure the developer community is a
welcoming environment for prospective new contributors, and present the
project as a technical envangelist in conferences. And if I still have
free time, I may even write some code where I need to scratch an itch.
All in all, it's a pretty exciting job, and I'm very happy to meet
everyone this week at the Openstack design summit in ~~Orlando~~ San
Antonio.
Java is not a first-class citizen in Linux distributions. We generally
have decent coverage for Java libraries, but lots of Java software is
not packaged at all, or packaged in alternate repositories. Some
consider that it's because Linux distribution developers dislike Java
and prefer other languages, like C or Python. The reality is slightly
different.
Java is fine
There is nothing sufficiently wrong with Java that would cause it to
uniformly be a second-class citizen on every distro. It is a widely-used
language, especially in the corporate world. It has a vibrant open
source community. On servers, it generated very interesting stable
(Tomcat) and cutting-edge (Hadoop, Cassandra...) projects. So what
grudge do the distributions hold against Java ?
Distributing distributions
The problem is that Java open source upstream projects do not really
release code. Their main artifact is a complete binary distribution, a
bundle including their compiled code and a set of third-party libraries
they rely on. If you take the Java project point of view, it makes
sense: you pick versions of libraries that work for you, test that
precise combination, and release the same bundle for all platforms. It
makes it easy to use everywhere, especially on operating systems that
don't enjoy the greatness of an unified package management system.
That doesn't play well with how Linux distributions package software. We
want to avoid code duplication (so that a security update in a library
package benefits all software that uses it), so we package libraries
separately. We keep those up to date, to benefit from bugfixes and new
features. We consider libraries to be part of the platform provided by
the Linux distribution.
The Java upstream project consider libraries to be part of the software
bundle they release. So they keep the libraries at a precise version
they tested, and only update them when they really need to. Essentially,
they maintain their own platform of libraries. They do, at their
scale, the same work the Linux distributions do. And that's where the
real problem lies.
Solutions ?
Force software to use your libraries
For simple Java software, stripping the upstream distribution and
forcing it to use your platform libraries can work. But that creates
friction with upstream projects (since you introduce an untested
difference). And that doesn't work with more complex software: swapping
libraries below it will just make it fail.
Package all versions of libraries
The next obvious solution is to make separate packages for every version
of library that the software uses. The problem is that there is no real
convergence on "commonly-used" versions of libraries. There is no ABI
protection, nor general guidelines on versioning. You end up having to
package each and every minor version of a library that the software
happens to want. That doesn't scale well: it creates an explosion in the
number of packages, code duplication, security update nightmares, etc.
Furthermore, sometimes the Java project patches the libraries they ship
with to include a specific feature they need, so it doesn't even match
with a real library version anymore.
Note: The distribution that is the closest to implementing this
approach is Gentoo, through the SLOT system that lets you have several
versions of the same package installed at the same time.
Bundle software with their libraries
At that point, you accept code duplication, so just shipping the precise
libraries together with the software doesn't sound that bad of an idea.
Unfortunately it's not that simple. Linux distributions must build
everything from source code. In most cases, the upstream Java project
doesn't ship the source code used in the libraries it bundles. And what
about the source code of the build dependencies of your libraries ? In
some corner cases, the library project is even abandoned, and its source
code lost...
What can we do to fix it ?
So you could say that the biggest issue the Linux distributions have
with Java is not really about the language itself. It's about an
ecosystem that glorifies binary bundles and not source code. And there
is no easy solution around it, that's why you can often hear Java
packagers in Linux distributions explain how much they hate Java. That's
why there is only a minimal number of Java projects packaged in
distributions. Shall we abandon all hope ?
The utopia solution is to aim for a reference platform, reasonably
up-to-date libraries that are known to work well together, and encourage
all Java upstream developers to use that. That was one of JPackage's
goals, but it requires a lot more momentum to succeed. It's very
difficult, especially since Java developers often use Windows or OSX.
Another plan is to build a parallel distribution mechanism for Java
libraries inside your distro. A Java library wouldn't be shipped as a
package anymore. But I think unified package systems are the glory of
Linux distributions, so I don't really like that option.
Other issues, for reference
There are a few other issues I didn't mention in this article, to
concentrate on the "distributing distributions" aspect. The tarball
distributions don't play nice with the FHS, forcing you to play with
symlinks to try to keep both worlds happy (and generally making both
unhappy). Maven encourages projects to pick precise versions of
libraries and stick to them, often resulting in multiple different
versions of the same library being used in a given project. Java code
tends to build-depend on hundreds of obscure libraries, transforming
seemingly-simple packaging work into a man-year exponential effort.
Finally, the same dependency inflation issue makes it a non-trivial
engagement to contractually support all the dependencies (and build
dependencies) of a given software (like Canonical does for software in
the Ubuntu main repository).
Why do people choose to participate in Open Source ? It's always a mix
of various reasons, so let's try to explore and classify them.
Technical
The first dimension is technical. People like open source because
looking directly in the code gives them the ability to understand
the behavior of their software. No documentation can match that level of
precision. They also like the ability to fix it themselves when it's
broken, rather than relying on usually-broken support contracts. Any
non-Fortune500 that tried to report a bug to Microsoft and get it fixed
will probably get my point. Sometimes, they like the ability to
shape and influence the future of the software, when that software
uses open design mechanisms (like Ubuntu with its free and
open-to-anyone Development Summits). Finally, they may be convinced,
like I am, that open source software development methods result in
better code quality.
Political
Next to the technical dimension, we have a political dimension, more
precisely a techno-political dimension. People like Free software as a
way to preserve end-user freedom, privacy and control over
technology. Some powerful companies will use every trick in the book to
reduce your rights and increase their revenue, so its more and more
important that we are aware of those issues and fight back. Working on
free and open source software is a way to contribute to that effort.
Philosophical
Very close to the political dimension, we are now seeing philosophic
interest in open source software. The 20th century saw the creation of a
consumer class with a new divide between those who produce and those who
consume. This dissociated usage of technology is a self-destroying
model, and contributing models (or participative production models) are
considered to be the solution to fix our societies for the future. Be a
producer and a consumer at the same time and be associated with
technology rather than alienated by it. Open source is an early and
highly successful manifestation of that.
Economical
Back on the ground, there are strong and rational economic reasons for
companies to opt to fund open source development. From most virtuous to
less, we first find companies using the technology internally rather
than selling it : sharing development and maintenance costs among
several users of that same technology makes great sense, and makes very
virtuous open source communities. Next you find companies selling
services around open source software: being the main sponsor of a
project gives you a unique position to leverage your know-how around
software that is freely available. Next you find open core
approaches, from companies making a business selling proprietary add-ons
to those using open source as crippleware. Finally, at the bottom,
you'll find companies using "open source" or "community" as a venture
capitalist honeypot. They don't believe in it, they resist
implementing what it takes to do it, but they like the money that
pretending to do open source will bring them.
Social
A very important dimension of open source is the social dimension. Many
people join open source projects to belong to a cool community that
allows you to prove yourself, gain mastery and climb the ladder of a
meritocracy. If your community doesn't encourage and reward those that
are in this social dimension, you'll miss a huge chunk of potential
contributors. Another social aspect is that doing work in the open (and
in all transparency) is also great publicity for your skills and to get
employment. The main reason I got hired by Canonical was due to my
visible work on Gentoo's Security team, much more than to the rest of my
professional experience. Finally, the sheer ego-flattering sensation
you get by knowing that millions of people are using your work is
definitely a powerful drive.
Ethical
The last dimension is ethical: the idea of directly contributing to the
sum of the world's common knowledge is appealing. Working on open source
software, you just make the world a better place. For example, open
source helps third-world and developing countries to reduce their
external debt, by encouraging the creation of local service companies
rather than encouraging to buy licenses to US companies. That sense of
purpose is what drives a lot of people (including me) to work on
open source.
Did I miss anything ? What drives you to participate on open source ?
Please let me know, by leaving a comment !
Open core is a business model where the base version of a software would
be released as open source while some advanced features would be closed
source. It's been under a lot of
discussion
lately, so I'll just add my 2 cents...
Outside the obvious workaround of the free software principles, there
are well-known
issues
with this model. In particular, it is difficult to set the right limit
between the "community edition" and the "enterprise edition", and you
end up having to refuse legitimate patches that happen to be a feature
in your enterprise edition roadmap. So building a real open source
community on top of the Open Core model can be quite a challenge. But
the main reason why I think it's wrong is purely technical.
I am a perfectionist. I work on open source software because I truly
believe that the open source development methodology ends upcreating
better code. Having all your code out there, up for scrutiny and
criticism, makes you think twice before committing something half-baked.
Allowing everyone to scratch their own itch ensures top motivation of
contributors and quick advancement of new features. And I could go on
and on...
Open Core denies that the open source development model creates better
code. Open Core basically screams: for the basics we use open source,
but for the most advanced features, the enterprise-quality ones, closed
source is at least as good. You end up alienating a potential community
of developers for the benefit of writing closed source code of lesser
quality. You end up using open source just as a VC honeypot.
Open Core advocates
say
that open source software companies need some unfair advantage to
monetize their efforts, and justify Open Core based on that. I'd argue
that selling expertise on a awesome piece of software is a better
business model. It's true it's a longer road to become rich, but I still
think it's the right one.