Grizzly, the day after

The OpenStack Grizzly release of yesterday officially closes the Grizzly development cycle. But while I try to celebrate and relax, I can't help from feeling worried and depressed on the hours following the release, as we discover bugs that we could have (should have ?) caught before release. It's a kind of postpartum depression for release managers; please consider this post as part of my therapy.

Good

We'd naturally like to release when the software is "ready", "good", or "bug-free". Reality is, with software of the complexity of OpenStack, onto which we constantly add new features, there will always be bugs. So, rather than releasing when the software is bug-free, we "release" when waiting more would not really change the quality of the result. We release when it's time.

In OpenStack, we invest a lot in automated testing, and each proposed commit goes through an extensive set of unit and integration tests. But with so many combinations of deployment options, there are still dark corners that will only be explored by users as they apply the new code to their specific use case. We encourage users to try new code before release, by publishing and making noise about milestones, release candidates... But there will always be a significant number of users who will not try new code until the point in time we call "release". So there will always be significant bugs that are discovered (and fixed) after release day.

The best point in time

What we need to do is pick the right moment to "release": when all known release-critical issues are fixed. When the benefits of waiting more are not worth the drawbacks of distracting developers from working on the next development cycle, or of abandoning the benefits of a predictable time-based common release.

That's the role of the Release Candidates that we produce in the weeks before the release day. When we fixed all known release-critical bugs, we create an RC. If we find new ones before the release day, we fix them and regenerate a new release candidate. On release day, we consider the current release candidates as "final" and publish them.

The trick, then, is to pick the right length for this feature-frozen period leading to release, one that gives enough time for each of the projects in OpenStack to reach this the first release candidate (meaning, "all known release-critical bugs fixed"), and publish this RC1 to early testers. For Grizzly, it looked like this:

grizzly RCs

This graph shows the number of release-critical bugs in various projects over time. We can see that the length of the pre-release period is about right: waiting more would not have resulted in a lot more bugs to be fixed. We basically needed to release to get more users to test and report the next bugs.

The Grizzly is still alive

The other thing we need to have is a process to continue to fix bugs after the "release". We document the most obvious regressions in the constantly-updated Release Notes. And we handle the Grizzly bugs using the stable release update process.

After release, we maintain a branch where important bugfixes are backported and from which we'll publish point releases. This stable/grizzly branch is maintained by the OpenStack stable maintenance team. If you see a bugfix that should definitely be backported, you can tag the corresponding bug in Launchpad with the grizzly-backport-potential tag to bring it to the team's attention. For more information on the stable branches, I invite you to read this wiki page.

Being pumped up again

The post-release depression usually lasts a few days, until I realize that not so many bugs were reported. The quality of the new release is actually always an order of magnitude better than the previous releases, due to 6-month worth of improvements in our amazing continuous integration system ! We actually did an incredible job, and it will only get better !

The final stage of recovery is when our fantastic community gets all together at the OpenStack Summit. 4 days to witness and celebrate our success. 4 days to recharge the motivation batteries, brainstorm and discuss what we'll do over the next 6 months. We are living awesome times. See you there.

UDS is dead, long live ODS

Back from a (almost) entirely-offline week vacation, a lot of news were waiting for me. A full book was written. OpenStack projects graduated. An Ubuntu rolling release model was considered. But what grabbed my attention was the announcement of UDS moving to a virtual event. And every 3 months. And over two days. And next week.

As someone who attended all UDSes (but one) since Prague in May 2008, as a Canonical employee then as an upstream developer, that was quite a shock. We all have fond memories and anecdotes of stuff that happened during those Ubuntu developer summits.

What those summits do

For those who never attended one, UDS (and the OpenStack Design Summits that were modeled after them) achieve a lot of goals for a community of open source developers:

  1. Celebrate recent release, motivate all your developer community for the next 6 months
  2. Brainstorm early ideas on complex topics, identify key stakeholders to include in further design discussion
  3. Present an implementation plan for a proposed feature and get feedback from the rest of the community before starting to work on it
  4. Reduce duplication of effort by getting everyone working on the same type of issues in the same room and around the same beers for a few days
  5. Meet in informal settings people you usually only interact with online, to get to know them and reduce friction that can build up after too many heated threads

This all sounds very valuable. So why did Canonical decide to suppress UDSes as we knew them, while they were arguably part of their successful community development model ?

Who killed UDS

The reason is that UDS is a very costly event, and it was becoming more and more useless. A lot of Ubuntu development happens within Canonical those days, and UDS sessions gradually shifted from being brainstorming sessions between equal community members to being a formal communication of upcoming features/plans to gather immediate feedback (point [3] above). There were not so many brainstorming design sessions anymore (point [2] above, very difficult to do in a virtual setting), with design happening more and more behind Canonical curtains. There is less need to reduce duplication of effort (point [4] above), with less non-Canonical people starting to implement new things.

Therefore it makes sense to replace it with a less-costly, purely-virtual communication exercise that still perfectly fills point [3], with the added benefits of running it more often (updating everyone else on status more often), and improving accessibility for remote participants. If you add to the mix a move to rolling releases, it almost makes perfect sense. The problem is, they also get rid of points [1] and [5]. This will result in a even less motivated developer community, with more tension between Canonical employees and non-Canonical community members.

I'm not convinced that's the right move. I for one will certainly regret them. But I think I understand the move in light of Canonical's recent strategy.

What about OpenStack Design Summits ?

Some people have been asking me if OpenStack should move to a similar model. My answer is definitely not.

When Rick Clark imported the UDS model from Ubuntu to OpenStack, it was to fulfill one of the 4 Opens we pledged: Open Design. In OpenStack Design Summits, we openly debate how features should be designed, and empower the developers in the room to make those design decisions. Point [2] above is therefore essential. In OpenStack we also have a lot of different development groups working in parallel, and making sure we don't duplicate effort is key to limit friction and make the best use of our resources. So we can't just pass on point [4]. With more than 200 different developers authoring changes every month, the OpenStack development community is way past Dunbar's number. Thread after thread, some resentment can build up over time between opposed developers. Get them to informally talk in person over a coffee or a beer, and most issues will be settled. Point [5] therefore lets us keep a healthy developer community. And finally, with about 20k changes committed per year, OpenStack developers are pretty busy. Having a week to celebrate and recharge motivation batteries every 6 months doesn't sound like a bad thing. So we'd like to keep point [1].

So for OpenStack it definitely makes sense to keep our Design Summits the way they are. Running them as a track within the OpenStack Summit allows us to fund them, since there is so much momentum around OpenStack and so many people interested in attending those. We need to keep improving the remote participation options to include developers that unfortunately cannot join us. We need to keep doing it in different locations over the world to foster local participation. But meeting in person every 6 months is an integral part of our success, and we'll keep doing it.

Next stop is in Portland, from April 15 to April 18. Join us !

OpenStack at FOSDEM '13

In 3 weeks, free and open source software developers will converge to Brussels for 2+ days of talks, discussions and beer. FOSDEM is still the largest gathering for our community in Europe, and it will be a pleasure to meet again with longtime friends. Note that FOSDEM attendance is free as in beer, and requires no registration.

OpenStack will be present with a number of talks in the Cloud devroom in the Chavanne auditorium on Sunday, February 3rd:

There will also be OpenStack mentions in various other talks during the day: Martyn Taylor should demonstrate OpenStack Horizon in conjunction with Aeolus Image Factory at 13:30, and Vangelis Koukis will present Synnefo, which provides OpenStack APIs, at 14:00.

Finally, I'll also be giving a talk, directed to Python developers, about the OpenStack job market sometimes Sunday in the Python devroom (room K.3.401): Get a Python job, work on OpenStack.

I hope you will join us in the hopefully-not-dead-frozen-this-time and beautiful Brussels !

What to expect from Grizzly-1 milestone

The first milestone of the OpenStack Grizzly development cycle is just out. What should you expect from it ? What significant new features were added ?

The first milestones in our 6-month development cycles are traditionally not very featureful. That's because we are just out of the previous release, and still working heavily on bugs (this milestone packs 399 bugfixes !). It's been only one month since we had our Design Summit, so by the time we formalize its outcome into blueprints and roadmaps, we are just getting started with feature implementation. Nevertheless, it collects a lot of new features and bugfixes that landed in our master branches since mid-September, when we froze features in preparation for the Folsom release.

Keystone is arguably where the most significant changes landed, with a tech preview of the new API version (v3), with policy and RBAC access enabled. A new ActiveDirectory/LDAP identity backend was also introduced, while the auth_token middleware is now shipped with the Python Keystone client.

In addition to fixing 185 bugs, the Nova crew removed nova-volume code entirely (code was kept in Folsom for compatibility reasons, but was marked deprecated). Virtualization drivers no longer directly access the database, as a first step towards completely isolating compute nodes from the database. Snapshots are now supported on raw and LVM disks, in addition to qcow2. On the hypervisors side, the Hyper-V driver grew ConfigDrive v2 support, while the XenServer one can now use BitTorrentas an image delivery mechanism.

The Glance client is no longer copied within Glance server (you can still find it with the Python client library), and the Glance SimpleDB driver reaches feature parity with the SQLAlchemy based one. A number of cleanups were implemented in Cinder, including in volume drivers code layout and API versioning handling.  Support for XenAPI storage manager for NFS is back, while the API grew a call to list bootable volumes and a hosts extension to allow service status querying.

The Quantum crew was also quite busy. The Ryu plugin was updated and now features tunnel support. The preparatory work to add advanced services was landed, as well as support for highly-available RabbitMQ queues. Feature parity gap with nova-network was reduced by the introduction of a Security Groups API.

Horizon saw a lot of changes under the hood, including unified configuration. It now supports Nova flavor extra specs. As a first step towards providing cloud admins with more targeted information, a system info panel was added. Oslo (formerly known as openstack-common) also saw a number of improvements. The config module (cfg) was ported to argparse. Common service management code was pushed to the Oslo incubator, as well as a generic policy engine.

That's only a fraction of what will appear in the final release of Grizzly, scheduled for April 2013. A lot of work was started in the last weeks but will only land in the next milestone. To get a glimpse of what's coming up, you can follow the Grizzly release status page !

Why OpenStack doesn't need a Linus

As comparing OpenStack with Linux becomes an increasingly popular exercise, it's only natural that people and press articles start to ask where the Linus Torvalds of OpenStack is, or who the Linus Torvalds of OpenStack should be. This assumes that technical leaders could somehow be appointed in OpenStack. This assumes that the single dictator model is somehow reproducible or even desirable. And this assumes that the current technical leadership in OpenStack is somehow lacking. I think all those three assumptions are wrong.

Like Linux, OpenStack is an Open Innovation project: an independent, common technical playground that is not owned by a single company and where contributors form a meritocracy. Assuming you can somehow appoint leaders in such a setting shows a great ignorance of how those projects actually work. Leaders in an open innovation project don't derive their authority from their title. They derive their authority from the respect that the other contributors have for them. If they lose this respect, their leadership will be disputed and you'll face the risk of a fork. Project leaders are not appointed, they are grown. Linus wasn't appointed, and he didn't decide one day that he should lead Linux. He grew as the natural leader for this community over time.

Maybe people asking for a Linus of OpenStack like the idea of a single dictator sitting at the top. But that setup is not easily reproduced. Three conditions need to be met: you have to be the founder (or first developer) of the project, your project has to grow sufficiently slowly so that you can gather the undisputed respect of incoming new contributors, and you have to keep your hands deep in technical matters over time (to retain that respect). Linus checked all those boxes. In OpenStack, there were a number of developers involved in it from the start, and the project grew really fast, so a group of leaders emerged, rather than a single undisputed figure.

I'd also argue that the "single leader" model is not really desirable. OpenStack is not a single project, it's a collection of projects. It's difficult to find a respected expert in all areas, especially as we grew by including new projects within the OpenStack collection. In addition to that, Linux as a project still struggles with its bus factor of 1 and how it would survive Linus. Organizing your technical leadership in a way that makes it easier for leadership to transition to new figures makes a stronger and more durable community.

Finally, asking for a Linus of OpenStack is somehow implying that the current technical leadership is insufficient, which is at best ignorant, at worse insulting. Linus fills two roles within Linux: the technical lead role (final decision on technical matters, the buck stops here) and the release management role (coordinating the release development cycles and producing releases). OpenStack has project technical leads ("PTLs") to fill the first role, and a (separate) release manager to fill the second. In addition to that, to solve cross-project issues, we have a Technical Committee (which happens to include the PTLs and release manager).

If you are under the impression that this multi-headed technical leadership might result in non-opiniated choices, think twice. The new governance model establishing the Technical Committee and the full authority of it over all technical matters in OpenStack is only a month old, previously the project (and its governance model) was still owned by a single company. The PTLs and Technical Committee members are highly independent and have the interests of the OpenStack project as their top priority. Most of them actually changed employers over the last year and continued to work on the project.

I think what the press and the pundits actually want is a more visible public figure, that would make stronger design choices, if possible with nice punch lines that would make good quotes. It's true that the explosive growth of the project did not leave a lot of time so far for technical leaders of OpenStack to engage with the press. It's true that the OpenStack leadership tends to use friendly words and prefer consensus where possible, which may not result in memorable quotes. But confusing that with weakness is really a mistake. Technical leadership in OpenStack is just fine the way it is, thank you for asking.