Improving Nova privilege escalation model, part 1

In this series, I'll discuss how to strengthen the privilege escalation model for OpenStack Compute (Nova). Due to the way networking, virtualization and volume management work, some Nova nodes need to be able to run some commands as root. To reduce the effects of a potential compromise (attacker being able to run arbitrary code as the Nova user), we want to limit the commands that Nova can run as root on a given node to the strict necessary. Today we'll explain how the current model works, its limitations, and the groundwork already implemented during the Diablo cycle to improve that.

Current model: sudo and sudoers

Currently, in a typical Nova deployment, the nodes run under an account with limited rights (usually called "nova"). When Nova needs to run a command as root, it prepends "sudo" to the command. The nova packages of your distribution of choice are supposed to ship a sudoers file that contains all the commands that nova is allowed to run as root without providing a password. This is a privilege escalation security model which is pretty well-known and easy to audit.

Limitations of the current model

That said, in the context of Nova, this model is very limited. The sudoers file does not allow to efficiently filter arguments, so you can basically pass any argument to the allowed command... and some of the commands that nova wants to use are rather open-ended. As an example, the current nova_sudoers file contains commands like chown, kill, dd or tee, which are more than enough to compromise a target system completely.

There are a couple other limitations.  The sudoers file belongs to the distributions packaging, so it's difficult to keep it in sync with the rest of Nova code when someone wants to add a privileged command. Last but not least, the same nova_sudoers file is used for any type of Nova node. A Nova API server, which does not need to run any command as root, is still allowed to run all the commands that a compute node requires, for example. Those other limitations could be fixed while still using sudo and sudoers files, but the first limitation would remain. Can we do better ?

Substitute a wrapper to sudo

To be able to propose alternative privilege escalation security models, we first needed to be able to change all the "sudo" calls in the code and make them potentially use something else. That's what I worked on late during the Diablo timeframe: creating a run_as_root option in nova.utils.execute that would use a configurable root_helper command (by default, "sudo"), and force all the existing calls to go through that (rather than blindly calling "sudo" themselves).

Thanks to the default root_helper, everything still behaves the same, but now we have the possibility to use something else, if we can be smarter than sudoers files. Like call a wrapper that will do advanced filtering of the command that nova wants to use. In part 2 of this series, we'll look into a proposed, alternative Python-based root_helper and open discussion on its security model.

OpenStack Essex-1 milestone

Last week saw the delivery of the first milestone of the Essex development cycle for Keystone, Glance, Horizon and Nova. This early milestone collected about two months of post-Diablo work... but it's not as busy in new features as most would think, since a big part of those last two months was spent releasing OpenStack 2011.3 and brainstorming Essex features.

Keystone delivered their first milestone as a core project, with a few new features like support for additional credentials, service registration and using certificate-based SSL client authentication to authenticate services. It should be easier to upgrade from now on, with support for database migrations.

Glance developers were busy preparing significant changes that will land in the next milestone. Several bugfixes and a few features made it to essex-1 though, including the long-awaited SSL client connections. It also moved to UUID image identifiers.

The Nova essex-1 effort was mostly spent on bugfixing, with 129 bugs fixed. New features include a new XenAPI SM volume driver, DHCP support in the Quantum network manager, and optional deferred deletion of instances. Under the hood, the volume code was significantly cleaned up and XML templates were added to simplify serialization in extensions.

Essex-1 was also the first official OpenStack milestone for Horizon, also known as the Dashboard. New features include a instance details page, support for managing Nova volumes and a new extensible modular architecture. The rest of the effort was spent on catching up with the best of core projects in internationalization, developer documentation, and QA (frontend testing and JS unit tests).

Now, keep your seatbelt fastened, as we are one month away from essex-2, where lots of new development work is expected to land !

Four areas for strategic contributions in OpenStack

The OpenStack Essex Design Summit just ended, and several people those last three days have asked me to give a bit more substance to what I exactly meant by "Strategic contributions" in my last article. Ensure the long-term health of the project by investing in project-centered resources, right, but what can we do now ? What actions can we take today ?

Based on the very interesting Summit discussions we had, I think the strategic contributions that can be made today fall into 4 categories.

Commonality

Brian Lamar had a great session on reviving the OpenStack Common effort: identifying common functions between OpenStack projects, converge towards the same implementation, and maintain it in a common library. The goal is double: present a more uniform face (logs and configuration files, for example, should follow the same syntax), and make sure that we don't waste precious development resources on useless duplicate works. This effort failed in the past due to lack of resources being dedicated long-term to it, so it sounds like a nice and easy area to start contributing strategically.

Consistency

The second (and related) area is consistency. Tactical contributions have advanced the state of very specific features applying to very specific setups, at the expense of the resulting coherence. Vish lead a good session on making the featureset between KVM and Xen hypervisors converge, not only in terms of functions, but also in term of concepts. I think that analysis needs to happen more generally in OpenStack: is the resulting product coherent ? How can we plug the holes in those feature matrixes ?

Security

Another important area that emerged from the Summit, especially with Ray Hookway's session, is work on security. Strengthen the architecture (to limit the attack surface and lay defense in depth), formalize the process around vulnerablity handling and disclosure, and coordinate the necessary effort on auditing. This work is just getting started, and I hope I will find time to help setting it up.

Quality

Last but certainly not least, we need to invest in durable quality. Jay Pipes pushed a number of sessions where we pinpointed the need to identify the issues (QA), fix them (Bug squads) and prevent them from happening again (automated tests & continuous integration). That's by far the most complex area and the most difficult to coordinate, but the basic resource needed there is manpower, and the setup of company-neutral common workgroups that everyone can contribute to is the first step.

Whether you bet your business on OpenStack, or you're just interested in the long-term health of the open source project, give your developers time to contribute to those areas and workgroups, and we'll all be a lot better as a result.

The next step for OpenStack

Just after a release, discovery of significant bugs always revives discussion around the need for maintenance branches or point releases. Those discussions, however, are not solving the root cause for the issue, but merely try to do damage control on the consequences.

The root cause for presence of significant bugs in a given release is not the presence or absence of maintenance branches. It's not about the choice of time-based cycles, or the length of it. It's about lack of focus on testing and fixing the release deliverables. If only a few people work on that, while all the others are busy adding new features in trunk, delaying your release by one or more weeks won't change anything.

From tactical to strategic contributions

OpenStack is one of the few open source projects where development is truly shared across multiple companies. The trick is, most companies involved so far are doing what I call tactical contributions: adding a feature that they care about, fix bugs that affect them. Tactical contributions are great to expand a project scope, community and mindshare, however they add technical debt. Companies involved need to move to what I call strategic contributions: funding development resources that care about the end result, the release deliverables, the absence of bugs, the coherence of the features.

The obvious comparison point is the Linux kernel. The reason why it's successful, despite lots of companies only involved in tactical contributions, is that at its core it has a strong group of key developers whose primary allegiance goes to the Linux kernel itself, no matter what company they happen to work for. Those companies understood the necessity of funding strategic contributions.

Currently, especially in Nova, it's quite difficult to get merge proposals reviewed, random bugs fixed, integration tests contributed, or holes in scope covered. That's because most groups are focused on their own objectives, rather than the common project objectives. That's the mindset we need to change now, and that's the only thing that can give us better releases.

The cost of strategic contributions

The problem with strategic contributions is that they are typically more costly than tactical contributions, which have a more obvious return on investment. Accepting to have developers on payroll "fixing what needs to be fixed", or giving 30% free time to all your developers so that they can work on project objectives rather than only your own is not that easy. But OpenStack has now proven that it's here to stay, lots of companies have now bet their strategy on it, so I think the time is now.

If we don't adjust, OpenStack in general (and Nova in particular) will crumble under the technical debt of tactical contributions, and everyone involved will lose. We might need to adjust governance to encourage other companies to invest long-term in project-centered resources. We'll need to set up open, multi-company workgroups (like the recently-setup QA team) to clearly show that it's a common effort. It won't happen in a day, but if we don't change our mindset now, no matter how we adjust the release cycle, Essex deliverables will be of the same quality as Diablo.

Proposing sessions for the Essex Design Summit

In less than a month the OpenStack development community will gather in Boston for three days of discussions and brainstorming around the Essex development cycle.

The main part of the summit is the session tracks. The sessions are proposed by the participants and should generally be about core or incubated projects. There are three types of sessions:

  • Brainstorm sessions (55 min.) are used to discuss and come up with a solution for complex issues.
  • Rubberstamp sessions (25 min.) are used to present and review an already-designed plan. Those should generally be linked to a project blueprint.
  • Discovery sessions (25 min.) where experts go into deep detail into a section of code or feature.

You can already go to http://summit.openstack.org and see or file session proposals. Deadline for proposals is September 27, and the sooner you propose, the more chances you have to get accepted. The proposals will be reviewed by the PTLs and myself and, if accepted, will get scheduled in one of the available time slots. Sessions about official Core or Incubated projects will get priority.

The other part of the summit is an unconference: we will have a whole room dedicated to 55 min. presentations that will be scheduled directly on big whiteboards at the summit itself. Any presentation on any subject vaguely related to OpenStack is acceptable ! We'll also have half-an-hour worth of 5-minute lightning talks after lunch every day, also scheduled directly at the summit itself on a first come first serve basis.

See for reference: http://wiki.openstack.org/Summit

I hope that this mix of scheduled sessions and unconference style will allow everyone to make the most of those three days. See you there !