In this series, I'll discuss how to strengthen the privilege escalation
model for OpenStack Compute (Nova). Due to the way networking,
virtualization and volume management work, some Nova nodes need to be
able to run some commands as root. To reduce the effects of a potential
compromise (attacker being able to run arbitrary code as the Nova user),
we want to limit the commands that Nova can run as root on a given node
to the strict necessary. Today we'll explain how the current model
works, its limitations, and the groundwork already implemented during
the Diablo cycle to improve that.
Current model: sudo and sudoers
Currently, in a typical Nova deployment, the nodes run under an account
with limited rights (usually called "nova"). When Nova needs to run a
command as root, it prepends "sudo" to the command. The nova packages of
your distribution of choice are supposed to ship a sudoers file that
contains all the commands that nova is allowed to run as root without
providing a password. This is a privilege escalation security model
which is pretty well-known and easy to audit.
Limitations of the current model
That said, in the context of Nova, this model is very limited. The
sudoers file does not allow to efficiently filter arguments, so you can
basically pass any argument to the allowed command... and some of the
commands that nova wants to use are rather open-ended. As an example,
the current nova_sudoers file contains commands like chown, kill,
dd or tee, which are more than enough to compromise a target system
completely.
There are a couple other limitations. The sudoers file belongs to the
distributions packaging, so it's difficult to keep it in sync with the
rest of Nova code when someone wants to add a privileged command. Last
but not least, the same nova_sudoers file is used for any type of Nova
node. A Nova API server, which does not need to run any command as
root, is still allowed to run all the commands that a compute node
requires, for example. Those other limitations could be fixed while
still using sudo and sudoers files, but the first limitation would
remain. Can we do better ?
Substitute a wrapper to sudo
To be able to propose alternative privilege escalation security models,
we first needed to be able to change all the "sudo" calls in the code
and make them potentially use something else. That's what I worked
on late
during the Diablo timeframe: creating a run_as_root option in
nova.utils.execute that would use a configurable root_helper
command (by default, "sudo"), and force all the existing calls to go
through that (rather than blindly calling "sudo" themselves).
Thanks to the default root_helper, everything still behaves the same,
but now we have the possibility to use something else, if we can be
smarter than sudoers files. Like call a wrapper that will do advanced
filtering of the command that nova wants to use. In part 2 of this
series, we'll look into a proposed, alternative Python-based
root_helper and open discussion on its security model.
Last week saw the delivery of the first milestone of the Essex
development cycle for Keystone, Glance, Horizon and Nova. This early
milestone collected about two months of post-Diablo work... but it's not
as busy in new features as most would think, since a big part of those
last two months was spent releasing OpenStack 2011.3 and brainstorming
Essex features.
Keystone delivered their first milestone as a core project, with a few
new features like support for additional
credentials,
service
registration
and using certificate-based SSL client authentication to authenticate
services. It
should be easier to upgrade from now on, with support for database
migrations.
Glance developers were busy preparing significant changes that will land
in the next milestone. Several bugfixes and a few features made it to
essex-1 though, including the long-awaited SSL client
connections.
It also moved to UUID image
identifiers.
The Nova essex-1 effort was mostly spent on bugfixing, with 129 bugs
fixed. New features
include a new XenAPI SM volume
driver,
DHCP support in the Quantum network
manager,
and optional deferred deletion of
instances.
Under the hood, the volume
code was
significantly cleaned up and XML
templates
were added to simplify serialization in extensions.
Essex-1 was also the first official OpenStack milestone for Horizon,
also known as the Dashboard. New features include a instance
details
page, support for managing Nova
volumes
and a new extensible modular
architecture.
The rest of the effort was spent on catching up with the best of core
projects in
internationalization,
developer
documentation,
and QA (frontend
testing
and JS unit
tests).
Now, keep your seatbelt fastened, as we are one month away from essex-2,
where lots of new development work is expected to land !
The OpenStack Essex Design Summit just ended, and several people those
last three days have asked me to give a bit more substance to what I
exactly meant by "Strategic contributions" in my last
article.
Ensure the long-term health of the project by investing in
project-centered resources, right, but what can we do now ? What actions
can we take today ?
Based on the very interesting Summit discussions we had, I think the
strategic contributions that can be made today fall into 4 categories.
Commonality
Brian Lamar had a great session on reviving the OpenStack Common effort:
identifying common functions between OpenStack projects, converge
towards the same implementation, and maintain it in a common library.
The goal is double: present a more uniform face (logs and configuration
files, for example, should follow the same syntax), and make sure that
we don't waste precious development resources on useless duplicate
works. This effort failed in the past due to lack of resources being
dedicated long-term to it, so it sounds like a nice and easy area to
start contributing strategically.
Consistency
The second (and related) area is consistency. Tactical contributions
have advanced the state of very specific features applying to very
specific setups, at the expense of the resulting coherence. Vish lead a
good session on making the featureset between KVM and Xen hypervisors
converge, not only in terms of functions, but also in term of concepts.
I think that analysis needs to happen more generally in OpenStack: is
the resulting product coherent ? How can we plug the holes in those
feature matrixes ?
Security
Another important area that emerged from the Summit, especially with Ray
Hookway's session, is work on security. Strengthen the architecture (to
limit the attack surface and lay defense in depth), formalize the
process around vulnerablity handling and disclosure, and coordinate the
necessary effort on auditing. This work is just getting started, and I
hope I will find time to help setting it up.
Quality
Last but certainly not least, we need to invest in durable quality. Jay
Pipes pushed a number of sessions where we pinpointed the need to
identify the issues (QA), fix them (Bug squads) and prevent them from
happening again (automated tests & continuous integration). That's by
far the most complex area and the most difficult to coordinate, but the
basic resource needed there is manpower, and the setup of
company-neutral common workgroups that everyone can contribute to is the
first step.
Whether you bet your business on OpenStack, or you're just interested in
the long-term health of the open source project, give your developers
time to contribute to those areas and workgroups, and we'll all be a lot
better as a result.
Just after a release, discovery of significant bugs always revives
discussion around the need for maintenance branches or point releases.
Those discussions, however, are not solving the root cause for the
issue, but merely try to do damage control on the consequences.
The root cause for presence of significant bugs in a given release is
not the presence or absence of maintenance branches. It's not about the
choice of time-based cycles, or the length of it. It's about lack of
focus on testing and fixing the release deliverables. If only a few
people work on that, while all the others are busy adding new features
in trunk, delaying your release by one or more weeks won't change
anything.
From tactical to strategic contributions
OpenStack is one of the few open source projects where development is
truly shared across multiple companies. The trick is, most companies
involved so far are doing what I call tactical contributions: adding a
feature that they care about, fix bugs that affect them. Tactical
contributions are great to expand a project scope, community and
mindshare, however they add technical debt. Companies involved need to
move to what I call strategic contributions: funding development
resources that care about the end result, the release deliverables, the
absence of bugs, the coherence of the features.
The obvious comparison point is the Linux kernel. The reason why it's
successful, despite lots of companies only involved in tactical
contributions, is that at its core it has a strong group of key
developers whose primary allegiance goes to the Linux kernel itself, no
matter what company they happen to work for. Those companies understood
the necessity of funding strategic contributions.
Currently, especially in Nova, it's quite difficult to get merge
proposals reviewed, random bugs fixed, integration tests contributed, or
holes in scope covered. That's because most groups are focused on their
own objectives, rather than the common project objectives. That's the
mindset we need to change now, and that's the only thing that can give
us better releases.
The cost of strategic contributions
The problem with strategic contributions is that they are typically more
costly than tactical contributions, which have a more obvious return on
investment. Accepting to have developers on payroll "fixing what needs
to be fixed", or giving 30% free time to all your developers so that
they can work on project objectives rather than only your own is not
that easy. But OpenStack has now proven that it's here to stay, lots of
companies have now bet their strategy on it, so I think the time is now.
If we don't adjust, OpenStack in general (and Nova in particular) will
crumble under the technical debt of tactical contributions, and everyone
involved will lose. We might need to adjust governance to encourage
other companies to invest long-term in project-centered resources. We'll
need to set up open, multi-company workgroups (like the recently-setup
QA team) to clearly show that it's a common effort. It won't happen in a
day, but if we don't change our mindset now, no matter how we adjust the
release cycle, Essex deliverables will be of the same quality as Diablo.
In less than a month the OpenStack development community will gather in
Boston for three days of discussions and brainstorming around the Essex
development cycle.
The main part of the summit is the session tracks. The sessions are
proposed by the participants and should generally be about core or
incubated projects. There are three types of sessions:
- Brainstorm sessions (55 min.) are used to discuss and come up
with a solution for complex issues.
- Rubberstamp sessions (25 min.) are used to present and review an
already-designed plan. Those should generally be linked to a project
blueprint.
- Discovery sessions (25 min.) where experts go into deep detail
into a section of code or feature.
You can already go to http://summit.openstack.org and see or file
session proposals. Deadline for proposals is September 27, and the
sooner you propose, the more chances you have to get accepted. The
proposals will be reviewed by the PTLs and myself and, if accepted, will
get scheduled in one of the available time slots. Sessions about
official Core or Incubated projects will get priority.
The other part of the summit is an
unconference: we will have
a whole room dedicated to 55 min. presentations that will be scheduled
directly on big whiteboards at the summit itself. Any presentation on
any subject vaguely related to OpenStack is acceptable ! We'll also have
half-an-hour worth of 5-minute lightning talks after lunch every day,
also scheduled directly at the summit itself on a first come first serve
basis.
See for reference: http://wiki.openstack.org/Summit
I hope that this mix of scheduled sessions and unconference style will
allow everyone to make the most of those three days. See you there !