Hello everybody,
I think it's again time to dive deep into infrastructure design and shake
things up a bit. I suggest that you read the entire e-mail when you have some
time, and for quick recap, here's a tl;dr:
- DebOps requires 'root' all the time, that resulted from poor design choices
and I want to change it.
- Let's remove the 'owner' and 'group' task parameters where possible
and let
the remote host systems dictate ownership instead.
- Roles could be split into unprivileged and privileged, with the latter
defining the system boundaries for the former ones.
- This can bring many useful benefits to security and broader use of DebOps,
but requires significant changes to how the project currently works.
And now, for the main event...
I think that I've made mistake at the beginning of the DebOps development, and
it's now time to rectify it.
Many Ansible roles written around that time (2013) were using the 'sudo'
keyword (today's 'become') at the task level to faciliate execution of tasks
as 'root' account in case the user logged in as their admin account. Perhaps
the influence of Ansible Galaxy was the reason - Galaxy didn't allow you to
distribute playbooks in the same consumable format as the roles; you had to
clumsily use some playbook inside of the role itself or write your own.
I on the other hand started writing roles without 'sudo' keyword in tasks; the
privilege escalation instead happened on the playbook level and because the
playbooks were distributed within the project in a "sane" way, it made sense.
The idea behind it was that users could either add the 'become' keyword to the
playbook they wrote if they wanted to connect as an admin and switch to root,
or leave the 'become' and connect to the root account directly; for the role
that wouldn't make a difference.
This method of gaining privileges on the playbook level would also have an
interesting consequence of allowing the roles to run entirely unprivileged, as
long as the UNIX account the user used to run them had all of the needed
permissions. All the administrator of the system would have to do, is to give
the UNIX account access to needed system groups, or change the file/directory
permissions to allow a given UNIX account access to them, or prepare specific
resources like databases beforehand with the correct access rights.
Unfortunately, both ideas were butchered in the process, by adding static
'root' owner and group parameters in tasks that used 'file',
'copy' 'template'
modules and similar ones, or even specifying ownership as different UNIX
accounts/groups which would require access to 'root' to change the required
filesystem attributes. This meant that the role was required to be run as
'root' and unprivileged operation was not an option. Even today, most of the
DebOps roles inherently require 'root' access on a host, even if most of their
operations is done in an unprivileged UNIX account - for example roles for
applications like GitLab, NetBox and others could be fairly easily modified to
be run entirely unprivileged, but some tasks in them require explicit 'root'
permissions.
This also impacted the way that the DebOps project evolved over the years,
being a tool for fully-privileged system administrators to set up environments
in a centralized fashion. Because the privileged and unprivileged tasks are
intertwined inside of the roles, there's hardly any way to allow the roles to
be executed by users without full 'root' privileges, therefore the burden of
maintaining the infrastructure falls fully on the sysadmins with full access.
Currently there's no sane way to delegate parts of the infrastructure
maintenance outside of this "domain", for example to give junior sysadmins the
rights to maintain an application without giving them access to root on the
host - this could result in giving them unwanted access to LDAP database,
Kerberos, or other services. In short, is an unprivileged and privileged
mish-mash.
There are other examples of this design that impact DebOps today. initial
spark of the ideas I write about in this mail message came from a fork of
DebOps called "BSDOps", specifically after I read and participaded in an issue
thread about 'root' being the primary UNIX group used in DebOps[1]. In it,
I learned that in FreeBSD instead of 'root:root' combination for the
0 UID/GID, 'root:wheel' is used by default. This means that the primary root
account group is not 'root' but 'wheel'. Because DebOps uses the
'root'
group explicitly in tasks, applying the roles that could work on FreeBSD
without changes is impossible without major changes, either to the roles or
to the host on which they operate. If the tasks didn't have the explicit
'owner' and 'group' parameters set in the tasks, that wouldn't be a
problem
- the files and directories would be created using the primary group of the
UNIX account that executed them, in this case 'wheel'.
[1]:
https://github.com/AnotherKamila/bsdops/issues/3
At first I thought that adding another Ansible fact to the 'debops.core' group
that defined the default "root" UNIX group to use in tasks would be the way to
fix this, but after a short while I realized that actually dropping the
explicit owner and group parameters would bring better benefits. Instead of
using Ansible to forcibly set the desired owner and group, why not let the
operating system itself dictate the final result? The Ansible tasks are
executed in the privileged mode anyway, and the 'root:root' owner/group would
be used implicitly by default. And if the task returns with an error because
Ansible cannot create or modify a given resource, it's not a fault of the role
that it cannot do it - it's the fault of the system administrator by not
giving the user sufficient permissions to operate.
Let's try with a different example. At the moment X.509 infrastructure in
DebOps is maintained by the 'debops.pki' role which uses 'root:root'
permissions for almost all files. Let's imagine that instead a 'pki' UNIX
account is created and given full access to the '/etc/pki/realms/' directory,
what would change? The system administrators could delegate access to this
UNIX account, either via 'sudo' or directly via SSH, to a team member without
full 'root' rights, who then can maintain the entire X.509 infrastructure on
all DebOps-managed hosts. From the software perspective nothing significant
would change - services that use the X.509 certificates and require access to
the private keys usually start as 'root', open the required files and then
drop privileges - the fact that the private keys are owned by the 'pki' UNIX
account wouldn't matter to them at all. Other services like 'rsyslogd' that
start as their UNIX accounts directly gain access to the private keys via the
':ssl-cert' UNIX group, to which the 'pki' UNIX account would have access.
And
in case you didn't know, unprivileged UNIX accounts can still modify the
file/directory primary group and change it to any of the groups they are
members of. Going further, the 'pki' UNIX account could be defined in LDAP
server to share the same UID/GID between the hosts, and the '/etc/pki/realms/'
directories could be mounted via a shared NFS mount - this would allow for
instant synchronization of PKI realms on multiple hosts like load balancers,
including access to the same private keys and X.509 certs signed via ACME.
So, how would that look like in the DebOps roles and playbooks? Initially not
much would have to change - most of the modification would be the removal of
the explicit 'owner' and 'group' parameters in tasks that use them. The
roles
are executed as privileged anyway, so the final result should be 'root:root',
same as before. That would be the first step towards more unprivileged use of
the playbooks and roles.
Of course some of the roles use specific, unprivileged UNIX accounts and
groups and the parameters that set them would have to stay, for now. In the
long run, I think that the best approach to this would be to separate the
privileged operations like installation of APT packages, creation of UNIX
accounts and groups, and so on, to 'privileged/' sub-roles, similar to how the
'env/' sub-roles are used in the playbooks to prepare environment variables
for other roles. That way both the privileged and unprivileged side of a given
role can have access to the same set of default variables and operate in the
same environment. This would also mean that the playbooks would have to be
split as well, into normal, unprivileged ones, and the privileged ones
executed initially to prepare the required environment - create UNIX accounts,
grant access to services like databases and filesystem resources. In thusly
created environment unprivileged roles should then be able to run without
issues, either via 'sudo' from the specific admin account, or directly via
SSH, depending on the requirements of the given infrastructure policy.
This probably would result in major changes in how the playbooks are
structured and how everything is deployed. The 'site.yml' and
'common.yml'
playbooks would have to be split into privileged and unprivileged sections,
roles most likely would have to have default variables for the executing UNIX
user account specified in the 'become_user' keyword, and a toggle for the
'become' keyword used in the playbooks - these parameters can easily be
overridden via the Ansible inventory.
The inventory itself was designed as a shared resource between all members of
the team, with distributed management via git repositories. The desired
infrastructure policy can be defined around who can create what changes, with
optional scripts that enforce the policy. I don't anticipate much issues with
this at the moment.
On the other hand, the access to the 'secret/' directory might be a big issue
to overcome. Currently the directory and the 'debops.secret' role is designed
with full access to all secrets in mind. Perhaps splitting the single
EncFS-encrypted directory into multiple ones inside of the 'secret/'
directory, with each subdirectory accessible via its own set of GPG keys of
the team members could be a solution.
What are potential risks of this approach? The obvious one is a loss of
explicit enforcement of the 'root:root' ownership of certain directories or
files. After thinking everything over, I would say that Ansible is a very bad
choice to do this work. Ansible playbooks are reactive, they are executed
either manually or perioditally via cron scripts. I think that if you want to
make sure that a given directory or file has the specific ownership or group,
using a more proactive tool like AIDE[2] or Tiger[3] might give you better
results. These tools are designed with this specific use case in mind and
could be even configured via the privileged roles to ensure that any changes
in the system applied by Ansible are accounted for.
[2]:
http://aide.sourceforge.net
[3]:
http://savannah.nongnu.org/projects/tiger
Dear DebOps Community - what do you think about my proposal? Is this a good
idea to pursue, or is this too risky or impractical to implement in
environments and infrastructure managed by DebOps? Do you have any other
questions or suggestions about improving this concept further? Also, do you
feel that if this were to be implemented, should DebOps v1.0.0 be released
with current "scheme" in place and changed afterwards, or are these
suggestions benefical enough to implement them before the final release of
v1.0.0? I'm waiting eagerly for your comments and criticisms.
Thanks for reading and have fun,
Maciej