[debops-users] RFC: Merge all DebOps git repositories into one

Sunday, 13 August 2017

Hello everyone,

TL;DR: I want to merge all the Ansible roles, playbooks, documentation, tests
and software repositories back into one DebOps git repository to improve
development process. I want to be sure that it's done for the right reasons.

A few days ago, Daniel Vetter published a blog post about how GitHub is not
a good place to host the Linux kernel:

    http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html

In short, David explains how on GitHub big projects are commonly split into
multiple git repositories, each one with their own issues and pull requests.
Linux is developed using one "monotree", stored in a git repository that is
forked multiple times, and changes to specific code are then merged back into
the main Linux repository.

While reading this blog post, I thought about how DebOps is currently
maintained, what are the pain points and how the development process could be
improved. But first...

How did we end up here?
-----------------------

DebOps wasn't always using multiple git repositories in one GitHub
organization. In the beginning, the code with multiple roles and playbooks was
in a single repository:

   first iteration:  https://github.com/drybjed/ansible-aiua
   second iteration: https://github.com/drybjed/ginas/

There were two reasons at the time for a split into multiple git repositories,
each repository containing one Ansible role:

1. Testing times on Travis-CI were approaching the 20-minute time limit, since
   the idea was to test as much Ansible roles as possible. With the number of
   roles increasing, there was a need to redesign the tests so that roles could
   be tested separately - multiple git repositories allowed for that.

2. There was a demand for some of the DebOps roles to be available via Ansible
   Galaxy, which supports a model where an Ansible role is in a separate git
   repository. In the end, all of the DebOps roles are available on Galaxy, but
   I'm not sure how many of them are used that way exclusively these days.

So, the split happened and immediately it was apparent that in order for the
project to work as expected, any role included in the playbook needed to be
installed and available to Ansible. Since the 'debops-playbooks' repository,
which at this point became a 'scaffolding" that joined the roles together,
included all of the DebOps roles, all of them needed to be installed by the
user.

This automatically points to the usage of the Ansible Galaxy 'requirements.txt'
file. However, DebOps project is a combination of Ansible roles that do the
actual work, and Ansible playbooks which define what roles should be executed
on which hosts. Ansible Galaxy only supports installation of roles, not
playbooks, therefore there was a need for an automated way to download all of
the project's git repositories and put them in the correct place for Ansible to
use. And that is how the custom scripts came to be - at that point DebOps had
56 roles and that number was expected to increase, so there wasn't any better
way to handle that otherwise.

Over time, project evolved to the current state that it is today due to input
from the users. The roles were tested using a separate test-suite repository
which allowed to define a consistent test environment and decoupled any issues
with tests from the git commits in role repositories themselves. Documentation
went through a design phase where it was decided that due to the split nature
of the git repositories, documentation for each role should be included with
the respective role. All of them then were merged using 'git submodule'
commands into one giant documentation repository and pushed to ReadTheDocs for
consumption. The design of the role dependencies has changed from using "hard
dependencies" in the roles themselves, to "soft dependencies" on the
playbook
level, so that roles could be used separately without the need to use the
dependencies. Roles themselves became more and more self-contained, that led to
design of the standardized ways the roles passed data around using role
dependent variables and Ansible local facts. To ensure that the project's code
is validated, git commits from project developers are now signed by their GPG
keys, although code validation that would use this is not yet implemented. Due
to how GitHub organization controls work, a separate 'debops-contrib'
organization was created for third-party Ansible roles that are not yet part of
the DebOps project, and are expected to be added at some point.

What's the current state of the project
---------------------------------------

In hindsight, some of these decisions were good (mostly related to the role
code design and inter-role communication, at least in my opinion), and some
were bad, but unavoidable within the selected development framework (using git
submodules for documentation results in a very slow performance of the 'docs'
repository). I think that the most glaring issue and the easiest to spot is the
installation or update of all the git repositories that contain DebOps
playbooks and roles. For comparsion, I cloned the Ansible repository (~139 MB)
to a new directory and timed it:

    $ time git clone https://github.com/ansible/ansible
    4,90s user 1,18s system 65% cpu 9,345 total

Running the 'debops-update' script, which clones the 'debops-playbooks'
repository and based on that, clones all of the DebOps role repositories (all
of which has ~152 MB):

    $ time debops-update
    2,20s user 0,81s system 2% cpu 2:13,13 total

This is 10s for cloning 1 repository vs 2:13s for cloning 121 git repositories
(updates of existing repositories are slightly faster). And this is usually
done each time to get the latest changes, otherwise you would need to know what
git repository changed and pull the changes manually.

Some of the development process is required to be done by a human, notably
GPG signing of each git commit or merge. Due to how GPG signing works, this
cannot be done on GitHub itself through a web browser. When a pull request is
accepted, the maintainer of a given role (in most cases, drybjed), fetches the
involved branch from GitHub manually to a local git repository of a given role,
merges and signs them, and pushes the new changes to GitHub. This cannot be
changed or resolved without modifying the GPG signing process - using a bot to
automatically sign commits requires a trusted infrastructure which the project
doesn't have at the moment, and dropping the GPG signing may result in an
untrusted code being introduced into the project. Since DebOps has essentially
'root' privileges in a production environment, not signing commits is not an
option, especially that the code can be easily forked and hosted by
third-parties (for example, https://github.com/rpmops/). We can't really do
anything about this, but it's worth keeping in mind.

Adding a new Ansible role to the existing infrastructure is an involved process
in itself:

- create new git repository on GitHub;

- update list of known GitHub repositories on Travis-CI, enable testing of the
  new repository;

- create new test in the 'test-suite' repository;

- push the new role code to GitHub, check if tests on Travis-CI pass, fix any
  issues that arise in the test, or in the role itself;

- when the new role passess successfully, tag it to mark a new release;

- add the documentation of the new role to the 'docs' repository (very slow due
  to use of git submodules);

- update the list of GitHub repositories in Ansible Galaxy, import the new role
  and ensure that it's correctly named;

- add the role and its playbook to the 'debops-playbooks' repository, which
  officially enables the role in the DebOps project;

After that, changes to existing DebOps roles are relatively easy to manage
- after an upstream repository is forked, new commits are pushed there and
a pull request is created on GitHub. When changes are accepted, the new changes
are merged and signed locally.

DebOps roles are versioned using git tags. This can be useful when you use
a specific DebOps role in your own set of Ansible roles, but is mostly
irrelevant for the 'debops-playbooks' repository which always pulls the latest
commits in the 'master' branch of the DebOps roles, ignoring the tags.
Currently role versioning matters most in the Travis-CI tests, which pull the
required roles using Ansible Galaxy, which by default points to the latest
tagged version of the role. Due to this, any changes that affect multiple roles
(for example changes in 'debops.ferm' or recent changes to
'debops.postfix'
need to be carefully coordinated, so that the main role is updated and tagged
first, and then any roles that depend on it can be properly tested on Travis.

The DebOps project currently does not have any concept of a "stable release".
Individual roles are versioned, but the 'debops-playbooks' repository hasn't
been tagged in a long time. The current project architecture doesn't point to
any sane way to resolve this issue - there was an idea of creating separate git
branches, each branch would contain the Ansible Galaxy 'requirements.yml' file
with specific versions of each role, but that was quickly dropped due to being
too much work to track everything manually (remember that each change would
need to be GPG-signed).

The current architecture of the project resulted in problems with packaging it
for Debian:

    https://github.com/debops/docs/issues/132
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=820367
    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=819816

The documentation proved to be unsuitable for packkaging, therefore without any
significant changes, I don't think that DebOps will be available in Debian
Archive.

The 'debops-contrib' GitHub organization hasn't been properly integrated into
the project itself. The roles in this organization were meant to be moved to
the main DebOps roganization after the current roles included in the project
are updated to latest code standards. Unfortunately, this hasn't happended fast
enough as I hoped, and I'm not sure when this will be picked up.

It seems to me that the current development model results in greatly favoring
the use of DebOps roles separately (they are properly versioned and taken care
of, easily available through Ansible Galaxy), but this affects the usage of the
project as a whole (no "stable" releases which makes the use of the project
risky in production, any significant changes need to be carefully coordinated).

Merging everything back together
--------------------------------

Can the different DebOps roles, playbooks, documentation and other software git
repositories be merged into one repository? With a bit of changes, yes. The
current Ansible code within DebOps essentially is one "repository" split into
separate git repositories - you can see it by looking at contents of the
'~/.local/share/debops/' directory after installation. Merging the roles back
with the playbooks shouldn't be a problem, apart from saving the git commit
history.

After the merge, some things could be moved around to improve the directory
structure. Documentation of the roles could be moved to one 'docs/' directory,
and cleaned up to remove redundant documentation like LICENSE files, etc. This
would improve documentation management and allow to use links between different
role documentation without the need to use separate link index files. This
should also resolve the Debian packaging issue.

Having one repository with roles and playbooks could enable easy creation of
stable releases based on branches. The Semantic Versioning could be utilized to
keep a few stable branches, while development is done in the 'master' branch.
The repository could have reserved directories for custom roles or playbooks,
which would allow the users to maintain their own fork and merge any changes in
upstream with relative ease.

Adding new roles to the project would be as easy as currently changing the
existing ones without the overhead of creating new repositories, etc. Any
changes that span multiple DebOps roles could be tracked in one pull request
instead of separate ones for each role, and could be coordinated together.
The ownership of the different roles or code could be managed by the
CODEOWNERS/MAINTAINERS file which specifies which users should review any
changes.

There would be no need to maintain a separate 'debops-contrib' GitHub
organization - third party roles could be prepared and merged in forked git
repositories, or easily maintained in upstream repositories that apply changes
from the DebOps repository.

The use of the Travis-CI for tests is a problem in this model. Running the
whole playbook at once takes too long, and some roles are mutually exclusive
(for example 'apache' and 'nginx'), so using just one test for the whole
project is not feasible. Travis-CI has a "build matrix" feature which allows to
create separate jobs, which could be used to create a set of tests for
different parts of the code, however the limit of maximum 200 jobs, and good
behaviour suggests that this shouldn't be used to test all of the roles
separately, as it is done now. Perhaps the main repository could do just a few
tests for major roles and playbooks (ownCloud, GitLab, Netbox, ie. any
user-facing application) which should test a relatively large part of the
project codebase, and testing of all the roles separately could be done on
a new infrastructure based on GitLab-CI.

Some roles still might be popular enough to warrant their availability
separately through Ansible Galaxy. This could be done automatically by
extracting the selected roles and publishing them in their own separate git
repositories, signed by a bot or a human, depending on number of roles. Ansible
configuration allows to use multiple role directories using a 'roles_path'
configuration variable with $PATH-like syntax, therefore cloning the main
DebOps repository and adding the role directory in the 'roles_path'
configuration variable shouldn't be a big issue. Roles would still be designed
to be self-contained, so this use case will stay valid.

The DebOps scripts will need to be updated to support the new deployment model.
This might be a good enough reason to finally rewrite them from stratch and
update the user interface to support subcommands. In similar fashion, third
party code that was created to support DebOps could be merged in the main
repository as well - example roles, test suite, example playbooks. We can
carefully design the final directory structure to support this.

Final thoughts
--------------

This is just a proposal - I would like to hear the thoughts about this from
other DebOps users before changing anything further. I think that proper way to
do this would be first to review and update the DebOps Policy and Guidelines
(https://github.com/debops/debops-policy/) to reflect the new state of the
project. Then work can be done to merge all existing code into one repository
to ensure that the commit history is preserved. After that, work can be done on
redesigning the existing code, updating documentation, etc.

Cheers,
Maciej

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016