[debops-users] RFC: Merge all DebOps git repositories into one

hvjunk hvjunk at gmail.com
Mon Aug 14 22:34:36 CEST 2017

Hi MAaciej,

From personal experiences, the best advise I’ll give is this:

 If the component(s) are *not* inter-dependent, like the kernel’s network driver intimately depending/using  the 
memory allocation subsystem(s) and the PCI systems are defining how the USB, SATA and network drivers connects into the kernel, 
then there is a case to have them in separate modules/repositories.

The whole of the debops playbooks and roles, are quite inter-dependent, so it’ll make sense to have them in a single repository.

The question perhaps goes with the debops scripts/wrappers, how much they are depending on the playbooks/roles to be a separate repository?

In the Ansible world, the “normal” set of roles, are quite autonomous, and those are the ones in Ansible's Galaxy.

The Debops roles, are too integrated and connected (in my opinion) to truly be autonomous Ansible Galaxy roles, just look at the ferm role “configured" from the postgresql/etc. roles to define the port usages.

At least, those are my humble opinions ;)


> On 13 Aug 2017, at 21:38 , Maciej Delmanowski <drybjed at drybjed.net> wrote:
> Hello everyone,
> TL;DR: I want to merge all the Ansible roles, playbooks, documentation, tests
> and software repositories back into one DebOps git repository to improve
> development process. I want to be sure that it's done for the right reasons.
> A few days ago, Daniel Vetter published a blog post about how GitHub is not
> a good place to host the Linux kernel:
>    http://blog.ffwll.ch/2017/08/github-why-cant-host-the-kernel.html
> In short, David explains how on GitHub big projects are commonly split into
> multiple git repositories, each one with their own issues and pull requests.
> Linux is developed using one "monotree", stored in a git repository that is
> forked multiple times, and changes to specific code are then merged back into
> the main Linux repository.
> While reading this blog post, I thought about how DebOps is currently
> maintained, what are the pain points and how the development process could be
> improved. But first...
> How did we end up here?
> -----------------------
> DebOps wasn't always using multiple git repositories in one GitHub
> organization. In the beginning, the code with multiple roles and playbooks was
> in a single repository:
>   first iteration:  https://github.com/drybjed/ansible-aiua
>   second iteration: https://github.com/drybjed/ginas/
> There were two reasons at the time for a split into multiple git repositories,
> each repository containing one Ansible role:
> 1. Testing times on Travis-CI were approaching the 20-minute time limit, since
>   the idea was to test as much Ansible roles as possible. With the number of
>   roles increasing, there was a need to redesign the tests so that roles could
>   be tested separately - multiple git repositories allowed for that.
> 2. There was a demand for some of the DebOps roles to be available via Ansible
>   Galaxy, which supports a model where an Ansible role is in a separate git
>   repository. In the end, all of the DebOps roles are available on Galaxy, but
>   I'm not sure how many of them are used that way exclusively these days.
> So, the split happened and immediately it was apparent that in order for the
> project to work as expected, any role included in the playbook needed to be
> installed and available to Ansible. Since the 'debops-playbooks' repository,
> which at this point became a 'scaffolding" that joined the roles together,
> included all of the DebOps roles, all of them needed to be installed by the
> user.
> This automatically points to the usage of the Ansible Galaxy 'requirements.txt'
> file. However, DebOps project is a combination of Ansible roles that do the
> actual work, and Ansible playbooks which define what roles should be executed
> on which hosts. Ansible Galaxy only supports installation of roles, not
> playbooks, therefore there was a need for an automated way to download all of
> the project's git repositories and put them in the correct place for Ansible to
> use. And that is how the custom scripts came to be - at that point DebOps had
> 56 roles and that number was expected to increase, so there wasn't any better
> way to handle that otherwise.
> Over time, project evolved to the current state that it is today due to input
> from the users. The roles were tested using a separate test-suite repository
> which allowed to define a consistent test environment and decoupled any issues
> with tests from the git commits in role repositories themselves. Documentation
> went through a design phase where it was decided that due to the split nature
> of the git repositories, documentation for each role should be included with
> the respective role. All of them then were merged using 'git submodule'
> commands into one giant documentation repository and pushed to ReadTheDocs for
> consumption. The design of the role dependencies has changed from using "hard
> dependencies" in the roles themselves, to "soft dependencies" on the playbook
> level, so that roles could be used separately without the need to use the
> dependencies. Roles themselves became more and more self-contained, that led to
> design of the standardized ways the roles passed data around using role
> dependent variables and Ansible local facts. To ensure that the project's code
> is validated, git commits from project developers are now signed by their GPG
> keys, although code validation that would use this is not yet implemented. Due
> to how GitHub organization controls work, a separate 'debops-contrib'
> organization was created for third-party Ansible roles that are not yet part of
> the DebOps project, and are expected to be added at some point.
> What's the current state of the project
> ---------------------------------------
> In hindsight, some of these decisions were good (mostly related to the role
> code design and inter-role communication, at least in my opinion), and some
> were bad, but unavoidable within the selected development framework (using git
> submodules for documentation results in a very slow performance of the 'docs'
> repository). I think that the most glaring issue and the easiest to spot is the
> installation or update of all the git repositories that contain DebOps
> playbooks and roles. For comparsion, I cloned the Ansible repository (~139 MB)
> to a new directory and timed it:
>    $ time git clone https://github.com/ansible/ansible
>    4,90s user 1,18s system 65% cpu 9,345 total
> Running the 'debops-update' script, which clones the 'debops-playbooks'
> repository and based on that, clones all of the DebOps role repositories (all
> of which has ~152 MB):
>    $ time debops-update
>    2,20s user 0,81s system 2% cpu 2:13,13 total
> This is 10s for cloning 1 repository vs 2:13s for cloning 121 git repositories
> (updates of existing repositories are slightly faster). And this is usually
> done each time to get the latest changes, otherwise you would need to know what
> git repository changed and pull the changes manually.
> Some of the development process is required to be done by a human, notably
> GPG signing of each git commit or merge. Due to how GPG signing works, this
> cannot be done on GitHub itself through a web browser. When a pull request is
> accepted, the maintainer of a given role (in most cases, drybjed), fetches the
> involved branch from GitHub manually to a local git repository of a given role,
> merges and signs them, and pushes the new changes to GitHub. This cannot be
> changed or resolved without modifying the GPG signing process - using a bot to
> automatically sign commits requires a trusted infrastructure which the project
> doesn't have at the moment, and dropping the GPG signing may result in an
> untrusted code being introduced into the project. Since DebOps has essentially
> 'root' privileges in a production environment, not signing commits is not an
> option, especially that the code can be easily forked and hosted by
> third-parties (for example, https://github.com/rpmops/). We can't really do
> anything about this, but it's worth keeping in mind.
> Adding a new Ansible role to the existing infrastructure is an involved process
> in itself:
> - create new git repository on GitHub;
> - update list of known GitHub repositories on Travis-CI, enable testing of the
>  new repository;
> - create new test in the 'test-suite' repository;
> - push the new role code to GitHub, check if tests on Travis-CI pass, fix any
>  issues that arise in the test, or in the role itself;
> - when the new role passess successfully, tag it to mark a new release;
> - add the documentation of the new role to the 'docs' repository (very slow due
>  to use of git submodules);
> - update the list of GitHub repositories in Ansible Galaxy, import the new role
>  and ensure that it's correctly named;
> - add the role and its playbook to the 'debops-playbooks' repository, which
>  officially enables the role in the DebOps project;
> After that, changes to existing DebOps roles are relatively easy to manage
> - after an upstream repository is forked, new commits are pushed there and
> a pull request is created on GitHub. When changes are accepted, the new changes
> are merged and signed locally.
> DebOps roles are versioned using git tags. This can be useful when you use
> a specific DebOps role in your own set of Ansible roles, but is mostly
> irrelevant for the 'debops-playbooks' repository which always pulls the latest
> commits in the 'master' branch of the DebOps roles, ignoring the tags.
> Currently role versioning matters most in the Travis-CI tests, which pull the
> required roles using Ansible Galaxy, which by default points to the latest
> tagged version of the role. Due to this, any changes that affect multiple roles
> (for example changes in 'debops.ferm' or recent changes to 'debops.postfix'
> need to be carefully coordinated, so that the main role is updated and tagged
> first, and then any roles that depend on it can be properly tested on Travis.
> The DebOps project currently does not have any concept of a "stable release".
> Individual roles are versioned, but the 'debops-playbooks' repository hasn't
> been tagged in a long time. The current project architecture doesn't point to
> any sane way to resolve this issue - there was an idea of creating separate git
> branches, each branch would contain the Ansible Galaxy 'requirements.yml' file
> with specific versions of each role, but that was quickly dropped due to being
> too much work to track everything manually (remember that each change would
> need to be GPG-signed).
> The current architecture of the project resulted in problems with packaging it
> for Debian:
>    https://github.com/debops/docs/issues/132
>    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=820367
>    https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=819816
> The documentation proved to be unsuitable for packkaging, therefore without any
> significant changes, I don't think that DebOps will be available in Debian
> Archive.
> The 'debops-contrib' GitHub organization hasn't been properly integrated into
> the project itself. The roles in this organization were meant to be moved to
> the main DebOps roganization after the current roles included in the project
> are updated to latest code standards. Unfortunately, this hasn't happended fast
> enough as I hoped, and I'm not sure when this will be picked up.
> It seems to me that the current development model results in greatly favoring
> the use of DebOps roles separately (they are properly versioned and taken care
> of, easily available through Ansible Galaxy), but this affects the usage of the
> project as a whole (no "stable" releases which makes the use of the project
> risky in production, any significant changes need to be carefully coordinated).
> Merging everything back together
> --------------------------------
> Can the different DebOps roles, playbooks, documentation and other software git
> repositories be merged into one repository? With a bit of changes, yes. The
> current Ansible code within DebOps essentially is one "repository" split into
> separate git repositories - you can see it by looking at contents of the
> '~/.local/share/debops/' directory after installation. Merging the roles back
> with the playbooks shouldn't be a problem, apart from saving the git commit
> history.
> After the merge, some things could be moved around to improve the directory
> structure. Documentation of the roles could be moved to one 'docs/' directory,
> and cleaned up to remove redundant documentation like LICENSE files, etc. This
> would improve documentation management and allow to use links between different
> role documentation without the need to use separate link index files. This
> should also resolve the Debian packaging issue.
> Having one repository with roles and playbooks could enable easy creation of
> stable releases based on branches. The Semantic Versioning could be utilized to
> keep a few stable branches, while development is done in the 'master' branch.
> The repository could have reserved directories for custom roles or playbooks,
> which would allow the users to maintain their own fork and merge any changes in
> upstream with relative ease.
> Adding new roles to the project would be as easy as currently changing the
> existing ones without the overhead of creating new repositories, etc. Any
> changes that span multiple DebOps roles could be tracked in one pull request
> instead of separate ones for each role, and could be coordinated together.
> The ownership of the different roles or code could be managed by the
> CODEOWNERS/MAINTAINERS file which specifies which users should review any
> changes.
> There would be no need to maintain a separate 'debops-contrib' GitHub
> organization - third party roles could be prepared and merged in forked git
> repositories, or easily maintained in upstream repositories that apply changes
> from the DebOps repository.
> The use of the Travis-CI for tests is a problem in this model. Running the
> whole playbook at once takes too long, and some roles are mutually exclusive
> (for example 'apache' and 'nginx'), so using just one test for the whole
> project is not feasible. Travis-CI has a "build matrix" feature which allows to
> create separate jobs, which could be used to create a set of tests for
> different parts of the code, however the limit of maximum 200 jobs, and good
> behaviour suggests that this shouldn't be used to test all of the roles
> separately, as it is done now. Perhaps the main repository could do just a few
> tests for major roles and playbooks (ownCloud, GitLab, Netbox, ie. any
> user-facing application) which should test a relatively large part of the
> project codebase, and testing of all the roles separately could be done on
> a new infrastructure based on GitLab-CI.
> Some roles still might be popular enough to warrant their availability
> separately through Ansible Galaxy. This could be done automatically by
> extracting the selected roles and publishing them in their own separate git
> repositories, signed by a bot or a human, depending on number of roles. Ansible
> configuration allows to use multiple role directories using a 'roles_path'
> configuration variable with $PATH-like syntax, therefore cloning the main
> DebOps repository and adding the role directory in the 'roles_path'
> configuration variable shouldn't be a big issue. Roles would still be designed
> to be self-contained, so this use case will stay valid.
> The DebOps scripts will need to be updated to support the new deployment model.
> This might be a good enough reason to finally rewrite them from stratch and
> update the user interface to support subcommands. In similar fashion, third
> party code that was created to support DebOps could be merged in the main
> repository as well - example roles, test suite, example playbooks. We can
> carefully design the final directory structure to support this.
> Final thoughts
> --------------
> This is just a proposal - I would like to hear the thoughts about this from
> other DebOps users before changing anything further. I think that proper way to
> do this would be first to review and update the DebOps Policy and Guidelines
> (https://github.com/debops/debops-policy/) to reflect the new state of the
> project. Then work can be done to merge all existing code into one repository
> to ensure that the commit history is preserved. After that, work can be done on
> redesigning the existing code, updating documentation, etc.
> Cheers,
> Maciej
> _______________________________________________
> debops-users mailing list
> debops-users at lists.debops.org
> https://lists.debops.org/mailman/listinfo/debops-users

More information about the debops-users mailing list