Re: [debops-users] RFC: Merge all DebOps git repositories into one + How to contribute to DebOps

Thursday, 7 September 2017

Salut

Nice!

Once re-worked/frozen/approved maybe worth to be add to docs.debops.org
for newbies in a e.g. "How to contribute to DebOps?

Just my 2 cents...

cheeers, h.

Am 07.09.2017 um 14:40 schrieb Maciej Delmanowski:
...
 On Sep 07, Tomasz bla Fortuna wrote:
> - Doesn't increase complexity - unless I don't understand the current
>   workflow well enough. If it does - I don't see how. I'm trying to
>   explain below.

 I'll try explain the current workflows I use while working on DebOps, both
 when creating new roles and when updating existing ones.

 New role
 --------

 When I create a new role from scratch, I start working on it in its new
 directory, usually with some stuff like docs/, .travis.yml, etc. copied from
 another role. I guess this could be better done with something like
 'ansible-galaxy init', but I don't want to have any files/directories that
 I don't use. During development I'm starting to do git commits in about 2-3rd
 of the role being ready for a release, when it's general "shape" and
 functionality is clear enough. Each commit is signed.

 When a new role is ready, I go to GitHub and create a new repository for the
 role. Add a Travis-CI webhook, at the same time I go to travis-ci.org and
 update the list of repositories and enable the new ones. In the 'test-suite'
 repository, I create new subdirectory for the new role and create tests for
 it, roughly checking that during the test Ansible set everything up correctly.
 I push the 'test-suite' changes to GitHub.

 After that I push the new role to GitHub. The initial push is tested by
 Travis-CI. If everything is OK I tag the role with v0.1.0 release. If there
 are issues on Travis-CI, I solve them first before creating a new release.
 Sometimes issues are in the test environment itself (say, a wrong version of
 a package, or Ubuntu needs different role variables than the defaults). When
 the tests pass, I tag the new role.

 Currently the role releases are important for DebOps only on Travis-CI - the
 tests download the dependent roles using ansible-galaxy command, which picks
 the latest release of a given role. If there are modifications in a dependent
 role (say, debops.nginx) that are required by other roles, the dependent role
 needs to be tagged before tests on Travis-CI can pick the new changes up.

 After the new role is released, I go to Ansible Galaxy webpage, update the
 list of repositories and import the new role. After that it's availabla via
 'ansible-galaxy' command.

 In my fork of the 'debops/docs' repository, I update all of the submodules to
 the current state (this repository is maintained by a bot that updates the
 documentation from all the roles) and add a new submodule with the new role.
 I push the changes to GitHub and if there are no issues detected by Travis-CI,
 I merge them with the upstream 'debops/docs' repository. This usually takes
 some time since the build needs to check out all of the submodules.

 When the documentation is up, in my fork of the 'debops/debops-playbooks'
 repository I add the playbook for the new role, with any dependencies it uses.
 The role is added to the 'galaxy/requirements.*' files via a script, these
 files are used by the 'debops-update' command to download or update all of the
 roles in the project. I push the changes to GitHub and if there are no issues
 on Travis-CI, I merge them to 'debops/debops-playbooks' repository. After this
 the new role will be noticed by the 'debops-update' script and installed.

 Update of an existing role
 --------------------------

 I start by forking a 'debops/ansible-*' role to my user account on GitHub and
 cloning it to my development environment. I hook up the upstream repository as
 a 'git remote' for updates. I create a new git branch and work on an update.

 When the update is ready, I push the commits in the new branch to my fork
 (origin) on GitHub and create a new pull request against the upstream
 repository. All pull requests are tested on Travis-CI. I fix any issues with
 the update in my fork. When the tests pass, I go to the cloned upstream
 repository on my workstation, fetch the new PR and merge the changes - this
 way everything is signed by my GPG key. The buttons on GitHub are not used for
 merging. Merged changes are pushed to the repository on GitHub and if ready,
 a new release is tagged. I go to the Ansible Galaxy webpage and make sure that
 the new version is imported correctly.

 Similar process is done for any third-party pull requests; when the tests pass
 on Travis-CI, I fetch the changes and merge them manually so that the merge
 commit is signed. After that, 'debops-update' can pull new changes in roles
 automatically.

 The 'debops-playbooks' repository needs to be updated only if the changes
 modified the role playbook (for example, new dependent roles were added). The
 bot will update the 'debops/docs' repository automatically via git submodules.

> - It doesn't simplify things *as much* as monorepo does, but this can
>   be traded  depending how much packages needs to live on Galaxy.
> - Has a some advantages over monorepo:
>   * Roles working separately without dependencies on the rest of debops
>     can be developed separately without synthetic solutions which might
>     cause problems.
>
>   * It's generally easier to combine software together, then to split
>     software. From my experience (with working with bad developers
>     probably! :D) without a clear boundaries the dependencies between
>     modules tend to grow - like entropy.

 I've seen somewhere a comparsion to microservices. As I understand them,
 microservices are developed completely separately from each other and APIs are
 used to connect them together. DebOps could probably be considered a hybrid
 case of this. To understand this you need to know how role dependencies work
 together in DebOps.

 The DebOps roles and playbooks are designed to be read-only, that is you are
 not supposed to change them yourself, so that updates via 'git pull' can work
 as expected. To affect changes to roles to suit them to your environment, you
 can use Ansible inventory variables. All of the variables that the role
 exposes are loaded from the 'defaults/main.yml' file. This puts them at the
 "bottom" of the Ansible variable merge hierarchy, which means that any
 variables defined in Ansible inventory, or Ansible playbooks as dependent role
 variables, mask and override the ones from role defaults.

 To further allow modifications, variables that define the configuration of
 a dependent role are not stored directly in the playbook located in
 'debops-playbooks'. Instead, they are also stored as variables in
 'defaults/main.yml' of the role that uses the dependent role. Let's take an
 example to better illustrate it:

 The 'debops.mailman' role wants, among other things, to configure some options
 in the Postfix '/etc/postfix/main.cf' configuration file. But there's also
 a 'debops.postfix' role that manages the same file. If both of them are
 executed, the changes done by the first one will be overwritten by the second
 one. To combat this, the playbook of the 'debops.mailman' role has the
 'debops.postfix' role as a dependency. This playbook entry defines
 'postfix__dependent_maincf' variable with contents of the variable defined in
 the 'debops.mailman' role, 'defaults/main.yml' file. Ansible
transparently
 uses this chained variable and the 'debops.postfix' role can configure the
 options specified by the 'debops.mailman' role on its behalf. The
 Postfix configuration is stored in the 'debops.mailman' default variables so
 that if the user wants to modify something, it can be done through Ansible
 inventory variables. If you don't want to use Postfix with Mailman because you
 prefer Exim, you can just remove the 'debops.postfix' entry from the
 'debops.mailman' playbook (presumably your own copy).

 This makes maintenance of the "leaf" roles that are not used as dependencies
 very easy. However as recent Postfix update showed, the use of dependent role
 variables in other role defaults, in combination with DebOps tests on
 Travis-CI using 'ansible-galaxy' to download roles to preserve some kind of
 working order, is problematic when you want to do significant changes. When
 Postfix role rewrite happened, I needed to update all of the roles that used
 the 'debops.postfix' role as dependency (that is, 'debops.mailman',
 'debops.dovecot', 'debops.smstools'), wait for the
'debops.postfix' changes to
 be merged in the upstream repository, merge the changes to the other roles,
 and update the 'debops-playbooks' repository to use the new playbooks.

 To summarize, DebOps roles are developed kind of like microservices since they
 can be used separately, but when more roles are involved so that the DebOps
 project as a while works as expected, they suddenly need to be developed
 together in a kind of lock-step fashion; otherwise they become desynchronized.
 As I understand it, this defeats the microservices methodology. In effect,
 DebOps roles are easy to work with an update separately, but larger changes
 involve more and more git repositories. This might not look that daunting from
 the user side - when all the changes are merged, one 'debops-update' run
 updates all of the involved git repositories in one run, but the development
 process becomes more and more involved since more and more roles in the
 project are using various dependencies.

>   * Can in a natural way combine build-in/core roles with "external"
>     roles while keeping both secure. This is less useful if all roles
>     must be always available during the run. I believe that for
>     packaging roles for Debian it would be better to allow some roles to be
>     removed during the "build process".

 Since more and more roles use role dependencies, this makes the number of
 roles that are not needed by default to go down. I haven't looked, but perhaps
 1/4th of the currnent roles could be not installed and the project would
 probably work fine without them. But I don't think that this number will
 increase, rather the role dependencies will be used more and more. DebOps is
 very tightly integrated at this point.

>   * Monorepo/tree works for Linux certainly but they don't care if some
>     stuff is used outside of Linux.

 The integration with external roles is currently done in the DebOps project
 directory. Custom roles and playbooks can be placed there and used alongside
 main DebOps playbooks. Switch to a monorepo wouldn't affect that mechanism.

>   * I wonder if exporting Galaxy roles won't cause problems like:
>     someone wants to fix a role, but he is using Galaxy. You will get a
>     pull-request (and forks) to the exported "read-only" GIT repo. The
>     valid way would be to fix the main repo, test it and export the
>     role back.

 Yes, that would be the case with a monorepo. Another problem is that for the
 role to work correctly on Ansible Galaxy, files from the
 'ansible/roles/<role-name>/' subdirectory in the monorepo would need to be
 moved to the root of the repository. The exported roles would also have
 a bilerplate README files that point to the monorepo, etc., so more commits
 that diverge the exported role from the monorepo. Since roles are meant to be
 read-only, the scripted flow of data from the monorepo to the separate role
 repositories would be one-way only.

 This of course hurts the contribution process for a single role, but might
 make it easier for a number of roles that depend on each other. This will
 affect the people that use DebOps roles separately more than the users of the
 whole project, ie. monorepo. The former would need to learn where to change
 the correct files in the monorepo and post the pull requests to the correct
 repository on GitHub.

 The question is, which user group is larger, and which one is more interested
 in contributions to the DebOps codebase.

 This can also be tied to the number of role maintainers - according to the
 data on the https://debops.org/status.html, there are two maintainers:

 - drybjed - maintains 74 roles
 - ypid - maintains 7 roles

 There are also 46 "unknown" roles which are not updated yet to be recognizable
 by the DebOps API. The maintainer of these roles will most likely be drybjed,
 unless somebody else interested steps in, so that is number of roles to
 maintain will be 120. That means that at the moment majority of the project
 upkeep relies on one maintainer. This leads to burnout.

 I'm not really sure how popular DebOps is. Because the project at least in my
 view is still in a beta state, I don't want to advertise it just yet because
 lots of things will still be changed. DebOps is focused on production
 environments, and implementing changes in production is tricky - most users
 that started using DebOps in production stopped updating the roles and
 playbooks because they feared that the implementation of the changes will
 break their production environments. Presumably having a way to reliably
 freeze the entire codebase in a particular state would help with that, but
 it's not realy feasible with the current mechanisms the project uses to keep
 the roles updated. Hopefully the change to monorepo would make the project
 easier to maintain in the long run, and after such a drastic change it would
 be ready to be more popularized, which would mean more potential developers.

> With two options (both easier and better then the current state - I
> believe) you can make a decision. Tell me if you don't see any
> advantages here though and I'll cease my blabbering. ;)

 No no, keep it coming. I feel that the current state of DebOps development
 needs to change, and I would like to select a good course of action. From my
 perspective going with a monorepo seems to be a good choice, but other people
 might not see the whole picture, so I'm trying to explain it as best I can. If
 more people agree that going with a monorepo is a good choice, I will be sue
 that it's a correct one.

> I'll answer some misunderstandings in detail below, but in short:
>
> You would end up with following project roles:
> - DebOps maintainer (signs, verifies, updates verified commit list in a main repo)
> - Core role developer (manages roles stored in the main repo)
> - Role developer (manages a role held in a separate repo, has contact
>   with maintainer is responsible to keep the role compatible with current
>   reviewed debops version).

 With the explanation above I hope that you realize that currently most of
 these are done by one and the same person. ;-)

> In the perfect world you would share debops maintainer + core role
> developer roles and work with a single repo only. I believe you'd also have
> to maintain some external roles (but I lack knowledge how many. And
> hence I cannot predict the feasibility of this approach).
>
> If in this split you end up managing over 10 repos (after merging core
> roles into main repo) then it doesn't solve your problem and you have
> no alternative to monorepo really.

 I believe that's the case. Number of roles grows, so even if somehow we select
 a set of core roles in one repository with some in external repositories,
 after a while number of repositories will grow again and we're back to the
 same problem.

>> If I understand this correctly, you propose to use the current design of
>> multiple Ansible role repositories + debops-playbooks repository, and add yet
>> another git repository separate from these that tracks everything by the SHA1
>> hashes.
> I'd use existing repository - debops playbooks. Just add a signed
> information about "stable/checked" commit SHA-1 of all "slave"
> repositories - in lieu of signing commits.
>
> Certainly not a "new" repo - I'm trying to limit parts you have to
> watch too.
>
> Also - any roles considered "core" could be still migrated to this repo
> to further limit moving parts:
>
> - Required, "core" roles managed by the core team can be held in
> the same repo. Which reduces the number of things to track for the
> maintainer.
> - Still other roles:
>   * less stable,
>   * the ones which can work by themselves, used on galaxy
>   * maintained by other people with pull-requests (PR)
>   can be held separately and still be used to securely "build" a debops
>   distribution.
>
> I regret that I don't know currently how many roles classify to each
> group. That can lower the feasibility of this solution. ;)
>
> In monorepo case having to export some of the roles to external
> repositories to make them usable in Galaxy increases the moving parts a
> bit too and is a bit weird. Certainly doable though.

 Some of these were explained by me earlier, and some points could be already
 invalidated by my explanation on how the DebOps development process works. Let
 me know what you think about these after reading my explanation above.

>> This failed
>> spectacularly because it just added more stuff to track, namely what Ansible
>> role versions worked with other role versions.
> I'm at most trying to substitute things you're tracking now (commits in
> multiple repos which needs to be signed with pull-requests etc.) with
> other things (list of SHA-s of accepted commits).
>
> I'm not trying to change the way the code works now. That is
> "everything works together" is imho ok - without versions between roles
> and dependencies.

 OK, but switch from one set of commits to another set of commits with
 significantly changing the number of signatures that need to be made by
 a human (presumably with most of the roles staying in separate repos) doesn't
 really change anything for that human. Any changes in the role repository
 would require a corresponding change on the list of know SHAs and therefore
 a commit as well. In some cases where in existing setup changes in the role
 does not require any changes in the 'debops-playbooks' repository, after your
 implementation number of required signed commits to be made would most likely
 increase as well.

>> I'm sorry, but I like writing Ansible roles and design what they do, not
>> micromanage software versions. Remember that the number of DebOps roles almost
>> always grows, and almost never shrinks. Managing Ansible roles separately
>> without a dedicated team will become harder and more time-consuming as the
>> project grows.
> I understand. Assuming you currently go over any new code in each repo
> and tag+sign it - I'm not proposing anything new. Just leaving out:
> - Separate signing of each repo (sign one file after a review instead)
>
> - doing pull-request to repos you don't manage.
> And make the maintainers of other repos inform you that there's
> something worth looking at by doing a Pull-request to your repo (which
> is the only one a maintainer would have to track.)
>
> This of course splits code a bit into "reviewed/stable" part and
> "not-yet-reviewed" part - but I understand that's the case currently
> too?

 This would be great if most of the DebOps roles were maintained by other
 people. If these are maintained by me, signing a file in separate repository
 kind of misses the point.

>> How the 'verified state' is maintained?
> Currently you sign commits in repos (but then debops-update uses master
> anyway) if I understand correctly. I just suggest signing a file
> pointing to repos instead. Without adding any more versioning and
> dependencies.

 That changes only where the signed commits will go - instead of each role
 repository they would go to the global repository. I suppose that these are
 equivalent, since it amounts to creating a detached signature instead of an
 inline signature.

>> I don't think that can be automated,
>> since the necessary information about what role versions work with other role
>> versions isn't there.
> For a given commit of master debops-playbook repos there would always
> be listed commits in other repos which were reviewed and used at that
> time - pretty much exactly like with monorepo, but allowing them to
> reside in seperate git repos.

 It's easier to move one git repository between hosting providers than 120+ git
 repositories. Similarly it's easier to fork one git repository on GitHub than
 fork 20 git repositories because you need to make a related change to many of
 them to keep the whole thing synchronized. I know that this can be scripted
 and there are tools like mr to handle multiple repositories efficiently, but
 wouldn't it be better if the need for these wouldn't arise due to how the
 project itself is structured?

>> I propose to curb that into:
>> one git repo -> hundreds of tarballs
> I don't really understand why multiple tarballs though. Single tarball
> is fine, single debian source package is fine - this one can generate
> few debian packages (debops, debops-docs at least).

 Ansible Galaxy uses GitHub as storage, so to publish roles they would need to
 be in a separate GitHub repository. This basically equates to "hunders of
 tarballs".

>> The thing is, people change,
>> people get bored, people remove the repositories they are not interested in
>> maintaining anymore. I predict that this model would quickly lead to parts of
>> code used by 'debops' scripts (since it wouldn't be in the main
DebOps project
>> anymore) suddenly missing.
> True. And you always need to make a decision then:
> - Do I have the time and resources to maintain this role myself? Do I
>   keep it in even if the code will deteriorate in time?

 Most of the DebOps roles were written by me, just a few of them were started
 by other people. Hopefully more through, automated testing of roles will make
 finding out bitrot easier and help with maintenance.

>> And don't forget that I still need to worry about my huge set of git
>> repositories with DebOps roles that I maintain. Your solution wouldn't help
>> with that, just add more on top of it.
> With roles that doesn't need to go on Galaxy you could include them in
> the main repo. Those working in Galaxy need a separate repo and
> additional maintenance also in monorepo case.

 With monorepo the export would be automated and one way only, development
 would be focused on monorepo. I imagine that with use of the API, the scripts
 could even handle creation of new GitHub repositories and registration in
 Ansible Galaxy.

>> (snip, snip) everything back. DebOps has ~120 roles, that's ~120
>> submodules. This doesn't scale.
> How many roles could be considered core and how many should work
> separately on Galaxy? Is it few 5-20 roles for galaxy or rather most of
> 120 roles?

 You can check the download stats using this link:

https://galaxy.ansible.com/list#/roles?page=1&page_size=20&users=...

 It looks like ~22 roles have >= 1000 downloads, roughly ~30 roles have fewer
 than 100 downloads. Remember that DebOps roles weren't cherrypicked to be
 published on Galaxy and these results came over time.

 Since Ansible Galaxy is currently tied to testing on Travis-CI, all project
 roles need to be available on Galaxy. When project switches to an internal
 testing infrastructure, that requirement will be dropped. Since the process to
 publish DebOps roles on Galaxy will be automated, I don't see a reason not to
 publish all of them.

>> Yes. I'm more focused on Ansible roles than on the 'debops' scripts,
for me
>> they are "just enough" but I imagine that for other users this might
be
>> different. Having a proper scripts that validate that the role repositories
>> are correctly signed would be useful, but it's not enough if an itch for me
>> personally to work on it.
> I'm a Python developer so if you believe this might be useful I can
> implement it in an agreed fashion.

 That would be excellent. :-) I would wait a bit though. One, not all git
 repositories are signed, and two, I'm not sure yet how the change in
 development will affect the deployment of roles and playbooks in
 ~/.local/share/debops/.

 Cheers,
 Maciej

 _______________________________________________
 debops-users mailing list
 debops-users(a)lists.debops.org
 https://lists.debops.org/mailman/listinfo/debops-users

-- 

Andreas Bürki

abuerki(a)anidor.com
S/MIME certificate - SHA-256 fingerprint:
8A:1A:C2:93:10:4B:CE:91:2C:80:79:44:24:1D:38:CA:EE:0E:89:C9:A5:A4:A0:03:FF:5A:FB:D1:15:18:B5:45
GnuPG - GPG fingerprint:
5DA7 5F48 25BD D2D7 E488 05DF 5A99 A321 7E42 0227

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

Re: [debops-users] RFC: Merge all DebOps git repositories into one + How to contribute to DebOps