Hello everyone,
During today's online meeting via Jitsi, we talked a bit about how DebOps is
currently developed, how this might change in the future and what are possible
scenarios for the project. Two models were discussed, one where there's much
more maintainers and contributors involved in the development with much larger
scope, and a second one where project stays tightly focused on a small set of
features with fewer contributors. I think that this warrants further
discussion on the mailing list, since many people are using DebOps in
production environments and I imagine that they wouldn't want to see suprises
there.
Going a bit backwards, current release model and schedule is described
here[1], and if you haven't seen that yet, the development worklow which I'm
using at the moment is described in the documentation as well[2]. What is left
out (I think) is how the actual code is developed and where does it come from.
I usually include a report of the number of contributions with each new
release, and you can see a monthly progress on GitHub insights page[3]. At
present the project is developed mostly by me and a few other people active in
any given month, and there's usually a long tail of one or two changes per
person. The changes get into the project via GitHub pull requests, and after
a review usually by me, I pull them to my local repository and merge to the
'master' branch when they are accepted. At the current size of the project it
seems that this is enough.
Is that a good governance model for DebOps? I suppose, it seems that it works
out well so far. In this model I get the last word about what gets in and what
doesn't, but on the other hand there's a backlog of proposed changes that need
to be reviewed and accepted which takes time, so it's currently slow. DebOps
is at the moment developed by volunteers in their free time - some of that
time is of course during their normal work hours if DebOps is used at work.
With more and more contributors providing patches that will probably have to
change, otherwise the pace of development will grind to a halt.
There are two solutions we discussed a bit during the meeting. When the
current codebase and its scope is too large, we could essentially split the
project into smaller subprojects with each one being maintained by a person or
a group of people who then try and combine everything into a whole. This is
a model used by OpenStack and can probably be used effectively when each
subproject is backed by a company and has dedicated team behind it. On the
other hand we had a similar situation in the past, where each role was
developed in a separate 'git' repository which increased the maintenance
burden when changes had to be synchronized everywhere. That's why switch to
the monorepo happened.
Another way is to use the model used by the Linux kernel developers, where
there are essentially "layers" of contributors that create changes and push
them to maintainers of subsystems which after review push them to Linus to
merge in his official kernel. If you're interested, the whole process was
nicely explained by Greg Kroah-Hartman in his 2016 presentation about the
topic[4].
Both models assume that a large number of people participate in the project,
so at the moment we don't have to worry about which path to take. But I think
that the model used by the kernel developers fits better for a project such as
DebOps, which is developed in the open, shared space by a dedicated community.
Use of 'git' as the version control system also gives us an advantage here
- since the project is maintained in a monorepo, we can leverage the mechanism
of codeowners[5] to give interested people "ownership' of parts of the code.
In such case they would take over reviews and maintenance of selected parts of
the codebase. That file is currently present in the DebOps monorepo[6] but is
not really used for anything - I'm keeping track on all changes and nobody so
far volunteered to help with reviews of a specific role or part of the
codebase, at least since that mechanism was added. If you want to be included
to easily keep track of parts of the repository that interest you, let me
know. In such case I'll wait with merging a given pull request for your
review.
Another closely related topic are the git commits themselves. At the moment,
with relatively small number of contributions, I accept the git commits as-is,
with close examination for any security issues. But if number of contributors
increases ten-fold, this will become an attack vector - remember that DebOps
code is executed with root privileges on a large and distributed
infrastructure - very interesting target indeed... In such case, the commits
will have to be cleaner to meet approval.
At the moment in our contributor workflow[7] we mandate use of GPG signatures
in git commits and advocate for clean and concise git commit messages[8], but
that's rarely enfoorced. The output of 'git log -p --no-merges' leaves a lot
to be desired in the context of easily readable and clean commits, and I'm
guilty of it as well - partially because I'm trying to record changes in the
official Changelog file and I don't want to simply repeat the note from the
Changelog in the commit message. So now I wonder, if a better focus on cleaner
git commits in lieu of less verbose Changelog file would work better in the
long run? How many of you rely on the Changelog to keep track of important
changes (the upgrade notes are of course another matter and should stay as
they are)?
As for refactoring badly done commits, this can be done but refactoring
overrides the GPG signatures... So maybe we should only care about them on the
"intake" instead, for example on the GitHub pull request page, and when the
code is pulled and being prepared for merge to the 'master' branch, I could
freely refactor and fix up the commits when necessary? The original GPG
signatures will be goned by this point, but authorship of the code will
remain. In such case we coould also stop using non-ff merges which create
a separate signed merge commit to mark acceptance of code, and the resulting
commit history would look a bit better without the need for '--no-merges'.
Or alternatively, should I point out all the issues in the pull request before
the pull, then the original authors could rework the patches, rebase where
necessary and provide a set of clean commits? To be honest that seems a bit
imposing to me, but if more and more people are relying on my judgement about
what gets into the codebase, perhaps my standards need to be higher?
Let me know what you think,
Maciej
[1]:
https://docs.debops.org/en/master/news/releases.html
[2]:
https://docs.debops.org/en/master/developer-guide/development-model.html
[3]:
https://github.com/debops/debops/pulse/monthly
[4]:
https://www.youtube.com/watch?v=vyenmLqJQjs
[5]:
https://docs.github.com/en/github/creating-cloning-and-archiving-reposito...
[6]:
https://github.com/debops/debops/blob/master/CODEOWNERS
[7]:
https://docs.debops.org/en/master/developer-guide/contribution-workflow.html
[8]:
https://chris.beams.io/posts/git-commit/