On Sep 05, Tomasz bla Fortuna wrote:
> This would probably be better answered by interested Debian
Developers,
> however I don't think that DebOps itself should strictly follow Debian release
> cycle.
The upstream version doesn't need to. But it's kind of an opportunity to
stabilize DebOps. As I understand it:
- You release some tagged upstream version (say 1.4.0) when most of the things works.
In general it should work with the current... Debian Testing.
- it gets included/updated in the testing Debian
- Testing Debian goes to "freeze" state - and from this moment onwards
you won't be able to change the debops within Debian except for fixing
bugs - so you have a stable branch 1.4, and can publish new 1.4.1, but
1.5 won't get into Debian stable at all.
- Upstream can be changed, released etc - it's not bound by the Debian
cycle, but won't get into Debian until new Debian Stable release -
hence you get a stable release - which has benefits.
Yes, that would be the plan. When everything is merged into one repository, it
will be easy to create a stable release branch (say, 'stable-1.4' which is
also tagged as 'v1.4.0' and any bugfixes are tagged with incremental bugfix
number but the 'stable-1.4' branch is moved to the next tag. I'm not sure yet
how many releases will be supported, I imagine that the release that ends up
in Debian will have to be supported at least during that release lifetime. It
could have a separate branch, say 'debian-stretch'.
> recent improvements to Postfix support). Would you want to stick
to the DebOps
> roles and playbooks that are included in Debian Buster, or would you rather
> use the more recent upstream version?
If my stable debian-reviewed debops still manages correctly my
debian-reviewed stable nginx version - I'm fine with it. Only when I
need some function that is not there I'll be chasing newer versions.
I guess that's a fair enough point. Currently DebOps is still in a bete state,
some of the roles haven't been touched in years. When everything is up to
date, having a DebOps release that works correctly with a specific Debian
release sounds reasonable.
Also - newer upstream version may, or may not work with frozen
Debian
nginx version (well - it should. But I'm not so certain about webapps
like owncloud).
There are definitely roles in DebOps that don't work well in a Debian Stable
release cycle, like 'debops.gitlab' which is frequently updated relative to
Debian. Perhaps when new scripts support multiple default playbooks having
a split into something akin to 'core' set of roles that's stable and
'contrib'
that's separate and more of a moving target will be a sensible idea. However,
I would see that as another monorepo, and not many git repositories with
single roles each.
> Down the line I think that there might be major DebOps releases
that stick to
> Ansible versions supported by specific Debian release - that's the major
> dependency chain of the project. DebOps currently tries to support current
> stable Ansible release, and I'm not adding code that relies on Ansible
> features that hadn't been released in a stable release yet.
Yeah, that exactly that. At least if DebOps is going to "stabilize".
I hope that something like stabilization occurs. So far, only certain Ansible
roles can be considered as "stable", not the entire project. I'm looking
forward to seeing how that pans out.
> There totally can be a tool that exports the code from the
monorepo to a set
> of tarballs which are then published on some HTTP server. But tarballs cannot
> be the "upstream" because they lack proper version control. The
"upstream"
> needs to be a proper source git repository.
I'm beginning to understand and feel you problem a bit better. Let me
offer you a more detailed example of how this can work with split repos
and Debian - and a path for stabilizing DebOps (which might not be what
you want anyway, but stick with me for a minute).
Let's forget about tarballs for a second. From the Debian project
perspective "upstream" is whatever you decide to publish. Most of the
time they are still working with the tarballs but what they really want
is:
- a single source of truth
- a version
(relevant afaik:
https://wiki.debian.org/PackagingWithGit)
If you manage to:
- Provide a single main versioning for the DebOps project (separate
from role/tools versioning). Connected to the single git repo.
If I understand this correctly, you propose to use the current design of
multiple Ansible role repositories + debops-playbooks repository, and add yet
another git repository separate from these that tracks everything by the SHA1
hashes. I suspect that this looks good from the packager's point of view, to
have one "dataset" the scripts can use to build the packages. But from
upstream's (my) point of view it just adds more work on top of what I'm
already doing, even if it's scripted. Having the yet another separate
repository that tracks everything just adds more moving parts to the project.
I want less moving parts.
- Store a verified info in the main repo about all the required parts
-
e.g. a yaml file with list of repositories, and the last verified SHA-1
checksums.
This is what 'ansible-galaxy' requirements.yml files are, in essence. In the
past I had an idea to create something like stable "releases" of
debops-playbooks repository, each release would have a Galaxy requirements.yml
file which specified Ansible role versions that release required. You can even
see some attempts at this in the 'debops-playbooks' repository. This failed
spectacularly because it just added more stuff to track, namely what Ansible
role versions worked with other role versions.
I'm sorry, but I like writing Ansible roles and design what they do, not
micromanage software versions. Remember that the number of DebOps roles almost
always grows, and almost never shrinks. Managing Ansible roles separately
without a dedicated team will become harder and more time-consuming as the
project grows.
- Sign this yaml file with hashes - instead of signing separate
repos.
Check out
https://github.com/rpmops/ organization. The team behind it copied
DebOps roles, and removed the stuff they didn't need. The commits were creted
by "root", whoever that is. What would happen if they just used my name and
e-mail address in their commits, since it's just a git config option? You are
talking below about using 23 not signed repositories and trusting them. Would
you trust the code from
https://github.com/rpmops/ seeing my name in the
commits? Or would you look for an actual GPG signature?
Sure, some kind of tool that can programatically look up the commit hash in
a .yml file and compare that to the checked out revision in a git repository
can handle this. But to me it just sounds like yet another package manager.
Haven't we done enough of these already? And when DebOps is packaged in Debian
Stable release, who will care what commits are in some signed .yml file? You
could just look at the code in the upstream git repository.
- During debops-update VERIFY the file before cloning repos. Clone
to
the specified SHA1 only. (This is a security improvement over current
approach I believe)
Wouldn't 'apt-get install debops' suffice? Why create yet another package
manager and a database that requires constant upkeep and maintenance...
- And finally: Provide a build script inside the main repo which:
- pulls verified repos up to the verified state,
How the 'verified state' is maintained? I don't think that can be automated,
since the necessary information about what role versions work with other role
versions isn't there. There is no true, automated dependency management in
Ansible right now. You need to handle version dependencies by hand.
- builds the docs,
- creates a tarball with version in it's name which is
reproductible (
https://reproducible-builds.org/ (a given state of
main repo will always produce THE SAME tarball - even if some
branches if other repos change).
This can still be done in a properly tagged monorepo.
So I got back to the tarballs. Tarballs can have a version control
(kind of) but it requires you to periodically build the project (from
validated by you sources) and then sign the resulting tarball. The
connection from the repo and it's SHA1 to the tarball is direct and
reproductible.
So, with this proposition the flow would look like:
hundreds of git repos -> one git repo -> hundreds of tarballs
I propose to curb that into:
one git repo -> hundreds of tarballs
- Shifts your signing issue from signing repos to maintaining a
signed
list of commits. So no easy solution. Procedure:
* Maintainers handle their repos themselves. You don't even need access.
* When they believe they are "ok" - they create a PR incrementing the hash in
main repo.
(so you observe the PR only on one repo)
* You review the PR by checking diff between versions and dropping or
accepting AND SIGNING the change.
So, the code wouldn't even be in a DebOps repository but some other maintained
by separate people and I would only add a SHA1 hash and some information that
points to that repository to the "database". The thing is, people change,
people get bored, people remove the repositories they are not interested in
maintaining anymore. I predict that this model would quickly lead to parts of
code used by 'debops' scripts (since it wouldn't be in the main DebOps
project
anymore) suddenly missing.
It's not a mistake that Debian requires source packages in their archive.
And don't forget that I still need to worry about my huge set of git
repositories with DebOps roles that I maintain. Your solution wouldn't help
with that, just add more on top of it.
- Validates the code better. In my current default debops setup I
have
23 NOT SIGNED repositories at all - and I run their code.
for i in $(ls); do pushd $i > /dev/null; git tag --verify $(git describe --abbrev=0
--tags 2>&1) 2>&1 | grep 'Good signature' > /dev/null || echo $i
error ; popd >/dev/null; done
That's because I started sining commits long after I started working on
DebOps. In fact, at the time I think that there were something like 70 roles
in separate git repositories already. I'm signing them when I get around to
fix something in a given role or refresh it. If only 23 roles are left, that's
a good sign.
- Allows you to fix previous stable versions:
- checkout the role at commit from which Debian version was created
- fix the bug and tag the fixing commit.
- checkout the main repo at release-commit and:
* increment the fixed role hash,
* sign, increment main version bugfix
- Reproductibly-build the tarball with with only required changes and
update Debian package.
With monorepo I can check out the stable release, fix the bug and tag the new
version in one checkout. The build of the Debian package and tarballs will
happen automatically after all tests are passed.
- In semantic versioning:
* You keep the major number if major numbers of all roles didn't change
(no incompatible API changes) - otherwise increment major, zero minor.
* You increment the bugfix version if the update fixes bugs without
including/changing features.
* You increment the minor version otherwise (function change, not
bugfix, but API stable).
Sure, DebOps uses semantic versioning, notice that everything is tagged with
v0.x.y version - still a beta. I want to start using vx.y.z at some point,
when everything is stabilized. A monorepo will definitely make using that
scheme easier.
Instead of the file signing you can probalby sign repo and use
submodules - but that's less explicit, more opaque and doesn't survive
creating a tarball.
I have experience with git submodules in the debops/docs repository. Let me
say this - Ansible project at some point had 3 repositories, main
'ansible/ansible' and 'ansible/ansible-modules-core'
+ 'ansible-modules-extras', used as submodules. Just 3. And they merged
everything back. DebOps has ~120 roles, that's ~120 submodules. This doesn't
scale.
> So it seems that SHA1 cannot be reliably used to ensure that
parent commits
> not signed by GPG are genuine. Combine this with the fact that at the moment
> DebOps is hosted and downloaded from GitHub. So yeah, I'm signing my commits
> and merges, manually. Of course this becomes a bottleneck with number of
> repositories, which switch to the monorepo will hopefully solve.
And there's even now a hole if I understood the process correctly.
debops-update doesn't validate your signatures at all - so attack on gh
repo might alter code running as an administrator if I'm not mistaken.
Yes. I'm more focused on Ansible roles than on the 'debops' scripts, for me
they are "just enough" but I imagine that for other users this might be
different. Having a proper scripts that validate that the role repositories
are correctly signed would be useful, but it's not enough if an itch for me
personally to work on it. Just different priorities, I guess. Moving to
a monorepo will require an update to the 'debops' scripts, so we might as well
write new ones from scratch and properly this time.
> With a monorepo layout, everyone has access to their own fork of
the DebOps
> repository and can create pull requests as they wish against other developers.
> Each DebOps Developer can have their own fork of the project and users can
> select which Developer they follow (compare that with Linux kernel where
> developers have their own git repositories on
git.kernel.org and users can
> freely choose which developer they follow:
With a described central main "debops" with list of SHA1 this is
possible too. There can be testing/stable branch - pulling different
versions of roles (and different sets of roles) - and the same there
can be more than one single repo. You would be able to create your own
mix of roles, validate as you see fit and sign.
Debops update would accept the file as long as the key would be trusted.
This is a solution from the "packager" side, not the maintainer. From
maintainers perspective, you still end up with hundreds of git repositories
that require attention, not to mention more work when you want to update
something in multiple repositories at once.
> Then, a set of scripts could extract the roles from the
debops/debops
> repository periodically, say on a new tagged release, and publish them on
> Ansible Galaxy.
Ok, so both approaches would require writting some additional code. ;)
Sure, but that's just a part of the problem, inevitable in both cases. I'm
trying to solve the other part. :-)
> The difference between the current and the new model is that
instead of
> hundreds of source repositories combined into "one" virtual repository we
have
> one source repository split into multiple ones. Much easier to handle by
> a human.
That's a bit what I was going at too. This human-project-lead - would
have to consider only one repo and PR to only one repo (except for
occasional git diff to verify if role maintainer is not including a
backdoor).
Yes, if that human-project-lead would only manage that repo, then fine. But
someone still needs to write the Ansible roles themselves.
Cheers,
Maciej