Mistaken dichotomies about dgit
In “Could the XZ backdoor have been detected with better Git and Debian packaging practices?”, Otto contrasts “git-buildpackage managed git repositories” with “dgit managed repositories”, saying that “the dgit managed repositories cannot incorporate the upstream git history and are thus less useful for auditing the full software supply-chain in git”.
Otto does qualify this earlier with “a package … that has not had the history recorded in dgit earlier”, but the last sentence of the section is a misleading oversimplification. It’s true for repositories that have been synthesized by dgit (which indeed was the focus of that section of Otto’s article), but it’s not true in general for repositories that are managed by dgit.
I suspect this was just slightly unclear writing, so I don’t want to nitpick here, but rather to take the opportunity to try to clear up some misconceptions around dgit that I’ve often heard at conferences and seen on mailing lists.
I’m not a dgit developer, although I’m a happy user of it and I’ve tried to help out in various design discussions over the years.
dgit and git-buildpackage sit at different layers
It seems very common for people to think of git-buildpackage and dgit as alternatives, as the example I quoted at the start of this article suggests. It’s really better to think of dgit as a separate and orthogonal layer.
You can use dgit together with tools such as git-buildpackage. In that
case, git-buildpackage handles the general shape of your git history, such
as helping you to import new upstream versions, and dgit handles gatewaying
between the archive and git. The advantages become evident when you start
using tag2upload, in which case you
can just use git debpush
to push a tag and the tag2upload service deals
with building the source package and uploading it to the archive for you.
This is true regardless of how you put your package’s git history together.
(There’s currently a wrinkle around pristine-tar support, so at the moment I personally tend to
use dgit push-source
for new upstream versions and git debpush
for
new Debian revisions, since I haven’t yet convinced myself that I see no
remaining value in pristine upstream tarballs.)
dgit supports complete history
If the maintainer has never used dgit, and so dgit clone
synthesizes a
repository based on the current contents of the Debian archive, then there’s
indeed no useful history there; in that situation it doesn’t go back and
import everything from the snapshot archive the way that gbp import-dscs
--debsnap
does.
However, if the maintainer uses dgit, then dgit’s view will include more history, and it’s absolutely possible for that to include complete upstream git history as well. Try this:
$ dgit clone man-db canonical suite name for unstable is sid fetching existing git history last upload to archive: specified git info (debian) downloading http://ftp.debian.org/debian//pool/main/m/man-db/man-db_2.13.1.orig.tar.xz... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 2060k 100 2060k 0 0 4643k 0 --:--:-- --:--:-- --:--:-- 4652k downloading http://ftp.debian.org/debian//pool/main/m/man-db/man-db_2.13.1.orig.tar.xz.asc... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 833 100 833 0 0 16322 0 --:--:-- --:--:-- --:--:-- 16660 HEAD is now at 167835b0 releasing package man-db version 2.13.1-1 dgit ok: ready for work in man-db $ git -C man-db log --graph --oneline | head * 167835b0 releasing package man-db version 2.13.1-1 * f7910493 New upstream release (2.13.1) |\ | * 3073b72e Import man-db_2.13.1.orig.tar.xz | |\ | | * 349ce503 Release man-db 2.13.1 | | * 0d6635c1 Update Russian manual page translation | | * cbf87caf Update Italian translation | | * fb5c5017 Update German manual page translation | | * dae2057b Update Brazilian Portuguese manual page translation
That package uses git-dpm, since I prefer the way it represents patches. But it works fine with git-buildpackage too:
$ dgit clone isort canonical suite name for unstable is sid fetching existing git history last upload to archive: specified git info (debian) downloading http://ftp.debian.org/debian//pool/main/i/isort/isort_7.0.0.orig.tar.gz... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 786k 100 786k 0 0 1772k 0 --:--:-- --:--:-- --:--:-- 1774k HEAD is now at f812aae releasing package isort version 7.0.0-1 dgit ok: ready for work in isort $ git -C isort log --graph --oneline | head * f812aae releasing package isort version 7.0.0-1 * efde62f Update upstream source from tag 'upstream/7.0.0' |\ | * 9694f3d New upstream version 7.0.0 * | 9cbfe0b releasing package isort version 6.1.0-1 * | 5423ffe Mark isort and python3-isort Multi-Arch: foreign * | 5eaf5bf Update upstream source from tag 'upstream/6.1.0' |\| | * edafbfc New upstream version 6.1.0 * | aedfd25 Merge branch 'debian/master' into fix992793
If you look closely you’ll see another difference here: the second only
includes one commit representing the new upstream release, and doesn’t have
complete upstream history. This doesn’t represent a difference between
git-dpm and git-buildpackage. Both tools can operate in both ways: for
example, git-dpm import-new-upstream --parent
and gbp import-orig
--upstream-vcs-tag
do broadly similar things, and something like gbp
import-dscs --debsnap --upstream-vcs-tag='%(version)s'
can be used to do a
bulk import provided that upstream’s tags are named consistently enough.
This is not generally the default because adding complete upstream history
requires extra setup: the maintainer has to add an extra git remote pointing
to upstream and select the correct tag when importing a new version, and
some upstreams forget to push git tags or don’t have the sort of consistency
you might want.
The Debian Python team’s policy says that “Complete upstream Git history should be avoided in the upstream branch”, which is why the isort history above looks the way it does. I don’t love this because I think the results are less useful, but I understand why it’s there: in a moderately large team maintaining thousands of packages, getting everyone to have the right git remotes set up would be a recipe for frustrating inconsistency.
However, in packages I maintain myself, I strongly value having complete upstream history in order to make it easier to debug problems, and I think it makes things a bit more transparent to auditors too, so I’m willing to go to a little extra work to make that happen. Doing that is completely compatible with using dgit.
Comments
With an account on the Fediverse or Mastodon, you can respond to this post. Since Mastodon is decentralized, you can use your existing account hosted by another Mastodon server or compatible platform if you don't have an account on this one. Known non-private replies are displayed below.
Learn how this is implemented here.