==== overview slide


Hi.  I'm here to plug dgit, which is a system for treating the Debian
archive like a git remote.

I'm going to talk for about 35 minutess and then I'll take questions.


When we work on Debian we take on different roles.  The biggest
difference is between the maintainer (or maintainers) of a package,
working on their own package, and everyone else.

I'm going to start by presenting dgit from the point of view of
everyone else: NMUers, sponsorship, bug squashers, downstreams,
users, teams doing cross-archive work
like transitions and reproducible builds, and so on.  Maintainers,
please be patient - I'll get to you later


==== manpage slide

The point of dgit is that it lets everyone treat the archive as if it
were a git repository.

You can dgit clone any package, in any suite (so, for example, sid or
experimental) and you will get a git tree which is exactly the same as
dpkg-source -x.

So dgit always works the same, from the non-maintainer's point of
view, on any package: the operation is completely uniform.  You don't
need to know anything about the maintainer's verson control workflow
tools or source format preferences.

You can then work on the package in git, the way you would work in git
with any other project.  In particular, you can:

 * commit locally
 * cherry pick changes from other branches
 * git reset, git clean
 * git rebase -i to polish a more complex set of changes into
   a patch queue
 * all the other usual gitish stuff

If you have the right authority you can also dgit push, to upload.
That is, a DD can dgit push any package; a DM can dgit push the
packages that the archive thinks they can upload.

Before you push you still have to do a build.  dgit does not replace
existing ways of building source and binary packages, although it does
provide some helpful build rune wrappers (more about that later).


If you don't want to, or can't, upload to Debian, but do want so share
your work with other people, you can git push your dgit
branch anywhere suitable, so they can fetch it.  So, for example, you
could share your branch with your sponsor, who could then approve it
by running dgit push.  dgit's branches are ordinary git branches for
this purpose.

A downstream such as a derivative or partial derivative of Debian can
use the dgit branches directly as the upstream for a git-based setup,
and work entirley without source packages.

==== data flow slide

Behind the scenes, dgit works by providing a set of git repositories
in parallel to the existing archive.

Every dgit push is actually two underlying operations: the first is a
git tag and push, to the dgit git server.  The second is a
conventional package upload - with a wrinkle: the .dsc of an upload
done with dgit contains an extra field with the git commit hash that
was pushed.

Likewise, fetch and clone combine information from the archive and the
git history.  If the most recent upload was done with dgit, the commit
hash in the dsc enables dgit fetch to find the right commit and
present it to you.  If the most recent upload was not done with dgit,
dgit imports the source package into git - and stitches it into the
existing dgit history, if there is one.

You do need to treat dgit git branches a bit specially is if you need
to build source pacakges (for upload, for example).  In this case dgit
needs the .orig tarballs for your source package.  If you are not
doing a new upstream version, dgit fetch will get the relevant .origs
for you.  If you /are/ doing a new upstream version, then presumably
you have obtained them as part of preparing your package, or you can
build them easily.


==== NMU linear history slide

As a general rule, the dgit history structure should be up to the
maintainer - at least, if they care.

If you are doing a straightforward NMU you should produce a
well-structured linear sequence of commits, as you would for any other
git upstream.  Not only does this mean that if the maintainer is using
dgit, they can hopefully easily include your changes; it also means
that if they _aren't_ using dgit, at least you have published a
history which is suitable for rebasing onto theirs, or whatever.

If the source package is `3.0 (quilt)', you shouldn't touch
debian/patches; dgit will take care of that for you.  This is the
other reason why you should provide a tidy linear patch series: if the
maintainer likes quilt and is not using dgit, your changes will be
automatically presented to them in a fairly plausible format like they
should expect from any other NMU.

An ordinary NMUer should not normally update a patch in the quilt
stack directly.  Ie, an NMUer shouldn't squash their changes into an
existing patch.  This is because while it's easy for the maintainer to
squash it themselves, if they want, it's a little harder for the
maintainer to disentangle a squashed patch.  This can also result in
people having to read interdiffs, which are notoriously confusing.


==== NMU linear history on top of basic dgit history

Sadly, unless the maintainer uses dgit, the history you see in dgit
will not be the maintainer's history.

This is because maintainers' git branches often differ from the source
packages in the archive.

If you dgit clone a package and it has an X-Vcs-Git header, dgit will
set up a remote for it, so you can fetch the history and use it if you
like.  So in that sense dgit clone encompasses debcheckout.

But, in the general case, the X-Vcs-Git tree may not be immediately
useable to someone not familiar with the package.

The maintainer's repo might contain only a debian/ directory, or be a
quilty tree without patches applied.  And the tag and suite naming
conventions can vary too.  So while the maintainer's history can be
useful if you want to do archeaology, it's not in general suitable for
use by dgit.

There is also the problem that the maintainer's nominated git server
might be anywhere, so it might be down, or gone away, or compromised.


So, if the maintainer is not using dgit, dgit has to synthesise a git
history.  The history you see in dgit will then have a very basic
branch and commit structure, rather than representing the package's
actual history.


Which brings me onto the other side of this talk: dgit for
maintainers:

==== history comparison slide

For the reasons I've explained, downstream dgit users would like you
as a maintainer to use dgit push to do your uploads.  They will then
be able to see, and directly work with, your own history.

In general, the point of using a dvcs like git is to publish your
work.  The existing ways of publishing git histories for Debian
packagess aren't uniformly useable for users: they require the user to
understand the maintainer's git working practices.

What dgit does is provide a way for you to publish a history which
users can rely on actually corresponding to the archive, and use
immediately without special knowledge.

But it's in your own selfish interest to upload with dgit, too:

If you use dgit, you will be able to directly merge NMUs, patches
proposed via pull-request emails, and so on: Because, in this case,
the dgit-using contributor will have based their work on your own
history.  Whereas, if you don't, dgit-using contributors will be
working on a stub history, and may dgit push commits based on that
stub.  You can dgit fetch it even if you're not using dgit for your
uploads, but when you do at the very least you'll have to rebase the
NMUer's work.

Another advantage of using dgit for your maintainer uploads is that it
will put your own history on browse.dgit.debian.org, rather than
advertising dgit's stub history (which can also be out of date).

If you use dgit push, you get an extra check that the source package
you are uploading is exactly the same as your git HEAD.  This can
save you some dsc-based checks.

And, of course, as I say, doing your uploads with dgit will improve
downstream dgit users' lives.


==== data flow slide

dgit is not a replacement for existing git packaging tools; it's
intended to complement them.  So (unlike git-dpm) dgit does not define
a git history structure.

Nor does dgit define a branch structure distinguishing upstream or
downstream branches, pristine tar branches, etc.

dgit doesn't require a particular source format; it couldn't, since it
needs to work with any package.


==== data flow slide with EQUAL and FF

dgit push imposes only two requirements on your git trees, which stem
directly from dgit's objectives.

The most important requirement is that your git tree is identical to
the unpacked source package.  (Technically, in the case of a `3.0
(quilt)' package, it is what is sometimes called a `patches-applied
packaging branch without .pc directory', which means that the upstream
source files in the main package tree correspond to the actual source
code that will be used when the package is built, rather than to the
upstream versions.)

For all native packages, and for users of git-dpm and raw git, this is
already the interchange format.  These maintainers can start using
dgit right away.  Please do!

For those using git-buildpackage with `3.0 (quilt)', things are a bit
more complicated.  I'm told that gbp pq can be used to generate a
patches-applied branch, and that some users prefer to use that as the
interchange git branch, but I know this is far from universal.  I'm
talking to the git-buildpackage maintainers about gbp integration, so
watch this space.

The other requirement of dgit is simply that the dgit branches are
fast-forwarding.  So if your tools have made a rebasing branch, you
may need to make a fake merge (with git merge -s ours) before pushing.
I'm intending to provide some rather more cooked way to do this but I
haven't decided the exact shape yet.


==== data flow slide

There are a few other things I ought to cover, since they often come
up.  They're are relevant to maintainers and non-maintainers:


Firstly, some wrinkles.

The first wrinkle is that DMs currently need to email me a signed copy
of their ssh key, in order to be able to push.  This is because the
dgit repo server uses ssh as a transport and the project doesn't,
right now, have a record of DMs' ssh keys.


The second thing that's less than ideal is that the dgit git history
does not generally include the package upload history.
git-import-dscs can produce a git branch representing the upload
history, but dgit doesn't run that itself.  It would be difficult for
dgit to do so because deciding which set of versions to include is
nontrivial and of course it would involve an awful lot of downloading.

One could push such a branch to the archive with dgit push.  But, it
seems to me that the git history structure ought to up to the
maintainer, and if the maintainer chooses to use dgit, the
maintainers's existing git history is probably better.

So I think the real way to improve this is to persuade more
maintainers to use dgit.  Perhaps for maintainers who do not, we
should at some point consider providing centrally an archive-based
package history.


But the most obvious challenge for a maintainer with an existing git
branch, but trying to use dgit, is dgit's insistence that the source
package and git tree are the same.

However, some source packages contain files not in the maintainers'
git branches, and which are needed to build: most commonly, autotools
output.  Such git branches are not useable with dgit.

But nowadays most people recommend that the package build should
always rerun autotools.  If you do that, then neither your git tree
nor your source package need contain the autotools output and all is
well.

Alternatively, you can commit the autotools output to git.  Merge
conflicts etc. are easily resolved by rerunning autotools.


And a second way this can bite is that it is normally best to use one
of dgit's build operations to build for upload.  This is mainly
because most other tools remove .gitignore by default.  dgit requires
that the source package and git tree are the same, so if your git tree
has .gitignore in it, your source package should too.


Finally, there is one compelling advantage of dgit's git-based
approach.

Many packages have strangely-behaved or plain buggy clean targets.
Because dgit knows that your git tree is canonical, it can help work
around this: you can tell dgit to use git-clean instead, avoiding the
package's clean target entirely.

If you're not in the habit of forgetting to say git-add, you can set a
configuration option to have dgit always use git-clean.  Then you will
never have to fight a buggy clean target, in a strange package, ever again.


==== Future plans slide

I have a number of plans for the future, some of which I need help
with.  But I don't have time, I'm afraid, to go through them.

Instead, I'm going to open the talk up to questions now.


12 mins