[title] (PITCH) [archive as vcs] The Debian archive is (amongst other things) a version control system. Clone (checkout) <=> apt-get source Commit+push <=> upload But the archive has no really sensible branching. It history browsing is appalling. And it has pretty bad interaction with other VCSs. In particular we want to be using git. So what to do ? Well, we might replace it. But the archive is various other things besides an appallingly bad VCS. And a lot of our co-developers are used to it and fear and hate change. So instead, we should have some software to help deal with the problem: What that really means is that we need a better gateway. That's what dgit is. [manpage] dgit is a tool which lets you treat the archive as if it were a git server. It provides uniform operation for all packages: You can clone any package, work on it, build it, and upload it. You don't need to know maintainer's workflow. It doesn't matter whether the maintainer uses dgit, other git tools, quilt, CVS or SCCS. With dgit you do all direct source code management in git. As a dgit user you do not interact with the archive directly. [start demo] dgit is particularly useful for NMUers: you can prepare an RC bugfix, with full support from git, without needing to know anything about the package's usual VCS arrangements. dgit also has great potential for downstreams - that is, derivatives and users who want to modify a package. Having used dgit clone or fetch, you can merge into your downstream branch. (There are some issues with this right now for non-DDs, which I'm going to discuss later.) As a maintainer you can choose, if you like, to use the dgit git history as your primary working history. Any fast forwarding patches-applied git workflow works with dgit. In particular, you can have the full upstream git history in the ancestry if your dgit git history. (PRINCIPLES OF OPERATION) [ .dsc, dpkg-source -x, git checkout, identity ] This is Debian so you want to know how it works. So let me run through dgit's principles of operation. The data model is as follows: A dgit-generated upload's .dsc contains a git commit hash. This specifies a commit whose tree is identical to the results of dpkg-source -x on the .dsc. But the actual git history is not stored in the archive. It is obtained via the git protocol from an actual git server. (Currently this is on alioth but it's going to move.) The only other constraint on the git commit named in the .dsc is that the successive dgit uploads must have a fast-forwarding history. Specifically, each upload made with dgit must have as an ancestor the current state of that package in the archive. I should expand a bit on the need for the git commit to be identical to the source package. dgit is (amongst other things) a way of looking at source packages, and their history, using git. That means that the git tree has to be the same as the package tree. Specifically, for example, files like configure need to either be in both git and the source package, or neither. [ synthetic commit example ] Non-dgit uploads don't have a (suitable) git commit hash. But dgit clone needs to produce a suitable git commit. It does this by inventing (in a deterministic way) a commit corresponding to the state of the archive. If necessary, it also generates a synthetic merge commit to tie the invented commit into the previous dgit history. (QUILTY WORKFLOW) [ synthetic patch example ] At the moment, dgit doesn't attempt to do anything clever with `3.0 (quilt)' source packages. The synthetic git history generated from non-dgit uploads does not represent the quilt patch stack. And conversely, dgit push involves dpkg-source commit, to make the git tree be the same as dpkg-source would extract. So dgit has to make some patches, and currently it makes single synthetic patch whose description contains some info from git log. Overall this means that currently when you work on a quilty package in dgit, you don't interact with the quilt patch stack. This is less than ideal. I intend to improve this, perhaps by having dgit use git-dpm as a bidirectional gateway between `3.0 (quilt)' and git. Exactly how to do this involves some complicated design decisions which I haven't entirely worked out yet. The intent, though, is that there will be an option to generate a rebasing-style git branch. After a patch series has been edited with rebase, dgit push will generate a `fake merge' commit to make the resulting history fast-forwarding. (ACCESS PROBLEMS) [table] Probably the biggest problem with dgit right now is that it is only useable for DDs. This is particularly galling for a tool which is especially useful for users, downstreams, mentees, and so forth. There are two obstacles to widening the access, one relating to the archive and one to the git server. Firstly, dgit needs to query the archive to find the current version of the package and obtain a copy of the .dsc. At the moment there is no official interface which provides this data. There is the rmadison server but it sometimes serves stale or wrong data and there is no authentication - that is, there is no way for the client to tell that the data is accurate. So what dgit does right now is ssh to coccia and run SQL commands against its mirror of the ftpmaster database. This is obviously mad. ftpmaster have promised me a proper query service API (via HTTP and TLS). Secondly, dgit is currently using alioth as its git server, and the collab-maint group there for authentication. This is not ideal for a number of reasons. Alioth does a lot of other things and is therefore less secure and reliable than would be ideal. dgit's required authentication model is not a very good fit for alioth's infrastructure. We intend to move this to a VM of its own - I've spoken to DSA (in the person of Tollef) about this. [questions etc.] The package itself is in testing, and you can simply install the same .deb on stable if you want. So that's what I had prepared to say. I think I have time for questions.