From 0c65844876ef96acf8a11136b7c5797e5218b3e9 Mon Sep 17 00:00:00 2001 From: Ian Jackson Date: Wed, 28 Sep 2016 18:05:53 +0100 Subject: [PATCH] Import: Copy plan from emails Signed-off-by: Ian Jackson --- README.dsc-import | 292 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 292 insertions(+) create mode 100644 README.dsc-import diff --git a/README.dsc-import b/README.dsc-import new file mode 100644 index 00000000..f5bb0bdb --- /dev/null +++ b/README.dsc-import @@ -0,0 +1,292 @@ +From ijackson Mon Sep 26 15:37:19 +0100 2016 +X-VM-v5-Data: ([nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil nil] + [nil "Monday" "26" "September" "2016" "15:37:19" "+0100" "Ian Jackson" "ijackson@chiark.greenend.org.uk" nil nil "Intent to commit craziness - source package unpacking" "^From:" nil nil "9" nil nil nil nil nil nil nil nil nil nil] + nil) +X-Mozilla-Status: 0001 +X-Mozilla-Status2: 00000000 +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Message-ID: <22505.12959.668142.478444@chiark.greenend.org.uk> +X-Mailer: VM 8.2.0b under 24.4.1 (i586-pc-linux-gnu) +From: Ian Jackson +To: debian-dpkg@lists.debian.org, + Guido Guenther , + Bernhard R. Link , + vcs-pkg-discuss@lists.alioth.debian.org +Subject: Intent to commit craziness - source package unpacking +Date: Mon, 26 Sep 2016 15:37:19 +0100 + +tl;dr: + + * dpkg developers, please tell me whether I am making assumptions + that are likely to become false. Particularly, on the behaviour of + successive runs of dpkg-source --before-build with successively + longer series files. + + * git-buildpackage and git-dpm developers, please point me to + information about what metadata to put into the commit message for + a git commit which represents a dpkg-source quilt patch. I would + like these commits to be as convenient for gbp and git-dpm users as + possible. + + +Hi. + +Currently when dgit needs to import a .dsc into git, it just uses +dpkg-source -x, and git-add. The result is a single commit where the +package springs into existence fully formed. This is not as good as +it could be. I would like to represent (in the git pseudohistory) the +way that the resulting tree is constructed from the input objects. + +In particular, I would like to: represent the input tarballs as a +commit each (which all get merged together as if by git merge -s +subtree), and for quilt packages, each patch as a commit. But I want +to avoid (as much as possible) reimplementing the package extraction +algorithm in dpkg-source. + +dpkg-source does not currently provide interfaces that look like they +are intended for what I want to do. And dgit wants to work with old +versions of dpkg, so I don't want to block on getting such interfaces +added (even supposing that a sane interface could be designed, which +is doubtful). + +So I intend to do as follows. (Please hold your nose.) + +* dgit will untar each input tarball (other than the Debian tarball). + + This will be done by scanning the .dsc for things whose names look + like (compressed) tarballs, and using the interfaces provided by + Dpkg::Compression to get at the tarball. + + Each input tarball unpack will be done separately, and will be + followed by git-add and git-write tree, to obtain a git tree object + corresponding to the tarball contents. + + That tree object will be made into a commit object with no parents. + (The package changelog will be searched for the earliest version + with the right upstream version component, and the information found + there used for the commit object's metadata.) + +* dgit will then run dpkg-source -x --skip-patches. + + Again, git plumbing will be used to make this into a tree and a + commit. The commit will have as parents all the tarballs previous + mentioned. The metadata will come from the .dsc and/or the + final changelog entry. + +* dgit will look to see if the package is `3.0 (quilt)' and if so + whether it has a series file. (dgit already rejects packages with + distro-specific series files, so we need worry only about a single + debian/patches/series file.) + + If there is a series file, dgit will read it into memory. It will + then iterate over the series file, and each time: + - write into its playground a series file containing one + more non-comment non-empty line to previously + - run dpkg-source --before-build (which will apply that + additional patch) + - make git tree and commit objects, using the metadata from + the relevant patch file to make the commit (if available) + - each commit object has as a parent the previous commit + (either the previous commit, or the commit resulting from + dpkg-source -x) + + After this the series file has been completely rewritten. + +* dgit will then run one final invocation of dpkg-source + --before-build. This ought not to produce any changes, but if + it does, they will be represented as another commit. + +* As currently, there will be a final no-change-to-the-tree + pseudomerge commit which stitches the package into the relevant dgit + suite branch; ie something that looks as if it was made with git + merge -s ours. + +* As currently, dgit will take steps so that none of the git trees + discussed above contain a .pc directory. + + +This has the following properties: + +* Each input tarball is represented by a different commit; in usual + cases these commits will be the same for every upload of the same + upstream version. + +* For `3.0 (quilt)' each patch's changes to the upstream files appears + as a single git commit (as is the effect of the debian tarball). + For `1.0' non-native, the effect of the diff is represented as a + commit. So eg `git blame' will show synthetic commits corresponding + to the correct parts of the input source package. + +* It is possible to `git-cherry-pick' etc. commits representing `3.0 + (quilt)' patches. It is even possible fish out the patch stack as + git branch and rebase it elsewhere etc., since the patch stack is + represented as a contiguous series of commits which make only the + relevant upstream changes. + +* Every orig tarball in the source package is decompressed twice, but + disk space for only one extra copy of its unpacked contents is + needed. (The converse would be possible in principle but would be + very hard to arrange with the current interfaces provided by the + various tools.) + +* No back doors into the innards of dpkg-source (nor changes to + dpkg-dev) are required. + +* dgit does grow a dependency on Dpkg::Compression. + +* Knowledge of the source format embedded in dgit is is restricted to + iterating over tarballs and manipulating debian/patches/series, + which dgit already does. + +* dgit now depends on dpkg-source --before-build idempotently applying + patches as they successively appear on debian/patches/series. + +* Perhaps the git commits generated by dgit to represent patches can + be made to round-trip nicely into tools like git-dpm and + git-buildpackage. + + I have found the information about tags in gbp-dch(1), but that + doesn't seem like it's applicable. + + I have also found the information about tags in gbp-pq(1). From + that it looks like I ought to generate "Gbp-Pq: Name" and "Gbp-Pq: + Topic". + +* The scheme I describe avoids introducing a dependency from dgit to + git-buildpackage. I might be able to replace the + successive-patch-application part with an appropriate invocation of + gbp-pq. Would that be better ? + + Bear in mind that because the output of gbp-pq import doesn't + contain debian/patches, I would need to rewrite its output (perhaps + with git-filter-branch). + + +Comments welcome. Please be quick - this is very close to the top of +my dgit todo list. + + +Thanks, +Ian. + + +-- +Ian Jackson These opinions are my own. + +If I emailed you from an address @fyvzl.net or @evade.org.uk, that is +a private address which bypasses my fierce spamfilter. + +From ijackson Wed Sep 28 10:50:49 +0100 2016 +X-VM-v5-Data: ([nil nil nil nil t nil nil nil nil nil nil nil nil nil nil nil] + [nil "Wednesday" "28" "September" "2016" "10:50:49" "+0100" "Ian Jackson" "ijackson@chiark.greenend.org.uk" "<22507.37497.633622.843659@chiark.greenend.org.uk>" nil "Re: Intent to commit craziness - source package unpacking" "^From:" nil nil "9" nil nil nil nil nil nil nil nil nil nil] + nil) +X-Mozilla-Status: 0003 +X-Mozilla-Status2: 00000000 +MIME-Version: 1.0 +Content-Type: text/plain; charset=iso-8859-1 +Content-Transfer-Encoding: quoted-printable +Message-ID: <22507.37497.633622.843659@chiark.greenend.org.uk> +In-Reply-To: <20160928010117.nqe2prbsbaqkbjza@gaara.hadrons.org> +References: <22505.12959.668142.478444@chiark.greenend.org.uk> + <20160928010117.nqe2prbsbaqkbjza@gaara.hadrons.org> +X-Mailer: VM 8.2.0b under 24.4.1 (i586-pc-linux-gnu) +From: Ian Jackson +To: Guillem Jover +Cc: debian-dpkg@lists.debian.org, + Guido Guenther , + "Bernhard R. Link" , + vcs-pkg-discuss@lists.alioth.debian.org +Subject: Re: Intent to commit craziness - source package unpacking +Date: Wed, 28 Sep 2016 10:50:49 +0100 + +Guillem Jover writes ("Re: Intent to commit craziness - source package = +unpacking"): +> On Mon, 2016-09-26 at 15:37:19 +0100, Ian Jackson wrote: +> > tl;dr: +> >=20 +> > * dpkg developers, please tell me whether I am making assumptions +> > that are likely to become false. Particularly, on the behaviour= + of +> > successive runs of dpkg-source --before-build with successively +> > longer series files. +>=20 +> For format =AB3.0 (quilt)=BB, that seems fine, to the point I'm fine = +even +> documenting this, which I can probably do for 1.18.11. + +Great. + +> For other formats, such as =AB2.0=BB, I don't think that's true, but = +I +> assume you don't care about that one anyway. But just mentioning +> because this behavior is probably format-specific. For =AB2.0=BB I +> think it could be fixed, and should not be too hard (not sure if it's= + +> worth it though). + +I think the right approach is perhaps to use --skip-patches and +--before-build only with 3.0 (quilt). The that would leave 2.0 (or +other strange or future formats) producing a correct (although +possibly sub-optimal) import. + +> > dpkg-source does not currently provide interfaces that look like th= +ey +> > are intended for what I want to do. And dgit wants to work with ol= +d +> > versions of dpkg, so I don't want to block on getting such interfac= +es +> > added (even supposing that a sane interface could be designed, whic= +h +> > is doubtful). +>=20 +> Even then I'm still interested in a decription of what you'd need +> ideally, to take into account when having a pass at cleaning up that +> part of the interface. I think you could be interested in a cleaner +> Dpkg::Source::* hierarchy, for the mid/long-term? + +For `3.0 (quilt)' explicit interfaces for applying and unapplying +individual patches would help. But really IMO such an interface ought +to be exposed on the command line rather than (or as well as) via a +Perl module. + +Beyond that I find it hard to see what could make dgit's life easier. +Since dgit wants to construct a commit graph representing the source +package's innards, unless dpkg-source explicitly provides an interface +along those lines ("please output a graph of unpacked source tree +states and corresponding commit messages") dgit is still going to have +to know specially about most of the source package formats. + +> > * dgit will untar each input tarball (other than the Debian tarball= +). +> >=20 +> > This will be done by scanning the .dsc for things whose names loo= +k +> > like (compressed) tarballs, and using the interfaces provided by +> > Dpkg::Compression to get at the tarball. +>=20 +> Hmm, Dpkg::Source::Archive is currently private, but I might have a +> look at making it public if that would be helpful here. + +I think the amount of logic I would have to replicate is minimal. + +> > * As currently, dgit will take steps so that none of the git trees +> > discussed above contain a .pc directory. +>=20 +> As long as the directory does not disappear from the working tree, +> that should work. + +Right, indeed it won't. + +Thanks for your comments. I feel unblocked :-). + +Ian. + +--=20 +Ian Jackson These opinions are my o= +wn. + +If I emailed you from an address @fyvzl.net or @evade.org.uk, that is +a private address which bypasses my fierce spamfilter. + -- 2.30.2