[PATCH] Add support for git option: `pack.packSizeLimit`
Sean Whitton
spwhitton at spwhitton.name
Sun Dec 29 10:07:45 GMT 2024
Hello,
On Sat 16 Dec 2023 at 04:20am -08, Cathy J. Fitzpatrick wrote:
> This patch modifies gpg-remote-gcrypt so that it will respect the value
> of `pack.packSizeLimit`. This is achieved by modifying the invocation of
> `git-pack-objects(1)` so that the `--stdout` argument is not supplied.
> Instead, the pack files are written to the same temporary directory that
> gpg-remote-gcrypt already uses for other purposes.
Unfortunately it looks like this breaks compatibility.
My /tmp is a tmpfs, like many people's, and so this happens:
+ [ -s /tmp/git-remote-gcrypt-bFpV24cE3jPY.2643689/objlP ]
+ GIT_ALTERNATE_OBJECT_DIRECTORIES=/tmp/git-remote-gcrypt-bFpV24cE3jPY.2643689 git pack-objects /tmp/git-remote-gcrypt-bFpV24cE3jPY.2643689/pack_raw
fatal: unable to rename temporary file to '/tmp/git-remote-gcrypt-bFpV24cE3jPY.2643689/pack_raw-edf80a39ed391e8eb1e5a9495c9c3f8c0c597de6.pack': Invalid cross-device link
I'm sorry that it took me so long to properly review this, and I hope
you are still interested in working on the feature.
I've applied the other parts of your path to my public repository.
I've attached the remainder of your patch I've not yet applied, with
some minor copyedits from me. If you are able to work more on this,
please use this as a baseline.
Thanks again.
-- >8 --
From: "Cathy J. Fitzpatrick" <cathy at cathyjf.com>
Date: Sat, 16 Dec 2023 04:20:53 -0800
Subject: [PATCH] Add support for git option pack.packSizeLimit
The standard git-config(1) setting of pack.packSizeLimit is used to specify
the maximum size of a pack file when repacking a repository. This setting is
important when working with a remote git hosting provider that imposes a
maximum file size on files stored on the remote server. For example, GitHub
currently imposes a maximum size of 100 MiB per file stored on its servers.
Until now, gpg-remote-gcrypt has ignored the pack.packSizeLimit setting when
repacking the repository because gpg-remote-gcrypt supplies the --stdout flag
to git-pack-objects(1), and that flag implicitly causes git-pack-objects(1)
ignore the value of pack.packSizeLimit.
This patch modifies gpg-remote-gcrypt so that it will respect the value of
pack.packSizeLimit. This is achieved by modifying the invocation of
git-pack-objects(1) so that the --stdout argument is not supplied.
Instead, the pack files are written to the same temporary directory that
gpg-remote-gcrypt already uses for other purposes.
The code that invokes git-pack-objects(1) is also modified to handle the
possibility that more than one pack file might be produced (if the size of the
pack would exceed the value of pack.packSizeLimit). Previously,
gpg-remote-gcrypt was able to assume that git-pack-objects(1) would always
produce exactly one pack file, but with this patch, that is no longer the case
if the user has specified pack.packSizeLimit. To address this, it was
necessary to introduce a loop in two places, to iterate over each of the
generated pack files, instead of assuming that there would always be exactly
one pack file.
The change is fully backward- and forward-compatible.
Indeed, this is true of the pack.packSizeLimit setting in general.
As the manual for gpg-config(1) observes, "the git:// protocol is unaffected"
by the value of pack.packSizeLimit.
Although storing repositories encrypted by git-remote-gcrypt on the servers of
Git hosting services such as GitHub has a variety of drawbacks, it is a
supported use case, and it can make sense for certain kinds of repositories.
This patch makes it easier to work with these backends by handling maximum
file size restrictions imposed by the services, and, for simplicity, the
interface for this patch relies solely on a standard git-config(1) setting.
This patch also includes a new test that that, when run, verifies the basic
functionality of git-remote-gcrypt.
Signed-off-by: Cathy J. Fitzpatrick <cathy at cathyjf.com>
Signed-off-by: Sean Whitton <spwhitton at spwhitton.name>
---
README.rst | 13 +++++++++++++
git-remote-gcrypt | 46 +++++++++++++++++++++++++++++++---------------
2 files changed, 44 insertions(+), 15 deletions(-)
diff --git a/README.rst b/README.rst
index 2847301..3a9fdc2 100644
--- a/README.rst
+++ b/README.rst
@@ -107,6 +107,19 @@ The following ``git-config(1)`` variables are supported:
There is a potential solution here: https://bugs.debian.org/877464#32
+``pack.packSizeLimit``
+ This is a standard git configuration variable.
+
+ In the context of git-remote-crypt, this variable, if set, specifies the
+ maximum size of the packfiles to be uploaded to the backend. As in
+ standard git, this value should be an integer, optionally suffixed with
+ "k", "m", or "g". If a packfile exceeds the maximum size, it will be
+ split into several files before being uploaded. This splitting is
+ transparent to the user and does not affect use of the repository.
+
+ This variable is useful when working with a backend that imposes a maximum
+ file size, such as GitHub.
+
Environment variables
=====================
diff --git a/git-remote-gcrypt b/git-remote-gcrypt
index 7e7240f..97684aa 100755
--- a/git-remote-gcrypt
+++ b/git-remote-gcrypt
@@ -739,7 +739,8 @@ do_push()
# The manifest is encrypted.
local r_revlist= pack_id= key_= obj_= src_= dst_= \
r_pack_delete= tmp_encrypted= tmp_objlist= tmp_manifest= \
- force_passed=
+ force_passed= tmp_pack_prefix= r_new_pack_list= \
+ new_pack_object_ids= object_id=
ensure_connected
@@ -787,6 +788,7 @@ EOF
fi
fi
+ tmp_pack_prefix="$Tempdir/pack_raw"
tmp_encrypted="$Tempdir/packP"
tmp_objlist="$Tempdir/objlP"
@@ -798,17 +800,28 @@ EOF
# Only send pack if we have any objects to send
if [ -s "$tmp_objlist" ]
then
- key_=$(genkey "$Packkey_bytes")
- pack_id=$(export GIT_ALTERNATE_OBJECT_DIRECTORIES=$Tempdir;
- pipefail git pack-objects --stdout < "$tmp_objlist" |
- pipefail ENCRYPT "$key_" |
- tee "$tmp_encrypted" | gpg_hash "$Hashtype")
-
- append_to @Packlist "pack :${Hashtype}:$pack_id $key_"
- if isnonnull "$r_pack_delete"
- then
- append_to @Keeplist "keep :${Hashtype}:$pack_id 1"
- fi
+ # This will return more than one object_id if the user's git
+ # configuration includes `pack.packSizeLimit` and the size of the
+ # packfile is greater than the specified size limit. Hence, we need
+ # to iterate through the returned objects.
+ new_pack_object_ids=$(GIT_ALTERNATE_OBJECT_DIRECTORIES=$Tempdir \
+ git pack-objects "$tmp_pack_prefix" < "$tmp_objlist")
+ while IFS= read -r object_id
+ do
+ key_=$(genkey "$Packkey_bytes")
+ pack_id=$(pipefail ENCRYPT "$key_" < "$tmp_pack_prefix-$object_id.pack" | \
+ tee "$tmp_encrypted-$object_id" | gpg_hash "$Hashtype")
+ rm -f -- "$tmp_pack_prefix-$object_id.pack"
+
+ append_to @r_new_pack_list "$pack_id:$object_id"
+ append_to @Packlist "pack :${Hashtype}:$pack_id $key_"
+ if isnonnull "$r_pack_delete"
+ then
+ append_to @Keeplist "keep :${Hashtype}:$pack_id 1"
+ fi
+ done <<EOF
+$new_pack_object_ids
+EOF
fi
# Generate manifest
@@ -824,16 +837,19 @@ repo $Repoid
$Extnlist
EOF
- # Upload pack
+ # Upload pack (or packs, if applicable)
if [ -s "$tmp_objlist" ]
then
- PUT "$URL" "$pack_id" "$tmp_encrypted"
+ xecho "$r_new_pack_list" | while IFS=':' read -r pack_id object_id
+ do
+ PUT "$URL" "$pack_id" "$tmp_encrypted-$object_id"
+ rm -f -- "$tmp_encrypted-$object_id"
+ done
fi
# Upload manifest
PUT "$URL" "$Manifestfile" "$tmp_manifest"
- rm -f "$tmp_encrypted"
rm -f "$tmp_objlist"
rm -f "$tmp_manifest"
--
Sean Whitton
More information about the sgo-software-discuss
mailing list