From: Ian Jackson Date: Fri, 24 Feb 2017 18:19:44 +0000 (+0000) Subject: before new/old objs X-Git-Url: http://www.chiark.greenend.org.uk/ucgi/~ian/git?p=git-hash-transition-plan.git;a=commitdiff_plain;h=f193ceda243e319d950ad12150d7640fb80c9137;ds=sidebyside before new/old objs --- diff --git a/plan.txt b/plan.txt index dcda9c2..612a8b2 100644 --- a/plan.txt +++ b/plan.txt @@ -17,37 +17,205 @@ objects which contain references by incompatibly-new or deprecatedly-old names. This allows for incremental deployment. -Syntax: +TEXTUAL SYNTAX -The object name syntax is extended as follows: object names using sha1 -are as current. Object names starting with lowercase ASCII letters h -or later refer to new hash functions. (`g' is reserved because of the -way that many programs write `g'. Programs that use -`g' should be changed to show `h' for hash function -`h' rather than `gh'.) +The object name textual syntax is extended as follows: -Object names h are SHA-512 hashes. Remaining letters are -reserved. `x' `y' `z' are reserved for private experiments; we -declare that public releases of git will never accept such names. +We declare that the object name syntax is henceforth + [A-Z]+[0-9a-z]+ | [0-9a-f]+ +and that names [A-Z].* are deprecated as ref name components. + + Rationale: + + Full backwards compatibility is impossible, because the hash + function needs to be evident in the name, so the new names + must be disjoint from all old SHA-1 names. + + We want a short but extensible syntax. The syntax should impose + minimal extra requirements on existing git users. In most + contexts where existing git users use hashes, ASCII alphanumeric + object names will fit. Use of punctuation such as : or even _ + may give trouble to existing users, who are already using + such things as delimiters. + + In existing deployments, refnames that differ only in case are + generally avoided (because they are troublesome on + case-insensitive filesystems). And conventionally refnames are + lower case. So names starting with an upper case letter will be + disjoint from most existing ref name components. + + Even though we probably want to keep using hex, it is a good + idea to reserve the flexibility to use a more compact encoding, + while not excessively widening the existing permissible + character set. + +Object names using SHA-1 are represented, in text, as at present. + +Object names starting with uppercase ASCII letters H or later refer to +new hash functions. Programs that use `g' should ideally +be changed to show `H' for hash function `H' rather than +`gH'.) + + Rationale: + + Object names starting with A-F might look like hex. G is + reserved because of the way that many programs write + `g'. + + This gives us 19 new hash function values until we have to + starting using two-letter hash function prefixes, or decide to + use A-F after all. + +(Truncated object names work as they do at the moment.) + +Initially we define and assign one new hash function (and textual +object name encoding): + + H where is the BLAKE2b hash of the object + (in lowercase) + +We also reserve the following syntax for private experiments: + E[A-Z]+[0-9a-z]+ +We declare that public releases of git will never accept such +object names. Everywhere in the git object formats and git protocols, a new object name (with hash function indicator) is permitted where an old object -name is permitted. A single object refers to all the objects it -references by the same hash function; in general this might be a -different hash function to the hash function by which this particular -object was itself referenced or obtained. +name is permitted. + +A single object refers to all the objects it references by the same +hash function; in general this might be a different hash function to +the hash function by which this particular object was itself +referenced or obtained. -As an exception, it is forbidden to refer to a tree object by a name -other than the hash function it uses to name its subtrees. If this -seems necessary, the tree object must be recursively rewritten instead -to use the desired object name. +As a further restriction, it is forbidden to refer to a tree object by +a name other than the hash function it uses to name its subtrees. If +this seems necessary, the tree object must be recursively rewritten +instead to use the desired object name. In binary protocols, where a SHA-1 object name in binary form was previously used, a new codepoint must be allocated in a containing structure (eg a new typecode). Usually, the new-format binary object will have a new typecode and also an additional name hash indicator. -15 of the hash indicator values correspond to the lowercase letters -reserved above. + +Whenever a new hash function textual syntax is defined, corresponding +binary format codepoint(s) are assigned. (Detailed binary format +specification is outside the scope of this plan.) + + +ORDERING + +Hash functions are partially ordered, from `older' to `newer'. + +The ordering is configurable. The default, with the two hash +functions defined here, is the obvious ordering + SHA1 ([0-9a-f]*) < BLAKE2b (H*) + + +CHOICE OF OBJECT NAMES + +Whenever objects are named, it is possible to refer to them by old or +new names. So git must make a choice, each time: when new objects +are created; when refs are updated; and when refs are reported over +network protocols to other instances of git. + +Although strictly speaking all objects have both old names and new +names, and there may be more than two hash functions, it is possible +to speak, somewhat loosely, about `new objects'. + +A `new' object is one which refers to other objects by a `new' name. +(whatever `new' means). + +We call these different hashes `namings'. That is, a `naming' is a +hash function implemented by git. The `naming IN an object' is the +naming by which the object refers to other objects (and may not exist, +if the object has no references); the `name OF an object' is the name +by which the object itself is specified. + + +Commits + +A non-origin commit is made (by default) as new as the newest of + (i) the naming in each of its parents + (ii) the specified name of each of its parents +(Implicitly this normally means that if HEAD uses a new name, new +commits will be generated.) + +The naming of an origin commit is controlled by a dropping left in +.git by git checkout --orphan or git init. + +At boundaries between old and new history, a new commit will refer to +old parents by those old parents' new names. + + +Tags + +A new tag is made to use newest naming, for its tagged object, of + (i) the name by which the tagged object was specified + (ii) the naming in the tagged object (if applicable) + + +Trees + +Commits (and sometimes, tags) can refer to tree objects; that tree +will contain the same naming as the referring object. + +That is, it is a bug to refer to a tree object by other than the hash +it uses internally to refer to subtrees (and gitlinks). This will +mean that a tree must sometimes be rewritten (ie, new object names +recalculated recursively). + + Rationale: we want to avoid new commits and tags relying on weak + hashes. + + +Blobs + +Blobs do not refer to other objects so they are neither new or old. + + +Name of newly created object + +When git creates a new object, it reports the new object name using +the naming in the object. + +For blobs and empty trees, the caller should normally specify. The +default is the naming used for HEAD. + + +Updating refs + +If a ref is updated with a new object, the name from its creation is +used (see above). + +If a ref is updated to a specified object, the naming used in the ref +is the newer of the specified name, or the naming in the object (if +any). + + + + + +), or with a specified object name. + + + +(If there are different equally new names, one of the newest names is +chosen according to some stable rule.) + + + +new + +commit. (This may mean converting the tree in hand, since trees are +supposed to be homgeonous.) + + + + +A `new commit' is one which refers to objects by + + Object store: