deprecatedly-old names. This allows for incremental deployment.
-Syntax:
+TEXTUAL SYNTAX
-The object name syntax is extended as follows: object names using sha1
-are as current. Object names starting with lowercase ASCII letters h
-or later refer to new hash functions. (`g' is reserved because of the
-way that many programs write `g<objectname>'. Programs that use
-`g<objectname>' should be changed to show `h<hash>' for hash function
-`h' rather than `gh<hash>'.)
+The object name textual syntax is extended as follows:
-Object names h<hex> are SHA-512 hashes. Remaining letters are
-reserved. `x' `y' `z' are reserved for private experiments; we
-declare that public releases of git will never accept such names.
+We declare that the object name syntax is henceforth
+ [A-Z]+[0-9a-z]+ | [0-9a-f]+
+and that names [A-Z].* are deprecated as ref name components.
+
+ Rationale:
+
+ Full backwards compatibility is impossible, because the hash
+ function needs to be evident in the name, so the new names
+ must be disjoint from all old SHA-1 names.
+
+ We want a short but extensible syntax. The syntax should impose
+ minimal extra requirements on existing git users. In most
+ contexts where existing git users use hashes, ASCII alphanumeric
+ object names will fit. Use of punctuation such as : or even _
+ may give trouble to existing users, who are already using
+ such things as delimiters.
+
+ In existing deployments, refnames that differ only in case are
+ generally avoided (because they are troublesome on
+ case-insensitive filesystems). And conventionally refnames are
+ lower case. So names starting with an upper case letter will be
+ disjoint from most existing ref name components.
+
+ Even though we probably want to keep using hex, it is a good
+ idea to reserve the flexibility to use a more compact encoding,
+ while not excessively widening the existing permissible
+ character set.
+
+Object names using SHA-1 are represented, in text, as at present.
+
+Object names starting with uppercase ASCII letters H or later refer to
+new hash functions. Programs that use `g<objectname>' should ideally
+be changed to show `H<hash>' for hash function `H' rather than
+`gH<hash>'.)
+
+ Rationale:
+
+ Object names starting with A-F might look like hex. G is
+ reserved because of the way that many programs write
+ `g<objectname>'.
+
+ This gives us 19 new hash function values until we have to
+ starting using two-letter hash function prefixes, or decide to
+ use A-F after all.
+
+(Truncated object names work as they do at the moment.)
+
+Initially we define and assign one new hash function (and textual
+object name encoding):
+
+ H<hex> where <hex> is the BLAKE2b hash of the object
+ (in lowercase)
+
+We also reserve the following syntax for private experiments:
+ E[A-Z]+[0-9a-z]+
+We declare that public releases of git will never accept such
+object names.
Everywhere in the git object formats and git protocols, a new object
name (with hash function indicator) is permitted where an old object
-name is permitted. A single object refers to all the objects it
-references by the same hash function; in general this might be a
-different hash function to the hash function by which this particular
-object was itself referenced or obtained.
+name is permitted.
+
+A single object refers to all the objects it references by the same
+hash function; in general this might be a different hash function to
+the hash function by which this particular object was itself
+referenced or obtained.
-As an exception, it is forbidden to refer to a tree object by a name
-other than the hash function it uses to name its subtrees. If this
-seems necessary, the tree object must be recursively rewritten instead
-to use the desired object name.
+As a further restriction, it is forbidden to refer to a tree object by
+a name other than the hash function it uses to name its subtrees. If
+this seems necessary, the tree object must be recursively rewritten
+instead to use the desired object name.
In binary protocols, where a SHA-1 object name in binary form was
previously used, a new codepoint must be allocated in a containing
structure (eg a new typecode). Usually, the new-format binary object
will have a new typecode and also an additional name hash indicator.
-15 of the hash indicator values correspond to the lowercase letters
-reserved above.
+
+Whenever a new hash function textual syntax is defined, corresponding
+binary format codepoint(s) are assigned. (Detailed binary format
+specification is outside the scope of this plan.)
+
+
+ORDERING
+
+Hash functions are partially ordered, from `older' to `newer'.
+
+The ordering is configurable. The default, with the two hash
+functions defined here, is the obvious ordering
+ SHA1 ([0-9a-f]*) < BLAKE2b (H*)
+
+
+CHOICE OF OBJECT NAMES
+
+Whenever objects are named, it is possible to refer to them by old or
+new names. So git must make a choice, each time: when new objects
+are created; when refs are updated; and when refs are reported over
+network protocols to other instances of git.
+
+Although strictly speaking all objects have both old names and new
+names, and there may be more than two hash functions, it is possible
+to speak, somewhat loosely, about `new objects'.
+
+A `new' object is one which refers to other objects by a `new' name.
+(whatever `new' means).
+
+We call these different hashes `namings'. That is, a `naming' is a
+hash function implemented by git. The `naming IN an object' is the
+naming by which the object refers to other objects (and may not exist,
+if the object has no references); the `name OF an object' is the name
+by which the object itself is specified.
+
+
+Commits
+
+A non-origin commit is made (by default) as new as the newest of
+ (i) the naming in each of its parents
+ (ii) the specified name of each of its parents
+(Implicitly this normally means that if HEAD uses a new name, new
+commits will be generated.)
+
+The naming of an origin commit is controlled by a dropping left in
+.git by git checkout --orphan or git init.
+
+At boundaries between old and new history, a new commit will refer to
+old parents by those old parents' new names.
+
+
+Tags
+
+A new tag is made to use newest naming, for its tagged object, of
+ (i) the name by which the tagged object was specified
+ (ii) the naming in the tagged object (if applicable)
+
+
+Trees
+
+Commits (and sometimes, tags) can refer to tree objects; that tree
+will contain the same naming as the referring object.
+
+That is, it is a bug to refer to a tree object by other than the hash
+it uses internally to refer to subtrees (and gitlinks). This will
+mean that a tree must sometimes be rewritten (ie, new object names
+recalculated recursively).
+
+ Rationale: we want to avoid new commits and tags relying on weak
+ hashes.
+
+
+Blobs
+
+Blobs do not refer to other objects so they are neither new or old.
+
+
+Name of newly created object
+
+When git creates a new object, it reports the new object name using
+the naming in the object.
+
+For blobs and empty trees, the caller should normally specify. The
+default is the naming used for HEAD.
+
+
+Updating refs
+
+If a ref is updated with a new object, the name from its creation is
+used (see above).
+
+If a ref is updated to a specified object, the naming used in the ref
+is the newer of the specified name, or the naming in the object (if
+any).
+
+
+
+
+
+), or with a specified object name.
+
+
+
+(If there are different equally new names, one of the newest names is
+chosen according to some stable rule.)
+
+
+
+new
+
+commit. (This may mean converting the tree in hand, since trees are
+supposed to be homgeonous.)
+
+
+
+
+A `new commit' is one which refers to objects by
+
+
Object store: