Perhaps, this would handle patterns like ["a(b", "c)"] better. It might not
be correct to error out on "(", but should be better than building wrong
regexp pattern "a(b|c)".
Now that we've replaced `MergeHunk` by a `Conflict`, it makes sense to
convert the input `Conflict<FileId>` by mapping each term. Unlike
`Option::map()` I made `Conflict::map()` take a reference `self`,
because it's not uncommon to want to map the same conflict multiple
times. I'm going to use that for producing a
`Conflict<Option<TreeValue>>` from a `Conflict<Tree>` and a set of
paths.
Since `Conflict`s can represent the resolved state, so
`Conflict<ContentHunk>` can represent the states that we use
`MergeHunk` for. `MergeHunk` does force the user to handle the
resolved case, which may be useful. I suppose one could use the same
argument for making `Conflict` an enum, i.e. if we think that
`MergeHunk`'s two variants are beneficial, then we should consider
making `Conflict` an enum with those two variants.
It's useful to have a more readable `Debug` format for `Vec<u8>`
(`"foo"` is better than `[102, 111, 111]`). It might also make types
in function signatures and elsewhere more readable.
This only parses the fields relevant to us, i.e.:
- name: the stable identifier of the submodule
- path: the path to the submodule in the current commit
- url: the remote we can clone the submodule from
The full list of .gitmodules fields can be found at
https://git-scm.com/docs/gitmodules.
This eliminates indirect access through Vec<u8> and improves cache locality
while sorting the index entries. We can achieve a similar result by using
SmallVec<[u8; 24]> in place of Commit/ChangeId(Vec<u8>), but we would have
to determine a reasonable id length across backends. Indexing [u8; 4] performs
better, at the cost of the API and implementation complexity.
For temporary Commit/ChangeId allocation in general, I think a borrowed type
like Path/PathBuf will help.
Testing with my "linux" repo, this saves ~670ms needed to initialize both
change id index and disambiguation indexes.
I'll rewrite resolve_prefix_range() to branch depending on the prefix length,
and the easiest way to do that is passing iterator to continuation function
instead of returning iterator as an either (or boxed) type.
I'm going to rewrite IdIndex to store only first few bytes of the key. A
separate table helps there.
At this point, it wouldn't make sense to convert usize to u32, but the new
index will store ([u8; 4], u32) pairs.
It allows us to build multiple IdIndex instances within a single loop. As the
final sorting is heavy operation, I don't want to implement Default + Extend
for IdIndex to be compatible with Iterator::unzip().
With my colocated "linux" repo, this appears to save ~50ms startup overhead.
Since the repo has lots of indirect tags, we can't eliminate tag object
loading at all. But still, it's faster than falling back to peel_to_commit().
It was convenient that what the git backend stored in its "extras"
table is exactly a subset of the fields that local backend stores, but
it's bit ugly and limiting. For example, it makes it possible to
populate the `author` field in the git extras, but that would have no
effect. It's better that it's not possible to do that (we store the
author field in the git commit, of course).
What made me notice this now was that I'm working on tree-level
conflicts (#1624) and I'm thinking of adding a field to the git extras
saying "this commit has single tree, but it's still a new-style
commit", so we can know not to walking such trees to find path-level
conflicts. That's only needed for the git backend because we don't
care about compatibility for the local backend.