Git notes (at least as implemented by libgit2) quickly gets really
slow, as noted in issue #7. This patch replaces it by a custom storage
format.
I tested the performance in the git.git repo with just a few hundred
annotated commits (~450, I think) and no sharding. I listed the first
~2900 commits there using `jj log --no-graph -r ,,v1.0.0 -T 'author
"\n"' | wc -l`. That took about 882ms. After this patch, it dropped to
108ms.
I did a similar test in this repo with 12700 annotated commits and
sharding, listing all visible commits. That took 142ms before this
patch (the sharding helps a lot!) and 55ms after.
Closes#3.
Closes#7.
The new store works the same way as the `OpHeadsStore`. It keeps track
of the current head file(s) by recording their names in a
directory. When a write happens, it adds the new head and then removes
the old head. There will be generally be a single head at a time. The
only exception is when there's been concurrent operations (locally, or
remotely, in the case of a distributed file system). When there are
multiple heads files, they are automatically merged. No guarantee is
given about which value wins if the key exists in several heads; the
store is meant to be used for data that's immutable once written. As
long as different keys are written, this is a CRDT. That makes it fit
for solving both #3 and #7.
I'm trying to replace the Git backend's use of Git notes for storing
metadata (#7). This patch adds a file format that I hope can be used
for that. It's a simple generic format for storing fixed-size keys and
associated variable-size values. The keys are stored in sorted
order. Each key is followed by an offset to the value. The offset is
relative to the first value. All values are concatenated after each
other. I suppose it's a bit like Git's pack files but lacking both
delta-encoding and compression.
Each file can also have a parent pointer (just like the index files
have), so we don't have to rewrite the whole file each time. As with
the index files, the new format squashes a file into its parent if it
contains more than half the number of entries of the parent. The code
is also based on `index.rs`.
Perhaps we can alo replace the default operation storage with this
format. Maybe also the native local backend's storage. We'll need
delta-encoding and compression soon then.
I'm about to change the index format (to remove predecessor
information), which will break the format. Let's prepare for that by
having `IndexStore` reindex the repo if it fails to read the index..
I think this is just cleaner, and it gives us room to put other
store-related data in the `.jj/store/` directory. I may want to use
that place for writing the metadata we currently write in Git notes
(#7).
Diffs between certain combinations of file types were not handled by
`jj diff` (for example, a diff between a conflict and another conflict
would not show a diff). This change fixes that, and also makes added
and removed files get printed with color and line numbers, which I've
often wanted.
The user currently has to edit `.jj/git/config` (or run `git
--git-dir=.jj/git config`) to manage remotes in the underlying Git
repo. That's not very discoverable (and we may change the path some
day), so let's provide a command for it.
With this change, you can do e.g. `heads(remote_branches())`. That
should currently be the same as `public_heads()`, except that we don't
yet remove public heads when remote branches have been updated. Having
this support should be generally useful, but I may use it in the short
term specifically for depending less on the public heads, until I get
around to keeping them up to date.
I also changed the instructions to use `cargo install --git` pointing
straight to GitHub, so we don't have the naming conflict with the jj
repo created in the tutorial.
As #33 says, the default diff we have can be hard to read and it
cannot be used for use with other tools. This patch adds a `jj diff
--git` mode for showing Git's flavor of unified diffs.
We should add a config option to get these diffs by default. For
interchange with other tools, we also need a way of turnning off color
codes in output (it's currently always on, even when when not printing
to a TTY).
I noticed while working on support for unified diffs (#33) that
`Diff::for_tokenizer(..., &find_line_ranges)` would return a
`DiffHunk::Matching` for each matching line instead of a single
`DiffHunk::Matching` for all the matching lines. That's different from
what you get from `Diff::default_refinement()` and seems less
convenient to work with.
My SSH keys are password-protected, so I haven't been able to test
this patch completely, but I believe it should work. We now use
ssh-agent if `$SSH_AGENT_PID` is set, otherwise we check if
`$HOME/.ssh/id_rsa` exists and assume it's a password-less key. That's
quite hacky but I think it's good enough for now. We eventually need
to move this out of the library crate just like libgit2 has done.
Closes#25.
It's been a lot of work, but now we're finally able to remove the
`Evolution` state! `jj obslog` still works as before (it just walks
the predecessor pointers).
This rewrites the `divergent` template keyword to be based on the
number of visible commits with a given change id. That's the same as
before; it's just that it's not based on the `Evolution` object's view
of which commits are visible anymore.
This is the last thing that depended on the evolution state!
The removal of hidden heads was just there to help with the transition
away from evolution (#32). Now that we no longer depend on evolution
for removing old heads, we can remove the hack.
This rewrites the code for resolving a change id to simply walk the
entire index. That's obviously not optimal, but it's not worse than
what we did in the evolution-based resolution. This is yet another
step towards removing support for evolution (#32).
This patch teaches `DescendantRebaser` to also update heads. That's
done at the end of the rebase (when `rebase_next()` starts returning
`None`), which is a little weird. We should probably change the
interface, but this will do for now.
With this change, we should no longer need to remove hidden heads when
the transaction commits. That will remove one of the last bits of
dependence on evolution from most commands (#32).
Now that we no longer have to be careful whether we mean "all heads"
or "non-obsolete heads", there's no need to pass them as
arguments. It's still possible to get a DAG range to a hidden commit
by using `RevsetExpression::dag_range_to()`, as long as the hidden
commit is indexed.
Now that we remove hidden heads whenever a transaction commits,
`non_obsolete_heads()` should always be the same as `all_heads()`,
except during a transaction. I don't think we depend on the difference
even during a transaction. Let's simplify a bit by removing the revset
function `all_heads()` and renaming `non_obsolete_heads()` to
`heads()`. This is part of issue #32.
Branches and the working copy are currently updated to the parents of
the old commit (if they pointed to the old commit to start
with). That's not what's supposed to happen; they're supposed to be
updated to point to the new commit. This patch fixes that by manually
rebasing the immediate children of the old commit, so that
`MutRepo::create_descendant_rebaser()` will do the right thing.