This adds a `git_refs()` revset that includes all commits pointed to
by a git ref. It's not very useful yet because the graph log doesn't
use the right type of edges for non-contiguous commits.
Merging is currently done with line-level granularity, so it makes
sense to have newlines after the markers. That makes them easier to
edit out when resolving conflicts.
This lets you use the same operator as we currently have for ancestors
and descendants (`,,`) to also specify a DAG range. That's what
Mercurial uses the `::` operator for and what Git has `git log
--ancestry-path` for.
It seems clearer to let the parsed `RevsetExpression`s have only root
and head expression instead of adding the ancestors when building the
expression tree.
I really liked the idea of having the operators for parents and
ancestors (etc.) look similar, but that turned out to be problematic
when we want to add an infix operator for a DAG range (hg's `::`
revset operator and git's `--ancestry-path` flag). Let's say we chose
`:*:` as the operator. Part of the problem is how to parse `foo:*:bar`
without eagerly parsing the `foo:`. It would also be nicer to use
exactly the same operator as prefix, postfix, and infix. Since the
"parents" operator can be repeated, we can't have it be just `:` and
the "ancestors" operator be `::`. We could make the "ancestors"
operator be something like `*:*` (or anything symmetric with the `:`
symbol on the inside). However, at that point, the operator is getting
ugly and hard to type. Another option would be to use `:` for
ancestors and `::` for parents, but that is counterintuitive and get
annoying if you want to repeat it. So it seems that the best option is
to simply pick different symbols for parents/children and
ancestors/descendants/range.
This patch changes the ancestors/descendants operators to both be
`,,`. I'm not at all attached to that particular symbol. I suspect
we'll change it later.
Now that expressions may contain literal strings, we can simply have
functions accept only expressions arguments. That simplifies both the
grammar and the code.
A small drawback is that `description((foo), bar)` is now allowed and
does a search for the string "foo" (not "(foo)"). That seems
unlikely to trip up users.
Git refs with names containing e.g "-" are currently not accepted
symbol names, and I don't plan to change the grammar to accept
them. Instead, let's have the user quote symbol names containing
unusual characters. That way we can keep these symbols reserved for
revset operators.
With this patch the user can do e.g. `jj diff -r '"v2.9.0-rc2"'`.
This adds `children(<set>)` and `<set>:` for the children of the given
set, and `descendants(<set>)` and `<set>:*` for the descendants of the
given set. The children and descendants are filtered to be among
ancestors of non-obsolete commits. I haven't added a way of overriding
that yet.
This is especially important now that we leak the rule names into the
`SyntaxError` message. For example, the error message when doing `jj
diff -r :` will now mention "expected parents_op, ancestors_op, or
primary". It seems much clearer with the "_op" suffixes there. Longer
term, we should think more about how we can best surface syntax errors
from the library crate.
The tests don't need any complex set up (no repo necessary), so they
can be in the `revset` module itself. I'm sure we'll need to split up
that module later (at least separate out the parsing), but that's a
separate problem.
I don't know why I made it walk by generation number to start
with. Walking by position is better in at least two ways: 1) revsets
now depend on the walks to be by descending index position (though
they could equally well depend on the walks to be by generation number
-- it just needs to be consistent), and 2) the log output gets less
interleaved.
This commit makes the number of bytes in the graphlog output in the
git.git repo drop by ~40% due to the reduced amount of
interleaving. Also, it reduces the time of `jj bench walkrevs v1.0.0
v2.0.0` in the git.git repo by 32% (9.4ms -> 6.4ms) and `jj bench
walkrevs v2.0.0 v1.0.0` by 33% (7.7ms -> 5.1ms).
This change adds a `non_obsolete_heads(<set>)` revset, which walks up
ancestors of the input set until it gets to a non-obsolete and
non-pruned commit. That's what we do by default in `jj log`
(i.e. without `--all`). Now we can make `jj log` use revsets and teach
it a `-r` option!
This adds `parents(foo)` and `ancestors(foo)` as alternative ways of
writing `:foo` and `*:foo`.
I haven't added support for for whitespace yet; the parsing is very
strict. The error messages will also need to be improved later.
This patch adds initial support for a DSL for specifying revisions
inspired by Mercurial's "revset" language. The initial support
includes prefix operators ":" (parents) and "*:" (ancestors) with
naive parsing of the revsets. Mercurial uses postfix operator "^" for
parent 1 just like Git does. It uses prefix operator "::" for
ancestors and the same operator as postfix operator for descendants. I
did it differently because I like the idea of using the same operator
as prefix/postfix depending on desired direction, so I wanted to apply
that to parents/children as well (and for
predecessors/successors). The "*" in the "*:" operator is copied from
regular expression syntax. Let's see how it works out. This is an
experimental VCS, after all.
I've updated the CLI to use the new revset support.
The implementation feels a little messy, but you have to start
somewhere...
This actually seems to make it slightly slower, but it fixes an
important bug (we used to evolve only one topological branch per `jj
evolve` call). The slowdown seemed to be on the order of 5% when
evolving 100 commits on git.git's "what's cooking" branch.
I suspect that at least one reason that I didn't make
`MutableRepo::base_repo` by an `Arc<ReadonlyRepo>` before was that I
thought that that would mean that `start_transaction()` would need be
moved off of `ReadonlyRepo` so it can be given an
`&Arc<ReadonlyRepo>`, which would make it much less convenient to
use. It turns out that a `self` argument can actually be of type
`&Arc<ReadonlyRepo>`.
See test case for details.
Before:
test bench_diff_10k_lines_reversed ... bench: 36,249,659 ns/iter (+/- 174,455)
test bench_diff_10k_modified_lines ... bench: 37,258,890 ns/iter (+/- 803,963)
test bench_diff_10k_unchanged_lines ... bench: 4,252 ns/iter (+/- 69)
test bench_diff_1k_lines_reversed ... bench: 982,834 ns/iter (+/- 6,467)
test bench_diff_1k_modified_lines ... bench: 3,343,469 ns/iter (+/- 23,243)
test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench: 95,559 ns/iter (+/- 816)
After:
test bench_diff_10k_lines_reversed ... bench: 36,186,715 ns/iter (+/- 196,903)
test bench_diff_10k_modified_lines ... bench: 37,511,000 ns/iter (+/- 1,370,476)
test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 8)
test bench_diff_1k_lines_reversed ... bench: 986,010 ns/iter (+/- 11,565)
test bench_diff_1k_modified_lines ... bench: 3,370,938 ns/iter (+/- 17,041)
test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 2)
test bench_diff_git_git_read_tree_c ... bench: 102,189 ns/iter (+/- 1,052)
So this patch makes diffing even slower (but still easily fast enough
for all cases I've run into in real life). There's probably a lot that
can be done to make things faster, but the first priority is that the
diffs are correct and easy to read.
This is yet another step towards making it easy to propagate
`BrokenPipe` errors. The `jj diff` code (naturally) diffs two trees
and prints the diffs. If the printing fails, we shouldn't just crash
like we do today.
The new code is probably slower since it does more copying (the
callback got references to the `FileRepoPath` and `TreeValue`). I hope
that won't make a noticeable difference. At least `jj diff -r
334afbc76fbd --summary` didn't seem to get measurably slower.
The iterator version is easier to use and we get rid of the ugly type
parameter for the error type. I also simplified the code by using
`Peekable` iterators.
The new diff algorithm produces pretty bad diffs in some cases, such
as cc4b1e9230 in this repo (the parent of this commit). I think the
problem there is that many words are repeated over and over. Diffing
first at the line level and then refining the diff of the changed
ranges at the word level gives much better results. That's what this
patch does. After this patch, `jj diff -r cc4b1e923091` looks pretty
similar to the diff in GitHub's UI.
I hope to get around to doing the same for the merge code soon.
Impact on benchmarks:
Before:
test bench_diff_10k_lines_reversed ... bench: 42,647,532 ns/iter (+/- 765,347)
test bench_diff_10k_modified_lines ... bench: 21,407,980 ns/iter (+/- 126,366)
test bench_diff_10k_unchanged_lines ... bench: 4,235 ns/iter (+/- 16)
test bench_diff_1k_lines_reversed ... bench: 1,190,483 ns/iter (+/- 7,192)
test bench_diff_1k_modified_lines ... bench: 1,919,766 ns/iter (+/- 9,665)
test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench: 174,702 ns/iter (+/- 1,199)
After:
test bench_diff_10k_lines_reversed ... bench: 38,289,509 ns/iter (+/- 129,004)
test bench_diff_10k_modified_lines ... bench: 33,140,659 ns/iter (+/- 3,989,339)
test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 14)
test bench_diff_1k_lines_reversed ... bench: 973,551 ns/iter (+/- 94,895)
test bench_diff_1k_modified_lines ... bench: 3,033,818 ns/iter (+/- 29,513)
test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 1)
test bench_diff_git_git_read_tree_c ... bench: 79,100 ns/iter (+/- 963)
So most of them get slower, as expected. The last one, taken from a
real diff in the git.git repo, get faster, however (which is also what
I would have expected).
I made a quite late change in a recent patch to make the merge code to
merge based on lines instead of words. I forgot to update the tests
(and to even run them). Sorry :(
The previous patch switched over the content-merge code to use the new
histogram diff code. This patch switches over the content-diff code to
use the histogram diff code. As before, the immediate goal is to speed
it up. `jj diff -r c28ded83fc` in the git.git repo is a good example
of a diff that's extremely slow to calculate with our current
LCS-based diff. With this patch, that drops from 35 s to 0.12 s.
The diff was slightly better before. I think that's mostly because of
our different definition of a "word" in the data. We can improve that
later. The speedup we get now is easily worth the slightly worse diff.
With the histogram diff code from the previous patch, we can now start
using that for finding the "sync regions" in 3-way merge. That helps a
lot with the slow merging we had before this patch. `jj diff -r
9d540e9726` in the git.git repo drops from 22 s to 0.15 s with this
patch. (That commit is a rather arbitrary merge commit from aroun 5
years ago.)
With the new diff algorithm, the output of `jj diff -r 9d540e9726` in
git.git looks better if we find unchanged sync regions based on lines
than on words, so that's what I'm using in this patch. That's a change
compared the the LCS-based diff we used before this patch. I suspect
the reason that finding sync regions based on words works worse now is
not because of the change from LCS to histogram but because of the
change in how we define a word. My goal right now is mostly to make it
faster; I'll get back to refining the diff result later.
The current diff algorithm does a full LCS on the words of the texts,
which is really slow. Diffing the working copy when e.g.
`src/commands.py` has changes far apart takes seconds. This patch adds
an implementation inspired by JGit's Histogram diff. I say "inspired"
because I just didn't quite understand it :P In particular, I didn't
understand what it does when it finds non-unique elements. I decided
to line up the leading common elements on both sides of the merge. I
don't know if that usually gives good enough results in practice.
I'm sure this can still be optimized a lot, but this seems good enough
as a start. There is also many things to improve about the quality of
the diffs.
I just changed my `~/.gitignore` and some tests started failing
because the working copy respects the user's `~/.gitignore`. We should
probably not depend on `$HOME` in the library crate. For now, this
patch just makes sure we set it to an arbitrary directory in the tests
where it matters.
`test_commit_parallel` was failing on Mac in the GitHub CI. I suspect
the reason was that it was timing out. The test runs in about 1 s on
my Linux desktop and in about 3 s on my Mac laptop. It failed after 31
in the GitHub CI. This patch increases the timeout to 1 minute to try
to make the test pass. It would be better to set the timeout to a
higher value only in tests, but this will be good enough for now. By
the way, it has turned out that git notes (at least libgit2's
implementation of them) are too slow, so we should probably eventually
create our own storage for the extra metadata instead.
I only noticed that there was a newer version when running `cargo
install --path .`, which resulted in warnings about deprecated
functions. There's no other reason I'm aware of to upgrade now.
We can now finally use the commit index for filtering out ancestors
from the sets of heads.
I haven't timed the change from most of the recent work on
performance, but I did a measurement after this commit. I modified a
commit in the git.git repo's "what's cooking" branch (because that's
linear). Then I ran `jj evolve` so the 100 commits after it would get
evolved. That took ~700ms. `git rebase` of the same 100 commits took
~6s.
I also compared `jj op undo` of that `jj evolve` operation. With this
patch, that was sped up from ~6.8s to ~125ms.
`MutableRepo` has more information needed for taking fast-paths, and
it will have to make the same decision for doing incremental updates
of the evolution state anyway.
This patch continues the work from the previous pathc. From this
patch, we no longer calculate the evolution state just because a
transaction starts. We still unnecessarily calculate it when adding a
commit within the transaction, however. I'll fix that next.
This patch changes it so that `ReadonlyEvolution` does not lazily
calculate its state and the caller, i.e. `ReadonlyRepo`, is instead
responsible for the laziness. That will allow the caller to make
decisions based on whether the state has been
calculated. Specifically, we don't want to calculate the evolution
state in order to update it incrementally if it hasn't already been
calculated. It's better to just leave it uncalculated in that case.
As a result of moving the laziness out of `ReadonlyEvolution`, we also
don't need to the reference to `ReadonlyRepo` anymore, which
simplifies things a bunch. The next patch will continue by making the
corresponding change to `MutableEvolution`, which will let us simplify
even more.
TreeState::write_tree leaves a ".git" file in the working copy. This is
undesirable but more problematic on Windows - The second time
TreeState::write_tree would panic because Repository::init_opts will fail
with a Permission Denied error.
This seems to be a libgit2 defect. But for now let's just remove ".git"
automatically. This makes `cargo test --test smoke_test` pass on Windows.
We now only call the function from `MutableRepo::merge()`. There we
pass the result to `MutableView::set_view()`, which already enforces
the invariants.
This adds `MutableRepo::merge()`, which applies the difference between
two `ReadonRepo`s to itself. That results in much simpler code than
the current code in `merge_op_heads()`. It also lets us write `undo`
using the new function. Finally -- and this is the actual reason I did
it now -- it prepares for using the index when enforcing view
invariants.
It's sometimes useful to create a `RepoLoader` given an existing
`ReadonlyRepo`. We already do that in `ReadonlyRepo::reload()`. This
patch repurposes `ReadonlyRepo::reload()` for that.
I think the `as_` prefix of `as_repo_mut()` makes it sound like it
returns a view of the `Transaction`, but the `MutableRepo` is actually
a part of it. Also, the convention seems to be to put the `mut_` in
the name first if the function returns a name with a matching name
(like `MutableRepo` does).
This is yet another step towards making the `View` types
simpler. Perhaps we eventually won't need to wrap the types returned
from the `OpStore` at all.
This continues the work to make the `View` types be only about the
state of the current view and not about operations in general (which
has been moved out `OpStore` and qOpHeadsStore`).
This is more code, but I think it's clearer because the code for
removing the ancestors from the set of parents and from disk is now
close. I hope this will also help prepare for some further changes.
It can be useful to write an operation to the `OpStore` without also
making it visible when you load the repo. I had planned to add that
functionality at least for hooks, so the hooks can be run commands
with `jj --at-op=<operation>` and decide whether to publish the
operation. However, the immediate goal is to let us rewrite
`op_heads_store::merge_op_heads()` to use the usual `Transaction`
API. That needs to be able to just write the operation without
publishing it, since the publishing step takes a long, which
`op_heads_store::merge_op_heads()` (its caller, actually) has already
taken.
I'd like to make `ReadonlyView` and `MutableView` focused on just the
state of the view (i.e. the set of heads, git refs, etc.). The
responsibility for managing the `.jj/view/op_heads/` directory should
be moved out of it. This prepares for that.
Unlike in `Transaction::commit()`, in the `view` module, we actually
don't update the `.jj/view/op_heads/` directory until after we've
recorded the index associated with the operation, so there's no race
there.
The methods are now only called from within the type. Inlining means
that the borrow checker will let us borrow these separate fields
concurrently. We'll take advantage of that soon.
I think the `Option<>` wrapping was from the time when `MutableView`
had a reference back to the repo (and `MutableIndex` was probably
wrapped out of habit).
Most methods on `Transaction` only need the `MutableRepo`, so it makes
for that functionality to be on the latter. That will let us update
the methods to also update the index, which would otherwise have been
harder because it would require a mutable borrow of both the view and
the index. This patch makes most current methods on `Transaction` just
delegate to `MutableRepo`. We may want to remove some of these
delegating methods later.
Updating the index on disk means that reader won't have to calculate
the state. Updating it in memory means that we can take advantage of
it while resolving conflicts. We will do that soon.
This patch introduces a new `IndexStore` struct. The idea is that it
will know about the directory in which the index files are stored, the
associations with operations. It may also cache `Arc<ReadonlyIndex>`
instances so if multiple `ReadonlyIndex` instances are loaded, they
can be returned from the cache. That may be useful when merging
operations because the operations are likely to share a large parent
index file. For now, however, all the new type has is `init()`,
`load()`, and `reinit()`.
The only way to load the repo at a current operation (as with
`--at-op`) is currently to first load it at the head operation and
then call `reload()` on the repo. This patch makes it so we can load
the repo directly at the requested operation.
We'll want to be able to load the repo at a given operation without
first loading the head operation as we do today. This patch introduces
a struct for keeping the state of a half-loaded repo. In that
half-loaded state, the store and the op-store have been loaded, but
the view has not yet been loaded. That makes it possible for callers
to use the loaded op-store for looking up an operation to load the
view at.
We want to support loading the repo at a specific operation without
first loading the head operation (like we currently do). One reason
for that is of course efficiency. A possibly more important reason is
that the head operation may be conflicted, depending on how we decide
to deal with operation-level conflicts. In order to do that, it makes
sense to move the creation of the `OpStore` outside of the
`View`. That also suggests that the `.jj/view/op_store/` directory
should move to `.jj/op_store/`, so this patch also does that. That's
consistent with how `.jj/store/` is outside of `.jj/working_copy/`.
All the information needed for calculating the evolution state is now
in the index, so let's use it. This speeds up calculation of the
evolution state from 1.53s to 150ms in the git.git repo. In the Linux
repo, it was sped up from 28.9s to 3.07s. That's still unbearably slow
(and still pretty slow in the git.git repo too). We may need to keep a
persistent cache of the evolution state, but that will have to come
later; this improvement is good enough for now.
Evolution needs to have fast access to the predecessors. This change
adds that information to the commit index.
Evolution also needs fast access to the change id and the bit saying
whether a commit is pruned. We'll add those soon.
Some tests changed because they previously added commits with
predecessors that were not indexed, which is no longer allowed from
this change. (We'll probably eventually want to allow that again, so
that the user can prune predecessors they no longer care about from
the repo.)
The index is now always kept up to date and it has functionality for
finding common ancestors, so let's use it! This should make merging
commits a little faster if their common ancestor is far away (which is
rare). It's probably much more important that the index-based
algorithm is more correct. Also, it returns multiple common ancestors
in the criss-cross case, which lets us do a recursive merge like git
does. I'm leaving the recursive merge for later, though.
We currently need to read the commit objects for finding common
ancestors. That can be very slow when the common ancestor is far back
in history. This patch adds a function for finding common ancestors
using the index instead.
Unlike the current algorithm, which only returns one common ancestor,
the new index-based one correctly handles criss-cross merges.
Here are some timings for finding the common ancestors in the git.git
repo:
| Without index | With Index |
| First run | Subsequent | First run | Subsequent |
v2.30.0-rc0 v2.30.0-rc1 | 5.68 ms | 5.94 us | 40.3 us | 4.77 us |
v2.25.4 v2.26.1 | 1.75 ms | 1.42 us | 13.8 ms | 4.29 ms |
v1.0.0 v2.0.0 | 492 ms | 2.79 ms | 23.4 ms | 6.41 ms |
Finding ancestors of v2.25.4 and v2.26.1 got much slower because the
new algorithm finds all common ancestors. Therefore, it also finds
v2.24.2, v2.23.2, v2.22.3, v2.21.2, v2.20.3, v2.19.4, v2.18.3, and
v2.17.4, which it then filters out because they're all ancestors of
v2.25.3.
Also note that the result was incorrect before, because the old
algorithm would return as soon as it had found a common ancestor, even
if it's not the latest common ancestor. For example, for the common
ancestor between v1.0.0 and v2.0.0, it returned an ancestor of v1.0.0
because it happened to get there by following some side branch that
led there more quickly.
The only place we currently need to find the common ancestor is when
merging trees, which we only do when the user runs `jj merge`, as well
as when operating on existing merge commits (e.g. to diff or rebase
them). That means that this change won't be very noticeable. However,
it's something we clearly want to do sooner or later, so we might as
well get it done.
It's nice to have a non-random order for tests (we can revisit later
if it shows up in profiling). I'm changing the order to be the index
order so the future caller of `heads_pos()` (not `heads()`) will also
get consistent order.