This adds `MutableRepo::merge()`, which applies the difference between
two `ReadonRepo`s to itself. That results in much simpler code than
the current code in `merge_op_heads()`. It also lets us write `undo`
using the new function. Finally -- and this is the actual reason I did
it now -- it prepares for using the index when enforcing view
invariants.
It's sometimes useful to create a `RepoLoader` given an existing
`ReadonlyRepo`. We already do that in `ReadonlyRepo::reload()`. This
patch repurposes `ReadonlyRepo::reload()` for that.
I think the `as_` prefix of `as_repo_mut()` makes it sound like it
returns a view of the `Transaction`, but the `MutableRepo` is actually
a part of it. Also, the convention seems to be to put the `mut_` in
the name first if the function returns a name with a matching name
(like `MutableRepo` does).
This is yet another step towards making the `View` types
simpler. Perhaps we eventually won't need to wrap the types returned
from the `OpStore` at all.
This continues the work to make the `View` types be only about the
state of the current view and not about operations in general (which
has been moved out `OpStore` and qOpHeadsStore`).
This is more code, but I think it's clearer because the code for
removing the ancestors from the set of parents and from disk is now
close. I hope this will also help prepare for some further changes.
It can be useful to write an operation to the `OpStore` without also
making it visible when you load the repo. I had planned to add that
functionality at least for hooks, so the hooks can be run commands
with `jj --at-op=<operation>` and decide whether to publish the
operation. However, the immediate goal is to let us rewrite
`op_heads_store::merge_op_heads()` to use the usual `Transaction`
API. That needs to be able to just write the operation without
publishing it, since the publishing step takes a long, which
`op_heads_store::merge_op_heads()` (its caller, actually) has already
taken.
I'd like to make `ReadonlyView` and `MutableView` focused on just the
state of the view (i.e. the set of heads, git refs, etc.). The
responsibility for managing the `.jj/view/op_heads/` directory should
be moved out of it. This prepares for that.
Unlike in `Transaction::commit()`, in the `view` module, we actually
don't update the `.jj/view/op_heads/` directory until after we've
recorded the index associated with the operation, so there's no race
there.
The methods are now only called from within the type. Inlining means
that the borrow checker will let us borrow these separate fields
concurrently. We'll take advantage of that soon.
I think the `Option<>` wrapping was from the time when `MutableView`
had a reference back to the repo (and `MutableIndex` was probably
wrapped out of habit).
Most methods on `Transaction` only need the `MutableRepo`, so it makes
for that functionality to be on the latter. That will let us update
the methods to also update the index, which would otherwise have been
harder because it would require a mutable borrow of both the view and
the index. This patch makes most current methods on `Transaction` just
delegate to `MutableRepo`. We may want to remove some of these
delegating methods later.
Updating the index on disk means that reader won't have to calculate
the state. Updating it in memory means that we can take advantage of
it while resolving conflicts. We will do that soon.
This patch introduces a new `IndexStore` struct. The idea is that it
will know about the directory in which the index files are stored, the
associations with operations. It may also cache `Arc<ReadonlyIndex>`
instances so if multiple `ReadonlyIndex` instances are loaded, they
can be returned from the cache. That may be useful when merging
operations because the operations are likely to share a large parent
index file. For now, however, all the new type has is `init()`,
`load()`, and `reinit()`.
The only way to load the repo at a current operation (as with
`--at-op`) is currently to first load it at the head operation and
then call `reload()` on the repo. This patch makes it so we can load
the repo directly at the requested operation.
We'll want to be able to load the repo at a given operation without
first loading the head operation as we do today. This patch introduces
a struct for keeping the state of a half-loaded repo. In that
half-loaded state, the store and the op-store have been loaded, but
the view has not yet been loaded. That makes it possible for callers
to use the loaded op-store for looking up an operation to load the
view at.
We want to support loading the repo at a specific operation without
first loading the head operation (like we currently do). One reason
for that is of course efficiency. A possibly more important reason is
that the head operation may be conflicted, depending on how we decide
to deal with operation-level conflicts. In order to do that, it makes
sense to move the creation of the `OpStore` outside of the
`View`. That also suggests that the `.jj/view/op_store/` directory
should move to `.jj/op_store/`, so this patch also does that. That's
consistent with how `.jj/store/` is outside of `.jj/working_copy/`.
All the information needed for calculating the evolution state is now
in the index, so let's use it. This speeds up calculation of the
evolution state from 1.53s to 150ms in the git.git repo. In the Linux
repo, it was sped up from 28.9s to 3.07s. That's still unbearably slow
(and still pretty slow in the git.git repo too). We may need to keep a
persistent cache of the evolution state, but that will have to come
later; this improvement is good enough for now.
Evolution needs to have fast access to the predecessors. This change
adds that information to the commit index.
Evolution also needs fast access to the change id and the bit saying
whether a commit is pruned. We'll add those soon.
Some tests changed because they previously added commits with
predecessors that were not indexed, which is no longer allowed from
this change. (We'll probably eventually want to allow that again, so
that the user can prune predecessors they no longer care about from
the repo.)
The index is now always kept up to date and it has functionality for
finding common ancestors, so let's use it! This should make merging
commits a little faster if their common ancestor is far away (which is
rare). It's probably much more important that the index-based
algorithm is more correct. Also, it returns multiple common ancestors
in the criss-cross case, which lets us do a recursive merge like git
does. I'm leaving the recursive merge for later, though.
We currently need to read the commit objects for finding common
ancestors. That can be very slow when the common ancestor is far back
in history. This patch adds a function for finding common ancestors
using the index instead.
Unlike the current algorithm, which only returns one common ancestor,
the new index-based one correctly handles criss-cross merges.
Here are some timings for finding the common ancestors in the git.git
repo:
| Without index | With Index |
| First run | Subsequent | First run | Subsequent |
v2.30.0-rc0 v2.30.0-rc1 | 5.68 ms | 5.94 us | 40.3 us | 4.77 us |
v2.25.4 v2.26.1 | 1.75 ms | 1.42 us | 13.8 ms | 4.29 ms |
v1.0.0 v2.0.0 | 492 ms | 2.79 ms | 23.4 ms | 6.41 ms |
Finding ancestors of v2.25.4 and v2.26.1 got much slower because the
new algorithm finds all common ancestors. Therefore, it also finds
v2.24.2, v2.23.2, v2.22.3, v2.21.2, v2.20.3, v2.19.4, v2.18.3, and
v2.17.4, which it then filters out because they're all ancestors of
v2.25.3.
Also note that the result was incorrect before, because the old
algorithm would return as soon as it had found a common ancestor, even
if it's not the latest common ancestor. For example, for the common
ancestor between v1.0.0 and v2.0.0, it returned an ancestor of v1.0.0
because it happened to get there by following some side branch that
led there more quickly.
The only place we currently need to find the common ancestor is when
merging trees, which we only do when the user runs `jj merge`, as well
as when operating on existing merge commits (e.g. to diff or rebase
them). That means that this change won't be very noticeable. However,
it's something we clearly want to do sooner or later, so we might as
well get it done.
It's nice to have a non-random order for tests (we can revisit later
if it shows up in profiling). I'm changing the order to be the index
order so the future caller of `heads_pos()` (not `heads()`) will also
get consistent order.
We currently write a new incremental index file every time. That means
that the stack of index files quickly gets deep, which makes it slow
to read the index. This commit makes it so that we squash the new
index segment into its parent if the parent has fewer commits. That
means we'll limit the number of files to O(log n). Writes time will
also be O(log n) on average.
I've confused myself a few times already thinking that level 0 is the
root, so that's probably more intuitive. It also makes tests simpler
because the initial part of the list is unchanged when a new
transaction commits.
With this change, we start writing the incremental index to disk, so
the next reader won't have to re-read the commits and create the
index.
As of this change, we simply write a new index file for each
transaction. That will clearly mean that the stack of files gets deep
pretty quickly. For now, the user will have to do `jj debug reindex`
when things get slow. I plan to change it so instead of writing an
incremental index file every time, we first check if the new index
file would have at least as many commits as the parent file, and if it
will, we write a combined one instead. That should apply recursively,
so we'd have O(log n) index files.
The check for adding an existing commit to the index only checked if
the commit was already in the `MutableIndex`, not if it was already in
the parent `ReadonlyIndex`.
I don't know why I made the walk stop at heads instead of indexed
commits before. Perhaps I did it because it's cheap to check in the
set of head. However, it gets very expensive to walk all the way back
to the root if the parents are not in the set of heads.
With tons of groundwork done, wee can now finally keep the index up to
date within a transaction! That means that we can start relying on the
index to always be valid, so we can use it e.g. for finding common
ancestors within a transaction. That should help speed up `jj evolve`
immensely on large repos.
We still don't write the updated index to disk when the transaction
closes. That will come later.
`Transaction::add_head()` currently invalidates the whole evolution
state. We've had support for incrementally updating evolution since
4619942a57. We should start taking advantage of that. Let's add a
fast-path in `Transaction::add_head()` for the common case where we
add a single commit on top of an existing head. That cheap an simple
to check for. However, it won't cover the case of adding a child off
of a non-head. It's still a good start.
I'm about to move the functions from `CompositeIndex` to an new
`Index` trait implemeted by `ReadonlyIndex` and `MutableIndex`. Since
those types already implement `IndexSegment`, the names would conflict
and it would get annoying to have to disambiguate them. This commit
therefore prepares for that by adding a `segment_` prefix to the
functions in `IndexSegment`.
I want to keep the index updated within the transaction. I tried doing
that by adding a `trait Index`, implemented by `ReadonlyIndex` and
`MutableIndex`. However, `ReadonlyRepo::index` is of type
`Mutex<Option<Arc<IndexFile>>>` (because it is lazily initialized),
and we cannot get a `&dyn Index` that lives long enough to be returned
from a `Repo::index()` from that. It seems the best solution is to
instead create an `Index` enum (instead of a trait), with one readonly
and one mutable variant. This commit starts the migration to that
design by replacing the `Repo` trait by an enum. I never intended for
there there to be more implementations of `Repo` than `ReadonlyRepo`
and `MutableRepo` anyway.
I just learned that attaching a git note is not enough to keep a
commit from being GC'd. I had read `git help gc` before but it was
quite misleading (I just sent a patch to clarify it). Since the git
note is not enough, we need to create some other reference. This patch
makes it so we write refs in `refs/jj/keep/` for every commit we
create. We will probably want to remove unnecessary refs (ancestors of
commits pointed to by other refs) once we have a `jj gc` command.
We store conflicts as blobs with JSON data and with a git note
pointing to them to prevent GC. These are stored in the git tree as
regular files. The only thing that distinguishes them is that their
filename ends with `.jjconflict`. Since they are referenced from the
tree, there's no need for the git note to prevent GC (which doesn't
work anyway, as I just learned), and we don't store any additional
data in the note either, so let's just remove it.
Windows doesn't support recording the executable bit in the file
system. Before this commit, the code for reading and writing the
executable wouldn't even compile on Windows. This commit at least
makes it so we preserve whatever bit has been recorded in the repo.
At least I hope that's what it does -- I don't have access to a
Windows machine right now.
The project doesn't currently build on Windows. One reason is because
we had a `unimplemented!()` when trying to write a symlink. Let's
print a warning instead, so the project can start building on
Windows. (The next patch will fix another build problem on Windows.)
When a transaction gets dropped without being committed or explicitly
discarded, we currently raise an assertion error. I added that check
because I kept forgetting to commit transactions. However, it's quite
normal to want to drop transactions in error cases. The current
assertion means that we panic and don't report the actual error to the
user in such cases. We should probably audit the code paths where we
commit transactions and decide for each if we simply want to to
discard the transaction or not. In some cases, we may want to commit
the transaction without integrating it in the operation log
(i.e. without creating a file entry in .jj/views/op_heads/). However,
we can do that later. For now, let's just make sure we don't panic
when dropping the transaction in release builds.
I'm about to move `MutableRepo` to the `repo` module so it becomes
more important to encapsulate access. Besides, the new functions
introduced in this commit reduces some duplication.
There's still one access of `MutableRepo::evolution` in
`Transaction::new()`. I'll address that next by adding a factory
function to `MutableRepo`.
I've been confused twice that rebasing an open commit so it results in
conflicts doesn't show the conflicts in the log output. That's because
we create a successor instead if a commit with conflicts is open. I
guess I thought it would be expected that a child commit was not
created. Since it seems surprising in practice, let's change it and
we'll see if the new behavior is more or less surprising.
The most annoying remaining bug is that 3-way merge frequently panics
with "unhandled merge case". This commit fixes that by rewriting the
merge code. The new code is based on the algorithm used in Mercurial
(which was in turn copied from Bazaar):
1. Find "sync" regions, which are regions that are the unchanged in
the base and two sides. Note their start end end positions in each
version.
2. Produce the output by taking the sync regions and inserting the
result of merging the regions between the sync regions. These
regions can either be changed on only one side, in which case we
use that version, or it can be changed on both sides, in which
case we indicate a conflict in the output.
It's both more correct and much easier to follow.
We want to be able to be able to do fast `.contains()` checks on the
result, so `Iterator` was a bad type. We probably should hide the
exact type (currently `HashSet` for both readonly and mutable views),
but we can do that later. I actually thought I'd want to use
`.contains()` for indiciting public-phase commits in the log output,
but of course want to also indicate ancestors as public. This still
seem like a step (mostly) in the right direction.
Mercurial's "phase" concept is important for evolution, and it's also
useful for filtering out uninteresting commits from log
output. Commits are typically marked "public" when they are pushed to
a remote. The CLI prevents public commits from being rewritten. Public
commits cannot be obsolete (even if they have a successor, they won't
be considered obsolete like non-public commits would).
This commits just makes space for tracking the public heads in the
View.
All commits in the view are supposed to be reachable from its
heads. If a head is removed and there are git refs pointing to
ancestors of it (or to the removed head itself), we should make that
ancestor a head.
The only invariant we currently enforce is that the set of heads does
not include any ancestors of other commits in the set. I'm about to
make sure that we don't end up with dangling git refs (pointing to
commits no reachable from the heads). It will be useful to have a
single place to enforce that since we'll need to do the same thing
after updating the view as after merging views.
I think it's better to let the caller decide if the parents should be
added. One use case for removing a head is when fetching from a Git
remote where a branch has been rewritten. In that case, it's probably
the best user experience to remove the old head. With the current
semantics of `View::remove_head()`, we would need to walk up the graph
to find a commit that's an ancestor and for each commit we remove as
head, its parents get temporarily added as heads. It's much easier for
callers that want to add the parents as heads to do that.