When an operation is missing and we recover the workspace, we create a
new working-copy commit on top of the desired working-copy commit (per
the available head operation). We then reset the working copy to an
empty tree because it shouldn't really matter much which commit we
reset to. However, when the workspace is sparse, it does matter, as
the test case from the previous patch shows. This patch fixes it by
replacing the `reset_to_empty()` method by a new `recover(&Commit)`,
which effectively resets to the empty tree and then resets to the
commit. That way, any subsequent snapshotting will result keep the
paths from that tree for paths outside the sparse patterns.
If the operation corresponding to a workspace is missing for some reason
(the specific situation in the test in this commit is that an operation
was abandoned and garbage-collected from another workspace), currently,
jj fails with a 255 error code. Teach jj a way to recover from this
situation.
When jj detects such a situation, it prints a message and stops
operation, similar to when a workspace is stale. The message tells the
user what command to run.
When that command is run, jj loads the repo at the @ operation (instead
of the operation of the workspace), creates a new commit on the @
commit with an empty tree, and then proceeds as usual - in particular,
including the auto-snapshotting of the working tree, which creates
another commit that obsoletes the newly created commit.
There are several design points I considered.
1) Whether the recovery should be automatic, or (as in this commit)
manual in that the user should be prompted to run a command. The user
might prefer to recover in another way (e.g. by simply deleting the
workspace) and this situation is (hopefully) rare enough that I think
it's better to prompt the user.
2) Which command the user should be prompted to run (and thus, which
command should be taught to perform the recovery). I chose "workspace
update-stale" because the circumstances are very similar to it: it's
symptom is that the regular jj operation is blocked somewhere at the
beginning, and "workspace update-stale" already does some special work
before the blockage (this commit adds more of such special work). But it
might be better for something more explicitly named, or even a sequence
of commands (e.g. "create a new operation that becomes @ that no
workspace points to", "low-level command that makes a workspace point to
the operation @") but I can see how this can be unnecessarily confusing
for the user.
3) How we recover. I can think of several ways:
a) Always create a commit, and allow the automatic snapshotting to
create another commit that obsoletes this commit.
b) Create a commit but somehow teach the automatic snapshotting to
replace the created commit in-place (so it has no predecessor, as viewed
in "obslog").
c) Do either a) or b), with the added improvement that if there is no
diff between the newly created commit and the former @, to behave as if
no new commit was created (@ remains as the former @).
I chose a) since it was the simplest and most easily reasoned about,
which I think is the best way to go when recovering from a rare
situation.
Our virtual file system at Google (CitC) would like to know the commit
so it can scan backwards and find the closest mainline tree based on
it. Since we always record an operation id (which resolves to a
working-copy commit) when we write the working-copy state, it doesn't
seem like a restriction to require a commit.
GitBackend::gc() will need to check if a commit is reachable from any
historical operations. This could be calculated from the view and commit
objects, but the Index will do a better job.
It seems better to have the caller pass the transaction description
when we finish the transaction than when we start it. That way we have
all the information we want to include more readily available.
This adds an initial `jj util gc` command, which simply calls `git gc`
when using the Git backend. That should already be useful in
non-colocated repos because it's not obvious how to GC (repack) such
repos. In my own jj repo, it shrunk `.jj/repo/store/` from 2.4 GiB to
780 MiB, and `jj log --ignore-working-copy` was sped up from 157 ms to
86 ms.
I haven't added any tests because the functionality depends on having
`git` binary on the PATH, which we don't yet depend on anywhere
else. I think we'll still be able to test much of the future parts of
garbage collection without a `git` binary because the interesting
parts are about manipulating the Git repo before calling `git gc` on
it.
Each instance of the enum represents a single command, so singular
`*Command` seems better. That also seems to match the examples in
clap's documentation.
GitBackend will use it to configure gix::Repository. I think UserSettings
is generally useful to pass store-specific parameters, so I've updated all
factory functions.
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
This avoids https://github.com/rust-lang/futures-rs/issues/2090. I
don't think we need to worry about reading legacy conflicts
asynchronously - async is really only useful for Google's backend
right now, and we don't use the legacy format at Google. In
particular, I don't want `MergedTree::value()` to have to be async.
This add support for custom `jj` binaries to use custom working-copy
backends. It works in the same way as with the other backends, i.e. we
write a `.jj/working_copy/type` file when the working copy is
initialized, and then we let that file control which implementation to
use (see previous commit).
I included an example of a (useless) working-copy implementation. I
hope we can figure out a way to test the examples some day.
`ReadonlyRepo::init()` takes callbacks for initializing each kind of
backend. We called these things like `op_store_initializer`. I found
that confusing because it is not a `OpStoreFactory` (which is for
loading an existing backend). This patch tries to clarify that by
renaming the arguments and adding types for each kind of callback
function.
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:
* Make the backend methods `async`
* Use many threads for reading from the backend
* Add backend methods for batch reading
I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.
I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).
I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).