I'm going to rewrite `TreeDiffIterator` to fetch one level (depth) of
the tree at a time and concurrently. One step towards that is to
convert the iterator to a `Stream`. I'd like to do that by making the
current `Iterator` implementation call the new `Stream`
implementation. However, we can't call `futures::executor::block_on()`
on a future that itself calls `futures::executor::block_on()` (as
`Store::read_tree()` does), so the first step is to bubble up the
async-ness a bit. This patch does that by fetching both sides of the
diff concurrently. That should give close to a 2x speedup on
high-latency backends. (It doesn't help with our backend at Google,
however, because we have a daemon process that does some speculative
prefetching that usually downloads the child trees anyway.)
`futures::stream::Stream::collect()` requires a collection that
implements `Default` and `Extend`, and I would like to to be able to
collect a stream of trees.
The commit backend at Google is cloud-based (and so are the other
backends); it reads and writes commits from/to a server, which stores
them in a database. That makes latency much higher than for disk-based
backends. To reduce the latency, we have a local daemon process that
caches and prefetches objects. There are still many cases where
latency is high, such as when diffing two uncached commits. We can
improve that by changing some of our (jj's) algorithms to read many
objects concurrently from the backend. In the case of tree-diffing, we
can fetch one level (depth) of the tree at a time. There are several
ways of doing that:
* Make the backend methods `async`
* Use many threads for reading from the backend
* Add backend methods for batch reading
I don't think we typically need CPU parallelism, so it's wasteful to
have hundreds of threads running in order to fetch hundreds of objects
in parallel (especially when using a synchronous backend like the Git
backend). Batching would work well for the tree-diffing case, but it's
not as composable as `async`. For example, if we wanted to fetch some
commits at the same time as we were doing a diff, it's hard to see how
to do that with batching. Using async seems like our best bet.
I didn't make the backend interface's write functions async because
writes are already async with the daemon we have at Google. That
daemon will hash the object and immediately return, and then send the
object to the server in the background. I think any cloud-based
solution will need a similar daemon process. However, we may need to
reconsider this if/when jj gets used on a server with a custom backend
that writes directly to a database (i.e. no async daemon in between).
I've tried to measure the performance impact. That's the largest
difference I've been able to measure was on `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo,
which increases from 749 ms to 773 ms (3.3%). In most cases I've
tested, there's no measurable difference. I've tried diffing from the
root commit, as well as `jj --ignore-working-copy log --no-graph -r
'::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a
commit-heavy load).
Summary: Since 066032b6e6 was merged, the `nix flake check` build no longer
overrides the 'cargo test' profile explicitly, to save disk space. The CI seems
to be in a better spot. This will stem the tide for a while hopefully.
However, with that change in place, the `nix flake check` build was
essentially a redundant, nearly-identical copy of a normal `nix build` with no
differentiating features, except: `RUST_BACKTRACE` is set to 1.
Delete all this code, and remove it from the CI matrix, and instead just export
`RUST_BACKTRACE` on the `checkPhase` of the normal `nix build` instead, which is
functionally equivalent.
Also does some minor, no-functional-change touchups to `flake.nix` while I was
there (whitespace, etc.)
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I87336b16e2a0b973343ecbde8ffd7b8f
Summary: Reintroduce 4acdf726 without spurious formatted changes. (This means I
don't have to go back over things with a fine comb.)
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I18ec68722362a2a64b99a368d3f25cf5
Summary: Just a small rework of the very top-level frontmatter. Now:
- Uses `<div>` to center things a little
- Adds top-level links to the new homepage, installation guide, and tutorial
- Reworks the disclaimer and 'Introduction' section. After all, a README should
first say what the project is! I think this reads much better.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: I2d92a21650afec0640add3741d4f20c5
This effectively undoes d8a313cdd4, which is no longer needed since
we just changed that error handling. It should make it easier to share
some of the current if/else blocks.
Before this patch, when updating to a commit that has a file that's
currently an ignored file on disk, jj would crash. After this patch,
we instead leave the conflicting files or directories on disk. We
print a helpful message about how to inspect the differences between
the intended working copy and the actual working copy, and how to
discard the unintended changes.
Closes#976.
I'm about to add handling of parent dirs that are existing ignored
files, so it's better to have it in one place. The only functional
difference should be that we now create parent directories for git
submodules. I don't think that matters.
On my Debian laptop, openssl_init() takes ~30ms to load the default CA
certificates serialized in PEM format, and the cost is added to each jj
invocation. This change saves 20s (of 50s) on my machine.
% wc -l /usr/lib/ssl/cert.pem
3517 /usr/lib/ssl/cert.pem
It's about time we make the working copy a pluggable backend like we
have for the other storage. We will use it at Google for at least two
reasons:
* To support our virtual file system. That will be a completely
separate working copy backend, which will interact with the virtual
file system to update and snapshot the working copy.
* On local disk, we need to tell our build system where to find the
paths that are not in the sparse patterns. We plan to do that by
wrapping the standard local working copy backend (the one moved in
this commit), writing a symlink that points to the mainline commit
where the "background" files can be read from.
Let's start by renaming the exising implementation to
`local_working_copy`.
I've added a boolean flag to the store to ensure that the migration never runs
more than once after the view gets "op restore"-d. I'll probably reorganize the
branches structure to support non-tracking branches later, but updating the
storage format in a single commit would be too involved.
If jj is downgraded, these "git" remote refs would be exported to the Git repo.
Users might have to remove them manually.
I'm going to migrate "refs/heads/" branches to .remote_targets["git"]. This
commit will simplify the story as we won't have to exclude "refs/remotes/git/"
refs when diffing or renaming/removing remote.
Since both has_id() and resolve_prefix() do binary search, their costs are
practically the same. I think has_id() would complete with fewer ops, but such
level of optimization wouldn't be needed here. More importantly, this ensures
that unreachable commits aren't imported by GitBackend::read_commit().
One problematic scenario is that we have commits imported by old jj, and all
of their descendant commits are created by jj. Therefore import_head_commits()
wouldn't reach the old ancestor commits.
This change might bury a real bug, but I don't have a better alternative. Maybe
we can remove this hack after a couple of jj releases, and add a debug command
that imports all reachable Git commits from all historical heads.
Closes#2343
I have used the tree-level conflict format for several weeks without
problem (after the fix in 51b5d168ae). Now - right after the 0.10.0
release - seems like a good time to enable the config by default.
I enabled the config in our default configs in the CLI crate to reduce
impact on tests (compared to changing the default in `settings.rs`).
Summary: The Nix CI has been failing recently due to (what I assume is) disk
space issues. But only the `flake check` step is failing. Right now, `nix flake
check` runs the Cargo tests with the debug profile to help get more debug info,
which is even heftier in terms of debug info than the normal 'test' profile. For
reference, a single build of 'cargo test' in a clean working copy results in a
15 gigabyte `target/` directory.
Turn off the debug profile for `nix flake check`, which should hopefully stem
the bleeding a bit. I believe the 'test' profile should still have enough
symbols for backtraces, so panics should still be useful.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
Change-Id: Idde10ac15847a1ad1e6f4e48a2497eca
As we can set HEAD to an arbitrary ref by using .reference_symbolic(), we don't
have to manage a ref that can also be valid as a branch name.
Fixes#1495
Currently, there is no way provided to merely run scripts in
a fixed environment (without recording).
Short-term TODOs (done in descendant commits):
- Fix the terminal width
- Document the script
`jj | head` exits with non-zero code since `head` breaks the
pipe. Also, removed `--color=always` from that command as it
will shortly become unnecessary.
Previosly, this caused the script to stop since it's run with
`set -o pipefail`.
Also, the operation id recovery code stopped working. We
can use `jj debug operation` for this purpose now.
I think it's clearer if only the actual demos started with `demo_`,
so I renamed `demo_helpers.sh` to just `helpers.sh`.
`demo_resolve_conflict.sh` should match `resolve_conflicts.png` (with an s).
I'll add a workaround for the root parent issue #1495 there. We can pass in
the wc parent id instead of the wc_commit object, but we might want to use
wc_commit.id() to generate a unique placeholder ref name.
While debugging git issues, I often ended up creating a deadlock by adding
debug prints. It's also not obvious that git::export_refs() works even if the
git_repo() has already been locked, whereas git::import_refs() wouldn't. Let's
consolidate lock handling to the backend implementation.