Commit graph

2267 commits

Author SHA1 Message Date
Martin von Zweigbergk
61d87fe296 merged_tree: make TreeEntriesIterator not self-referential
While importing the `ouroboros` crate and the `aliasable` crate it
depends on, the "unsafe Rust reviewer" expressed some concern that
they contain a lot of unsafe code that's hard to review. We can avoid
the unsafe code altogether by making `TreeEntriesIterator` not
self-refential. Instead, we can collect the matching entries in an
individual tree up front. It does have some performance cost:


```
❯ hyperfine --warmup 3 --runs 30 \
      '/tmp/jj-before --ignore-working-copy files -r v6.0' \
      '/tmp/jj-after --ignore-working-copy files -r v6.0'
Benchmark 1: /tmp/jj-before --ignore-working-copy files -r v6.0
  Time (mean ± σ):     461.4 ms ±  14.3 ms    [User: 232.1 ms, System: 229.4 ms]
  Range (min … max):   443.4 ms … 496.3 ms    30 runs

Benchmark 2: /tmp/jj-after --ignore-working-copy files -r v6.0
  Time (mean ± σ):     482.0 ms ±  14.3 ms    [User: 257.2 ms, System: 224.9 ms]
  Range (min … max):   461.8 ms … 513.3 ms    30 runs

Summary
  '/tmp/jj-before --ignore-working-copy files -r v6.0' ran
    1.04 ± 0.04 times faster than '/tmp/jj-after --ignore-working-copy files -r v6.0'
```

I think that's acceptable.
2023-11-17 03:50:34 -08:00
Yuya Nishihara
93fbcec2f7 index: use BinaryHeap instead of BTreeSet in common_ancestors_pos()
For the same reason as the heads_pos() change. We just want to omit duplicated
items.
2023-11-16 08:27:59 +09:00
Yuya Nishihara
d4059520a9 index: cache generation numbers during common_ancestors_pos() computation
I'm not sure if this is better, but common_ancestors_pos() would have a
similar property to heads_pos().
2023-11-16 08:27:59 +09:00
Yuya Nishihara
ea4bdd718d index: use "while let" in common_ancestors_pos() 2023-11-16 08:27:59 +09:00
Yuya Nishihara
02c84a8596 index: remove stale "allow(unstable_name_collisions)"
I think this is remainder of nightly shims.
2023-11-16 08:27:59 +09:00
Yuya Nishihara
6399c392fd index: make heads_pos() deduplicate entries without building separate set
This is much faster (maybe because of better cache locality?) Another option
is to use BTreeSet, but the BinaryHeap version is slightly faster.

"bench revset" result in my linux repo:

    revsets/heads(tags())
    ---------------------
    baseline  3.28     560.6±4.01ms
    1         2.92     500.0±2.99ms
    2         1.98     339.6±1.64ms
    3 (this)  1.00     171.2±0.30ms
2023-11-16 08:27:59 +09:00
Yuya Nishihara
9832ee205d index: optimize heads_pos() to cache generation numbers during computation
Apparently, IndexEntry::generation_number() isn't cheap probably because it
involves random access to larger memory region, and the u32 value might not
be aligned. Let's instead store the generation numbers in BinaryHeap.

Also, heads_pos() becomes slightly faster by keeping the BinaryHeap entries
small, so I've removed the IndexEntry at all.

This makes the default log and disambiguation revsets fast, which evaluate
'heads(immutable_heads())'.

"bench revset" result in my linux repo:

    revsets/heads(tags())
    ---------------------
    baseline  3.28     560.6±4.01ms
    1         2.92     500.0±2.99ms
    2 (this)  1.98     339.6±1.64ms
2023-11-16 08:27:59 +09:00
Yuya Nishihara
1e933b84dd index: make IndexEntry::parents() lazy instead of collecting to Vec
All callers just iterate over the parent entries.

"bench revset" result in my linux repo:

    revsets/heads(tags())
    ---------------------
    baseline  3.28     560.6±4.01ms
    1 (this)  2.92     500.0±2.99ms
2023-11-16 08:27:59 +09:00
Yuya Nishihara
39b065f7ab git: on import_refs(), exclude uninteresting dirs such as refs/jj/keep
For loose refs, uninteresting directories can be just skipped. For packed refs,
gix will have to do binary search for each prefix to find the starting point.
Still it's better overall if the repository contains tons of refs/jj/keep refs.

With my linux repo containing ~5k loose jj refs, this saves ~40ms:

    % hyperfine --warmup 3 --runs 10 \
        "/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux" \
        "/tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux"
    Benchmark 1: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     151.6 ms ±  11.4 ms    [User: 38.8 ms, System: 111.6 ms]
      Range (min … max):   129.8 ms … 159.5 ms    10 runs
    Benchmark 2: /tmp/jj-gix-iter --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     109.9 ms ±  11.6 ms    [User: 27.5 ms, System: 82.4 ms]
      Range (min … max):    89.4 ms … 117.8 ms    10 runs
2023-11-14 17:35:27 +09:00
Yuya Nishihara
044716ee40 git: migrate import_refs() to gix::Repository
Gitoxide errors are boxed since there are various error types and they tend
to exceed the clippy size limit.

Apparently, gitoxide is faster than git2:

    % hyperfine --warmup 3 --runs 10 \
        "/tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux" \
        "/tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux"
    Benchmark 1: /tmp/jj-baseline --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     205.4 ms ±  15.7 ms    [User: 59.6 ms, System: 144.6 ms]
      Range (min … max):   189.7 ms … 223.9 ms    10 runs
    Benchmark 2: /tmp/jj-gix --ignore-working-copy git import -R ~/mirrors/linux
      Time (mean ± σ):     176.2 ms ±  13.7 ms    [User: 41.2 ms, System: 134.0 ms]
      Range (min … max):   155.4 ms … 186.5 ms    10 runs
2023-11-14 17:35:27 +09:00
Yuya Nishihara
6c98dfcdcb git: have import_refs() obtain git2::Repository instance from store
This helps gitoxide migration. It's theoretically possible to import Git refs
from non-Git backend, but I don't think such API flexibility is needed.
2023-11-14 17:35:27 +09:00
Yuya Nishihara
dbb1adaf0a git: move import-related types close to import_refs() function 2023-11-14 17:35:27 +09:00
Yuya Nishihara
f991705e47 tests: add test for importing missing ancestor of HEAD
If a commit pointed to by HEAD or ref is missing, the ref is considered
invalid and excluded by import_refs(). The current test behavior appears to
depend on some in-memory cache of git2::Repository.
2023-11-14 17:35:27 +09:00
Yuya Nishihara
8e143541a5 operation: propagate OpStoreError from parents()
We need to .collect_vec() the parents iterator to temporary buffer since the
borrowed iterator can't be returned back to the dag_walk functions. Another
option is to clone op_store and parent ids to remove &self lifetime from the
iterator, but that also means a temporary Vec is created.
2023-11-14 07:16:39 +09:00
Yuya Nishihara
8ddad859e8 dag_walk: add fallible topo_order_reverse_lazy()
Unlike dfs_ok(), this function short-circuits at an Err as we use non-lazy
topo_order_forward() internally. I think that's good enough. If we implement
GC on operation log, deleted parents will be excluded (or mapped to tombstone)
by caller. An Err shouldn't mean it's GC-ed.
2023-11-14 07:16:39 +09:00
Yuya Nishihara
3d5a07e86a dag_walk: add fallible dfs(), topo_order(), heads(), and closest_common_node()
This unblocks the use of Result<T, E> in op.parents().

There are two ways to encode errors:
 a. impl IntoIterator<Item = Result<T, E>>
 b. Result<V, E> where V: FromIterator<Item = T>
I think (a) is more natural to algorithms like dfs(), which can process error
nodes transparently.

Still the caller might have to collect the source iterator to temporary Vec
to conform to the neighbors_fn signature. It's not easy for neighbors_fn to
return an iterator borrowing the input node. We already have GAT, but doesn't
have return-position impl Trait in trait yet.
2023-11-14 07:16:39 +09:00
Yuya Nishihara
e5a9a26911 dag_walk: remove unused and untested leaves() function 2023-11-14 07:16:39 +09:00
Anton Bulakh
e3a1e5b80e sign: Implement storage for digital commit signatures
Recognize signature metadata from git commit objects, implement
a basic version of that for the native backend.
Extract the signed data (a commit binary repr without the signature) to
be verified later.
2023-11-12 03:37:13 +02:00
Yuya Nishihara
b42a69db6d git_backend: configure committer (and author) of gix::Repository
Otherwise, ref updates would fail if we port git::export_refs() to gitoxide.
This change isn't strictly needed for the backend itself, but we'll reuse the
gix::Repository instance created by the backend when importing and exporting
Git refs.
2023-11-11 22:35:54 +09:00
Yuya Nishihara
ea32c0cb9e git_backend: pass UserSettings to GitBackend constructors 2023-11-11 22:35:54 +09:00
Yuya Nishihara
8a2048a0e5 repo: pass UserSettings to store factories and initializers
GitBackend will use it to configure gix::Repository. I think UserSettings
is generally useful to pass store-specific parameters, so I've updated all
factory functions.
2023-11-11 22:35:54 +09:00
Yuya Nishihara
6125fb160e op_store: embed details in operation/view not found error
This is basically a copy of BackendError::ObjectNotFound. The failed id may
be either view or operation id.
2023-11-11 22:35:40 +09:00
Yuya Nishihara
ea96513fd1 op_store: deduplicate functions that map io::Error to OpStoreError
io_to_read_error() also translates ErrorKind::NotFound.
2023-11-11 22:35:40 +09:00
Martin von Zweigbergk
120115a20d cli: pass MaterializedTreeValue into git_diff_part()
Just a little preparation for reading the materialized values
concurrently.
2023-11-10 04:54:47 -08:00
Waleed Khan
a60733f632 tree: remove unsafe with ouroboros for self-referential iterators 2023-11-09 21:50:29 -08:00
Yuya Nishihara
6ff3a4f3df repo: reimplement DirtyCell without using unsafe
While the safe implementation is a bit more complex (and probably more branchy),
I don't think the runtime overhead would matter here. Let's remove one more
unsafe for better code maintainability.
2023-11-10 07:42:45 +09:00
Martin von Zweigbergk
9b24d24612 conflicts: add another helper for materializing a tree value
We have a few places where we have a `MergedTreeValue` and need to
read the data associated with it so we can write to the working copy
or include it in a diff. Let's extract some of that shared logic to a
function so we can reuse it. I plan to use it for reading file
contents in advance while streaming a diff in `local_working_copy`
soon (and probably in `jj diff` thereafter), but I think it seems like
an improvement on its own.
2023-11-08 21:21:38 -08:00
Martin von Zweigbergk
65bd5cacba working copy: on checkout, move read from store out of write_*() functions
I'd like to read N files ahead from the backend, to avoid serializing
too many server calls on backends that are backed by a server. Moving
the reads a little earlier is a little step towards that.

The `TreeState::write_*()` functions can now be made into free/static
functions if we prefer.
2023-11-08 21:21:38 -08:00
Yuya Nishihara
084b99e1e2 index: rewrite CompositeIndex::entry_by_pos() by leveraging ancestors iterator
We no longer have "unsafe" in this function, so let's use the iterator API
instead of recursion. Apparently I haven't pushed this change before because
unsafe in .find_map() looked scary.
2023-11-08 12:09:33 +09:00
Anton Bulakh
d27351b978 misc: drop a few low-hanging unsafes
Remove a couple of unnecessary unsafes:
- The NonZeroUsize is a constant where the unwrap will optimize away
anyway and we don't have an unsafe without any good reason there :)
- The other two were simply not needed, lifetimes worked fine, maybe
Rust became better since that code was written? NLL? Anyway, they're
gone now
2023-11-08 02:16:08 +02:00
Yuya Nishihara
2ac9865ce7 revset: exclude @git branches from remote_branches()
As discussed in Discord, it's less useful if remote_branches() included
Git-tracking branches. Users wouldn't consider the backing Git repo as
a remote.

We could allow explicit 'remote_branches(remote=exact:"git")' query by changing
the default remote pattern to something like 'remote=~exact:"git"'. I don't
know which will be better overall, but we don't have support for negative
patterns anyway.
2023-11-08 07:34:30 +09:00
Yuya Nishihara
59640496aa cargo: sort dependencies list alphabetically 2023-11-07 23:46:05 +09:00
Yuya Nishihara
d1b0c4cc48 merge: relax input type of Merge::from_removes_adds() 2023-11-07 17:10:12 +09:00
Yuya Nishihara
e0c35684af merge: rename Merge::new() to Merge::from_removes_adds()
Since (removes, adds) pair is no longer the canonical representation of Merge,
the name Merge::new() seems too generic. Let's give more verbose name.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
2c128f1b61 merged_tree: convert from legacy conflicts through interleaved list
This is basically the same change as the previous commit.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
a734f46130 merged_tree: build unresolved Merge<Tree> from interleaved list
We no longer need to iterate removes and adds separately.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
dd26b7be40 merge: add Merge constructor that accepts interleaved values
Also migrated some callers of 3-way merge, where [left, base, right] order
looks okay.
2023-11-07 17:10:12 +09:00
Yuya Nishihara
803b41c426 merge: load legacy Merge values without allocating intermediate buffers 2023-11-07 17:10:12 +09:00
Yuya Nishihara
09987c1d27 merge: micro-optimize allocation of Merge object for resolved value
It's super common that a Merge object holds a resolved value, so let's inline
up to 1 element. T of Merge<T> usually consists of a couple of pointer-sized
fields. I don't see any measurable speed up, but it's no worse than the
original.
2023-11-07 17:10:12 +09:00
Martin von Zweigbergk
1140295829 merged_tree: extract polling of tree futures into a function 2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
c77417d4e4 merged_tree: drop outer loop in TreeDiffStreamImpl::poll_next()
As suggested by Yuya. I also added a comment and an assertion in the
case where return `Poll::Pending`.
2023-11-07 00:03:50 -08:00
Martin von Zweigbergk
d989d4093d merged_tree: let backend influence whether to use new diff algo
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
f40adb84fc merged_tree: add a Stream for concurrent diff off trees
When diffing two trees, we currently start at the root and diff those
trees. Then we diff each subtree, one at a time, recursively. When
using a commit backend that uses remote storage, like our backend at
Google does, diffing the subtrees one at a time gets very slow. We
should be able to diff subtrees concurrently. That way, the number of
roundtrips to a server becomes determined by the depth of the deepest
difference instead of by the number of differing trees (times 2,
even). This patch implements such an algorithm behind a `Stream`
interface. It's not hooked in to `MergedTree::diff_stream()` yet; that
will happen in the next commit.

I timed the new implementation by updating `jj diff -s` to use the new
diff stream and then ran it on the Linux repo with `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by
~20%, from ~750 ms to ~900 ms. Maybe we can get some of that
performance back but I think it'll be hard to match
`MergedTree::diff()`. We can decide later if we're okay with the
difference (after hopefully reducing the gap a bit) or if we want to
keep both implementations.

I also timed the new implementation on our cloud-based repo at
Google. As expected, it made some diffs much faster (I'm not sure if
I'm allowed to share figures).
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
9af09ec236 test_meregd_tree: test diffing with a matcher
We didn't have any tests at all for `MergedTree::diff()` with a
matcher other than `EverythingMatcher`. This patch adds a few.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
16aa8e8f10 test_merged_tree: nest each part of test_diff_dir_file()
I'm about to add a few more checks for diffing with a matcher. I think
it will help make it readable and reduce the risk of mixing up
variables between each part of the test if we use some nested blocks.

I also removed some unnecessary `.clone()` calls while at it.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
c9ce80a82a merged_tree: extract function for merged iterator of basenames in diff
I'm going to reuse this for stream/async diffing.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
b72f04ba61 merged_tree: rename all_tree_conflict_names() since it's not about conflicts 2023-11-06 23:12:02 -08:00
Yuya Nishihara
3fddc31da8 merge: remove Merge::take() which is no longer used
Merge::take() is no longer a cheap function. We can add into_vec() if needed.
2023-11-07 06:52:35 +09:00
Yuya Nishihara
92dfe59ade refs: run non-trivial merge of ref targets without destructuring Merge object 2023-11-07 06:52:35 +09:00
Yuya Nishihara
93601541cb refs: use swap_remove() in non-trivial merge of ref targets
I'm going to add a Merge method that removes negative/positive terms pair, and
swap_remove() is the easiest option. The order of the conflicted ref targets
doesn't matter.
2023-11-07 06:52:35 +09:00