Commit graph

73 commits

Author SHA1 Message Date
Yuya Nishihara
4931b2ba04 merged_tree: remove redundant .clone() from TreeDiffStreamImpl::new() 2024-11-30 10:20:43 +09:00
Yuya Nishihara
c741e3db39 merged_tree: use Merge<Tree> to represent pending trees in TreeDiffStreamImpl
This seems a slightly better in that MergedTree no longer represent a subtree.
2024-11-30 10:20:43 +09:00
Martin von Zweigbergk
a5690beab5 test_merged_tree: avoid a temporary lifetime extension
Some checks are pending
binaries / Build binary artifacts (push) Waiting to run
nix / flake check (push) Waiting to run
build / build (, macos-13) (push) Waiting to run
build / build (, macos-14) (push) Waiting to run
build / build (, ubuntu-latest) (push) Waiting to run
build / build (, windows-latest) (push) Waiting to run
build / build (--all-features, ubuntu-latest) (push) Waiting to run
build / Build jj-lib without Git support (push) Waiting to run
build / Check protos (push) Waiting to run
build / Check formatting (push) Waiting to run
build / Check that MkDocs can build the docs (push) Waiting to run
build / Check that MkDocs can build the docs with latest Python and uv (push) Waiting to run
build / cargo-deny (advisories) (push) Waiting to run
build / cargo-deny (bans licenses sources) (push) Waiting to run
build / Clippy check (push) Waiting to run
Codespell / Codespell (push) Waiting to run
website / prerelease-docs-build-deploy (ubuntu-latest) (push) Waiting to run
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
I think it's just clearer to assign the owned value to the variable
than to assign a reference to a temporary value.
2024-11-27 18:53:28 -08:00
Martin von Zweigbergk
409be2e1c4 store: make get_tree() functions take owned repo path
The function needs an owned value, so we might as well pass it one and
avoid a few clone calls.
2024-11-27 18:53:28 -08:00
Martin von Zweigbergk
10c90a5099 merged_tree: propagate errors from conflict iterator
Some checks failed
binaries / Build binary artifacts (push) Has been cancelled
nix / flake check (push) Has been cancelled
build / build (, macos-13) (push) Has been cancelled
build / build (, macos-14) (push) Has been cancelled
build / build (, ubuntu-latest) (push) Has been cancelled
build / build (, windows-latest) (push) Has been cancelled
build / build (--all-features, ubuntu-latest) (push) Has been cancelled
build / Build jj-lib without Git support (push) Has been cancelled
build / Check protos (push) Has been cancelled
build / Check formatting (push) Has been cancelled
build / Check that MkDocs can build the docs (push) Has been cancelled
build / Check that MkDocs can build the docs with latest Python and uv (push) Has been cancelled
build / cargo-deny (advisories) (push) Has been cancelled
build / cargo-deny (bans licenses sources) (push) Has been cancelled
build / Clippy check (push) Has been cancelled
Codespell / Codespell (push) Has been cancelled
website / prerelease-docs-build-deploy (ubuntu-latest) (push) Has been cancelled
Scorecards supply-chain security / Scorecards analysis (push) Has been cancelled
2024-11-23 13:53:04 -08:00
Samuel Tardieu
3f2ef2ee04 style: add semicolon at the end of expressions used as statements 2024-10-04 22:29:13 +02:00
Yuya Nishihara
f5187fa063 copies: determine copy/rename operation by CopiesTreeDiffStream
Not all callers need this information, but I assumed it's relatively cheap to
look up the source path in the target tree compared to diffing.

This could be represented as Regular(_)|Copied(_, _)|Renamed(_, _), but it's
a bit weird if Copied and Renamed were separate variants. Instead, I decided
to wrap copy metadata in Option.
2024-08-23 10:29:12 +09:00
Yuya Nishihara
b6060ce6dd copies: wrap source path in Option to save allocation
Most diff entries should have no copy sources.
2024-08-23 10:29:12 +09:00
Yuya Nishihara
08262eb152 copies: extract (source, target) path pair to separate type
This patch adds accessor methods as I'm going to change the underlying data
types. Since entry values are consumed separately, these methods are implemented
on CopiesTreeDiffEntryPath, not on *TreeDiffEntry.
2024-08-23 10:29:12 +09:00
Yuya Nishihara
43bf195314 merged_tree: rename diff entry field from "value" to "values"
It seems a slightly better, and aligns with the local variable name in
materialized_diff_stream().
2024-08-23 10:29:12 +09:00
Matt Kulukundis
8ead72e99f formatting only: switch to Item level import ganularity 2024-08-22 14:52:54 -04:00
Yuya Nishihara
352a4a0eea copies: filter rename source entries by CopiesTreeDiffStream 2024-08-22 20:17:19 +09:00
Yuya Nishihara
d85e66bbb4 copies: turn add_records() into non-stream API, block_on_stream() by caller
This is simpler, and I think it's generally better to not spawn executor in
library code.
2024-08-22 20:17:19 +09:00
Martin von Zweigbergk
3acb89e7cc merged_tree: remove TreeDiffEntry::source 2024-08-18 22:16:41 -07:00
Martin von Zweigbergk
70598498b0 merged_tree: provide separate version of diff_stream() with copy info
I plan to provide a richer version of `TreeDiffEntry` with copy info
(and to make `TreeDiffEntry` itself "poorer"). Most callers want to
know about copies/renames, but at least working copy implementations
probably don't. This patch adds separate `diff_stream()` and
`diff_stream_with_copies()` so we can provide the simpler interface
for callers that don't need copy info.
2024-08-18 22:16:41 -07:00
Martin von Zweigbergk
e670837ff6 copies: implement copy support in MergedTree::diff_stream() as adapter
The support for copy tracing is already simply added to the stream
just before yielding the item, so we can easily implement it as a
stream adapter. That ensures that we use the same logic for the
iterator- and stream-based versions. More importantly, it enables
further cleanups and a simpler interface.
2024-08-18 22:16:41 -07:00
Martin von Zweigbergk
fd9a236be5 copies: move CopyRecords to new copies module
Copy/rename handling is complicated. It seems worth having a module
for it. I'm going to add more content to it next.
2024-08-18 22:16:41 -07:00
Yuya Nishihara
f7377fbbcd merged_tree: replace MergedTreeVal<'_> by Merge<Option<&TreeValue>>
MergedTreeVal was roughly equivalent to Merge<Option<Cow<_>>. As we've dropped
support for the legacy trees, it can be simplified to Merge<Option<&_>>.
2024-08-12 23:01:46 +09:00
Matt Kulukundis
5911e5c9b2 copy-tracking: Add copy tracking as a post iteration step
- force each diff command to explicitly enable copy tracking
- enable copy tracking in diff_summary
- post-process for diff iterator
- post-process for diff stream
- update changelog
2024-08-11 17:01:45 -04:00
Matt Kulukundis
34b0f87584 copy-tracking: plumb CopyRecordMap through diff method 2024-08-11 17:01:45 -04:00
Matt Kulukundis
8e84c60157 copy-tracking: create an explicit TreeDiffEntry struct 2024-08-11 17:01:45 -04:00
Yuya Nishihara
6fc7cec4a5 merged_tree: make TreeDiffIterator accept trees as &Merge<Tree>
For the same reason as the patch for TreeEntriesIterator. It's probably
better to assume that MergedTree represents the root tree.
2024-08-08 23:05:37 +09:00
Yuya Nishihara
9378adedb7 merged_tree: hold store globally by TreeDiffIterator
Since TreeDiffDirItem is now calculated eagerly, it doesn't make sense to
keep MergedTree in it.
2024-08-08 23:05:37 +09:00
Martin von Zweigbergk
ec7725064b merged_tree: make MergedTree a struct
I considered making `MergedTree` just a newtype (1-tuple) but I went
with a struct instead because we may want to add copy information in a
separate field in the future.
2024-08-08 05:32:16 -07:00
Martin von Zweigbergk
109391f9c7 merged_tree: delete MergedTree::Legacy 2024-08-08 05:32:16 -07:00
Yuya Nishihara
24b8934b14 tests: migrate .diff() callers to .diff_stream() 2024-08-08 10:45:59 +09:00
Yuya Nishihara
63e254d052 tests: use pollster instead of futures::executor::block_on()
It doesn't matter in tests and I have no preference over these, but we tend
to use .block_on().
2024-08-08 10:45:59 +09:00
Martin von Zweigbergk
65a988e3d2 merged_tree: make tree builder attempt to resolve conflicts
As we discovered in the `jj fix` tests,
`MergedTreeBuilder::write_tree()` doesn't try to resolve conflicts,
not even trivial ones. This patch fixes that.
2024-06-08 20:29:30 +09:00
Martin von Zweigbergk
776b2d981f merged_tree: make resolve() return a MergedTree
It seems like a method on `MergedTree` should return another
`MergedTree` when reasonable. I'm not sure why I made it return a
`Merge<Tree>` instead.
2024-06-08 20:29:30 +09:00
Martin von Zweigbergk
7e6a968415 conflicts: consider the empty tree a non-legacy tree
Since we no longer depend on legacy trees being preserved when we
build new trees or merge trees, we can consider the root tree a
non-legacy tree.
2024-05-27 06:25:27 -07:00
Martin von Zweigbergk
07bb1d81b7 tree_builder: propagate errors from write_tree() 2024-05-22 06:46:38 -07:00
Martin von Zweigbergk
1970ddef15 tree: propagate errors from sub_tree()/path_value() 2024-05-22 06:46:38 -07:00
Martin von Zweigbergk
facfb71f7b test_merged_tree: reduce duplication and wrapping with helper lambdas
I'm about to make `[Merged]Tree::path_value()` return a `Result`. This
will help even more then.
2024-05-22 06:46:38 -07:00
Martin von Zweigbergk
0d1ff8a150 merged_tree: propagate errors from TreeEntriesIterator
We shouldn't panic if we fail to read a tree from the backend.
2024-05-01 06:10:08 -07:00
Ilya Grigoriev
a88c06068e clippy: new nightly fixes
For some reason, clippy also suggested surrounding
`self.value` with parentheses. Not sure whether
that's a clippy bug.

Cc: https://github.com/rust-lang/rust-clippy/issues/12268
2024-02-10 16:06:28 -08:00
Yuya Nishihara
35f718f212 merged_tree: remove canceling terms prior to resolving file-level conflict
I think this is a variant of the problem fixed by 7fda80fc22 "tree: simplify
conflict before resolving at hunk level." We need to simplify() the conflict
before and after extracting file ids because the source conflict values may
contain trees to be cancelled out, and the file values may differ only in exec
bits. Since the legacy tree passes a simplified conflict in to this function,
I made the merged tree do the same.

Fixes #2654
2023-12-03 07:44:58 +09:00
Yuya Nishihara
4ffbf40c82 merged_tree: do not propagate conflicting empty tree value to parent
Otherwise an empty subtree would be added to the parent tree.

If the stored tree contained an empty subtree, simplify() wouldn't work
against new "absent" subtree representation. I don't know if there's a
such code path, but I believe it's very rare to encounter the problem.

#2654
2023-12-03 07:44:58 +09:00
Yuya Nishihara
28ab9593c3 repo_path: split RepoPath into owned and borrowed types
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
2023-11-28 07:33:28 +09:00
Yuya Nishihara
0a1bc2ba42 repo_path: add stub RepoPathBuf type, update callers
Most RepoPath::from_internal_string() callers will be migrated to the function
that returns &RepoPath, and cloning &RepoPath won't work.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
d322df0c8d matchers: make Files/PrefixMatcher constructors accept slice of borrowed paths
RepoPath will become slice type (like str), and it doesn't make sense to
require &[RepoPathBuf] here.
2023-11-28 07:33:28 +09:00
Yuya Nishihara
974a6870b3 repo_path: make RepoPath::components() return iterator
This allows us to change the backing type from Vec<String> to String.
2023-11-27 08:42:09 +09:00
Yuya Nishihara
59ef3f0023 repo_path: split RepoPathComponent into owned and borrowed types
This is a step towards introducing a borrowed RepoPath type. The current
RepoPath type is inefficient as each component String is usually short. We
could apply short-string optimization, but still each inlined component would
consume 24 bytes just for e.g. "src", and increase the chance of random memory
access. If the owned RepoPath type is backed by String, we can implement cheap
cast from &str to borrowed &RepoPath type.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
f2096da2d6 repo_path: add stub type to introduce borrowed RepoPathComponent type
The current RepoPathComponent will be renamed to RepoPathComponentBuf, and
new str wrapper will be added as RepoPathComponent.
2023-11-26 18:21:40 +09:00
Yuya Nishihara
6344cd56b3 repo_path: remove RepoPathJoin trait, just implement join() on the type
I don't think we'll add join() that takes different types.
2023-11-26 07:14:47 +09:00
Yuya Nishihara
e0c35684af merge: rename Merge::new() to Merge::from_removes_adds()
Since (removes, adds) pair is no longer the canonical representation of Merge,
the name Merge::new() seems too generic. Let's give more verbose name.
2023-11-07 17:10:12 +09:00
Martin von Zweigbergk
d989d4093d merged_tree: let backend influence whether to use new diff algo
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
f40adb84fc merged_tree: add a Stream for concurrent diff off trees
When diffing two trees, we currently start at the root and diff those
trees. Then we diff each subtree, one at a time, recursively. When
using a commit backend that uses remote storage, like our backend at
Google does, diffing the subtrees one at a time gets very slow. We
should be able to diff subtrees concurrently. That way, the number of
roundtrips to a server becomes determined by the depth of the deepest
difference instead of by the number of differing trees (times 2,
even). This patch implements such an algorithm behind a `Stream`
interface. It's not hooked in to `MergedTree::diff_stream()` yet; that
will happen in the next commit.

I timed the new implementation by updating `jj diff -s` to use the new
diff stream and then ran it on the Linux repo with `jj diff
--ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by
~20%, from ~750 ms to ~900 ms. Maybe we can get some of that
performance back but I think it'll be hard to match
`MergedTree::diff()`. We can decide later if we're okay with the
difference (after hopefully reducing the gap a bit) or if we want to
keep both implementations.

I also timed the new implementation on our cloud-based repo at
Google. As expected, it made some diffs much faster (I'm not sure if
I'm allowed to share figures).
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
9af09ec236 test_meregd_tree: test diffing with a matcher
We didn't have any tests at all for `MergedTree::diff()` with a
matcher other than `EverythingMatcher`. This patch adds a few.
2023-11-06 23:12:02 -08:00
Martin von Zweigbergk
16aa8e8f10 test_merged_tree: nest each part of test_diff_dir_file()
I'm about to add a few more checks for diffing with a matcher. I think
it will help make it readable and reduce the risk of mixing up
variables between each part of the test if we use some nested blocks.

I also removed some unnecessary `.clone()` calls while at it.
2023-11-06 23:12:02 -08:00
Yuya Nishihara
895bbce8c0 files: use borrowed Merge iterator in merge()
Since the underlying Merge data type is no longer (Vec<T>, Vec<T>), it doesn't
make sense to build removes/adds Vecs and concatenate them.
2023-11-07 06:52:35 +09:00