mirrors/jj

mirror of https://github.com/martinvonz/jj.git synced 2025-01-29 23:57:51 +00:00

Author	SHA1	Message	Date
Martin von Zweigbergk	c77417d4e4	merged_tree: drop outer loop in `TreeDiffStreamImpl::poll_next()` As suggested by Yuya. I also added a comment and an assertion in the case where return `Poll::Pending`.	2023-11-07 00:03:50 -08:00
Martin von Zweigbergk	d989d4093d	merged_tree: let backend influence whether to use new diff algo Since the concurrent diff algorithm is significantly slower when using the Git backend, I think we'll have to use switch between the two algorithms depending on backend. Even if the concurrent version always performed as well as the sequential version, exactly how concurrent it should be probably still depends on the backend. This commit therefore adds a function to the `Backend` trait, so each backend can say how much concurrency they deal well with. I then use that number for choosing between the sequential and concurrent versions in `MergedTree::diff_stream()`, and also to decide the number of concurrent reads to do in the concurrent version.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	f40adb84fc	merged_tree: add a `Stream` for concurrent diff off trees When diffing two trees, we currently start at the root and diff those trees. Then we diff each subtree, one at a time, recursively. When using a commit backend that uses remote storage, like our backend at Google does, diffing the subtrees one at a time gets very slow. We should be able to diff subtrees concurrently. That way, the number of roundtrips to a server becomes determined by the depth of the deepest difference instead of by the number of differing trees (times 2, even). This patch implements such an algorithm behind a `Stream` interface. It's not hooked in to `MergedTree::diff_stream()` yet; that will happen in the next commit. I timed the new implementation by updating `jj diff -s` to use the new diff stream and then ran it on the Linux repo with `jj diff --ignore-working-copy -s --from v5.0 --to v6.0`. That slowed down by ~20%, from ~750 ms to ~900 ms. Maybe we can get some of that performance back but I think it'll be hard to match `MergedTree::diff()`. We can decide later if we're okay with the difference (after hopefully reducing the gap a bit) or if we want to keep both implementations. I also timed the new implementation on our cloud-based repo at Google. As expected, it made some diffs much faster (I'm not sure if I'm allowed to share figures).	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	9af09ec236	test_meregd_tree: test diffing with a matcher We didn't have any tests at all for `MergedTree::diff()` with a matcher other than `EverythingMatcher`. This patch adds a few.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	16aa8e8f10	test_merged_tree: nest each part of `test_diff_dir_file()` I'm about to add a few more checks for diffing with a matcher. I think it will help make it readable and reduce the risk of mixing up variables between each part of the test if we use some nested blocks. I also removed some unnecessary `.clone()` calls while at it.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	c9ce80a82a	merged_tree: extract function for merged iterator of basenames in diff I'm going to reuse this for stream/async diffing.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	b72f04ba61	merged_tree: rename `all_tree_conflict_names()` since it's not about conflicts	2023-11-06 23:12:02 -08:00
Yuya Nishihara	3fddc31da8	merge: remove Merge::take() which is no longer used Merge::take() is no longer a cheap function. We can add into_vec() if needed.	2023-11-07 06:52:35 +09:00
Yuya Nishihara	92dfe59ade	refs: run non-trivial merge of ref targets without destructuring Merge object	2023-11-07 06:52:35 +09:00
Yuya Nishihara	93601541cb	refs: use swap_remove() in non-trivial merge of ref targets I'm going to add a Merge method that removes negative/positive terms pair, and swap_remove() is the easiest option. The order of the conflicted ref targets doesn't matter.	2023-11-07 06:52:35 +09:00
Yuya Nishihara	895bbce8c0	files: use borrowed Merge iterator in merge() Since the underlying Merge data type is no longer (Vec<T>, Vec<T>), it doesn't make sense to build removes/adds Vecs and concatenate them.	2023-11-07 06:52:35 +09:00
Yuya Nishihara	f1898a31b5	merge: simply print interleaved conflict values in debug output We could apply that for the resolved case, but Resolved/Conflicted label seems more useful than just printing Merge([value]).	2023-11-06 07:21:06 +09:00
Yuya Nishihara	b07b370ed3	merge: simply generate content hash from interleaved values	2023-11-06 07:21:06 +09:00
Yuya Nishihara	46ffb2f0b2	merge: store negative/positive terms internally in an interleaved Vec Many callers use interleaved iterators, and recently-added serialization code is built on top of that, so I think it's better to store terms in that format. map() functions no longer use MergeBuilder as we know the mapped values are ordered properly. flatten() and simplify() are reimplemented to work with the interleaved values. The other changes are trivial.	2023-11-06 07:21:06 +09:00
Yuya Nishihara	287728fee7	merge: extract trivial_merge() that takes interleaved adds/removes iterator The Merge type will store interleaved terms instead of separate adds/removes vecs.	2023-11-06 07:21:06 +09:00
Yuya Nishihara	01523ba4f3	merge: rewrite bottom half of trivial_merge() for non-copyable types The input values of trivial_merge() will be changed to Iterator<Item = T> where T: Eq + Hash. It could be <Item = &'a T>, but it doesn't have to be.	2023-11-06 07:21:06 +09:00
Martin von Zweigbergk	7c923514ee	git: add config to disable abandoning of unreachable commits Some users prefer to have commits not get abandoned when importing refs. This adds a config option for that. Closes #2504.	2023-11-05 06:10:54 -08:00
Martin von Zweigbergk	7bf8906f9c	git: extract a function for abandoning unreachable commits This motivation for this is so we can easily skip calling the function if the user has opted out of the propagation of abandoned commits we usually do (#2504). However, it seems like a good piece of code to extract regardless of that feature.	2023-11-05 06:10:54 -08:00
Yuya Nishihara	d9fbf21794	merge: have Merge::adds()/removes() return iterator The Merge type will be changed to store interleaved values internally.	2023-11-05 16:43:06 +09:00
Yuya Nishihara	1c6913d618	merge: use Merge::iter() instead of adds()/removes() where order doesn't matter Merge::iter() will be a slice::Iter, and be more efficient than chaining adds and removes.	2023-11-05 16:43:06 +09:00
Yuya Nishihara	99e6ff493a	merge: fix copy-paste error in doc comment for adds()	2023-11-05 16:43:06 +09:00
Yuya Nishihara	f6d85c51cd	merge: add non-optional Merge accessor to the zeroth value We have a few callers which just need to obtain an object common among all the merge values. Let's add a non-failing accessor for that purpose.	2023-11-05 16:43:06 +09:00
Yuya Nishihara	b12c688ea0	merge: add method for indexed adds/removes access The current adds()/removes() will be changed to return iterators.	2023-11-05 16:43:06 +09:00
Martin von Zweigbergk	6a5615c933	rewrite: use `MergedTree::diff_stream()` when restoring from tree	2023-11-04 21:07:49 -07:00
Yuya Nishihara	602b44258e	workspace: add function that initializes colocated git repository One less git2 API use in CLI. The function name GitBackend::init_colocated() is a bit odd, but we need to specify the work-tree path, not the ".git" repo path. So we can't eliminate the notion of the working copy path anyway.	2023-11-05 08:48:35 +09:00
Yuya Nishihara	77e16243d6	tests: assert paths of initialized GitBackend	2023-11-05 08:48:35 +09:00
Yuya Nishihara	ce46c10c96	git_backend: extract inner function that initializes backend with open git repo	2023-11-05 08:48:35 +09:00
Yuya Nishihara	dce640aaf1	workspace: one less cloning of workspace_root in init_external_git() Just a trivial code cleanup.	2023-11-05 08:48:35 +09:00
Yuya Nishihara	c866b4a42d	workspace: fix repository path in init_internal_git() doc comment Also rephrased "Git backend" as "Git repo" since the new backend storage will be created.	2023-11-05 08:48:35 +09:00
Antoine Cezar	5973ab47b9	commands: move rebase_to_dest_parent to jj_lib::rewrite What make rebase_to_dest_parent a good candidate for jj_lib::rewrite module: - It is used both in obslog and interdiff. It's a sign that it may be moved to a lower layer - CommandError is returned by converting from TreeMergeError. Not explicitly. - It only use jj_lib::rewrite fonctions.	2023-11-03 20:48:00 +01:00
Martin von Zweigbergk	904c37d36d	working copy: use `MergedTree::diff_stream()` This will make it a little faster to update the working copy at Google once we've made `MergedTree::diff_stream()` fetch trees concurrently. (It only makes it a little faster because we still fetch files serially.)	2023-11-03 08:15:10 -07:00
Martin von Zweigbergk	72245cfac5	merged_tree: add `Stream`-based version of `diff()`, delegating for now I'm going to implement a `Stream`-based version optimized for high-latency (RPC-based) commit backends. So far, that implementation is about 20% slower in the Linux repo when running `jj diff --ignore-working-copy -s --from v5.0 --to v6.0`. I think that's almost only because the algorithm is different, not because it's async per se. This commit adds a `Stream`-based version of `MergedTree::diff()` that just wraps the regular iterator in stream. I updated `jj diff` to use it. I couldn't measure any difference on the command above in the Linux repo. I think that means we can safely use the same `Stream`-based interface regardless of backend, even if we end up needing two different implementations of the `Stream`. We would then be using the wrapped iterator from this commit for local backends, and the new implementation for remote backends. But ideally we can make the remote-friendly implementation fast enough that we don't need two implementations.	2023-11-03 08:15:10 -07:00
Martin von Zweigbergk	24b706641f	async: switch to `pollster`'s `block_on()` During the transition to using more async code, I keep running into https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want to convert `MergedTree::diff()` into a `Stream`. I don't want to update all call sites at once, so instead I'm adding a `MergedTree::diff_stream()` method, which just wraps `MergedTree::diff()` in a `Stream. However, since the iterator is synchronous, it needs to block on the async `Backend::read_tree()` calls. If we then also block on the `Stream` in the CLI, we run into the panic.	2023-11-03 08:15:10 -07:00
Martin von Zweigbergk	3a378dc234	cli: add a function for restoring part of a tree from another tree We had similar code in two places for restoring paths from one tree to another. Let's reuse it instead. I put the new function in the `rewrite` module. I'm not sure if that's right place. Maybe it belongs in `tree`?	2023-11-02 06:07:45 -07:00
Yuya Nishihara	162dcd49b4	cli: rewrite base GitIgnoreFile lookup to use gitoxide instead of libgit2 Since gix::Repository::config_snapshot() borrows the repo instance, it has to be allocated in caller's stack. That's why GitBackend::git_config() is removed.	2023-11-02 19:33:06 +09:00
Yuya Nishihara	c88e69ad6f	git_backend: replace git2::Repository with gix::Repository My gut feeling is that gitoxide aims to be more transparent than libgit2. We'll need to know more about the underlying Git data model. Random comments on gix API: * gix::Repository provides API similar to git2::Repository, but has less "convenient" functions. For example, we need to use .find_object() + .try_to/into_<kind>() instead of .find_<kind>(). * gix::Object, Blob, etc. own raw data as bytes. gix::object and gix::objs types provide high-level views on such data. * Tree building is pretty low-level compared to git2. * gix leverages bstr (i.e. bytes) extensively. It's probably not difficult to migrate git::import/export_refs(). It might help eliminate the startup overhead of libssl initialization. The gix-based GitBackend appears to be a bit faster, but that wouldn't practically matter. #2316	2023-11-02 19:33:06 +09:00
Yuya Nishihara	9a86b77e38	tests: force gitoxide to not load config nor use "main" as default branch AFAIK, there's no global config state for gitoxide. We can use Config::isolated() in tests, but GitBackend should load config files in a normal way. https://docs.rs/gix/0.55.2/gix/open/permissions/struct.Config.html#method.isolated https://docs.rs/gix/0.55.2/gix/init/constant.DEFAULT_BRANCH_NAME.html	2023-11-02 19:33:06 +09:00
Yuya Nishihara	f5a61dc2b7	git_backend: open just-initialized repo with canonicalized path Otherwise, the initialized repo could have a different work-dir path than the load()-ed one. libgit2 appears to do some normalization somewhere, but gix won't.	2023-11-02 19:33:06 +09:00
Yuya Nishihara	fd187d266f	git_backend: box GitBackendInit/LoadError up front These error enums will wrap gix error types, and will become bigger enough for clippy to complain.	2023-11-02 19:33:06 +09:00
Yuya Nishihara	b48569b104	cargo: add gitoxide (or gix) dependency I've enabled the "index" component from the "basic" feature set, which would be needed to implement colocated repo functionality. The doc suggests that a library shouldn't activate "max-performance-safe", but our crate is also an application so it would be okay to enable the feature. We'll need "parallel" anyway to make GitBackend Sync. https://docs.rs/gix/latest/gix/#feature-flags	2023-11-02 19:33:06 +09:00
Isabella Basso	749d8bb15a	git: preserve HEAD when possible Closes: #2210	2023-11-01 08:23:52 -03:00
Yuya Nishihara	1788b5014e	git_backend: remove redundant copy back of author timestamp Only the committer timestamp can be updated inside a loop.	2023-10-31 06:51:27 +09:00
Yuya Nishihara	f5aa739c70	git_backend: use .strip_suffix() instead of manual slicing	2023-10-31 06:51:27 +09:00
Yuya Nishihara	9bd84c55e0	git_backend: use file mode extensively in read_tree() Both filemode() and kind() are calculated from the same underlying data, and kind() is libgit2-specific API.	2023-10-31 06:51:27 +09:00
Yuya Nishihara	b3c9cab12d	git_backend: handle read_tree() lookup/encoding errors gracefully	2023-10-31 06:51:27 +09:00
Yuya Nishihara	847adc832f	git_backend: use lossy conversion to decode non-UTF-8 commit message If message() returned None, it doesn't mean the commit message is empty. I originally mapped it to an error, but that made import of linux repo fail. https://docs.rs/git2/latest/git2/struct.Commit.html#method.message	2023-10-31 06:51:27 +09:00
Yuya Nishihara	06c254e742	git_backend: use non-owned str::from_utf8() to decode symlink target Just for consistency with the other changes. str::Utf8Error is 2 words long, so I removed the boxing.	2023-10-31 06:51:27 +09:00
Yuya Nishihara	d1c71c05c9	git_backend: remove redundant error handling for invalid hash length The only error that could be returned by libgit2 is invalid hash length, and we check that explicitly. If we switch the backends to gitoxide, there will be panicking constructor. https://docs.rs/git2/latest/git2/struct.Oid.html#method.from_bytes	2023-10-31 06:51:27 +09:00
Ilya Grigoriev	8bc3f5fd67	cli rebase: Allow `jj rebase -r` to rebase a commit onto a descendant #1188 There are some additional test changes because children and descendants are now rebased before the commit itself.	2023-10-30 10:56:27 -07:00
Yuya Nishihara	2d3fe7eee2	rewrite: replace use of "lift"ed function application with try_collect() Also removed redundant borrow + clone.	2023-10-30 13:50:37 +09:00

1 2 3 4 5 ...

2227 commits