mirrors/jj

mirror of https://github.com/martinvonz/jj.git synced 2025-01-16 17:19:37 +00:00

Author	SHA1	Message	Date
Martin von Zweigbergk	1cc271441f	gc: implement basic GC for Git backend This adds an initial `jj util gc` command, which simply calls `git gc` when using the Git backend. That should already be useful in non-colocated repos because it's not obvious how to GC (repack) such repos. In my own jj repo, it shrunk `.jj/repo/store/` from 2.4 GiB to 780 MiB, and `jj log --ignore-working-copy` was sped up from 157 ms to 86 ms. I haven't added any tests because the functionality depends on having `git` binary on the PATH, which we don't yet depend on anywhere else. I think we'll still be able to test much of the future parts of garbage collection without a `git` binary because the interesting parts are about manipulating the Git repo before calling `git gc` on it.	2023-12-03 07:40:12 -08:00
Yuya Nishihara	d747879aee	signing: pass SigningFn by reference write_commit() doesn't need ownership of the signing function.	2023-12-01 22:55:04 +09:00
Anton Bulakh	d7229a3f90	sign: Define signing backend API and integrate it Finished everything except actual signing backend implementation(s) and the UI.	2023-11-30 23:36:56 +02:00
Yuya Nishihara	28ab9593c3	repo_path: split RepoPath into owned and borrowed types This enables cheap str-to-RepoPath cast, which is useful when sorting and filtering a large Vec<(String, _)> list by using matcher for example. It will also eliminate temporary allocation by repo_path.parent().	2023-11-28 07:33:28 +09:00
Yuya Nishihara	f2096da2d6	repo_path: add stub type to introduce borrowed RepoPathComponent type The current RepoPathComponent will be renamed to RepoPathComponentBuf, and new str wrapper will be added as RepoPathComponent.	2023-11-26 18:21:40 +09:00
Yuya Nishihara	d7df2516c5	repo_path: remove RepoPathComponent::string(), use as_str() instead There are only two callers, and one does further conversion to BString.	2023-11-26 07:14:47 +09:00
Anton Bulakh	5c3c0e9f6e	sign: Implement generic commit signing on the backend	2023-11-23 22:52:20 +02:00
Anton Bulakh	e3a1e5b80e	sign: Implement storage for digital commit signatures Recognize signature metadata from git commit objects, implement a basic version of that for the native backend. Extract the signed data (a commit binary repr without the signature) to be verified later.	2023-11-12 03:37:13 +02:00
Martin von Zweigbergk	d989d4093d	merged_tree: let backend influence whether to use new diff algo Since the concurrent diff algorithm is significantly slower when using the Git backend, I think we'll have to use switch between the two algorithms depending on backend. Even if the concurrent version always performed as well as the sequential version, exactly how concurrent it should be probably still depends on the backend. This commit therefore adds a function to the `Backend` trait, so each backend can say how much concurrency they deal well with. I then use that number for choosing between the sequential and concurrent versions in `MergedTree::diff_stream()`, and also to decide the number of concurrent reads to do in the concurrent version.	2023-11-06 23:12:02 -08:00
Martin von Zweigbergk	24b706641f	async: switch to `pollster`'s `block_on()` During the transition to using more async code, I keep running into https://github.com/rust-lang/futures-rs/issues/2090. Right now, I want to convert `MergedTree::diff()` into a `Stream`. I don't want to update all call sites at once, so instead I'm adding a `MergedTree::diff_stream()` method, which just wraps `MergedTree::diff()` in a `Stream. However, since the iterator is synchronous, it needs to block on the async `Backend::read_tree()` calls. If we then also block on the `Stream` in the CLI, we run into the panic.	2023-11-03 08:15:10 -07:00
Martin von Zweigbergk	cfcdd71865	backend: make `read_conflict` synchronous again This avoids https://github.com/rust-lang/futures-rs/issues/2090. I don't think we need to worry about reading legacy conflicts asynchronously - async is really only useful for Google's backend right now, and we don't use the legacy format at Google. In particular, I don't want `MergedTree::value()` to have to be async.	2023-10-28 16:45:40 -07:00
Martin von Zweigbergk	f541f9f3a6	cleanup: import `futures::exectutor::block_on()` instead of qualifying It seems we'll end up using `block_on()` quite a bit, at least until we're done transitioning to async, and the function name doesn't conflict with anything else, so let's always import it when we need it.	2023-10-20 07:38:34 -07:00
Martin von Zweigbergk	f8be0b2030	backends: deduplicate definition of backend names I copied the example set by `DefaultSubmoduleStore`.	2023-10-14 06:38:35 -07:00
Martin von Zweigbergk	5174489959	backend: make read functions async The commit backend at Google is cloud-based (and so are the other backends); it reads and writes commits from/to a server, which stores them in a database. That makes latency much higher than for disk-based backends. To reduce the latency, we have a local daemon process that caches and prefetches objects. There are still many cases where latency is high, such as when diffing two uncached commits. We can improve that by changing some of our (jj's) algorithms to read many objects concurrently from the backend. In the case of tree-diffing, we can fetch one level (depth) of the tree at a time. There are several ways of doing that: * Make the backend methods `async` * Use many threads for reading from the backend * Add backend methods for batch reading I don't think we typically need CPU parallelism, so it's wasteful to have hundreds of threads running in order to fetch hundreds of objects in parallel (especially when using a synchronous backend like the Git backend). Batching would work well for the tree-diffing case, but it's not as composable as `async`. For example, if we wanted to fetch some commits at the same time as we were doing a diff, it's hard to see how to do that with batching. Using async seems like our best bet. I didn't make the backend interface's write functions async because writes are already async with the daemon we have at Google. That daemon will hash the object and immediately return, and then send the object to the server in the background. I think any cloud-based solution will need a similar daemon process. However, we may need to reconsider this if/when jj gets used on a server with a custom backend that writes directly to a database (i.e. no async daemon in between). I've tried to measure the performance impact. That's the largest difference I've been able to measure was on `jj diff --ignore-working-copy -s --from v5.0 --to v6.0` in the Linux repo, which increases from 749 ms to 773 ms (3.3%). In most cases I've tested, there's no measurable difference. I've tried diffing from the root commit, as well as `jj --ignore-working-copy log --no-graph -r '::v3.0 & author(torvalds)' -T 'commit_id ++ "\n"'` (to test a commit-heavy load).	2023-10-08 23:36:49 -07:00
Martin von Zweigbergk	551abee1d6	local_backend: don't write commits with no parents	2023-09-25 15:41:45 -07:00
Martin von Zweigbergk	d575aaeca8	backend: move constant functions first `root_commit_id()`, `root_change_id()`, and `empty_tree_id()` were strangely ordered between `write_symlink()` and `read_tree().	2023-09-19 05:24:51 -07:00
Martin von Zweigbergk	e4ba6a42fc	backends: store tree id conflicts as list with alternating signs Now that we have `Merge::iter()` and friends, it's simpler to store the tree ids in a single list.	2023-08-26 07:02:04 -07:00
Martin von Zweigbergk	fd4146d485	backend: use new enum for `Commit::root_tree` We currently represent the root tree id in a commit by `Merge<TreeId>` plus a boolean `uses_tree_conflict_format`. It's better to use an enum for that. That makes it harder to forget to check which type of tree it is, and it makes it impossible to store a legacy tree with multiple ids (as we could with `uses_tree_conflict_format=false`, `root_tree=Merge::new(...)`). Maybe more importantly, we're also going to want to pass around this information in most places where we currently pass a single `TreeId`, and passing two separate values would be annoying.	2023-08-26 07:02:04 -07:00
Martin von Zweigbergk	e3d67d5e45	local_backend: allow storing legacy trees Unlike the git backend, we don't need to support path-level conflicts for existing repos because we don't care about compatibility with existing repos using the native backend. However, we still need to support both formats until all code paths are able to handle tree-level conflicts.	2023-08-26 07:02:04 -07:00
Martin von Zweigbergk	e414f3b73c	cleanup: use `fs:read()` instead of `File::open().read_to_end()`	2023-08-13 14:04:59 +00:00
Benjamin Saunders	75636d626f	local_backend: don't reference uninitialized memory	2023-08-08 13:08:26 -07:00
Martin von Zweigbergk	ef5f97f8d7	conflicts: move `Merge<T>` to `merge` module The `merge` module now seems like the obvious place for this type.	2023-08-06 22:08:09 +00:00
Martin von Zweigbergk	ecc030848d	conflicts: rename `Conflict<T>` to `Merge<T>` Since `Conflict<T>` can also represent a non-conflict state (a single term), `Merge<T>` seems like better name. Thanks to @ilyagr for the suggestion in https://github.com/martinvonz/jj/pull/1774#discussion_r1257547709 Sorry about the churn. It would have been better if I thought of this name before I introduced `Conflict<T>`.	2023-08-06 22:08:09 +00:00
Martin von Zweigbergk	006c764694	backend: learn to store tree-level conflicts Tree-level conflicts (#1624) will be stored as multiple trees associated with a single commit. This patch adds support for that in `backend::Commit` and in the backends. When the Git backend writes a tree conflict, it creates a special root tree for the commit. That tree has only the individual trees from the conflict as subtrees. That way we prevent the trees from getting GC'd. We also write the tree ids to the extra metadata table (i.e. outside of the Git repo) so we don't need to load the tree object to determine if there are conflicts. I also added new flag to `backend::Commit` indicating whether the commit is a new-style commit (with support for tree-level conflicts). That will help with the migration. We will remove it once we no longer care about old repos. When the flag is set, we know that a commit with a single tree cannot have conflicts. When the flag is not set, it's an old-style commit where we have to walk the whole tree to find conflicts.	2023-07-19 22:04:16 -07:00
Waleed Khan	54dba51a08	docs: warn about missing docs for `jj-lib` crate	2023-07-10 18:28:59 +03:00
Yuya Nishihara	9560ca94c5	local_backend: remove global error conversion impls for BackendError We don't care much about error handling in the local backend, but these conversion impls are globally available and can be misused.	2023-07-06 20:48:46 +09:00
Yuya Nishihara	e1e75daa8e	backend: make BackendError::Other preserve source error object	2023-07-06 20:48:46 +09:00
Martin von Zweigbergk	da5db27bb0	backend: split up `store.proto` in git and local versions It was convenient that what the git backend stored in its "extras" table is exactly a subset of the fields that local backend stores, but it's bit ugly and limiting. For example, it makes it possible to populate the `author` field in the git extras, but that would have no effect. It's better that it's not possible to do that (we store the author field in the git commit, of course). What made me notice this now was that I'm working on tree-level conflicts (#1624) and I'm thinking of adding a field to the git extras saying "this commit has single tree, but it's still a new-style commit", so we can know not to walking such trees to find path-level conflicts. That's only needed for the git backend because we don't care about compatibility for the local backend.	2023-06-22 13:49:46 +02:00
Martin von Zweigbergk	d087e64abf	cleanup: consistently (?) put removed conflict terms before added ones	2023-05-25 04:24:26 -07:00
Martin von Zweigbergk	a95188ddbc	backend: take commit to write by value and return new value The internal backend at Google doesn't let you write any value you want for in the committer field. The `Store` type still caches the value it attempted to write, which gets a little weird when the written value is not what we tried to write. We should use the value the backend actually wrote. However, we don't know if the backend changed anything without reading the value back, which is often wasteful. This commit changes the API to return the written value. I only changed the signature of `write_commit()` for now. Maybe we should make a similar change to `write_tree()`.	2023-05-12 15:20:44 -07:00
Martin von Zweigbergk	e7419e76a1	backend: replace `git_repo()` by `as_any()` This has several advantages: * Makes it possible to downcast to non-Git custom backends (might be useful at Google, but we haven't needed it yet) * Lets us access more specific functionality on the `GitBackend`, making it possible to access the `git2::Repository` without creating a copy of it. * Removes the dependency on Git from the backend	2023-05-12 08:05:09 -07:00
Martin von Zweigbergk	a87125d08b	backend: rename `ConflictPart` to `ConflictTerm` It took a while before I realized that conflicts could be modeled as simple algebraic expressions with positive and negative terms (they were modeled as recursive 3-way conflicts initially). We've been thinking of them that way for a while now, so let's make the `ConflictPart` name match that model.	2023-02-17 23:28:50 -08:00
Martin von Zweigbergk	d1dc22d957	backend: let backend decide length of change id As mentioned in the previous commit, our internal backend at Google uses a 32-byte long change id. This commit will make us able to use that.	2023-02-07 22:31:34 -08:00
Martin von Zweigbergk	e6693d0f68	backend: let backend choose root change id Our internal backend at Google uses a 32-byte change id, so I'd like to make the backend able to decide the length. To start with, let's make the backend able to decide what the root change id should be. That's consistent with how we already let the backend decide what the root commit id should be.	2023-02-07 22:31:34 -08:00
Martin von Zweigbergk	98259346df	backend: make `hash_length()` specifically about commit IDs The function is currently only about the length of commit IDs, so let's clarify that. I'm going to add another function for the length of change IDs next. I don't know if we're going to care about lengths of other hashes in the future. We might even be able to remove the current restriction that all commit IDs and all change IDs have the same length.	2023-02-07 22:31:34 -08:00
Waleed Khan	af55d17a25	git_backend: propagate various errors I needed this in the course of debugging an error. Before this commit, the error looked like this: ``` Error: Unexpected error from backend: Object not found ``` After this commit, it looks like this: ``` Error: Unexpected error from backend: Object with CommitId 8f59646bc9bb6bb44b5624f1248f4a708f37003c not found: object not found - no match for id (8f59646bc9bb6bb44b5624f1248f4a708f37003c); class=Odb (9); code=NotFound (-3) ```	2023-01-02 12:28:51 -06:00
Waleed Khan	7f8a196ab2	backend: create `ObjectId` trait This lets us operate over various kinds of objects polymorphically (e.g. call `.hex()` on any kind of object hash).	2023-01-02 12:28:51 -06:00
Benjamin Saunders	aaa175eca7	lib: replace protobuf crate with prost	2022-12-22 07:04:35 -08:00
Martin von Zweigbergk	fdf43b845a	content_hash: absorb duplicate `hash()` functions We use the same blake2b hash for `ContentHash` impls in several places, and I'm about to add more places, so let's centralize the helper function.	2022-12-03 22:31:02 -08:00
Martin von Zweigbergk	25008b63a4	local_backend: switch from Thrift back to Protobuf The Protobuf team at Google decided to let us use Protobufs internally after all. That will make things a little easier for us with the Google-internal adapations, and the `protobuf` crate is noticeably faster than the `thrift` crate. This effectively rolls back commit `5b10c9aa0a`. I resolved some conflicts caused by the rename from `NormalFile` to `File`. I also kept the changelog entry, but I changed it to say that the hashing scheme has changed (not the format), but since the hashes are just used for identity, existing repos should still work.	2022-12-02 19:29:45 -08:00
Martin von Zweigbergk	d8feed9be4	copyright: change from "Google LLC" to "The Jujutsu Authors" Let's acknowledge everyone's contributions by replacing "Google LLC" in the copyright header by "The Jujutsu Authors". If I understand correctly, it won't have any legal effect, but maybe it still helps reduce concerns from contributors (though I haven't heard any concerns). Google employees can read about Google's policy at go/releasing/contributions#copyright.	2022-11-28 06:05:45 -10:00
Martin von Zweigbergk	780d7fb59c	backend: rename `NormalFile` to just `File` There are no "non-normal" files, so "normal" is not needed. We have symlinks and conflicts, but they are not files, so I think just "file" is unambiguous. I left `testutils::write_normal_file()` because there it's used to mean "not executable file" (there's also a `write_executable_file()`). I left `working_copy::FileType::Normal` since renaming `Normal` there to `File` would also suggest we should rename `FileType`, and I don't know what would be a better name for that type.	2022-11-14 23:36:43 -08:00
Martin von Zweigbergk	5b10c9aa0a	local_backend: switch from Protobuf to Thrift This migrates the native backend from Protobuf to Thrift since Google's Protobuf team does let us import jj into Google's monorepo if it uses a third-party Protobuf library. Since the native backend is not supported, I didn't write any migration code for it. We can't remove `lib/src/protos/store.proto` yet, because it's also used by the Git backend (only the `predecessors` and `change_id` fields).	2022-11-13 21:55:41 -08:00
Benjamin Saunders	c3bfe72754	local_backend: use ContentHash rather than hashing protos Insulates identifiers from the unstable serialized form.	2022-11-12 21:40:36 -08:00
Martin von Zweigbergk	6703810c6e	backend: remove `Commit::is_open` field from data model	2022-11-05 06:14:37 -07:00
Martin von Zweigbergk	af4d183c7e	cleanup: automated fixes by new Clippy version	2022-10-09 12:20:15 -07:00
Martin von Zweigbergk	de7b5cf8b0	repo: write format ("git" or "local") to disk on init We currently determine if the repo uses the Git backend or the local backend by checking for presence of a `.jj/repo/store/git_target` file. To make it easier to add out-of-tree backends, let's instead add a file that indicates which backend to use.	2022-09-25 09:40:42 -07:00
Martin von Zweigbergk	ea5aa0a96d	cleanup: replace some `PathBuf` args by `&Path` In many of these places, we don't need an owned value, so using a reference means we don't force the caller to clone the value. I really doubt it will have any noticeable impact on performance (I think these are all once-per-repo paths); it's just a little simpler this way.	2022-09-25 09:40:42 -07:00
Martin von Zweigbergk	fb8d087882	backend: make backend aware of root commit I had made the backends unaware of the virtual root commit because they don't need to know about it, and we could avoid some duplicated code by putting that in `Store` instead. However, as we saw in `b21a123bc8`, the root commit being virtual has some user-visible effects (they can't create a merge with the root and some other commit). So I'm thinking that we may want to make the root commit an actual commit, depending on which backend is used. Specificially, when using the Git backend, we cannot record the root commit as an actual parent since Git would fail when trying to look it up. Backends that don't need compatibility can make the root commit an actual commit, however. This commit therefore makes the backends aware of the root commit. It makes it remain a virtual commit in the Git backend, and makes it an actual commit in the `LocalBackend`. This commit breaks any existing repos using the `LocalBackend`, but there shouldn't be any such repos other than for testing.	2022-09-20 21:20:57 -07:00
Martin von Zweigbergk	1d9f1720c5	backend: add a `Tree::from_hex()` helper	2022-09-20 21:20:57 -07:00

1 2

59 commits