Commit graph

85 commits

Author SHA1 Message Date
Martin von Zweigbergk
5b10c9aa0a local_backend: switch from Protobuf to Thrift
This migrates the native backend from Protobuf to Thrift since
Google's Protobuf team does let us import jj into Google's monorepo if
it uses a third-party Protobuf library.

Since the native backend is not supported, I didn't write any
migration code for it.

We can't remove `lib/src/protos/store.proto` yet, because it's also
used by the Git backend (only the `predecessors` and `change_id`
fields).
2022-11-13 21:55:41 -08:00
Martin von Zweigbergk
4ee261717e simple_op_store: replace Protobuf by Thrift
As mentioned in the previous commit, we need to remove the Protobuf
dependency in order to be allowed to import jj into Google's
repo. This commit makes `SimpleOpStore` store its data using Thrift
instead of Protobufs. It also adds automatic upgrade of existing
repos. The upgrade process took 18 s in my repo, which has 22k
operations. The upgraded storage uses practically the same amount of
space. `jj op log` (the full outut) in my repo slowed down from 1.2 s
to 3.4 s. Luckily that's an uncommon operation. I couldn't measure any
difference in `jj status` (loading a single operation).
2022-11-13 11:39:33 -08:00
Martin von Zweigbergk
8acb54b26e op_store: extract current Protobuf-based implementation to separate file
In order to allow building jj inside of Google, our Protobuf team
doesn't want to us to use a Google-unsupported implementation. Since
there is no supported implementation in Rust, we have to migrate off
of Protobufs. I'm starting with the operation store. This commit moves
the current implementation to a separate file so it can easily be
disabled by a Caargo feature.
2022-11-13 11:39:33 -08:00
Benjamin Saunders
2447dfeed8 simple_op_store: hash view/operation data directly
Decouples view/operation IDs from serialized forms, which are not
necessarily stable. Not breaking as these IDs are persistent, never
recomputed or used for integrity checking.
2022-11-12 21:40:36 -08:00
Martin von Zweigbergk
3c7c4e9f5c tests: move testutils module into separate crate
The `testutils` module should ideally not be part of the library
dependencies. Since they're used by the integration tests (and the CLI
tests), we need to move them to a separate crate to achieve that.
2022-11-08 07:29:35 -08:00
Martin von Zweigbergk
6f2359c36d lib: remove ineffective enabling of map_first_last
If I'm reading this attribute correctly, it says that if the
`map_first_last` feature is enabled, then we should enable the
`map_first_last` feature, which seems like it would not have any
effect. We started getting warnings from the nightly compiler about
this line because it tries to enable a feature that's stable in that
version.
2022-11-05 06:07:27 -07:00
Martin von Zweigbergk
b654a1fe84 cleanup: remove extern crate declarations
`extern crate` is no longer needed since edition 2018.
2022-09-21 22:24:09 -07:00
Waleed Khan
9202aae8b1 build: conditionally use map_first_last feature if available 2022-02-20 22:21:14 -08:00
Waleed Khan
dd3272fe90 build: use assert_matches crate
The `assert_matches` feature is nightly-only, so use this crate as a shim.
2022-02-20 22:21:14 -08:00
Waleed Khan
261cd1a1c4 build: add shims for nightly feature map_first_last 2022-02-20 22:16:07 -08:00
Martin von Zweigbergk
9f8c6fe07d clippy: return_self_not_must_use is now disabled (pedantic) by default 2022-01-26 22:13:09 -08:00
Martin von Zweigbergk
91e471cb73 clippy: disable return_self_not_must_use
A new Clippy version added a new warning when a function that returns
`Self` doesn't have `#[must_use]`. I feel like all the cases reported
by it were false positives. Most were functions on `CommitBuilder`,
where we take `mut self` and return `Self`. I don't think I've ever
forgotten to use the result of those.
2022-01-03 21:34:39 -08:00
Martin von Zweigbergk
c0711f47cf workspace: introduce Workspace type
Having a concept of a "workspace" will be useful for adding support
for multiple workspaces (#13). You can think of the "workspace" as a
repo combined with a working copy. A workspace corresponds 1:1 with a
`.jj/` directory. It's pretty close to what other VCS simply call a
"repo", but I've ended up using the word "repo" for what Git calls a
"bare repo".
2021-11-25 21:04:56 -08:00
Martin von Zweigbergk
e86d266e6b stacked_table: add a file format for stacked, sorted tables
I'm trying to replace the Git backend's use of Git notes for storing
metadata (#7). This patch adds a file format that I hope can be used
for that. It's a simple generic format for storing fixed-size keys and
associated variable-size values. The keys are stored in sorted
order. Each key is followed by an offset to the value. The offset is
relative to the first value. All values are concatenated after each
other. I suppose it's a bit like Git's pack files but lacking both
delta-encoding and compression.

Each file can also have a parent pointer (just like the index files
have), so we don't have to rewrite the whole file each time. As with
the index files, the new format squashes a file into its parent if it
contains more than half the number of entries of the parent. The code
is also based on `index.rs`.

Perhaps we can alo replace the default operation storage with this
format. Maybe also the native local backend's storage. We'll need
delta-encoding and compression soon then.
2021-10-20 13:19:32 -07:00
Martin von Zweigbergk
4c4e436f38 evolution: delete it now that we don't use it anymore (#32)
It's been a lot of work, but now we're finally able to remove the
`Evolution` state! `jj obslog` still works as before (it just walks
the predecessor pointers).
2021-10-06 23:28:30 -07:00
Martin von Zweigbergk
0c1ce664ea store: remove (weak) self-reference and take &Arc<Self> arguments instead
The `weak_self` stuff was from before I knew that `self` could be of
type `&Arc<Self>`.
2021-09-16 23:30:30 -07:00
Martin von Zweigbergk
ce5e95fa80 store: rename Store to Backend and StoreWrapper to Store
For what's currently called `Store` in the code, I have been using
"backend" in plain text. That probably means that `Backend` is a good
name for it.
2021-09-12 12:02:10 -07:00
Martin von Zweigbergk
6b1ccd4512 view: add support for merging git ref targets
When there are two concurrent operations, we would resolve conflicting
updates of git refs quite arbitrarily before this change. This change
introduces a new `refs` module with a function for doing a 3-way merge
of ref targets. For example, if both sides moved a ref forward but by
different amounts, we pick the descendant-most target. If we can't
resolve it, we leave it as a conflict. That's fine to do for git refs
because they can be resolved by simply running `jj git refresh` to
import refs again (the underlying git repo is the source of truth).

As with the previous change, I'm doing this now because mostly because
it is a good stepping stone towards branch support (issue #21). We'll
soon use the same 3-way merging for updating the local branch
definition (once we add that) when a branch changes in the git repo or
on a remote.
2021-07-24 19:01:56 -07:00
Martin von Zweigbergk
134940d2bb windows: don't fail when concurrent threads/processes fail to rename file
On Windows, it seems that you can't rename a file if the target file
is open (Stebalien/tempfile#131). I think that's the reason for our
failing tests on Windows. This patch adds a simple wrapper around
`NamedTempFile::persist()` that returns the existing file instead of
failing, if there is one.
2021-06-14 00:09:22 -07:00
Martin von Zweigbergk
fdeb499836 trees: merge into tree module 2021-06-05 14:20:07 -07:00
Martin von Zweigbergk
88f7f4732b gitignores: add own implementation and stop using libgit2's
This is to address issue #8. I haven't added the optimization to avoid
walking all the files in `target/` yet. Even so, this patch still
speeds up `jj st` in this repo, with ~13k files in `target/`, from
~320 ms to ~100 ms (-5.1dB). The time actually checking if paths match
gitignores seems to go down from 116 ms to 6 ms. I think that's mostly
because libgit2 has to look for `.gitignore` files in every parent
directory every time we ask it about a file, while the rewritten code
looks for a `.gitignore` file only when visiting a new directory.
2021-05-13 22:23:59 -07:00
Martin von Zweigbergk
33da97f0bf revsets: add iterator adapter for rendering simplified graph of set
When rendering a non-contiguous subset of the commits, we want to
still show the connections between the commits in the graph, even
though they're not directly connected. This commit introduces an
adaptor for the revset iterators that also yield the edges to show in
such a simplified graph.

This has no measurable impact on `jj log -r ,,v2.0.0` in the git.git
repo.


The output of `jj log -r 'v1.0.0 | v2.0.0'` now looks like this:

```
o   e156455ea491 e156455ea491 gitster@pobox.com 2014-05-28 11:04:19.000 -07:00 refs/tags/v2.0.0
:\  Git 2.0
: ~
o c2f3bf071ee9 c2f3bf071ee9 junkio@cox.net 2005-12-21 00:01:00.000 -08:00 refs/tags/v1.0.0
~ GIT 1.0.0
```

Before this commit, it looked like this:

```
o e156455ea491 e156455ea491 gitster@pobox.com 2014-05-28 11:04:19.000 -07:00 refs/tags/v2.0.0
| Git 2.0
| o   c2f3bf071ee9 c2f3bf071ee9 junkio@cox.net 2005-12-21 00:01:00.000 -08:00 refs/tags/v1.0.0
| |\  GIT 1.0.0
```

The output of `jj log -r 'git_refs()'` in the git.git repo is still
completely useless (it's >350k lines and >500MB of data). I think
that's because we don't filter out edges to ancestors that we have
transitive edges to. Mercurial also doesn't filter out such edges, but
Git (with `--simplify-by-decoration`) seems to filter them out. I'll
change it soon so we filter them out.
2021-05-01 14:56:52 -07:00
Martin von Zweigbergk
a4ef42962c revsets: don't crash when given ungrammatical revset
I also snuck in some updates to the test cases.
2021-04-21 18:53:58 -07:00
Martin von Zweigbergk
2d6325b0f4 revsets: define grammar in pest 2021-04-18 21:25:58 -07:00
Martin von Zweigbergk
9e8a7e2ba6 revsets: move code for resolving symbol to commit to new module 2021-04-10 09:46:27 -07:00
Martin von Zweigbergk
1e657c5331 diff: add a histogram(-like?) diff algorithm
The current diff algorithm does a full LCS on the words of the texts,
which is really slow. Diffing the working copy when e.g.
`src/commands.py` has changes far apart takes seconds. This patch adds
an implementation inspired by JGit's Histogram diff. I say "inspired"
because I just didn't quite understand it :P In particular, I didn't
understand what it does when it finds non-unique elements. I decided
to line up the leading common elements on both sides of the merge. I
don't know if that usually gives good enough results in practice.

I'm sure this can still be optimized a lot, but this seems good enough
as a start. There is also many things to improve about the quality of
the diffs.
2021-03-31 22:15:36 -07:00
Martin von Zweigbergk
db4e8bc458 cargo: upgrade to protobuf 2.22.1 to avoid workaround for rustfmt::skip 2021-03-18 13:06:42 -07:00
Martin von Zweigbergk
91117f36b6 cargo: work around warning in generated protobuf code with new nightly rustc 2021-03-14 22:25:43 -07:00
Martin von Zweigbergk
4bd121dab5 view: split out separate type for keeping track of op heads 2021-03-10 21:34:11 -08:00
Martin von Zweigbergk
403e86c138 index: introduce IndexStore, which owns ReadonlyIndex files
This patch introduces a new `IndexStore` struct. The idea is that it
will know about the directory in which the index files are stored, the
associations with operations. It may also cache `Arc<ReadonlyIndex>`
instances so if multiple `ReadonlyIndex` instances are loaded, they
can be returned from the cache. That may be useful when merging
operations because the operations are likely to share a large parent
index file. For now, however, all the new type has is `init()`,
`load()`, and `reinit()`.
2021-03-06 09:52:16 -08:00
Martin von Zweigbergk
bb94516175 index: add support for finding common ancestors
We currently need to read the commit objects for finding common
ancestors. That can be very slow when the common ancestor is far back
in history. This patch adds a function for finding common ancestors
using the index instead.

Unlike the current algorithm, which only returns one common ancestor,
the new index-based one correctly handles criss-cross merges.

Here are some timings for finding the common ancestors in the git.git
repo:

                          |      Without index     |       With Index       |
                          | First run | Subsequent | First run | Subsequent |
v2.30.0-rc0 v2.30.0-rc1   |   5.68 ms |    5.94 us |   40.3 us |    4.77 us |
v2.25.4 v2.26.1           |   1.75 ms |    1.42 us |   13.8 ms |    4.29 ms |
v1.0.0 v2.0.0             |    492 ms |    2.79 ms |   23.4 ms |    6.41 ms |

Finding ancestors of v2.25.4 and v2.26.1 got much slower because the
new algorithm finds all common ancestors. Therefore, it also finds
v2.24.2, v2.23.2, v2.22.3, v2.21.2, v2.20.3, v2.19.4, v2.18.3, and
v2.17.4, which it then filters out because they're all ancestors of
v2.25.3.

Also note that the result was incorrect before, because the old
algorithm would return as soon as it had found a common ancestor, even
if it's not the latest common ancestor. For example, for the common
ancestor between v1.0.0 and v2.0.0, it returned an ancestor of v1.0.0
because it happened to get there by following some side branch that
led there more quickly.

The only place we currently need to find the common ancestor is when
merging trees, which we only do when the user runs `jj merge`, as well
as when operating on existing merge commits (e.g. to diff or rebase
them). That means that this change won't be very noticeable. However,
it's something we clearly want to do sooner or later, so we might as
well get it done.
2021-02-23 17:29:23 -08:00
Martin von Zweigbergk
e82197d981 git: extract function for pushing commit to remote branch, and test it 2020-12-28 00:53:41 -08:00
Martin von Zweigbergk
c7ee24727a protobuf: generate code at build-time
I had tried to generate the protobuf code at build time many months
ago, but decided against it because it slowed down the build too
much. I didn't realize there was the
"cargo:rerun-if-changed=<filename>" feature that time. Given that that
exists, it seems like an obvious win to generate the source code at
build time.

I put the generated sources in `$OUT_DIR` (where [1] says they should
be), then include them in the `protos` module by using the `include!`
macro. The biggest problem with that is that I couldn't get IntelliJ
to understand it, even after enabling the experimental features
described in [2].

 [1] https://doc.rust-lang.org/cargo/reference/build-script-examples.html#code-generation

 [2] https://github.com/intellij-rust/intellij-rust/issues/1908#issuecomment-592773865
2020-12-24 01:05:17 -08:00
Martin von Zweigbergk
88e7f4a30c tests: start using the maplit crate 2020-12-23 17:32:31 -08:00
Martin von Zweigbergk
6b1427cb46 import commit 0f15be02bf4012c116636913562691a0aaa7aed2 from my hg repo 2020-12-12 00:23:38 -08:00