Commit graph

1919 commits

Author SHA1 Message Date
Martin von Zweigbergk
e1c0d4fd5f cleanup: replace x[n..n+l] by x[n..][..l]
This avoids repeating the `n` in these expressions. Thanks to
@Dr-Emann for the suggestion.
2023-08-21 22:29:46 -07:00
Martin von Zweigbergk
c43a3067eb revset: pass all context arguments to parse() via an object
`revset::parse()` already has a `RevsetWorkspaceContext` argument, so
I think it makes sense to put that and the other context arguments
into a larger `RevsetParseContext` object.
2023-08-20 21:30:06 -07:00
Martin von Zweigbergk
5f3df4aaea revset: resolve "@" symbol's workspace id earlier (while parsing)
We resolve file paths into repo-relative paths while parsing the
revset expression, so I think it's consistent to also resolve which
workspace "@" refers to while parsing it. That means we won't need the
workspace context both while parsing and while resolving symbols.

In order to break things like `author("martinvonz@")` (thanks to @yuja
for catching this), I also changed the parsing of working-copy
expressions so they are not allowed to be
quoted. `author(martinvonz@)` will therefore be an error now. That
seems like a small improvement anyway, since we have recently talked
about making `root` and `[workspace]@` not parsed as other symbols.
2023-08-20 17:57:18 -07:00
Martin von Zweigbergk
f9b3211d58 revset: drop an unnecessary return keyword 2023-08-20 17:57:18 -07:00
Yuya Nishihara
b6794ca04a revset: rename literal:"" prefix to exact:""
Per discussion in #2107, I believe "exact" is preferred.

We can also change the default to exact match, but it doesn't always make
sense. Exact match would be useful for branches(), but not for description().
We could define default per predicate function, but I'm pretty sure I cannot
remember which one is which.
2023-08-19 11:33:57 +09:00
Yuya Nishihara
ebdc22a65e revset: add support for explicit substring:"..." prefix
git-branchless calls it a substring, so let's do the same.

FWIW, I copied literal:_ from Mercurial, but it's exact:_ in git-branchless.
I have no idea which one is preferred. Since this feature isn't released, we
can freely change it if exact:_ makes more sense.

https://github.com/arxanas/git-branchless/wiki/Reference:-Revsets#patterns
2023-08-19 10:32:59 +09:00
Emily Fox
3f8ac2198d commits: use empty strings instead of placeholders for missing name or email
This commit replaces the functions `UserSettings::user_name_placeholder()`` and
`UserSettings::user_email_placeholder()` with `const` `&str`s to emphasize that
the placeholder strings must not be changed to support commits without
names or email addresses made before this change.
2023-08-18 17:22:59 -05:00
Emily Fox
2c88da02b4 git: teach backend to handle empty name and email strings 2023-08-18 17:22:59 -05:00
Benjamin Saunders
4bd05e8285 tests: hack around broken lint 2023-08-17 19:29:38 -07:00
Benjamin Saunders
417035cb20 tests: validate snapshot.max-new-file-size behavior 2023-08-17 19:29:38 -07:00
Benjamin Saunders
54f1d310c4 testutils: propagate snapshot errors 2023-08-17 19:29:38 -07:00
Benjamin Saunders
6c4b8a7383 settings: support human-readable byte sizes for max-new-file-size 2023-08-17 19:29:38 -07:00
Ben Saunders
351e7feef5 working_copy: don't snapshot new files larger than 1MiB by default 2023-08-17 19:29:38 -07:00
Martin von Zweigbergk
7ad2270c05 working_copy: pass Merge, not ConflictId, to write_conflict()
This is another small step towards making this code work with
tree-level conflicts.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
1571541214 working_copy: combine blocks for updating added/modified paths
There's a lot of duplication between the blocks of code for updating
modified and added paths. This commit combines them.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
01a6578ada working_copy: move up special case for exec-bit-only change
This is also just to make the next change simpler.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
8ded5ae03b working_copy: convert Diff into Options for matching
This just a little refactoring to make the next step of sharing code
between `Modified` and `Added` simpler.
2023-08-16 22:59:12 -07:00
Martin von Zweigbergk
5b8c1e013f working_copy: add a helper for getting the current tree
The code for getting the current tree object was repeated a few times
over. I'm going to soon make it return a `MergedTree` and I don't want
to repeat that code (it's more complicated than the current code).
2023-08-16 22:59:12 -07:00
Yuya Nishihara
4c3265477b revset: remove unused MineWithoutUserName error variant
user_email is always available to the revset parser.
2023-08-17 07:49:24 +09:00
Yuya Nishihara
810d4eeef2 revset: use exact match for mine() 2023-08-17 07:42:12 +09:00
Yuya Nishihara
7da7356ef7 revset: add fast path to look up branches by literal name 2023-08-17 07:42:12 +09:00
Yuya Nishihara
81f1ae38b3 revset: add literal:"string" pattern syntax
The syntax is slightly different from Mercurial. In Mercurial, a pattern must
be quoted like "<kind>:<needle>". In JJ, <kind> is a separate parsing node, and
it must not appear in a quoted string. This allows us to report unknown prefix
as an error.

There's another subtle behavior difference. In Mercurial, branch(unknown) is
an error, whereas our branches(literal:unknown) is resolved to an empty set.
I think erroring out doesn't make sense for JJ since branches() by default
performs substring matching, so its behavior is more like a filter.

The parser abuses DAG range syntax for now. It can be rewritten once we remove
the deprecated x:y range syntax.
2023-08-17 07:42:12 +09:00
Yuya Nishihara
5b3c73dfc4 revset: insert StringPattern enum to add support for other kind of matching 2023-08-17 07:42:12 +09:00
Emily Fox
9ba9ecd708 revset: add function mine() 2023-08-16 11:00:14 -05:00
Alexander Potashev
2d81cf9156 Fix build when bstr is also imported.
Add type annotation to `vec` to avoid the following build error if you
additionally import `bstr`:

```
~/jj> cargo test
   Compiling jj-lib v0.8.0 (/home/aspotashev/jj/lib)
warning: unused import: `bstr`
  --> lib/src/default_index_store.rs:30:5
   |
30 | use bstr;
   |     ^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

error[E0282]: type annotations needed
   --> lib/src/default_index_store.rs:564:14
    |
564 |             .as_mut()
    |              ^^^^^^
565 |             .write_u32::<LittleEndian>(parent_overflow.len() as u32)
    |              --------- type must be known at this point
    |
help: try using a fully qualified path to specify the expected types
    |
563 |         <[u8] as AsMut<T>>::as_mut(&mut buf[parent_overflow_offset..parent_overflow_offset + 4])
    |         +++++++++++++++++++++++++++++++                                                        ~

For more information about this error, try `rustc --explain E0282`.
warning: `jj-lib` (lib) generated 1 warning
error: could not compile `jj-lib` (lib) due to previous error; 1 warning emitted
```

Reason to support `bstr` being imported: in a Bazel environment where
crates are imported with certain features enabled, jj-lib may pull in
bstr as part of the following dependency chain:
  jj-lib -> insta -> similar -> bstr.
2023-08-15 22:47:56 +02:00
Vamsi Avula
3869b7c2ac cli: refactor default_description to UserSettings 2023-08-15 21:25:50 +05:30
Martin von Zweigbergk
9138bb5517 working_copy: use MergedTree for current tree when snapshotting
We now have all the pieces in place to read the current tree as a
`MergedTree` when snapshotting the working copy. For now, it's still
always a legacy tree. We'll need to update the working copy state file
to support storing multiple trees before we can create a `MergedTree`
with multiple sides here.
2023-08-15 07:56:55 -07:00
Martin von Zweigbergk
c126e75b2b working_copy: make write_path_to_store() work with merged values
For tree-level conflicts, we're going to be getting
`Merge<Option<TreeValue>>` from the current tree and produce a new
such value if contents changes on disk. This commit gets us a little
closer to that by passing in a value of that type into
`write_path_to_store()`.

This seems to have a small but measurable performance
impact. Snapshotting the working copy in the git repo with all files
`touch`ed went from 2.36 s to 2.43 s (3%). I think that's okay,
especially since most files' mtimes rarely change, and we only pay the
price when it has.
2023-08-15 07:56:55 -07:00
Martin von Zweigbergk
1d55a404cc merged_tree: add path_value() 2023-08-15 07:56:55 -07:00
Martin von Zweigbergk
2238c87da1 merged_tree: import create_tree() in tests to reduce line wrapping 2023-08-15 07:56:55 -07:00
Martin von Zweigbergk
3f97a6da78 working_copy: avoid adding unchanged values to tree builder
If the value at a path hasn't changed, there's no need to send it over
the channel and have the receiver add it to `TreeBuilder`. I couldn't
measure any performance impact.

Now we should no longer send `TreeValue::Conflict` variants over the
tree entry channel.
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
eacdad3ebd working_copy: move writing of conflicts to receiver side of channel
When writing tree-level conflicts, we're going to be writing multiple
tree (maybe using some new `MergedTreeBuilder`), so we'll need the
full `Merge<Option<TreeValue>>` object. This gets us closer to that by
sending such objects over the channel and having the receiver write
the conflict object.

Note that we still sometimes send `TreeValue::Conflict` variants over
the channel. That only happens if they're unchanged.
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
03f00bbf30 working_copy: return Merge<Option<TreeValue>> over channel
When writing tree-level conflicts, we won't pass `TreeValue::Conflict`
over the `tree_entries` channel. Instead, we're going to pass possibly
unresolved `Merge<Option<TreeValue>>` instances. This commit prepares
for that by changing the type even though we'll only pass
`Merge::normal()` over the channel at this point.

I did this partly to see what the performance impact is. I tested that
by touching all files in the git.git repo to force the trees (and
files) to be rewritten. There was no measurable impact at all
(best-of-10 time was 2.44 s before and 2.40 s after, but I assume that
was a fluke).
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
6c5d6d7e39 working_copy: delete duplicate comment
I copied a comment that I should have just moved in 37a770e8b4.
2023-08-14 23:32:52 -07:00
Martin von Zweigbergk
4eadb06251 working_copy: propagate errors from writing conflict parts to store 2023-08-14 23:32:52 -07:00
Yuya Nishihara
6286cde543 index: import commits in chronological order
This basically means that heads in a filtered graph appear in reverse
chronological order. Before, "jj log -r 'tags()'" in linux-stable repo would
look randomly sorted once you ran "jj debug reindex" in it.

With this change, indexing is more like breadth-first search, and BFS is
known to be bad at rendering nice graph (because branches run in parallel.)
However, we have a post process to group topological branches, so we don't
have this problem. For serialization formats like Mercurial's revlog iirc,
BFS leads to bad compression ratio, but our index isn't that kind of data.

Reindexing gets slightly slower, but I think this is negligible.

  (in Git repository)
  % hyperfine --warmup 3 --runs 10 "jj debug reindex --ignore-working-copy"
  (original)
  Time (mean ± σ):      1.521 s ±  0.027 s    [User: 1.307 s, System: 0.211 s]
  Range (min … max):    1.486 s …  1.573 s    10 runs
  (new)
  Time (mean ± σ):      1.568 s ±  0.027 s    [User: 1.368 s, System: 0.197 s]
  Range (min … max):    1.531 s …  1.625 s    10 runs

Another idea is to sort heads chronologically and run DFS-based topological
sorting. It's ad-hoc, but worked surprisingly well for my local repositories.
For repositories with lots of long-running branches, this commit will provide
more predictable result than DFS-based one.
2023-08-15 15:03:45 +09:00
Yuya Nishihara
cc6e9150d5 dag_walk: add topological sort that runs Kahn's algorithm with heap queue
This is a bit more involved than DFS-based implementation, but it allows us
to sort commits chronologically without breaking topological ordering.
2023-08-15 15:03:45 +09:00
Martin von Zweigbergk
f1b817e8ca cleanup: fix warnings from nightly clippy 2023-08-14 22:11:56 -07:00
Martin von Zweigbergk
b16fd3b6b9 conflicts: combine loops for adds/removes in update_from_content()
Similar to the previous commit, now that we can `Merge::iter()`, we
can combine that with `zip()` and simplify.
2023-08-14 08:44:38 -07:00
Martin von Zweigbergk
f45b8052e1 conflicts: check earlier for edited absent part in conflict markers
With the new `Merge::iter()`, we can simplify the code a bit by
combining that with `zip`.

I'll simplify the last part of `update_from_content()` next.
2023-08-14 08:44:38 -07:00
Martin von Zweigbergk
01ac97f999 merge: implement Iterator and FromIterator
Implementing `Iterator` and `FromIterator` on `Merge<T>` provides much
more flexibility than the current `map()`, `try_map()`, etc.

`Merge::from_iter()` wouldn't have a way of failing if it's given an
unexpected (even) number of items. I would be fine with having it
panic, but we can't even usefully do that, because
e.g. `Option::from_iter()` will pass us an iterator ends early if the
input interator ends early. For example,
`Merge::resolved(None).iter().collect()` would call
`Merge::from_iter()` with an empty iterator (first item `None`). So, I
instead created a `MergeBuilder` type implementing `FromIterator`, and
let `MergeBuilder::build()` panic if there were an even number of
items.

I re-implemented some existing `Merge` methods using the new
facilities in this commit. Maybe we should remove some of the methods.
2023-08-14 08:44:38 -07:00
Martin von Zweigbergk
dffe069985 conflicts: remove redundant check of num_sides from condition
Since `Merge` always has one more "adds" than "removes", there's no
need to check both of them. I really should have noticed this in
0b3b62a777.
2023-08-14 08:44:38 -07:00
Yuya Nishihara
7ddced7f3f git: scan new commits all at once from multiple heads
The visiting order is DFS from heads sorted in lexicographical order, but
I plan to change it to chronological order.
2023-08-14 07:48:55 +09:00
Yuya Nishihara
73a4b7f5bf repo: extract add_heads() that can import commits from multiple heads
This allows us to reorder commits to be indexed in bulk.

The incremental update optimization is applied only for a single head. This
could be tried for multiple heads, but it's unlikely that every head has
a single new commit for each.
2023-08-14 07:48:55 +09:00
Yuya Nishihara
157a0e748b git: add separate step to apply HEAD@git change
I'm going to extract a step to import new commits all at once.
2023-08-14 07:48:55 +09:00
Yuya Nishihara
359c871545 git: remove redundant id.clone() from diff_refs_to_import() 2023-08-14 07:48:55 +09:00
Martin von Zweigbergk
e414f3b73c cleanup: use fs:read() instead of File::open().read_to_end() 2023-08-13 14:04:59 +00:00
Martin von Zweigbergk
0b3b62a777 conflicts: remove redundant num_removes argument from parse_conflict()
Merges always have exactly one more "adds" than "removes" these days.
2023-08-13 09:54:16 +00:00
Yuya Nishihara
72271c0d1f repo: micro-optimize add_head() to not instantiate indexed commit object 2023-08-13 18:52:17 +09:00
Yuya Nishihara
15fb8b95b0 index: rewrite topological sort by leveraging dag_walk function
This is similar to what mut_repo.add_head() does.

I'm going to adjust the visiting order so the bulk-imported history preserves
chronological order. It might be a small adjustment on the current DFS
approach, or new function based on Kahn's algorithm. Either way, it's important
that both "jj git import" and "jj debug reindex" use the same underlying
function.
2023-08-13 18:52:17 +09:00