For large repos, it's useful to be able to use shorter change id and
commit id prefixes by resolving the prefix in a limited subset of the
repo (typically the same subset that you'd want to see in your default
log output). For very large repos, like Google's internal one, the
shortest unique prefix evaluated within the whole repo is practically
useless because it's long enough that the user would want to copy and
paste it anyway.
Mercurial supports this with its `revisions.disambiguatewithin` config
(added in https://www.mercurial-scm.org/repo/hg/rev/503f936489dd). I'd
like to add the same feature to jj. Mercurial's implementation works
by attempting to resolve the prefix in the whole repo and then, if the
prefix was ambiguous, it resolves it in the configured subset
instead. The advantage of doing it that way is that there's no extra
cost of resolving the revset defining the subset if the prefix was not
ambiguous within the whole repo. However, there are two important
reasons to do it differently in jj:
* We support very large repos using custom backends, and it's probably
cheaper to resolve a prefix within the subset because it can all be
cached on the client. Resolving the prefix within the whole repo
requires a roundtrip to the server.
* We want to be able to resolve change id prefixes, which is always
done in *some* revset. That revset is currently `all()`, i.e. all
visible commits. Even on local disk, it's probably cheaper to
resolve a small revset first and then resolve the prefix within that
than it is to build up the index of all visible change ids.
We could achieve the goal by letting each revset engine respect the
configured subset, but since the solution proposed above makes sense
also for local-disk repos, I think it's better to do it outside of the
revset engine, so all revset engines can share the code.
This commit prepares for the new functionality by moving the symbol
resolution out of `Index::evaluate_revset()`.
The callers don't need to hold on to the revset expression once it's
been evaluated, and having an owned expression (well, an expression
with shared ownership) will avoid a clone in the next commit.
A mapped template is basically a combined function that takes context: &C,
extracts Vec<O>, and formats each item with Template<C>. It cannot be cleanly
turned into a function of (&C) -> Vec<Template<()>> type. So list-like methods
are implemented on Box<dyn ListTemplate<C>> instead.
I'm going to add a trait that provides .join() -> Box<dyn Template>.
wrap_template() should handle it transparently, but the current interface
would require excessive boxing.
This involves a little hack to insert a lambda parameter 'x' to be used at
keyword position. If the template language were dynamically typed (and were
interpreted), .map() implementation would be simpler. I considered that, but
interpreter version has its own warts (late error reporting, uneasy to cache
static object, etc.), and I don't think the current template engine is
complex enough to rewrite from scratch.
.map() returns template, which can't be join()-ed. This will be fixed later.
A lambda expression will be allowed only in .map() operation. The syntax is
borrowed from Rust closure.
In Mercurial, a map operation is implemented by context substitution. For
example, 'parents % "{node}"' prints parents[i].node for each. There are two
major problems: 1. the top-level context cannot be referred from the inner map
expression. 2. context of different types inserts arbitrarily-named keywords
(e.g. a dict type inserts "{key}" and "{value}", but how we could know.)
These issues should be avoided by using explicitly named parameters.
parents.map(|parent| parent.commit_id ++ " " ++ commit_id)
^^^^^^^^^ global keyword
A downside is that we can't reuse template fragment in map expression. Suppose
we have -T commit_summary, -T 'parents.map(commit_summary)' doesn't work.
# only usable as a top-level template
'commit_summary' = 'commit_id.short() ++ " " ++ description.first_line()'
Another problem is that a lambda expression might be confused with an alias
function.
# .map(f) doesn't work, but .map(g) does
'f(x)' = 'x'
'g' = '|x| x'
The `jj debug` commands are hidden from help and are described as
"Low-level commands not intended for users", but e.g. `jj debug
completion` is intended for users, and should be visible in the help
output.
By using one letter for the path type before and one letter for path
type after, we can encode much more information than just the current
'M'/'A'/'R'. In particular, we can indicate new and resolved
conflicts. The color still encodes the same information as before. The
output looks a bit weird after many years of using `hg status`. It's a
bit more similar to the `git status -s` format with one letter for the
index and one with the working copy. Will we get used to it and find
it useful?
I'm going to add a lambda expression, and the current type-error message
wouldn't work for the lambda type. I also renamed "argument" to "expression"
as the expect_<type>() helper may be called against any expression node.
This is similar to the structure of RevsetParseError. It's unlikely we would
need to discriminate parsing errors, so let's avoid wasting time on naming
things.
In templater, it's easier to handle invalid format string at parsing stage, so
I want to build formatting items upfront. Since the formatting items borrow
the input string by reference, we need to manually convert them to the owned
variants.
While measuring overhead of interpreter version of the template engine, I
noticed the templater spend some time in chrono. I don't think this would
matter in practice, but it's easy to cache the formatting items.
% jj log -r'all()' -T'".\n"' --no-graph | wc -l
2996
% hyperfine --warmup 3 --runs 20 "jj log --ignore-working-copy -r 'all()' -Tshow --no-graph"
(original)
Time (mean ± σ): 120.0 ms ± 18.7 ms [User: 97.5 ms, System: 22.5 ms]
Range (min … max): 96.7 ms … 144.1 ms 20 runs
(new)
Time (mean ± σ): 106.2 ms ± 12.3 ms [User: 86.1 ms, System: 20.1 ms]
Range (min … max): 96.3 ms … 130.4 ms 20 runs
Regarding the template engine rewrites, I'm yet sure that the interpreter
version is strictly better. It's simpler, but could make some caching story
difficult. So I'm not gonna replace the engine anytime soon.
We want to allow custom revset engines define their own graph
iterator. This commit helps with that by adding a
`Revset::iter_graph()` function that returns an abstract iterator.
The current `RevsetGraphIterator` can be configured to skip or include
transitive edges. It skips them by default and we don't expose option
in the CLI. I didn't bother including that functionality in the new
`iter_graph()` either. At least for now, it will be up to the
implementation whether it includes such edges (it would of course be
free to ignore the caller's request even if we added an option for it
in the API).
This commit adds an `evaluate_revset()` function to the `Index`
trait. It will require some further cleanup, but it already achieves
the goal of letting the index implementation decide which revset
engine to use.
We want to allow customization of the revset engine, so it can query
server indexes, for example. The current revset implementation will be
our default implementation for now. What's left in the `revset` module
after this commit is mostly parsing code.
Now that there's a single implementation of `Revset`, I think it makes
more sense for `is_empty()` to be defined there. Maybe different
revset engines have different ways of implementing it. Even if they
don't, this is trivial to re-implement in each revset engine.
As the comment above `ToPredicateFn` says, it could be a private
type. This commit makes that happen by making the private `Revset`
implementations (`DifferenceRevset` etc.) instead implement an
internal revset type called `InternalRevset`. That type is what
extends `ToPredicateFn`, so the public type doesn't have to. The new
type will not need to implement the new functions I'm about to add to
the `Revset` trait.
We don't want the public `Revset` interface to know about
`ToPredicateFn`. In order to hide it, I'm wrapping the internal type
in another type, so only the internal type can keep implementing
`ToPredicateFn`.
I'd like to be able to change the return type of `evaluate_revset()`
to be an internal type. Since all external callers currently call the
function via `RevsetExpression::evaluate()`, it turns out it's easy to
make it private. To benefit from an internal type, we also need to
make the recursive calls be directly to the internal function.
@joyously found `o` confusing because it's a valid change id prefix. I
don't have much preference, but `●` seems fine. The "ascii",
"ascii-large", and "legacy" graph styles still use "o".
I didn't change `@` since it seems useful to have that match the
symbol used on the CLI. I don't think we want to have users do
something like `jj co ◎-`.
I'm about to make the default (non-working-copy) node symbol be a
unicode symbol, but we only want that when using a unicode graph, so
users with a terminal that doesn't support unicode can get plain ASCII
output by setting e.g. `ui.graph.style = "ascii"`.
The command grepped for 'o ' and picked the third line. That was meant
to match the graph nodes only, but it also matched the 'jj co master'
line. Let's match only 'o' at the beginning of the line, and throw in
another space for good measure (since that's what we get from the new
default graph style from Sapling).