Performance on repositories with many commits is limited somewhat by repeatedly
stating the tablestore directory to work out what the head is. By caching the
table rather than looking it up from disk on every request, we can much more
rapidly satisfy requests.
This avoids the pathological case in #845 where jj operations take several
minutes to complete.
This patch doesn't change the normal flow of the write path: that will still
always call get_head() on the underlying TableStore, which will stat the
directory before writing out changes. It will however empty the cache when the
metadata has been written.
Fixes#845.
The implementation has some hoops to jump through because Rust does not allow
`self: &Arc<Self>` on trait methods, and two of the OpHeadsStore functions need
to return cloned selves. This is worked around by making the implementation type
itself a wrapper around Arc<>.
This is not particularly note worthy for the current implementation type where
the only data copied is a PathBuf, but for extensions it is likely to be more
critical that the lifetime management of the OpHeadsStore is properly
maintained.
Since we call `cargo_out_dir()` - which is the preferred way of using
`protobuf_codegen::Codegen` in `build.rs` - our call to `out_dir()`
has no effect.
I ran an upgraded Clippy on the codebase. All the changes seem to be
about using variables directly in format strings instead of passing
them as separate arguments.
While working on ancestor generation, I noticed Mercurial has this
substitution rule. Since it's easier to deal with Ancestors() than Range {},
'roots..heads' is first decomposed to ':heads & ~:roots'.
I failed to solve type puzzle for to_predicate_fn<'a>(&'a self) where
'repo: 'a, so struct RevWalkRevset<'repo, T> is bounded by T to consume
the lifetime parameter.
This will be a building block of 'parents(base)' revset. 'base---' will
be .filter_by_generation(3..4) for example. I think 'ancestors(base)' can
also have an optional generation parameter, but I haven't considered any
particular syntax yet.
Even though I couldn't determine if RevWalkGenerationRange has a measurable
cost compared to RevWalk, I'm not comfortable with enabling generation
tracking by default. So this patch adds a separate struct. I duplicated
Iterator::next() method as it seemed rather complicated to extract a common
iterator wrapper.
Actual filtering function and tests will be added by the next commit.
We're more likely to filter out empty commits, so this should be slightly
faster in practice.
The extra Option<> isn't needed, but it should clarify that "prefix([])"
is not "everything".
This basically transforms 's1 & (f() | s2)' to
's1.iter().filter(all && f || s2)'. Still the predicate part includes "all",
the filter function doesn't need to load commit data for every entry since
's1.iter().filter(all)' is tested first. To optimize "all" predicate out,
maybe we can add a wrapper that returns '|_: &IndexEntry| true'.
Instead of inserting AsFilter(_) node, I could add a recursive is_filter()
function. That would also work so long as the height of RevsetExpression tree
is limited. I chose node insertion just for ease of snapshot testing.
The idea behind this is to extend RevWalk to track generation (or depth from
the initial wanted items.) Basic DAG walk doesn't need such data, but a query
like 'rev---' could be translated to a RevWalk yielding nth ancestors.
The default log revset can also be expressed as 0/1-th ancestors of
'(remote_branches() | tags())..'.
Also, this appears to be faster than using boundary sets, based on the
bench extracted from test_index_commits_criss_cross().
In order to optimize a query like '(author(_) | @) & main..', we'll probably
need a predicate form of an iterable set so that the query can be evaluated
to '(main..).iter().filter(author(_) | @)'. And if a predicate function can
terminate the source iterator early (by returning true/false/false_forever),
complexity of a filtered revset is basically the same as an intersection of
iterator pair. This means we can eventually merge IntersectionRevset with
FilterRevset.
With that in mind, this patch removes the redundant 'candidates' field from
the filter node, which would otherwise appear in the predicate function as
'candidates.contains(entry)'. A filter node with candidates was somewhat
useful while rewriting the tree, but that can be dealt with a view function
like as_filter_intersection() in this patch.
This also simplify the subsequent filter transformation as we no longer need
to test if candidates == All.
Previously we only have a test for the left recursion. The added test
contains right recursion path, which should have caught the error I made
while working on the next "unary filter node" patch.
The doc comment summarizes what I'm going to implement. I'm not sure if
we'll add all of them because revset evaluation isn't the key performance
bottleneck at the moment. Anyway, I don't think any of these ideas would
logically conflict with segmented changelog adaptation unless we decide to
replace the whole revset stack with Eden/Sapling's.