I'm going to rewrite the error output to print source chain one by one.
The "{message}: {err}" pattern is wrapped in an error struct. InternalError
variant could be a { message, source } pair, but not all errors need an
additional message. This will also apply to UserError.
This prevents rust doctests from trying to compile the code blocks.
I don't think we use doctests much or at all, but I'd like
`cargo insta test --test-runner nextest --accept --workspace`
to pass.
Unfortunately, this affects the output of `jj help` as well, but I
can't think of a workaround for that.
I am using a very hacky approach, using `insta` to generate the markdown help.
This is based on a lucky coincidence that `insta` chose to use a header
format for snapshot files that MkDocs interprets as a Markdown header (and
ignores).
I considered several other approaches, but I couldn't resist the facts that:
- This doesn't require new developers to install any extra tools or run any
extra commands.
- There is no need for a new CI check.
- There is no need to compile `jj` in the "Make HTML docs" GitHub action,
which is currently very fast and cheap.
Downside: I'm not sure how well MkDocs will work on Windows, unless the
developer explicitly enables symbolic links (which IIUC is not trivial).
### Possible alternatives
My next favorite approaches (which we could switch to later) would be:
#### `xtask`
Set up a CI check and a [Cargo `xtask`] so that `cargo xtask cli-help-to-md`
essentially runs `cargo run -- util markdown-help > docs/cli-reference.md` from
the project root.
Every developer would have to know to run `cargo xtask cli-help-to-md` if
they change the help text.
Eventually, we could have `cargo xtask preflight` that runs `cargo +nightly
fmt; cargo xtask cli-help-to-md; cargo nextest run`, or `cargo insta`.
#### Only generate markdown for CLI help when building the website, don't track it in Git.
I think that having the file in the repo will be nice to preview changes to
docs, and it'll allow people to consult the file on GitHub or in their repo.
The (currently) very fast job of building the website would now require
installing Rust and building `jj`.
#### Same as the `xtask`, but use a shell script instead of an `xtask`
An `xtask` might seem like overkill, since it's Rust instead of a shell script.
However, I don't want this to be a shell script so that new contributors on
Windows can still easily run it ( since this will be necessary for the CI to
pass) without us having to support a batch file.
#### Cargo Alias
My first attempt was to set up a [cargo alias] to run this, but that doesn't
support redirection (so I had to change the `jj util` command to output to a
file) and, worse, is incapable of executing the command *in the project root*
regardless of where in the project the current directory is. Again, this seems
to be too inconvenient for a command that every new PR author would have to run
to pass CI.
Overall, this just seems a bit ugly. I did file
https://github.com/rust-lang/cargo/issues/13348, I'm not really sure that was
worthwhile, though.
**Aside:** For reference, the alias was:
```toml
# .cargo/config.toml
alias.gen-cli-reference = "run -p jj-cli -- util markdown-help docs/cli-reference.md"
```
### Non-alternatives
#### Clap's new feature
`clap` recently obtained a similarly-sounding feature in
https://github.com/clap-rs/clap/pull/5206. However, it only prints short help
for subcommands and can't be triggered by an option AFAICT, so it won't help us
too much.
[Cargo `xtask`]: https://github.com/matklad/cargo-xtask
[cargo alias]: https://doc.rust-lang.org/cargo/reference/config.html#alias
This uses the [`clap-markdown`] library. It's not very flexible, but seems to
work. Its implementation is not difficult. If needed, we could later
reimplement the part that iterates over all subcommands and have a different
output (e.g., the boring version of each help text inside its own code block,
or a different file per subcommand, or some fancy template in handlebars or
another rust-supported templating language). I don't want to do it right now,
though.
The library does turn out to have some annoying bugs, e.g.
https://github.com/ConnorGray/clap-markdown/pull/18, but I think that's not a
deal-braker. The fix seems to be 3 lines; if the fix doesn't get merged soon, we
could vendor the library maybe?
[`clap-markdown`]: https://docs.rs/clap-markdown/latest/clap_markdown/
GitBackend::gc() will need to check if a commit is reachable from any
historical operations. This could be calculated from the view and commit
objects, but the Index will do a better job.
The "::HEAD" set is usually smaller than "::git_refs". If these sets were
imported in that order, "HEAD..git_refs" commits would be indexed on top of
the "::HEAD" commits. It's not a problem, but undesirable.
I'm going to make WorkspaceCommandHelper::maybe_snapshot() snapshot the working
copy before importing refs. git::import_some_refs() can rebase the working copy
branch and therefore @ can be moved. git::import_head() doesn't, and it should
be invoked before snapshotting.
git::import_head() is inserted to some of the git:import_refs() callers where
HEAD seems to matter. I feel it's a bit odd that the HEAD ref is imported to
non-colocated repo, but "jj init --git-repo" relies on that, and I think the
existence of HEAD@git is harmless. It's merely a ref to the revision checked
out somewhere else.
This is a convenient command, for scripting things like `cd $(jj root)
&& do something`, and it seems better to allow people to find it
before they learn about workspaces.
This goes with `c`, `s`, `f`. I feel like it's a relatively
common command now that `auto-local-branch` defaults to
`false`.
I didn't add `u` for `untrack` because it seems a little
ambiguous (un-what?); there may be other `branch` commands that undo
something in the future.
The count() function in this trait is used by "jj branch" to determine
(and then report) how many commits a certain branch is ahead/behind
another branch. This is currently implemented by walking all commits
in the revset, counting how many were encountered. But this could be
improved: if the number is large, it is probably sufficient to report
"at least N" (instead of walking all the way), and this does not scale
well to jj backends that may not have all commits present locally (which
may prefer to return an estimate, rather than access the network).
Therefore, add a function that is explicitly documented to be O(1)
and that can return a range of values if the backend so chooses.
Also remove count(), as it is not immediately obvious that it is an
expensive call, and callers that are willing to pay the cost can obtain
the exact same functionality through iter().count() anyway. (In this
commit, all users of count() are migrated to iter().count() to preserve
all existing functionality; they will be migrated to count_estimate() in
a subsequent commit.)
"branch" needed to be updated due to this change. Although jj
is currently only available in English, I have attempted to keep
user-visible text from being assembled piece by piece, so that if we
later decide to translate jj into other languages, things will be easier
for translators.
A subsequent commit will teach more variants of ahead/behind messages,
so refactor the existing code to avoid code duplication and a future
combinatorial explosion.
Gerrit also needs to be able to push low-level refs into upstream remotes, just
like `jj git push` does. Doing so requires providing callbacks e.g. for various
password entry mechanisms, which was private to the `git` command module.
Pull these out to a new module `git_utils` so we can reuse it across the two
call sites.
This also moves a few other strictly Git-related functions into `git_utils`
as well, just for the sake of consistency.
Signed-off-by: Austin Seipp <aseipp@pobox.com>
* Move canonicalization of the external git repo path into the Workspace::init_git_external().
This keeps necessary code together.
* Add a new variant of WorkspaceInitError for reporting path not found errors. The user error
string is written to pass existing tests.
It seems obvious in hindsight to have a virtual root operation just
like we have a virtual root commit. It removes the same kind of
problems by making sure there's always a common ancestor (or multiple)
between any two commits.
I think the reason I didn't add a root operation from the beginning
was that there used to be a mandatory working-copy commit in the view
(this was before support for multiple workspaces).
Perhaps we should remove the "initialize repo" operation now. The only
difference between their view objects is that the "initialize repo"
operation adds the root commit as a head. We could add that to the
root operation, but then the root operation's value depends on the
commit backend.
We've had the public_heads for as long as we've had the View object,
IIRC (I didn't check), but we still don't use it for anything. I don't
have any concrete plans for using it either. Maybe our config for
immutable commits is good enough, or maybe we'll want something more
generic (like Mercurial's phases). For now, I think we should simplify
by removing it the storage for public heads.
This function respects --ignore-working-copy and --at-op arguments, but the
name snapshot() sounds like it could forcibly trigger the snapshotting. Let's
clarify the actual behavior.
import_git_refs_and_head() and snapshot_working_copy() are unchecked functions,
and there are no external callers.
When debugging behavior of badly-GCed repos, I find it's annoying that "op log"
fails because the index can't be loaded. Since "op log" doesn't need a repo, I
think it's better to display the exact op-heads state without merging.
I'm going to make "op log" not load a repo nor resolve concurrent operations,
so there will be the case where no unique "current" operation exists. Since the
"current" operation represents the repo to be loaded, I think it's better to
not consider multi-head operations as the "current" ones. That's why the
templater field isn't Vec<_> but Option<_>.
If indexing failed due to missing commit objects, the repo won't be loadable
without --ignore-working-copy (at least in colocated environment.) In that
case, we can use "op abandon" to recover, but we had to work around the failed
index loading by --ignore-working-copy. Since "op abandon" isn't a repo-level
command, it's better to bypass working-copy snapshot and import of git refs at
all.
--at-op is rejected because it's useless and we'll need extra care for "@"
expression resolution and working-copy updates.
The formatting is closer to hg than git just because it's easier to
conditionalize the whole line per keyword. Our template language doesn't
have infix logical operators.
Unlike the other default templates, all remote branches are displayed because
it's "detailed" output.
Closes#2509
If the existing index was corrupt, it would have to be rebuilt while loading a
repo (at least in colocated environment.) Then, the index will be rebuilt again.
Since new operations and views may be added concurrently by another process,
there's a risk of data corruption. The keep_newer parameter is a mitigation
for this problem. It's set to preserve files modified within the last 2 weeks,
which is the default of "git gc". Still, a concurrent process may replace an
existing view which is about to be deleted by the gc process, and the view
file would be lost.
#12
It doesn't make sense to do gc from a non-head operation because that means
either the head operation would be corrupted or the --at-op argument is
ignored.
We would like to use a non-static version string at Google. We have a
workaround using Lazy, but it seems unfortunate to have to do
that. Using dynamic strings requires Clap's `string` feature, which
increases the binary size by 140 kiB (0.6%).
I'm going to add try_from_hex(), which requires Self: Sized. Such trait bound
could be added, but I don't think we'll need abstracted ObjectId constructors
at all.
I'm going to add a prefix resolution method to OpStore, but OpStore is
unrelated to the index. I think ObjectId, HexPrefix, and PrefixResolution can
be extracted to this module.
In order to implement GC (#12), we'll need to somehow prune old operations.
Perhaps the easiest implementation is to just remove unwanted operation files
and put tombstone file instead (like git shallow.) However, the removed
operations might be referenced by another jj process running in parallel. Since
the parallel operation thinks all the historical head commits are reachable, the
removed operations would have to be resurrected (or fix up index data, etc.)
when the op heads get merged.
The idea behind this patch is to split the "op log" GC into two steps:
1. recreate operations to be retained and make the old history unreachable,
2. delete unreachable operations if the head was created e.g. 3 days ago.
The latter will be run by "jj util gc". I don't think GC can be implemented
100% safe against lock-less append-only storage, and we'll probably need some
timestamp-based mechanism to not remove objects that might be referenced by
uncommitted operation.
FWIW, another nice thing about this implementation is that the index is
automatically invalidated as the op id changes. The bad thing is that the
"undo" description would contain an old op id. It seems the performance is
pretty okay.
This removes CommandError dependency from these resolution functions. We might
want to refactor the error types again if we introduce a real "opset" evaluator.
The error message for unresolved op heads now includes "@" instead of the whole
expression.
The resolver callback usually returns wider error type, which I don't think
is a variant of OpHeadResolutionError.
To help type inference, resolver's error type is E, not E1 where E: From<E1>.
I'll probably add change id lookup methods to CompositeIndex. The Index trait
won't gain resolve_change_id_prefix(), but I also renamed its resolve_prefix()
for consistency.
This requires creating a new public API as a substitute. I took the opportunity
to also add some comments to the
`MutRepo::record_rewritten_commit`/`record_abandoned_commit` functions.
I imade the simplest possible addition to the API; it is not a very elegant
one. Eventually, the entire `record_rewritten_commit` API should probably be
refactored again.
I also added some comments explaining what these functions do.
At some point, I tried `new_commit` instead of `rewrite_commit` in the split
command. That seemed to work, but messed up the dates in a subtle way.
This commit should prevents repeats of this mistake and emphasize the
importance of the author dates being preserved.
This partially reverts 6c627fb30d "cli: default to log when no subcommand is
provided." We could reject an empty alias at all, but we would still need to
ensure that the expanded alias contained a subcommand name.
The help output is a bit odd as the <COMMAND> can be omitted, but I think
that's acceptable. If we do care about that, maybe we can override_usage().
This is basically the same as Mercurial's workaround. I don't know about Git,
but arguments order is very restricted in git, so -C path can be parsed prior
to alias expansion. In hg and jj, doing that would be messy and unreliable.
Closes#2414
It should be better to handle invalid -R path globally. The error message is
a bit worse, but I think it's still okay.
This helps to load temporary config from the cwd-relative path. If the command
processing continued with an invalid -R path, the temporary config would have
to be explicitly discarded.
Make it clearer what the command does, make the error message when the file is
not ignored less of a surprise.
Also, I think it's nice to mention `.git/info/exclude`, since the path is
not trivial to remember.
As far as I can see in the chat, there's no objection to changing the default,
and git.auto-local-branch = false is generally preferred.
docs/branches.md isn't updated as it would otherwise conflict with #2625. I
think the "Remotes" section will need a non-trivial rewrite.
#1136, #1862
This is really a simple change that does the following in a transaction:
* Set the new branch name to point to the same commit as the old branch name.
* Set the old branch name to point to no commit (hence deleting the old name).
Before it starts, it confirms that the new branch name is not already in use.
I originally thought this would be unavoidable, but I was wrong. "jj git clone"
doesn't implicitly create any local branch if git.auto-local-branch is off, and
that's fine because the detached HEAD state is normal in jj.
That being said, Git users would expect that the main/master branch exists.
Since importing the default branch is harmless, let's create and track it no
matter if git.auto-local-branch is off.
It seems better to have the caller pass the transaction description
when we finish the transaction than when we start it. That way we have
all the information we want to include more readily available.
This tries to clarify the fact that the branches must be remote and the syntax
for specifying them as globs.
Cc @yuja, https://github.com/martinvonz/jj/pull/2625#discussion_r1423379351
Here is the result (excerpt):
```
$ jj branch track --help
Start tracking given remote branches
A tracking remote branch will be imported as a local branch of the same name. Changes to it
will propagate to the existing local branch on future pulls.
Usage: jj branch track [OPTIONS] <BRANCH@REMOTE>...
Arguments:
<BRANCH@REMOTE>...
Remote branches to track
By default, the specified name matches exactly. Use `glob:` prefix to select
branches by wildcard pattern. For details, see
https://github.com/martinvonz/jj/blob/main/docs/revsets.md#string-patterns.
Examples: branch@remote, glob:main@*, glob:jjfan-*@upstream
```
default_index_store.rs is relatively big, and it contains types and impls in
arbitrary order. Let's split them into sub modules. After everything moved,
mod.rs will only contain tests.
This prints a hint about using `jj new <first conflicted commit>` and
`jj squash` to resolve conflicts. The hint is printed whenever there
are new or resolved conflicts.
I hope this hint will be useful especially for new users so they know
which commit to resolve conflicts in first. It may not be obvious that
they should start with the bottommost one. I hope the hint will also
be useful for more more experienced user by letting them just copy the
printed command without first running `jj log` to find the right
commit..
When e.g. `jj rebase` results in new conflicts, it's useful for the
user to learn about that without having to run `jj log` right
after. This patch adds reporting of new conflicts created by an
operation. It also add reporting of conflicts that were resolved or
abandoned by the operation.
There was no measurable performance impact when rebasing a single
commit in the Linux kernel repo.
Before, an absolute path would be saved in the git_target file if .git is a
symlink. That's not wrong, but seemed a bit weird. Let's consolidate the
behavior across .git file types.
Apparently, libgit2 doesn't deduce "core.bare" config from the directory name,
but gitoxide implements it correctly. So we shouldn't blindly canonicalize
the Git repository path. Fortunately, the saved git_target path isn't a fully-
canonicalized form (unless user explicitly sepcified "--git-repo ./.git"), so
we don't need a hack to remap git_target back to the symlink path.
is_colocated_git_workspace() is adjusted since the git_workdir is no longer
resolved from the fully-canonicalized repo path, at least in our code. Still we
have the ".git/.." fallback because test_init_git_colocated_symlink_gitlink()
would otherwise fail. I haven't figured out why, and the test might be actually
wrong compared to the git CLI behavior, but let's not change that for now.
Fixes#2668
A git repo created by the "repo" tool doesn't have core.base set, which means
the "bare"-ness relies on the directory name. Gitoxide appears to parse it
correctly, whereas libgit2 doesn't. That's why the symlinked .git repo is no
longer processed as a colocated repo.
#2668
This adds an initial `jj util gc` command, which simply calls `git gc`
when using the Git backend. That should already be useful in
non-colocated repos because it's not obvious how to GC (repack) such
repos. In my own jj repo, it shrunk `.jj/repo/store/` from 2.4 GiB to
780 MiB, and `jj log --ignore-working-copy` was sped up from 157 ms to
86 ms.
I haven't added any tests because the functionality depends on having
`git` binary on the PATH, which we don't yet depend on anywhere
else. I think we'll still be able to test much of the future parts of
garbage collection without a `git` binary because the interesting
parts are about manipulating the Git repo before calling `git gc` on
it.
See comments inline for details. Cc #2600.
In particular, I wanted to make sure these behaviors are not affected by #2646.
They don't seem to be.
The tests ended up weirder than expected because of
https://github.com/martinvonz/jj/issues/2600#issuecomment-1835418824. Even
though, right now, the behavior of tests is unaffected by that issue, the
*expected* behavior is different.
Branches move around a little confusigly with `abandon`. We do want to keep
them, to test their behavior, but we can show the change id to make things
clearer.
Each instance of the enum represents a single command, so singular
`*Command` seems better. That also seems to match the examples in
clap's documentation.
Note that one of the new tests panics; this is a newly discovered bug.
In Git, a commit's direct parent is allowed to also be an indirect ancestor
at the same time. `jj` currently tries to prevent this situation, but does
allow it. The correctness of `rebase -r A -d descendant_of_A` currently depends
on this jj-specific behavior; we should change that.
Cc #2600
Allowing `jj init --git` in an existing Git repo creates a second Git
store in `.jj/repo/store/git`, totally disconnected from the existing
Git store. This will only produce extremely confusing bugs for users,
since any operations they make in Git will *not* be reflected in the
jj repo.
This enables cheap str-to-RepoPath cast, which is useful when sorting and
filtering a large Vec<(String, _)> list by using matcher for example. It
will also eliminate temporary allocation by repo_path.parent().
This follows up on 3967f63 (see that commit's description for more
motivation) and e79c8b6.
In a discussion linked below, it was decided that forbidding `-r --skip-empty`
entirely is preferable to the mixed behavior introduced in 3967f63.
3967f637dc (commitcomment-133539911)
This follows up on @matts1 's #2609.
We still allow the `-r` commit to become empty. I would be more comfortable if
there was a test for that, but I haven't done that (yet?) and it seems pretty
safe. If that's a problem, I'm happy to forbid `-r --skip-empty` entirely,
since it is far less useful than `-s --skip-empty` or `-b --skip-empty`.
I think it is undesired to abandon emptied descendants. As far as descendants
of `A` are concerned, `jj rebase -r A` should be equivalent to `jj abandon A`,
and `jj abandon` does not remove emptied commits. It also doesn't seem very
useful to do that, since I think descendant commits of an abandoned (or moved
with `-r`) commit only become empty in pathological cases.
Additionally, if we did want -r to empty descendants of `A`, we'd have to add
thorough tests and possibly improve the algorithm. I want to refactor `rebase
-r` and add features to it, and having to consider cases of commits becoming
abandoned makes everything harder.
For example, if we have
```
root -> A -> B -> C
```
and `jj rebase -r A -d C` empties commit `B` (or `C`), I do not know whether
the current algorithm will work correctly. It seems possible that it would, but
that depends on the fact that empty merge commits are not abandoned for
descendants. That seems dangerous to rely on without tests.
I hope (but can't promise) that in the near future, making DescendantRebaser
return more information should help make it possible to create such
functionality in a more robust way. I am likely to attempt this as part of
implementing `-r --after`.
Adress the post-merge comments from #2486, which found a doc comment
inaccurate and to not blindly ignore `-j 0` which would've worked until now.
I've also reduced the default `jobs` size to one, as it's user-visible configuration
which determines how many processes should run.
Thanks to @necauqua the controversial `unsafe` usage was already removed.
I've omitted to change `revisions` from an Vec to a RevisonArg for the moment,
as I will keep working on the file anyway.
If the existing git repo contains local and remote branches of the same name,
one of the remote branches is probably a tracking remote branch. Let's show
a hint how to set up tracking branches. The tracking state could be derived
from .git/config, but doing that automatically might cause another issue like
#1862, which could have been mitigated by git.auto-local-branch = false.
`RevsetExpression::resolve()` is meant for programmatically created
expressions. In particular, it may not contain symbols. Let's try to
clarify that by renaming the function and documenting it.
Repeating these is a no-op. This allows:
```shell
jj new -r a -r b # Equivalent to jj new a b
jj new --before a --before b # Equivalent to jj new a b --before
```
I keep typing the latter and getting an annoying error.
Follows up on 13c93d5270.
The information I added is explained in that commit's description, but
I feel like it could reduce confusion for future readers of the code.
Since this is the error to spawn (or wait) process, command arguments aren't
important. Let's make that clear by not showing full command string.
#2614
The `scm-record` library comments say that the `file_mode` is:
> The Unix file mode of the file (before any changes), if available.
This reverts commit ffd6884 and fixes#2591 and #2548.
We need to .collect_vec() the parents iterator to temporary buffer since the
borrowed iterator can't be returned back to the dag_walk functions. Another
option is to clone op_store and parent ids to remove &self lifetime from the
iterator, but that also means a temporary Vec is created.
GitBackend will use it to configure gix::Repository. I think UserSettings
is generally useful to pass store-specific parameters, so I've updated all
factory functions.
I'll make it propagate OpStoreError, but OpStoreError is quite different
from the existing StaleWorkingCopyError. I think this error isn't actually
an "error" but a description of the working copy state.
Per discussion in https://github.com/martinvonz/jj/discussions/2555. I'm
okay with either way, but it's confusing if we had "branch create" and
"branch set" and both of these could create a new branch.
Renamed `description_template_for_commit` to
`description_template_for_describe` since it's only used in
`cmd_describe`.
Renamed `description_template_for_cmd_split` to
`description_template_for_commit` and modified to accomodate empty
`intro` argument.
Fixes#2439.
We have a few places where we have a `MergedTreeValue` and need to
read the data associated with it so we can write to the working copy
or include it in a diff. Let's extract some of that shared logic to a
function so we can reuse it. I plan to use it for reading file
contents in advance while streaming a diff in `local_working_copy`
soon (and probably in `jj diff` thereafter), but I think it seems like
an improvement on its own.
Remove a couple of unnecessary unsafes:
- The NonZeroUsize is a constant where the unwrap will optimize away
anyway and we don't have an unsafe without any good reason there :)
- The other two were simply not needed, lifetimes worked fine, maybe
Rust became better since that code was written? NLL? Anyway, they're
gone now
As discussed in Discord, it's less useful if remote_branches() included
Git-tracking branches. Users wouldn't consider the backing Git repo as
a remote.
We could allow explicit 'remote_branches(remote=exact:"git")' query by changing
the default remote pattern to something like 'remote=~exact:"git"'. I don't
know which will be better overall, but we don't have support for negative
patterns anyway.
Since the concurrent diff algorithm is significantly slower when using
the Git backend, I think we'll have to use switch between the two
algorithms depending on backend. Even if the concurrent version always
performed as well as the sequential version, exactly how concurrent it
should be probably still depends on the backend. This commit therefore
adds a function to the `Backend` trait, so each backend can say how
much concurrency they deal well with. I then use that number for
choosing between the sequential and concurrent versions in
`MergedTree::diff_stream()`, and also to decide the number of
concurrent reads to do in the concurrent version.