The `DescendantRebaser` was designed to help with rebasing in two
different use cases: 1) after regular rewriting of commits where the
change ID is preserved, and 2) after importing moved branches from
other repo (e.g. backing Git repo or remote). Many of the tests are
for the second use case, such as where a branch was moved
forward. However, I just noticed that there's a pretty common scenario
from the first use case that is not supported.
Let's say you have this history:
```
D
|
C C'
|/
B B'
|/
A
```
Here we want C' to be rebased onto B' and then D to be rebased onto
C''. However, because of the support for moving branches forward, we
would not rebase commits that were already rewritten, such as C' here
(see affected tests for details), which resulted in D getting rebased
onto C', and both B and B' remaining visible.
I think I was thinking when I designed it that it would be nice if you
could just tell `DescendantRebaser` that any descendants of a commit
should be moved forward. That may be useful, but I don't think we'll
want that for the general case of a branch moving forward. Perhaps
we'll want to make it configurable which branches it should happen
for. Either way, the way it was coded by not rebasing already
rewritten commits did not work for the case above. We may be able to
handle both cases better by considering each rewrite separately
instead of all destinations at once. For now, however, I've decided to
keep it simple, so I'm fixing the case above by sacrificing some of
the potentially useful functionality for moving branches forward.
Another fix necessary for the scenario shown above was to make sure we
always rebase C' before D. Before this patch, that depended on the
order in the index. This patch fixes that by modifying the topological
order to take rewrites into account, making D depend not only on C but
also on C'. (I suppose you could instead say that C depends on both B
and C'; I don't know if that'd make a difference.)
Despite what the documentation said, we don't clear the record of
rewritten and abandoned commits at the end. This change fixes that,
and adds a test showing that it's possible to call
`MutableRepo::rebase_descendants()` multiple times.
When there are concurrent operations that want to update the working
copy, it's useful to know which operation was the last to successfully
update the working copy. That can help use decide how to resolve a
mismatch between the repo view's record and the working copy's
record. If we detect such a difference, we can look at the working
copy's operation ID to see if it was updated by an operation before or
after we loaded the repo.
If the working copy's record says that it was updated at operation A
and we have loaded the repo at operation B (after A), we know that the
working copy is stale, so we can automatically update it (or tell the
user to run some command to update it if we think that's more
user-friendly).
Conversely, if we have loaded the repo at operation A and the working
copy's record says that it was updated at operation B, we know that
there was some concurrent operation that updated it. We can then
decide to print a warning telling the user that we skipped updating
because of the conflict. We already have logic for not updating the
working copy if the repo is loaded at an earlier operation, but maybe
we can drop that if we record the operation in the working copy (as
this patch does).
Having the checkout functionality in `LockedWorkingCopy` makes it a
little more flexible (one could imagine using it for udating working
copy files and then discarding the state changes, for example). It
also lets us reuse a few lines of code for locking. I left
`WorkingCopy::check_out()` for convenience because that's what all
current users want.
`WorkingCopy::check_out()` currently fails if the commit recorded on
disk has changed since it was last read. It fails with a "concurrent
checkout" error. That usually works well in practice, but one can
imagine cases where it's not correct. For an example where the current
behavior is wrong, consider this sequence of events:
1. Process A loads the repo and working copy.
2. Process B loads the repo at operation A. It has not loaded the
working copy yet.
3. Process A writes an operation and updates the working copy.
4. Process B loads the working copy and sees that it is checked out
to the commit process B set it to. We don't currently have any
checks that the working copy commit matches the view's checkout
(though I plan to add that).
5. Process B finishes its operation (which is now divergent with the
operation written by process A). It updates the working copy to
the checkout set in the repo view by process B. There's no data
loss here, but the behavior is surprising because we would usually
tell the user that we detected a concurrent update to the working
copy.
We should instead check that the working copy's commit on disk matches
what the previous repo view said, i.e. the view at the start of the
operation we just committed. This patch does that by having the caller
pass in the expected old commit ID.
We already have two usecases that can be modeled as updating the
`TreeState` without touching the working copy:
1. `jj untrack` can be implemented as removing paths from the tree
object and then doing a reset of the working copy state.
2. Importing Git HEAD when sharing the working copy with a Git repo.
This patch adds that functionality to `TreeState`.
I was surprised that we save the `TreeState` before
`LockedWorkingCopy::finish()`. That means that even if the caller
instead decides to discard the changes, some changes will already have
been written.
This patch changes the interface for making changes to the working
copy by replacing `write_tree()` and `untrack()` by a single
`start_mutation()` method. The two functions now live on the returned
`LockedWorkingCopy` object instead. That is more flexible because the
caller can make multiple changes while the working copy is locked. It
also helps us reduce the risk of buggy callers that read the commit ID
before taking the lock, because we can now make it accessible only on
`LockedWorkingCopy`.
The working copy object knows the currently checked out commit ID. It
is set to `None` when the object is initialized. It is also set to
`None` when an existing working copy is loaded. In that case, it's
used only to facilitate lazy loading. However, that means that
`WorkingCopy::current_commit_id()` fails if the working copy has been
initalized but no checkout has been specified. I've never run into
that case, but it's ugly that it can happen. This patch fixes it by
having `WorkingCopy::init()` take a `CommitId`.
`WorkingCopy::current_commit()` has been there from the beginning. It
has made less sense since we made the repo view keep track of the
current checkout. Let's remove it.
If you create a `dir/.gitignore` file with pattern "dir" in it, it'll
currently match the parent directory, making e.g. the `dir/.gitignore`
file itself ignored. That was quite confusing, and it doesn't match
how Git behaves. This patch fixes the bug.
I ran into a tool that produced a `.gitignore` file with CRLF line
endings. I had not considered that case when implementing support for
`.gitignore`, so we interpreted the CR as part of the string, which of
course made the files not match.
This patch fixes the bug by ignoring a single CR at EOL. That seems to
be what Git does (I didn't see any information about it in the
documentation).
A new Clippy version added a new warning when a function that returns
`Self` doesn't have `#[must_use]`. I feel like all the cases reported
by it were false positives. Most were functions on `CommitBuilder`,
where we take `mut self` and return `Self`. I don't think I've ever
forgotten to use the result of those.
It makes sense to omit either of the arguments of the `..` operator,
even though `..x` is equivalent to `:x`. `x..`, with a implied right
argument of `heads()` is more useful.
If you import Git refs, then rebase a commit pointed to by some Git
ref, and then re-import Git refs, you don't want the old commit to be
made a visible head again. That's particularly annoying when Git refs
are automatically updated by every command.
Since 94e03f5ac8, we lazily filter out non-heads from `View`'s set
of head. Accessing the `MutRepo::view()` triggers that filtering. This
patch makes `DescendantRebaser` not unnecessarily do that. That speeds
up the rebasing of 162 descendants in the git.git repo from ~3.6 s to
~330 ms. Rebasing 1272 descendants takes ~885 ms.
When you run e.g. `jj describe <some old commit>` or `jj squash -r
<some old commit>`, the descendants' tree objects will not change, so
we can avoid calculating them. This speeds up rebasing of 126 commits
in the git.git repo from ~9.8 s to ~3.6 s.
I recently (0c441d9558) made it so we don't create an operation when
nothing changed. Soon thereafter (94e03f5ac8), I broke that when I
introduced a cache-invalidation bug when I made the filtering-out of
non-heads be lazy. This patch fixes that and also adds a test to
prevent regressions.
When initializing a jj repo in the same directory as its backing git
repo, add `.jj/` to `.git/info/exclude` so it doesn't show up to `git`
commands.
This is part of #44.
It's very slow to remove non-heads from the set of heads every time we
add an head. For example, in the git.git repo, a no-op `jj git import`
takes ~15 s. This patch changes makes us just mark the set of heads
dirty when a commit has been added and then we remove non-heads when
needed. That cuts down the `jj git import` time to ~200 ms.
It's useful to know which commit is checked out in the underlying Git
repo (if there is one), so let's show that. This patch indicates that
commit with `HEAD@git` in the log output. It's probably not very
useful when the Git repo is "internal" (i.e. stored inside `.jj/`),
because then it's unlikely to change often. I therefore considered not
showing it when the Git repo is internal. However, it turned out that
`HEAD` points to a non-existent branch in the repo I use, so it won't
get imported anyway (by the function added in the previous patch). We
can always review this decision later.
This is part of #44.
This patch adds a place for tracking the current `HEAD` commit in the
underlying Git repo. It updates `git::import_refs()` to record it. We
don't use it anywhere yet.
This is part of #44.
I think these remaining implementations of `Drop` are for types that
are infrequently dropped (unlike `Transaction`), so it be fine to be
more strict about them.
If nothing changed in a transaction, it's rarely useful to commit it,
so let's avoid that. For example, if you run `jj git import` without
changing the anything in the Git repo, we now just print "Nothing
changed.".
I'm about to change `ReadonlyRepo::load()` to take the path to the
`.jj/` directory, so this patch prepares for that. It already works
because `ReadonlyRepo::load()` will search up the directory tree for
the `.jj/` entry.
`ReadonlyRepo::init_*()` currently calls `WorkingCopy::init()`. In
order to remove that dependency, this patch wraps the
`ReadonlyRepo::init_*()` functions in new `Workspace` functions. A
later patch will have those functions call `WorkspaceCopy::init()`.`
The `Repo` doesn't do anything with the `WorkingCopy` except keeping a
reference to it for its users to use. In fact, the entire lib crate
doesn't do antyhing with the `WorkingCopy`. It therefore seems simpler
to have the users of the crate manage the `WorkingCopy` instance. This
patch does that by letting `Workspace` own it. By not keeping an
instance in `Repo`, which is `Sync`, we can also drop the
`Arc<Mutex<>>` wrapping.
I left `Repo::working_copy()` for convenience for now, but now it
creates a new instance every time. It's only used in tests.
This further decoupling should help us add support for multiple
working copies (#13).
Having a concept of a "workspace" will be useful for adding support
for multiple workspaces (#13). You can think of the "workspace" as a
repo combined with a working copy. A workspace corresponds 1:1 with a
`.jj/` directory. It's pretty close to what other VCS simply call a
"repo", but I've ended up using the word "repo" for what Git calls a
"bare repo".
The recent e5dd93cbf7, whose description says "cleanup: make Vec
inside CommitId etc. non-public", made all ID types in the `backend`
module *except* for `CommitId` non-public :P This patch makes
A while ago, I replaced a call to git2-rs's `Remote::fetch()` by calls
to `Remote::download()` and `Remote::update_tips()`. The function is
documented to be a convenience for those function, but it turns out
that the pruning of deleted remote refs is a separate call
(`Remote::prune()`), so we need to call that too.
Since the working copy can now handle conflicts, we don't need to
materialize conflicts when checking out a commit.
Before this patch, we used to create a new commit on top whenever we
checked out a commit with conflicts. That new commit was intended just
for resolving the conflicts. The typical workflow was the resolve the
conflicts and then amend. To use the same workflow after this patch,
one needs to explicitly create a new commit on top with `jj new` after
checking out a commit with conflict.
I realized only recently that we can try to parse conflict markers in
files and leave them as conflicted if they haven't changed. If they
have changed and some conflict markers have been removed, we can even
update the conflict with that partial resolution.
This change teaches the working copy to write conflicts to the working
copy. It used to expect that the caller had already updated the tree
by materializing conflicts. With this change, we also start parsing
the conflict markers and leave the conflicts unresolved in the working
copy if the conflict markers remain.
There are some cases that we don't handle yet. For example, we don't
even try to set the executable bit correctly when we write
conflicts. OTOH, we didn't do that even before this change.
We still never actually write conflicts to the working copy (outside
of tests) because we currently materialize conflicts in
`MutRepo::check_out()`. I'll change that next.
I initially made the working copy materialize conflicts in its
`check_out()` method. Then I changed it later (exactly a year ago, on
Halloween of 2020, actually) so that the working copy expected
conflicts to already have been materalized, which happens in
`MutableRepo::check_out`().
I think my reasoning then was that the file system cannot represent a
conflict. While it's true that the file system itself doesn't have
information to know whether a file represents a conflict, we can
record that ourselves. We already record whether a file is executable
or not and then preserve that if we're on a file system that isn't
able to record it. It's not that different to do the same for
conflicts if we're on a file system that doesn't understand conflicts
(i.e. all file systems).
The plan is to have the working copy remember whether a file
represents a conflict. When we check if it has changed, we parse the
file, including conflict markers, and recreate the conflict from
it. We should be able to do that losslessly (and we should adjust
formats to make it possible if we find cases where it's not).
Having the working copy preserve conflict states has several
advantages:
* Because conflicts are not materialized in the working copy, you can
rebase the conflicted commit and the working copy without causing
more conflicts (that's currently a UX bug I run into every now and
then).
* If you don't change anything in the working copy, it will be
unchanged compared to its parent, which means we'll automatically
abandon it if you update away from it.
* The user can choose to resolve only some of the conflicts in a file
and squash those in, and it'll work they way you'd hope.
* It should make it easier to implement support for external merge
tools (#18) without having them treat the working copy differently.
This patch prepares for that work by adding support for parsing
materialized conflicts.
On Windows, we preserve the executable bit. I plan to also teach the
working copy to preserve conflict state. This refactoring prepares for
that by simplifying how we preserve parts of the current file state.
While working on demos, I noticed that `jj log` output in the
octocat/Hello-World repo was unstable: sometimes the first parent of
the merge was on the left and sometimes it was on the right. This
patch fixes that by sorting the edges by position in the index just
before returning them. It seems that most applications would want
stable output so I put it in the `RevsetGraphIterator` rather than
doing at the call site in the CLI. I ordered them with the reverse
index position rather than forward because it seemed to make the
graphs in the git.git repo slight nicer, with the left-most edge going
between subsequent releases.
There performance difference is within the noise level.
If you rewrite a commit that's also available on some remote, you'll
currently see both the old version and the new version in the view,
which means they're divergent. They're not logically divergent (the
rewritten version should replace the old version), so this is a UX
bug. I think it indicates that the set of current heads should be
redefined to be the *desired* heads. That's also what I had suspected
in the TODO removed by this change. I think another indication that
we should hide the old heads even if they have e.g. a remote branch
pointing to them is that we don't want them to be rebased if we
rewrite an ancestor.
So that's what I decided to do: let the view's heads be the desired
heads. The user can still define revsets for showing non-current
commits pointed to by e.g. remote branches.