2022-01-08 18:02:12 +00:00
|
|
|
# Concurrency
|
|
|
|
|
|
|
|
## Introduction
|
|
|
|
|
|
|
|
Concurrent editing is a key feature of DVCSs -- that's why they're called
|
|
|
|
*Distributed* Version Control Systems. A DVCS that didn't let users edit files
|
|
|
|
and create commits on separate machines at the same time wouldn't be much
|
|
|
|
of a distributed VCS.
|
|
|
|
|
|
|
|
When conflicting changes are made in different clones, a DVCS will have to deal
|
|
|
|
with that when you push or pull. For example, when using Mercurial, if the
|
|
|
|
remote has updated a bookmark called `main` (Mercurial's bookmarks are similar
|
|
|
|
to a Git's branches) and you had updated the same bookmark locally but made it
|
|
|
|
point to a different target, Mercurial would add a bookmark called `main@origin`
|
|
|
|
to indicate the conflict. Git instead prevents the conflict by renaming pulled
|
|
|
|
branches to `origin/main` whether or not there was a conflict. However, most
|
|
|
|
DVCSs treat local concurrency quite differently, typically by using lock files
|
|
|
|
to prevent concurrent edits. Unlike those DVCSs, Jujutsu treats concurrent edits
|
|
|
|
the same whether they're made locally or remotely.
|
|
|
|
|
|
|
|
One problem with using lock files is that they don't work when the clone is in a
|
|
|
|
distributed file system. Most clones are of course not stored in distributed
|
|
|
|
file systems, but it is a *big* problem when they are (Mercurial repos
|
|
|
|
frequently get corrupted, for example).
|
|
|
|
|
|
|
|
Another problem with using lock files is related to complexity of
|
|
|
|
implementation. The simplest way of using lock files is to take coarse-grained
|
2022-01-13 07:08:53 +00:00
|
|
|
locks early: every command that may modify the repo takes a lock at the very
|
2022-01-08 18:02:12 +00:00
|
|
|
beginning. However, that means that operations that wouldn't actually conflict
|
|
|
|
would still have to wait for each other. The user experience can be improved by
|
|
|
|
using finer-grained locks and/or taking the locks later. The drawback of that is
|
2022-01-10 06:02:44 +00:00
|
|
|
complexity. For example, you need to verify that any assumptions you made before
|
2022-12-08 23:40:48 +00:00
|
|
|
locking are still valid after you take the lock.
|
2022-01-08 18:02:12 +00:00
|
|
|
|
|
|
|
To avoid depending on lock files, Jujutsu takes a different approach by
|
|
|
|
accepting that concurrent changes can always happen. It instead exposes any
|
|
|
|
conflicting changes to the user, much like other DVCSs do for conflicting
|
2022-01-13 07:08:53 +00:00
|
|
|
changes made remotely.
|
2022-01-08 18:02:12 +00:00
|
|
|
|
|
|
|
Jujutsu's lock-free concurrency means that it's possible to update copies of the
|
|
|
|
clone on different machines and then let `rsync` (or Dropbox, or NFS, etc.)
|
|
|
|
merge them. The working copy may mismatch what's supposed to be checked out, but
|
|
|
|
no changes to the repo will be lost (added commits, moved branches, etc.). If
|
|
|
|
conflicting changes were made, they will appear as conflicts. For example, if a
|
|
|
|
branch was moved to two different locations, they will appear in `jj log` in
|
|
|
|
both locations but with a "?" after the name, and `jj status` will also inform
|
|
|
|
the user about the conflict.
|
|
|
|
|
|
|
|
The most important piece in the lock-free design is the "operation log". That is
|
|
|
|
what allows us to detect and merge concurrent operations.
|
|
|
|
|
|
|
|
|
|
|
|
## Operation log
|
|
|
|
|
2022-12-08 23:40:48 +00:00
|
|
|
The operation log is similar to a commit DAG (such as in
|
|
|
|
[Git's object model](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects)),
|
|
|
|
but each commit object is instead an "operation" and each tree object is instead
|
|
|
|
a "view". The view object contains the set of visible head commits, branches,
|
|
|
|
tags, and the working-copy commit in each workspace. The operation object
|
|
|
|
contains a pointer to the view object (like how commit objects point to tree
|
|
|
|
objects), pointers to parent operation(s) (like how commit objects point to
|
|
|
|
parent commit(s)), and metadata about the operation. These types are defined
|
2023-08-13 01:38:57 +00:00
|
|
|
in `op_store.proto` The operation log is normally linear.
|
2022-12-08 23:40:48 +00:00
|
|
|
It becomes non-linear if there are concurrent operations.
|
2022-01-08 18:02:12 +00:00
|
|
|
|
|
|
|
When a command starts, it loads the repo at the latest operation. Because the
|
|
|
|
associated view object completely defines the repo state, the running command
|
|
|
|
will not see any changes made by other processes thereafter. When the operation
|
|
|
|
completes, it is written with the start operation as parent. The operation
|
|
|
|
cannot fail to commit (except for disk failures and such). It is left for the
|
|
|
|
next command to notice if there were concurrent operations. It will have to be
|
|
|
|
able to do that anyway since the concurrent operation could have arrived via a
|
|
|
|
distributed file system. This model -- where each operation sees a consistent
|
2022-12-08 23:40:48 +00:00
|
|
|
view of the repo and is guaranteed to be able to commit their changes -- greatly
|
|
|
|
simplifies the implementation of commands.
|
2022-01-08 18:02:12 +00:00
|
|
|
|
|
|
|
It is possible to load the repo at a particular operation with
|
2022-01-13 07:08:53 +00:00
|
|
|
`jj --at-operation=<operation ID> <command>`. If the command is mutational, that
|
|
|
|
will result in a fork in the operation log. That works exactly the same as if
|
|
|
|
any later operations had not existed when the command started. In other words,
|
|
|
|
running commands on a repo loaded at an earlier operation works the same way as
|
|
|
|
if the operations had been concurrent. This can be useful for simulating
|
|
|
|
concurrent operations.
|
2022-01-08 18:02:12 +00:00
|
|
|
|
|
|
|
### Merging concurrent operations
|
|
|
|
|
|
|
|
If Jujutsu tries to load the repo and finds multiple heads in the operation log,
|
|
|
|
it will do a 3-way merge of the view objects based on their common ancestor
|
|
|
|
(possibly several 3-way merges if there were more than two heads). Conflicts
|
|
|
|
are recorded in the resulting view object. For example, if branch `main` was
|
2022-01-13 07:08:53 +00:00
|
|
|
moved from commit A to commit B in one operation and moved to commit C in a
|
2022-01-08 18:02:12 +00:00
|
|
|
concurrent operation, then `main` will be recorded as "moved from A to B or C".
|
2023-08-13 01:38:57 +00:00
|
|
|
See the `RefTarget` definition in `op_store.proto`.
|
2022-01-08 18:02:12 +00:00
|
|
|
|
|
|
|
Because we allow branches (etc.) to be in a conflicted state rather than just
|
2022-01-13 07:08:53 +00:00
|
|
|
erroring out when there are multiple heads, the user can continue to use the
|
|
|
|
repo, including performing further operations on the repo. Of course, some
|
|
|
|
commands will fail when using a conflicted branch. For example,
|
|
|
|
`jj checkout main` when `main` is in a conflicted state will result in an error
|
|
|
|
telling you that `main` resolved to multiple revisions.
|
2022-01-08 18:02:12 +00:00
|
|
|
|
|
|
|
### Storage
|
|
|
|
|
|
|
|
The operation objects and view objects are stored in content-addressed storage
|
|
|
|
just like Git commits are. That makes them safe to write without locking.
|
|
|
|
|
|
|
|
We also need a way of finding the current head of the operation log. We do that
|
|
|
|
by keeping the ID of the current head(s) as a file in a directory. The ID is the
|
|
|
|
name of the file; it has no contents. When an operation completes, we add a file
|
|
|
|
pointing to the new operation and then remove the file pointing to the old
|
|
|
|
operation. Writing the new file is what makes the operation visible (if the old
|
|
|
|
file didn't get properly deleted, then future readers will take care of that).
|
|
|
|
This scheme ensures that transactions are atomic.
|