2020-12-12 08:12:04 +00:00
|
|
|
# Jujube
|
2020-12-12 07:37:25 +00:00
|
|
|
|
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
## Disclaimer
|
2020-12-12 07:37:25 +00:00
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
This is not a Google product. It is an experimental version-control system
|
|
|
|
(VCS). It is not ready for use. It was written by me, Martin von Zweigbergk
|
|
|
|
(martinvonz@google.com). It is my personal hobby project. It does not indicate
|
|
|
|
any commitment or direction from Google.
|
2020-12-12 07:37:25 +00:00
|
|
|
|
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
## Introduction
|
2020-12-12 07:37:25 +00:00
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
I started the project mostly in order to test the viability of some UX ideas in
|
|
|
|
practice. I continue to use it for that, but my short-term goal now is to make
|
|
|
|
it useful as an alternative CLI for Git repos.
|
2020-12-12 07:37:25 +00:00
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
The storage design is similar to Git's in that it stores commits, trees, and
|
|
|
|
blobs. However, the blobs are actually split into three types: normal files,
|
|
|
|
symlinks (Unicode paths), and conflicts (more about that later).
|
2020-12-12 07:37:25 +00:00
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
The command-line tool is called `jj` for now because it's easy to type and easy
|
|
|
|
to replace (rare in English). The project is called "Jujube" (a fruit) because
|
|
|
|
that's the first word I could think of that matched "jj".
|
2020-12-12 07:37:25 +00:00
|
|
|
|
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
## Features
|
2020-12-12 07:37:25 +00:00
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
The following subsections describe the current features. The text is aimed at
|
|
|
|
readers who are already familiar with other VCSs.
|
2020-12-12 07:37:25 +00:00
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
### Compatible with Git
|
2020-12-12 07:37:25 +00:00
|
|
|
|
2020-12-12 08:12:04 +00:00
|
|
|
The tool currently has two backends. One is called "local store" and is very
|
|
|
|
simple and inefficient. The other backend uses a Git repo as storage. The
|
|
|
|
commits are stored as regular Git commits. Commits can be read from and written
|
|
|
|
to an existing Git repo. This makes it possible to create a Jujube repo and use
|
|
|
|
it as an alternative interface for a Git repo (it will be backed by the Git repo
|
|
|
|
just like additional Git worktrees are).
|
|
|
|
|
|
|
|
### Written as a library
|
|
|
|
|
|
|
|
The repo consists of two main parts: the lib crate and the main (CLI)
|
|
|
|
crate. Most of the code lives in the lib crate. The lib crate does not print
|
|
|
|
anything to the terminal. The separate lib crate should make it relatively
|
|
|
|
straight-forward to add a GUI.
|
|
|
|
|
|
|
|
|
|
|
|
### Operations are performed repo-first
|
|
|
|
|
|
|
|
Almost all operations are done in the repo first and then possibly reflected in
|
|
|
|
the working copy. The only exception so far is when committing the working copy,
|
|
|
|
which naturally uses the working copy as input.
|
|
|
|
|
|
|
|
This makes it faster because the working copy doesn't need to get updated. It
|
|
|
|
also means that the working copy won't see spurious changes e.g. during a rebase
|
|
|
|
operation. It makes it safe to update the working copy while some operation is
|
|
|
|
running.
|
|
|
|
|
|
|
|
### Supports Evolution
|
|
|
|
|
|
|
|
Jujube copies the Evolution feature from Mercurial. It keeps track of when a
|
|
|
|
commit gets rewritten. A commit has a list of predecessors in addition to the
|
|
|
|
usual list of parents. This lets the tool figure out where to rebase descendant
|
|
|
|
commits to when a commit has been rewritten (amended, rebased, etc.). See
|
|
|
|
https://www.mercurial-scm.org/wiki/ChangesetEvolution for more information.
|
|
|
|
|
|
|
|
### The working copy is a commit
|
|
|
|
|
|
|
|
The working copy gets automatically committed when you interact with the
|
|
|
|
tool. This simplifies both implementation and UX. It also means that the working
|
|
|
|
copy is frequently backed up.
|
|
|
|
|
|
|
|
Any changes to the working copy stays in place when you check out another
|
|
|
|
commit. That is different from Git and Mercurial, but I think it's more
|
|
|
|
intuitive for new users. To replicate the default behavior of Git/Mercurial, use
|
|
|
|
`jj rebase -r @ -d <destination>` (`@` is a name for the working copy
|
|
|
|
commit). There is no need to stash/unstash.
|
|
|
|
|
|
|
|
Commands become more consistent because the same command can operate on the repo
|
|
|
|
or another commit. For example, `jj log` includes the working copy (much like
|
|
|
|
`gitk` and other tools include a node for the working copy). `jj squash`
|
|
|
|
squashes a commit into its parent, including if it's the working copy (like `git
|
|
|
|
commit --amend`/`hg amend`).
|
|
|
|
|
|
|
|
A commit description can be added to the working copy before "commit". The same
|
|
|
|
command (`jj describe`) is used for changing the description of any commit.
|
|
|
|
|
|
|
|
### Commits can contains conflicts
|
|
|
|
|
|
|
|
When a merge conflict happens, it is recorded within the tree object as a
|
|
|
|
special conflict object (not a file object with conflict markers). Conflicts are
|
|
|
|
stored as a lists of states to add and another list of states to remove. A
|
|
|
|
regular 3-way merge adds [B,C] and removes [A] (the common ancestor). A
|
|
|
|
modify/remove conflict adds [B] and removes [A]. An add/add conflict adds
|
|
|
|
[B,C]. An octopus merge of N commits adds N states and removes N-1 states. A
|
|
|
|
non-conflict state A is equivalent to a conflict state that just adds [A]. A
|
|
|
|
"state" here can be a normal file, a symlink, or a tree. This support for
|
|
|
|
in-tree conflicts has some interesting effects on both implementation and UX.
|
|
|
|
|
|
|
|
It means that there is a consistent way of resolving conflicts: check out a
|
|
|
|
commit with conflicts in, resolve the conflicts, and amend them into the
|
|
|
|
conflicted commit. Then evolve descendant commits.
|
|
|
|
|
|
|
|
It naturally enables collaborative conflict resolution.
|
|
|
|
|
|
|
|
The in-tree conflicts means that there is no need for book-keeping in
|
|
|
|
rebase-like commands to support continue/abort operations. Instead, the rebase
|
|
|
|
can simply continue and create the desired new DAG shape.
|
|
|
|
|
|
|
|
Conflicts get simplified on rebase by removing pairs of matching states in the
|
|
|
|
"add" and "remove" lists. For example, if B is based on A and then rebased to C,
|
|
|
|
and then to D, it will be a regular 3-way merge between B, and D with C as base
|
|
|
|
(no trace of A). This means that you can keep old commits rebased to head
|
|
|
|
without resolving conflicts, and you still won't have messy recursive conflicts.
|
|
|
|
|
|
|
|
The conflict handling also results in some Darcs-/Pijul-like properties. For
|
|
|
|
example, if you rebase a commit and it results in conflicts, and you then back
|
|
|
|
out that commit, the conflict will go away. (I plan to make that work even if
|
|
|
|
there had been unrelated changes in the file, but I haven't gotten around to it
|
|
|
|
yet.)
|
|
|
|
|
|
|
|
The criss-cross merge case becomes simpler. In Git, the virtual ancestor may
|
|
|
|
have conflicts and you may get nested conflict markers in the working copy. In
|
|
|
|
Jujube, the result is a merge with multiple parts, which may even get simplified
|
|
|
|
to not be recursive.
|
|
|
|
|
|
|
|
The in-tree conflicts make it natural and easy to define the contents of a merge
|
|
|
|
commit to be the difference compared to the merged parents (the so-called "evil"
|
|
|
|
part of the merge), so that's what Jujube does. Rebasing merge commits therefore
|
|
|
|
works as you would expect (Git and Mercurial both handle rebasing of merge
|
|
|
|
commits poorly). It's even possible to change the number of parents while
|
2020-12-18 22:35:12 +00:00
|
|
|
rebasing, so if A is non-merge commit, you can make it a merge commit with `jj
|
|
|
|
rebase -r A -d B -d C`. `jj diff -r <commit>` will show you the diff compared to
|
|
|
|
the merged parents.
|
2020-12-12 08:12:04 +00:00
|
|
|
|
|
|
|
I intend for commands that present the contents of a tree (such as listing
|
|
|
|
files) to use the "add" state(s) of the conflict, but that's not yet done.
|
|
|
|
|
|
|
|
### Operations are logged
|
|
|
|
|
|
|
|
Each write operation is logged to a content-addressed storage, much like the
|
|
|
|
commit storage. The Operation object has an associated View object, much like
|
|
|
|
the Commit object has a Tree object. The view object contains all the heads
|
|
|
|
currently in the repo, as well as the checked-out commit. It will also contain
|
|
|
|
the refs if I add support for that. The operation object can have multiple
|
|
|
|
parent operations, so it forms a DAG just like the commit graph does. There is
|
|
|
|
normally only one parent operation, but there can be multiple parents if
|
|
|
|
concurrent operations happened.
|
|
|
|
|
|
|
|
I added the operation log as a solution for the problem of making concurrent
|
|
|
|
repo edit safe. When the repo is loaded, it is loaded at a particular operation,
|
|
|
|
which provides an immutable view of the repo. For a caller of the library to
|
|
|
|
start making changes, they then have to start a transaction. Once they are done
|
|
|
|
making changes to the transaction, they commit. The operation object is then
|
|
|
|
created. This step cannot fail (except if the file system runs out of space or
|
|
|
|
such). Pointers to the heads of the operation DAG are kept as files in a
|
|
|
|
directory (the filename is the operation id). When a new operation object has
|
|
|
|
been created, its operation id is added to the directory. The transaction's base
|
2020-12-25 08:53:58 +00:00
|
|
|
operation id is then removed from that directory. If concurrent operations
|
2020-12-12 08:12:04 +00:00
|
|
|
happened, there would be multiple new operation ids in the directory and only
|
|
|
|
one base operation id would have been removed. If a reader sees the repo in this
|
|
|
|
state, it will attempt to merge the views and create a new operation with
|
|
|
|
multiple parents. If there are conflicts, the user will have to resolve it (I
|
|
|
|
haven't implemented that yet).
|
|
|
|
|
|
|
|
As a nice side-effect of adding the operation log to solve the concurrent-edits
|
|
|
|
problem, we get some very useful UX features. Many UX features come from mapping
|
|
|
|
commands that work on the commit graph onto the operation graph. For example, if
|
|
|
|
you map `git revert`/`hg backout` onto the operation graph, you get an operation
|
|
|
|
that undoes a previous operation (called `jj op undo`). Note that any operation
|
2020-12-18 23:46:49 +00:00
|
|
|
can be undone, not just the latest one. If you map `git restore`/`hg revert`
|
|
|
|
onto the operation graph, you get an operation that rewinds the repo state to an
|
2020-12-12 08:12:04 +00:00
|
|
|
earlier point (called `jj op restore`).
|
|
|
|
|
|
|
|
You can also see what the repo looked like at an earlier point with `jj
|
|
|
|
--at-op=<operation id> log`. As mentioned earlier, the checkout is also part of
|
|
|
|
the view, so that command will show you where the working copy was at that
|
2020-12-18 23:46:49 +00:00
|
|
|
operation. If you do `jj op restore -o <operation id>`, it will also update the
|
2020-12-12 08:12:04 +00:00
|
|
|
working copy accordingly. This is actually how the working copy is always
|
|
|
|
updated: we first commit a transaction with a pointer to the new checkout and
|
|
|
|
then the working copy is updated to reflect that.
|
|
|
|
|
|
|
|
## Future plans
|
|
|
|
|
|
|
|
TODO
|