feat: stabilizing encoding (#219)

This PR implements a new encode schema that is more extendible and more compact. It’s also simpler and takes less binary size and maintaining effort. It is inspired by the [Automerge Encoding Format](https://automerge.org/automerge-binary-format-spec/).

The main motivation is the extensibility. When we integrate a new CRDT algorithm, we don’t want to make a breaking change to the encoding or keep multiple versions of the encoding schema in the code, as it will make our WASM size much larger. We need a stable and extendible encoding schema for our v1.0 version.

This PR also exposes the ops that compose the current container state. For example, now you can make a query about which operation a certain character quickly. This behavior is required in the new snapshot encoding, so it’s included in this PR.

# Encoding Schema

## Header

The header has 22 bytes.

- (0-4 bytes) Magic Bytes: The encoding starts with `loro` as magic bytes.
- (4-20 bytes) Checksum: MD5 checksum of the encoded data, including the header starting from 20th bytes. The checksum is encoded as a 16-byte array. The `checksum` and `magic bytes` fields are trimmed when calculating the checksum.
- (20-21 bytes) Encoding Method (2 bytes, big endian): Multiple encoding methods are available for a specific encoding version.

## Encode Mode: Updates

In this approach, only ops, specifically their historical record, are encoded, while document states are excluded.

Like Automerge's format, we employ columnar encoding for operations and changes.

Previously, operations were ordered by their Operation ID (OpId) before columnar encoding. However, sorting operations based on their respective containers initially enhance compression potential.

## Encode Mode: Snapshot

This mode simultaneously captures document state and historical data. Upon importing a snapshot into a new document, initialization occurs directly from the snapshot, bypassing the need for CRDT-based recalculations.

Unlike previous snapshot encoding methods, the current binary output in snapshot mode is compatible with the updates mode. This enhances the efficiency of importing snapshots into non-empty documents, where initialization via snapshot is infeasible. 

Additionally, when feasible, we leverage the sequence of operations to construct state snapshots. In CRDTs, deducing the specific ops constituting the current container state is feasible. These ops are tagged in relation to the container, facilitating direct state reconstruction from them. This approach, pioneered by Automerge, significantly improves compression efficiency.
This commit is contained in:
Zixuan Chen 2024-01-02 17:03:24 +08:00 committed by GitHub
parent 727b5c2518
commit bc27a47531
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
90 changed files with 7291 additions and 3511 deletions

1
.gitignore vendored
View file

@ -7,3 +7,4 @@ dhat-heap.json
.DS_Store
node_modules/
.idea/
lcov.info

13
.vscode/settings.json vendored
View file

@ -1,11 +1,14 @@
{
"cSpell.words": [
"arbtest",
"cids",
"clippy",
"collab",
"dhat",
"flate",
"gmax",
"heapless",
"idspan",
"insta",
"Leeeon",
"LOGSTORE",
@ -28,9 +31,13 @@
"RUST_BACKTRACE": "full",
"DEBUG": "*"
},
"rust-analyzer.cargo.features": ["test_utils"],
"rust-analyzer.cargo.features": [
"test_utils"
],
"editor.defaultFormatter": "rust-lang.rust-analyzer",
"rust-analyzer.server.extraEnv": { "RUSTUP_TOOLCHAIN": "stable" },
"rust-analyzer.server.extraEnv": {
"RUSTUP_TOOLCHAIN": "stable"
},
"editor.formatOnSave": true,
"todo-tree.general.tags": [
"BUG",
@ -46,7 +53,7 @@
"*.rs": "${capture}.excalidraw"
},
"excalidraw.theme": "dark",
"deno.enable": false ,
"deno.enable": false,
"cortex-debug.variableUseNaturalFormat": true,
"[markdown]": {
"editor.defaultFormatter": "darkriszty.markdown-table-prettify"

259
Cargo.lock generated
View file

@ -94,6 +94,12 @@ dependencies = [
"rustc-demangle",
]
[[package]]
name = "base64"
version = "0.21.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "35636a1494ede3b646cc98f74f8e62c773a38a659ebc777a2cf26b9b74171df9"
[[package]]
name = "bench-utils"
version = "0.1.0"
@ -101,19 +107,11 @@ dependencies = [
"arbitrary",
"enum-as-inner 0.5.1",
"flate2",
"loro-common",
"rand",
"serde_json",
]
[[package]]
name = "benches"
version = "0.1.0"
dependencies = [
"bench-utils",
"loro",
"tabled 0.15.0",
]
[[package]]
name = "bit-set"
version = "0.5.3"
@ -174,30 +172,14 @@ version = "0.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
[[package]]
name = "cbindgen"
version = "0.24.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a6358dedf60f4d9b8db43ad187391afe959746101346fe51bb978126bec61dfb"
dependencies = [
"clap",
"heck",
"indexmap",
"log",
"proc-macro2 1.0.67",
"quote 1.0.29",
"serde",
"serde_json",
"syn 1.0.107",
"tempfile",
"toml",
]
[[package]]
name = "cc"
version = "1.0.78"
version = "1.0.83"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "a20104e2335ce8a659d6dd92a51a767a0c062599c73b343fd152cb401e828c3d"
checksum = "f1174fb0b6ec23863f8b971027804a42614e347eafb0a95bf0b12cdae21fc4d0"
dependencies = [
"libc",
]
[[package]]
name = "cfg-if"
@ -238,12 +220,9 @@ version = "3.2.23"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "71655c45cb9845d3270c9d6df84ebe72b4dad3c2ba3f7023ad47c144e4e473a5"
dependencies = [
"atty",
"bitflags",
"clap_lex",
"indexmap",
"strsim",
"termcolor",
"textwrap",
]
@ -273,6 +252,16 @@ dependencies = [
"termcolor",
]
[[package]]
name = "color-backtrace"
version = "0.6.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "150fd80a270c0671379f388c8204deb6a746bb4eac8a6c03fe2460b2c0127ea0"
dependencies = [
"backtrace",
"termcolor",
]
[[package]]
name = "console_error_panic_hook"
version = "0.1.7"
@ -364,7 +353,7 @@ dependencies = [
"autocfg",
"cfg-if",
"crossbeam-utils",
"memoffset 0.7.1",
"memoffset",
"scopeguard",
]
@ -387,6 +376,16 @@ dependencies = [
"syn 1.0.107",
]
[[package]]
name = "ctor"
version = "0.2.6"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "30d2b3721e861707777e3195b0158f950ae6dc4a27e4d02ff9f67e3eb3de199e"
dependencies = [
"quote 1.0.29",
"syn 2.0.41",
]
[[package]]
name = "darling"
version = "0.20.3"
@ -408,7 +407,7 @@ dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
"strsim",
"syn 2.0.25",
"syn 2.0.41",
]
[[package]]
@ -419,7 +418,7 @@ checksum = "836a9bbc7ad63342d6d6e7b815ccab164bc77a2d95d84bc3117a8c0d5c98e2d5"
dependencies = [
"darling_core",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
]
[[package]]
@ -440,7 +439,7 @@ checksum = "53e0efad4403bfc52dc201159c4b842a246a14b98c64b55dfd0f2d89729dfeb8"
dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
]
[[package]]
@ -486,7 +485,7 @@ dependencies = [
"heck",
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
]
[[package]]
@ -501,6 +500,19 @@ dependencies = [
"syn 1.0.107",
]
[[package]]
name = "examples"
version = "0.1.0"
dependencies = [
"arbitrary",
"bench-utils",
"color-backtrace 0.6.1",
"ctor 0.2.6",
"debug-log",
"loro",
"tabled 0.15.0",
]
[[package]]
name = "fastrand"
version = "1.8.0"
@ -656,12 +668,6 @@ dependencies = [
"hashbrown",
]
[[package]]
name = "indoc"
version = "1.0.8"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "da2d6f23ffea9d7e76c53eee25dfb67bcd8fde7f1198b0855350698c9f07c780"
[[package]]
name = "instant"
version = "0.1.12"
@ -710,6 +716,12 @@ version = "1.4.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646"
[[package]]
name = "leb128"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "884e2677b40cc8c339eaefcb701c32ef1fd2493d71118dc0ca4b6a736c93bd67"
[[package]]
name = "libc"
version = "0.2.147"
@ -755,19 +767,12 @@ dependencies = [
"js-sys",
"loro-rle",
"serde",
"serde_columnar",
"string_cache",
"thiserror",
"wasm-bindgen",
]
[[package]]
name = "loro-ffi"
version = "0.1.0"
dependencies = [
"cbindgen",
"loro-internal",
]
[[package]]
name = "loro-internal"
version = "0.2.2"
@ -776,10 +781,11 @@ dependencies = [
"arbitrary",
"arbtest",
"arref",
"base64",
"bench-utils",
"color-backtrace",
"color-backtrace 0.5.1",
"criterion",
"ctor",
"ctor 0.1.26",
"debug-log",
"dhat",
"enum-as-inner 0.5.1",
@ -790,11 +796,15 @@ dependencies = [
"im",
"itertools 0.11.0",
"js-sys",
"leb128",
"loro-common",
"loro-preload",
"loro-rle",
"md5",
"miniz_oxide 0.7.1",
"num",
"num-derive",
"num-traits",
"once_cell",
"postcard",
"proptest",
@ -828,8 +838,8 @@ version = "0.1.0"
dependencies = [
"append-only-bytes",
"arref",
"color-backtrace",
"ctor",
"color-backtrace 0.5.1",
"ctor 0.1.26",
"debug-log",
"enum-as-inner 0.6.0",
"fxhash",
@ -862,21 +872,18 @@ dependencies = [
"wasm-bindgen",
]
[[package]]
name = "md5"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]]
name = "memchr"
version = "2.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2dffe52ecf27772e601905b7522cb4ef790d2cc203488bbd0e2fe85fcb74566d"
[[package]]
name = "memoffset"
version = "0.6.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5aa361d4faea93603064a027415f07bd8e1d5c88c9fbf68bf56a285428fd79ce"
dependencies = [
"autocfg",
]
[[package]]
name = "memoffset"
version = "0.7.1"
@ -922,9 +929,9 @@ checksum = "e4a24736216ec316047a1fc4252e27dabb04218aa4a3f37c6e7ddbf1f9782b54"
[[package]]
name = "num"
version = "0.4.0"
version = "0.4.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "43db66d1170d347f9a065114077f7dccb00c1b9478c89384490a3425279a4606"
checksum = "b05180d69e3da0e530ba2a1dae5110317e49e3b7f3d41be227dc5f92e49ee7af"
dependencies = [
"num-bigint",
"num-complex",
@ -947,13 +954,24 @@ dependencies = [
[[package]]
name = "num-complex"
version = "0.4.2"
version = "0.4.4"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7ae39348c8bc5fbd7f40c727a9925f03517afd2ab27d46702108b6a7e5414c19"
checksum = "1ba157ca0885411de85d6ca030ba7e2a83a28636056c7c699b07c8b6f7383214"
dependencies = [
"num-traits",
]
[[package]]
name = "num-derive"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "876a53fff98e03a936a674b29568b0e605f06b29372c2489ff4de23f1949743d"
dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 1.0.107",
]
[[package]]
name = "num-integer"
version = "0.1.45"
@ -1211,74 +1229,6 @@ dependencies = [
"syn 0.15.44",
]
[[package]]
name = "pyloro"
version = "0.1.0"
dependencies = [
"loro-internal",
"pyo3",
]
[[package]]
name = "pyo3"
version = "0.17.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "268be0c73583c183f2b14052337465768c07726936a260f480f0857cb95ba543"
dependencies = [
"cfg-if",
"indoc",
"libc",
"memoffset 0.6.5",
"parking_lot",
"pyo3-build-config",
"pyo3-ffi",
"pyo3-macros",
"unindent",
]
[[package]]
name = "pyo3-build-config"
version = "0.17.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "28fcd1e73f06ec85bf3280c48c67e731d8290ad3d730f8be9dc07946923005c8"
dependencies = [
"once_cell",
"target-lexicon",
]
[[package]]
name = "pyo3-ffi"
version = "0.17.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0f6cb136e222e49115b3c51c32792886defbfb0adead26a688142b346a0b9ffc"
dependencies = [
"libc",
"pyo3-build-config",
]
[[package]]
name = "pyo3-macros"
version = "0.17.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "94144a1266e236b1c932682136dc35a9dee8d3589728f68130c7c3861ef96b28"
dependencies = [
"proc-macro2 1.0.67",
"pyo3-macros-backend",
"quote 1.0.29",
"syn 1.0.107",
]
[[package]]
name = "pyo3-macros-backend"
version = "0.17.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c8df9be978a2d2f0cdebabb03206ed73b11314701a5bfe71b0d753b81997777f"
dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 1.0.107",
]
[[package]]
name = "quick-error"
version = "1.2.3"
@ -1514,7 +1464,7 @@ dependencies = [
"darling",
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
]
[[package]]
@ -1525,7 +1475,7 @@ checksum = "389894603bd18c46fa56231694f8d827779c0951a667087194cf9de94ed24682"
dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
]
[[package]]
@ -1640,9 +1590,9 @@ dependencies = [
[[package]]
name = "syn"
version = "2.0.25"
version = "2.0.41"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "15e3fc8c0c74267e2df136e5e5fb656a464158aa57624053375eb9c8c6e25ae2"
checksum = "44c8b28c477cc3bf0e7966561e3460130e1255f7a1cf71931075f1c5e7a7e269"
dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
@ -1707,12 +1657,6 @@ dependencies = [
"syn 1.0.107",
]
[[package]]
name = "target-lexicon"
version = "0.12.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9410d0f6853b1d94f0e519fb95df60f29d2c1eff2d921ffdf01a4c8a3b54f12d"
[[package]]
name = "tempfile"
version = "3.3.0"
@ -1759,7 +1703,7 @@ checksum = "463fe12d7993d3b327787537ce8dd4dfa058de32fc2b195ef3cde03dc4771e8f"
dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
]
[[package]]
@ -1778,15 +1722,6 @@ dependencies = [
"serde_json",
]
[[package]]
name = "toml"
version = "0.5.10"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "1333c76748e868a4d9d1017b5ab53171dfd095f70c712fdb4653a406547f598f"
dependencies = [
"serde",
]
[[package]]
name = "typenum"
version = "1.16.0"
@ -1811,12 +1746,6 @@ version = "0.1.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fc72304796d0818e357ead4e000d19c9c174ab23dc11093ac919054d20a6a7fc"
[[package]]
name = "unindent"
version = "0.1.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e1766d682d402817b5ac4490b3c3002d91dfa0d22812f341609f97b08757359c"
[[package]]
name = "version_check"
version = "0.9.4"
@ -1870,7 +1799,7 @@ dependencies = [
"once_cell",
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
"wasm-bindgen-shared",
]
@ -1892,7 +1821,7 @@ checksum = "e128beba882dd1eb6200e1dc92ae6c5dbaa4311aa7bb211ca035779e5efc39f8"
dependencies = [
"proc-macro2 1.0.67",
"quote 1.0.29",
"syn 2.0.25",
"syn 2.0.41",
"wasm-bindgen-backend",
"wasm-bindgen-shared",
]

View file

@ -1,2 +1,11 @@
[workspace]
members = ["crates/*"]
members = [
"crates/loro",
"crates/examples",
"crates/bench-utils",
"crates/rle",
"crates/loro-common",
"crates/loro-internal",
"crates/loro-preload",
"crates/loro-wasm",
]

View file

@ -7,6 +7,7 @@ edition = "2021"
[dependencies]
arbitrary = { version = "1.2.0", features = ["derive"] }
loro-common = { path = "../loro-common" }
enum-as-inner = "0.5.1"
flate2 = "1.0.25"
rand = "0.8.5"

View file

@ -1,12 +1,14 @@
use arbitrary::Arbitrary;
#[derive(Debug, Arbitrary, PartialEq, Eq)]
use crate::ActionTrait;
#[derive(Debug, Arbitrary, PartialEq, Eq, Clone)]
pub struct Point {
pub x: i32,
pub y: i32,
}
#[derive(Debug, Arbitrary, PartialEq, Eq)]
#[derive(Debug, Arbitrary, PartialEq, Eq, Clone)]
pub enum DrawAction {
CreatePath {
points: Vec<Point>,
@ -25,3 +27,38 @@ pub enum DrawAction {
relative_to: Point,
},
}
impl DrawAction {
pub const MAX_X: i32 = 1_000_000;
pub const MAX_Y: i32 = 1_000_000;
pub const MAX_MOVE: i32 = 200;
}
impl ActionTrait for DrawAction {
fn normalize(&mut self) {
match self {
DrawAction::CreatePath { points } => {
for point in points {
point.x %= Self::MAX_X;
point.y %= Self::MAX_Y;
}
}
DrawAction::Text { pos, size, .. } => {
pos.x %= Self::MAX_X;
pos.y %= Self::MAX_Y;
size.x %= Self::MAX_X;
size.y %= Self::MAX_Y;
}
DrawAction::CreateRect { pos, size } => {
pos.x %= Self::MAX_X;
pos.y %= Self::MAX_Y;
size.x %= Self::MAX_X;
size.y %= Self::MAX_Y;
}
DrawAction::Move { relative_to, .. } => {
relative_to.x %= Self::MAX_MOVE;
relative_to.y %= Self::MAX_MOVE;
}
}
}
}

View file

@ -0,0 +1,78 @@
use std::sync::Arc;
use arbitrary::{Arbitrary, Unstructured};
pub use loro_common::LoroValue;
use crate::ActionTrait;
#[derive(Arbitrary, Debug, PartialEq, Eq, Clone)]
pub enum JsonAction {
InsertMap {
key: String,
value: LoroValue,
},
InsertList {
#[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))]
index: usize,
value: LoroValue,
},
DeleteList {
#[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))]
index: usize,
},
InsertText {
#[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))]
index: usize,
s: String,
},
DeleteText {
#[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))]
index: usize,
#[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=128))]
len: usize,
},
}
const MAX_LEN: usize = 1000;
impl ActionTrait for JsonAction {
fn normalize(&mut self) {
match self {
JsonAction::InsertMap { key: _, value } => {
normalize_value(value);
}
JsonAction::InsertList { index: _, value } => {
normalize_value(value);
}
JsonAction::DeleteList { index } => {
*index %= MAX_LEN;
}
JsonAction::InsertText { .. } => {}
JsonAction::DeleteText { .. } => {}
}
}
}
fn normalize_value(value: &mut LoroValue) {
match value {
LoroValue::Double(f) => {
if f.is_nan() {
*f = 0.0;
}
}
LoroValue::List(l) => {
for v in Arc::make_mut(l).iter_mut() {
normalize_value(v);
}
}
LoroValue::Map(m) => {
for (_, v) in Arc::make_mut(m).iter_mut() {
normalize_value(v);
}
}
LoroValue::Container(_) => {
*value = LoroValue::Null;
}
_ => {}
}
}

View file

@ -1,9 +1,14 @@
pub mod draw;
pub mod json;
pub mod sheet;
use arbitrary::{Arbitrary, Unstructured};
use enum_as_inner::EnumAsInner;
use rand::{RngCore, SeedableRng};
use std::io::Read;
use std::{
io::Read,
sync::{atomic::AtomicUsize, Arc},
};
use flate2::read::GzDecoder;
use serde_json::Value;
@ -15,6 +20,10 @@ pub struct TextAction {
pub del: usize,
}
pub trait ActionTrait: Clone + std::fmt::Debug {
fn normalize(&mut self);
}
pub fn get_automerge_actions() -> Vec<TextAction> {
const RAW_DATA: &[u8; 901823] =
include_bytes!("../../loro-internal/benches/automerge-paper.json.gz");
@ -46,19 +55,34 @@ pub fn get_automerge_actions() -> Vec<TextAction> {
actions
}
#[derive(Debug, EnumAsInner, Arbitrary, PartialEq, Eq)]
#[derive(Debug, EnumAsInner, Arbitrary, PartialEq, Eq, Clone, Copy)]
pub enum SyncKind {
Fit,
Snapshot,
OnlyLastOpFromEachPeer,
}
#[derive(Debug, EnumAsInner, Arbitrary, PartialEq, Eq, Clone)]
pub enum Action<T> {
Action { peer: usize, action: T },
Sync { from: usize, to: usize },
Action {
peer: usize,
action: T,
},
Sync {
from: usize,
to: usize,
kind: SyncKind,
},
SyncAll,
}
pub fn gen_realtime_actions<'a, T: Arbitrary<'a>>(
pub fn gen_realtime_actions<'a, T: Arbitrary<'a> + ActionTrait>(
action_num: usize,
peer_num: usize,
seed: &'a [u8],
mut preprocess: impl FnMut(&mut Action<T>),
) -> Result<Vec<Action<T>>, Box<str>> {
let mut seed_offset = 0;
let mut arb = Unstructured::new(seed);
let mut ans = Vec::new();
let mut last_sync_all = 0;
@ -67,17 +91,23 @@ pub fn gen_realtime_actions<'a, T: Arbitrary<'a>>(
break;
}
let mut action: Action<T> = arb
.arbitrary()
.map_err(|e| e.to_string().into_boxed_str())?;
let mut action: Action<T> = match arb.arbitrary() {
Ok(a) => a,
Err(_) => {
seed_offset += 1;
arb = Unstructured::new(&seed[seed_offset % seed.len()..]);
arb.arbitrary().unwrap()
}
};
match &mut action {
Action::Action { peer, .. } => {
Action::Action { peer, action } => {
action.normalize();
*peer %= peer_num;
}
Action::SyncAll => {
last_sync_all = i;
}
Action::Sync { from, to } => {
Action::Sync { from, to, .. } => {
*from %= peer_num;
*to %= peer_num;
}
@ -94,13 +124,14 @@ pub fn gen_realtime_actions<'a, T: Arbitrary<'a>>(
Ok(ans)
}
pub fn gen_async_actions<'a, T: Arbitrary<'a>>(
pub fn gen_async_actions<'a, T: Arbitrary<'a> + ActionTrait>(
action_num: usize,
peer_num: usize,
seed: &'a [u8],
actions_before_sync: usize,
mut preprocess: impl FnMut(&mut Action<T>),
) -> Result<Vec<Action<T>>, Box<str>> {
let mut seed_offset = 0;
let mut arb = Unstructured::new(seed);
let mut ans = Vec::new();
let mut last_sync_all = 0;
@ -110,14 +141,16 @@ pub fn gen_async_actions<'a, T: Arbitrary<'a>>(
}
if arb.is_empty() {
return Err("not enough actions".into());
seed_offset += 1;
arb = Unstructured::new(&seed[seed_offset % seed.len()..]);
}
let mut action: Action<T> = arb
.arbitrary()
.map_err(|e| e.to_string().into_boxed_str())?;
match &mut action {
Action::Action { peer, .. } => {
Action::Action { peer, action } => {
action.normalize();
*peer %= peer_num;
}
Action::SyncAll => {
@ -127,7 +160,7 @@ pub fn gen_async_actions<'a, T: Arbitrary<'a>>(
last_sync_all = ans.len();
}
Action::Sync { from, to } => {
Action::Sync { from, to, .. } => {
*from %= peer_num;
*to %= peer_num;
}
@ -140,6 +173,99 @@ pub fn gen_async_actions<'a, T: Arbitrary<'a>>(
Ok(ans)
}
pub fn preprocess_actions<T: Clone>(
peer_num: usize,
actions: &[Action<T>],
mut should_skip: impl FnMut(&Action<T>) -> bool,
mut preprocess: impl FnMut(&mut Action<T>),
) -> Vec<Action<T>> {
let mut ans = Vec::new();
for action in actions {
let mut action = action.clone();
match &mut action {
Action::Action { peer, .. } => {
*peer %= peer_num;
}
Action::Sync { from, to, .. } => {
*from %= peer_num;
*to %= peer_num;
}
Action::SyncAll => {}
}
if should_skip(&action) {
continue;
}
let mut action: Action<_> = action.clone();
preprocess(&mut action);
ans.push(action.clone());
}
ans
}
pub fn make_actions_realtime<T: Clone>(peer_num: usize, actions: &[Action<T>]) -> Vec<Action<T>> {
let since_last_sync_all = Arc::new(AtomicUsize::new(0));
let since_last_sync_all_2 = since_last_sync_all.clone();
preprocess_actions(
peer_num,
actions,
|action| match action {
Action::SyncAll => {
since_last_sync_all.store(0, std::sync::atomic::Ordering::Relaxed);
false
}
_ => {
since_last_sync_all.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
false
}
},
|action| {
if since_last_sync_all_2.load(std::sync::atomic::Ordering::Relaxed) > 10 {
*action = Action::SyncAll;
}
},
)
}
pub fn make_actions_async<T: Clone + ActionTrait>(
peer_num: usize,
actions: &[Action<T>],
sync_all_interval: usize,
) -> Vec<Action<T>> {
let since_last_sync_all = Arc::new(AtomicUsize::new(0));
let since_last_sync_all_2 = since_last_sync_all.clone();
preprocess_actions(
peer_num,
actions,
|action| match action {
Action::SyncAll => {
let last = since_last_sync_all.load(std::sync::atomic::Ordering::Relaxed);
if last < sync_all_interval {
true
} else {
since_last_sync_all.store(0, std::sync::atomic::Ordering::Relaxed);
false
}
}
_ => {
since_last_sync_all.fetch_add(1, std::sync::atomic::Ordering::Relaxed);
false
}
},
|action| {
if since_last_sync_all_2.load(std::sync::atomic::Ordering::Relaxed) > 10 {
*action = Action::SyncAll;
}
if let Action::Action { action, .. } = action {
action.normalize();
}
},
)
}
pub fn create_seed(seed: u64, size: usize) -> Vec<u8> {
let mut rng = rand::rngs::StdRng::seed_from_u64(seed);
let mut ans = vec![0; size];

View file

@ -1,6 +1,8 @@
use arbitrary::Arbitrary;
#[derive(Debug, Arbitrary, PartialEq, Eq)]
use crate::ActionTrait;
#[derive(Debug, Clone, Arbitrary, PartialEq, Eq)]
pub enum SheetAction {
SetValue {
row: usize,
@ -15,12 +17,10 @@ pub enum SheetAction {
},
}
impl SheetAction {
pub const MAX_ROW: usize = 1_048_576;
pub const MAX_COL: usize = 16_384;
impl ActionTrait for SheetAction {
/// Excel has a limit of 1,048,576 rows and 16,384 columns per sheet.
// We need to normalize the action to fit the limit.
pub fn normalize(&mut self) {
/// We need to normalize the action to fit the limit.
fn normalize(&mut self) {
match self {
SheetAction::SetValue { row, col, .. } => {
*row %= Self::MAX_ROW;
@ -35,3 +35,8 @@ impl SheetAction {
}
}
}
impl SheetAction {
pub const MAX_ROW: usize = 1_048_576;
pub const MAX_COL: usize = 16_384;
}

View file

@ -1,2 +0,0 @@
pub mod draw;
pub mod sheet;

View file

@ -1,5 +1,5 @@
[package]
name = "benches"
name = "examples"
version = "0.1.0"
edition = "2021"
@ -9,3 +9,9 @@ edition = "2021"
bench-utils = { path = "../bench-utils" }
loro = { path = "../loro" }
tabled = "0.15.0"
arbitrary = { version = "1.3.0", features = ["derive"] }
debug-log = { version = "0.2", features = [] }
[dev-dependencies]
color-backtrace = { version = "0.6" }
ctor = "0.2"

View file

@ -1,6 +1,6 @@
use std::time::Instant;
use benches::draw::{run_async_draw_workflow, run_realtime_collab_draw_workflow};
use examples::{draw::DrawActor, run_async_workflow, run_realtime_collab_workflow};
use loro::LoroDoc;
use tabled::{settings::Style, Table, Tabled};
@ -52,7 +52,7 @@ fn run_async(peer_num: usize, action_num: usize, seed: u64) -> BenchResult {
"run_async(peer_num: {}, action_num: {})",
peer_num, action_num
);
let (mut actors, start) = run_async_draw_workflow(peer_num, action_num, 200, seed);
let (mut actors, start) = run_async_workflow::<DrawActor>(peer_num, action_num, 200, seed);
actors.sync_all();
let apply_duration = start.elapsed().as_secs_f64() * 1000.;
@ -96,7 +96,7 @@ fn run_realtime_collab(peer_num: usize, action_num: usize, seed: u64) -> BenchRe
"run_realtime_collab(peer_num: {}, action_num: {})",
peer_num, action_num
);
let (mut actors, start) = run_realtime_collab_draw_workflow(peer_num, action_num, seed);
let (mut actors, start) = run_realtime_collab_workflow::<DrawActor>(peer_num, action_num, seed);
actors.sync_all();
let apply_duration = start.elapsed().as_secs_f64() * 1000.;

View file

@ -1,4 +1,4 @@
use benches::sheet::init_sheet;
use examples::sheet::init_sheet;
use std::time::Instant;
pub fn main() {

4
crates/examples/fuzz/.gitignore vendored Normal file
View file

@ -0,0 +1,4 @@
target
corpus
artifacts
coverage

1056
crates/examples/fuzz/Cargo.lock generated Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,36 @@
[package]
name = "benches-fuzz"
version = "0.0.0"
publish = false
edition = "2021"
[package.metadata]
cargo-fuzz = true
[dependencies]
libfuzzer-sys = "0.4"
[dependencies.examples]
path = ".."
[dependencies.bench-utils]
path = "../../bench-utils"
# Prevent this from interfering with workspaces
[workspace]
members = ["."]
[profile.release]
debug = 1
[[bin]]
name = "draw"
path = "fuzz_targets/draw.rs"
test = false
doc = false
[[bin]]
name = "json"
path = "fuzz_targets/json.rs"
test = false
doc = false

View file

@ -0,0 +1,8 @@
#![no_main]
use bench_utils::{draw::DrawAction, Action};
use libfuzzer_sys::fuzz_target;
fuzz_target!(|actions: Vec<Action<DrawAction>>| {
examples::draw::run_actions_fuzz_in_async_mode(5, 100, &actions)
});

View file

@ -0,0 +1,9 @@
#![no_main]
use bench_utils::{json::JsonAction, Action};
use examples::json::fuzz;
use libfuzzer_sys::fuzz_target;
fuzz_target!(|data: Vec<Action<JsonAction>>| {
fuzz(5, &data);
});

View file

@ -1,8 +1,10 @@
use std::{collections::HashMap, time::Instant};
use std::collections::HashMap;
use bench_utils::{create_seed, draw::DrawAction, gen_async_actions, gen_realtime_actions, Action};
use bench_utils::{draw::DrawAction, Action};
use loro::{ContainerID, ContainerType};
use crate::{run_actions_fuzz_in_async_mode, ActorTrait};
pub struct DrawActor {
pub doc: loro::LoroDoc,
paths: loro::LoroList,
@ -27,8 +29,16 @@ impl DrawActor {
id_to_obj,
}
}
}
pub fn apply_action(&mut self, action: &mut DrawAction) {
impl ActorTrait for DrawActor {
type ActionKind = DrawAction;
fn create(peer_id: u64) -> Self {
Self::new(peer_id)
}
fn apply_action(&mut self, action: &mut Self::ActionKind) {
match action {
DrawAction::CreatePath { points } => {
let path = self.paths.insert_container(0, ContainerType::Map).unwrap();
@ -55,7 +65,7 @@ impl DrawActor {
map.insert("y", p.y).unwrap();
}
let len = self.id_to_obj.len();
self.id_to_obj.insert(len, path.id());
self.id_to_obj.insert(len, path_map.id());
}
DrawAction::Text { text, pos, size } => {
let text_container = self
@ -119,82 +129,21 @@ impl DrawActor {
let pos_map = map.get("pos").unwrap().unwrap_right().into_map().unwrap();
let x = pos_map.get("x").unwrap().unwrap_left().into_i32().unwrap();
let y = pos_map.get("y").unwrap().unwrap_left().into_i32().unwrap();
pos_map.insert("x", x + relative_to.x).unwrap();
pos_map.insert("y", y + relative_to.y).unwrap();
pos_map
.insert("x", x.overflowing_add(relative_to.x).0)
.unwrap();
pos_map
.insert("y", y.overflowing_add(relative_to.y).0)
.unwrap();
}
}
}
}
pub struct DrawActors {
pub docs: Vec<DrawActor>,
}
impl DrawActors {
pub fn new(size: usize) -> Self {
let docs = (0..size).map(|i| DrawActor::new(i as u64)).collect();
Self { docs }
}
pub fn apply_action(&mut self, action: &mut Action<DrawAction>) {
match action {
Action::Action { peer, action } => {
self.docs[*peer].apply_action(action);
}
Action::Sync { from, to } => {
let vv = self.docs[*from].doc.oplog_vv();
let data = self.docs[*from].doc.export_from(&vv);
self.docs[*to].doc.import(&data).unwrap();
}
Action::SyncAll => self.sync_all(),
}
}
pub fn sync_all(&mut self) {
let (first, rest) = self.docs.split_at_mut(1);
for doc in rest.iter_mut() {
let vv = first[0].doc.oplog_vv();
first[0].doc.import(&doc.doc.export_from(&vv)).unwrap();
}
for doc in rest.iter_mut() {
let vv = doc.doc.oplog_vv();
doc.doc.import(&first[0].doc.export_from(&vv)).unwrap();
}
fn doc(&self) -> &loro::LoroDoc {
&self.doc
}
}
pub fn run_async_draw_workflow(
peer_num: usize,
action_num: usize,
actions_before_sync: usize,
seed: u64,
) -> (DrawActors, Instant) {
let seed = create_seed(seed, action_num * 32);
let mut actions =
gen_async_actions::<DrawAction>(action_num, peer_num, &seed, actions_before_sync, |_| {})
.unwrap();
let mut actors = DrawActors::new(peer_num);
let start = Instant::now();
for action in actions.iter_mut() {
actors.apply_action(action);
}
(actors, start)
}
pub fn run_realtime_collab_draw_workflow(
peer_num: usize,
action_num: usize,
seed: u64,
) -> (DrawActors, Instant) {
let seed = create_seed(seed, action_num * 32);
let mut actions =
gen_realtime_actions::<DrawAction>(action_num, peer_num, &seed, |_| {}).unwrap();
let mut actors = DrawActors::new(peer_num);
let start = Instant::now();
for action in actions.iter_mut() {
actors.apply_action(action);
}
(actors, start)
pub fn fuzz(peer_num: usize, sync_all_interval: usize, actions: &[Action<DrawAction>]) {
run_actions_fuzz_in_async_mode::<DrawActor>(peer_num, sync_all_interval, actions);
}

View file

@ -0,0 +1,74 @@
use bench_utils::{json::JsonAction, Action};
use loro::LoroDoc;
use crate::{minify_failed_tests_in_async_mode, run_actions_fuzz_in_async_mode, ActorTrait};
pub struct JsonActor {
doc: LoroDoc,
list: loro::LoroList,
map: loro::LoroMap,
text: loro::LoroText,
}
impl ActorTrait for JsonActor {
type ActionKind = JsonAction;
fn create(peer_id: u64) -> Self {
let doc = LoroDoc::new();
doc.set_peer_id(peer_id).unwrap();
let list = doc.get_list("list");
let map = doc.get_map("map");
let text = doc.get_text("text");
Self {
doc,
list,
map,
text,
}
}
fn apply_action(&mut self, action: &mut Self::ActionKind) {
match action {
JsonAction::InsertMap { key, value } => {
self.map.insert(key, value.clone()).unwrap();
}
JsonAction::InsertList { index, value } => {
*index %= self.list.len() + 1;
self.list.insert(*index, value.clone()).unwrap();
}
JsonAction::DeleteList { index } => {
if self.list.is_empty() {
return;
}
*index %= self.list.len();
self.list.delete(*index, 1).unwrap();
}
JsonAction::InsertText { index, s } => {
*index %= self.text.len_unicode() + 1;
self.text.insert(*index, s).unwrap();
}
JsonAction::DeleteText { index, len } => {
if self.text.is_empty() {
return;
}
*index %= self.text.len_unicode();
*len %= self.text.len_unicode() - *index;
self.text.delete(*index, *len).unwrap();
}
}
}
fn doc(&self) -> &loro::LoroDoc {
&self.doc
}
}
pub fn fuzz(peer_num: usize, inputs: &[Action<JsonAction>]) {
run_actions_fuzz_in_async_mode::<JsonActor>(peer_num, 20, inputs);
}
pub fn minify(peer_num: usize, inputs: &[Action<JsonAction>]) {
minify_failed_tests_in_async_mode::<JsonActor>(peer_num, 20, inputs);
}

235
crates/examples/src/lib.rs Normal file
View file

@ -0,0 +1,235 @@
use std::{
collections::VecDeque,
sync::{atomic::AtomicUsize, Arc, Mutex},
time::Instant,
};
use bench_utils::{
create_seed, gen_async_actions, gen_realtime_actions, make_actions_async, Action, ActionTrait,
};
pub mod draw;
pub mod json;
pub mod sheet;
pub mod test_preload {
pub use bench_utils::json::JsonAction::*;
pub use bench_utils::json::LoroValue::*;
pub use bench_utils::Action::*;
pub use bench_utils::SyncKind::*;
}
pub trait ActorTrait {
type ActionKind: ActionTrait;
fn create(peer_id: u64) -> Self;
fn apply_action(&mut self, action: &mut Self::ActionKind);
fn doc(&self) -> &loro::LoroDoc;
}
pub struct ActorGroup<T> {
pub docs: Vec<T>,
}
impl<T: ActorTrait> ActorGroup<T> {
pub fn new(size: usize) -> Self {
let docs = (0..size).map(|i| T::create(i as u64)).collect();
Self { docs }
}
pub fn apply_action(&mut self, action: &mut Action<T::ActionKind>) {
match action {
Action::Action { peer, action } => {
self.docs[*peer].apply_action(action);
}
Action::Sync { from, to, kind } => match kind {
bench_utils::SyncKind::Fit => {
let vv = self.docs[*to].doc().oplog_vv();
let data = self.docs[*from].doc().export_from(&vv);
self.docs[*to].doc().import(&data).unwrap();
}
bench_utils::SyncKind::Snapshot => {
let data = self.docs[*from].doc().export_snapshot();
self.docs[*to].doc().import(&data).unwrap();
}
bench_utils::SyncKind::OnlyLastOpFromEachPeer => {
let mut vv = self.docs[*from].doc().oplog_vv();
for cnt in vv.values_mut() {
*cnt -= 1;
}
let data = self.docs[*from].doc().export_from(&vv);
self.docs[*to].doc().import(&data).unwrap();
}
},
Action::SyncAll => self.sync_all(),
}
}
pub fn sync_all(&mut self) {
debug_log::group!("SyncAll");
let (first, rest) = self.docs.split_at_mut(1);
for doc in rest.iter_mut() {
debug_log::group!("Importing to doc0");
let vv = first[0].doc().oplog_vv();
first[0].doc().import(&doc.doc().export_from(&vv)).unwrap();
debug_log::group_end!();
}
for (i, doc) in rest.iter_mut().enumerate() {
debug_log::group!("Importing to doc{}", i + 1);
let vv = doc.doc().oplog_vv();
doc.doc().import(&first[0].doc().export_from(&vv)).unwrap();
debug_log::group_end!();
}
debug_log::group_end!();
}
pub fn check_sync(&self) {
debug_log::group!("Check sync");
let first = &self.docs[0];
let content = first.doc().get_deep_value();
for doc in self.docs.iter().skip(1) {
assert_eq!(content, doc.doc().get_deep_value());
}
debug_log::group_end!();
}
}
pub fn run_async_workflow<T: ActorTrait>(
peer_num: usize,
action_num: usize,
actions_before_sync: usize,
seed: u64,
) -> (ActorGroup<T>, Instant)
where
for<'a> T::ActionKind: arbitrary::Arbitrary<'a>,
{
let seed = create_seed(seed, action_num * 32);
let mut actions = gen_async_actions::<T::ActionKind>(
action_num,
peer_num,
&seed,
actions_before_sync,
|_| {},
)
.unwrap();
let mut actors = ActorGroup::<T>::new(peer_num);
let start = Instant::now();
for action in actions.iter_mut() {
actors.apply_action(action);
}
(actors, start)
}
pub fn run_realtime_collab_workflow<T: ActorTrait>(
peer_num: usize,
action_num: usize,
seed: u64,
) -> (ActorGroup<T>, Instant)
where
for<'a> T::ActionKind: arbitrary::Arbitrary<'a>,
{
let seed = create_seed(seed, action_num * 32);
let mut actions =
gen_realtime_actions::<T::ActionKind>(action_num, peer_num, &seed, |_| {}).unwrap();
let mut actors = ActorGroup::<T>::new(peer_num);
let start = Instant::now();
for action in actions.iter_mut() {
actors.apply_action(action);
}
(actors, start)
}
pub fn run_actions_fuzz_in_async_mode<T: ActorTrait>(
peer_num: usize,
sync_all_interval: usize,
actions: &[Action<T::ActionKind>],
) {
let mut actions = make_actions_async::<T::ActionKind>(peer_num, actions, sync_all_interval);
let mut actors = ActorGroup::<T>::new(peer_num);
for action in actions.iter_mut() {
debug_log::group!("[ApplyAction] {:?}", &action);
actors.apply_action(action);
debug_log::group_end!();
}
actors.sync_all();
actors.check_sync();
}
pub fn minify_failed_tests_in_async_mode<T: ActorTrait>(
peer_num: usize,
sync_all_interval: usize,
actions: &[Action<T::ActionKind>],
) {
let hook = std::panic::take_hook();
std::panic::set_hook(Box::new(|_info| {
// ignore panic output
// println!("{:?}", _info);
}));
let actions = make_actions_async::<T::ActionKind>(peer_num, actions, sync_all_interval);
let mut stack: VecDeque<Vec<Action<T::ActionKind>>> = VecDeque::new();
stack.push_back(actions);
let mut last_log = Instant::now();
let mut min_actions: Option<Vec<Action<T::ActionKind>>> = None;
while let Some(actions) = stack.pop_back() {
let actions = Arc::new(Mutex::new(actions));
let actions_clone = Arc::clone(&actions);
let num = Arc::new(AtomicUsize::new(0));
let num_clone = Arc::clone(&num);
let result = std::panic::catch_unwind(move || {
let mut actors = ActorGroup::<T>::new(peer_num);
for action in actions_clone.lock().unwrap().iter_mut() {
actors.apply_action(action);
num_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
}
actors.sync_all();
actors.check_sync();
});
if result.is_ok() {
continue;
}
let num = num.load(std::sync::atomic::Ordering::SeqCst);
let mut actions = match actions.lock() {
Ok(a) => a,
Err(a) => a.into_inner(),
};
actions.drain(num..);
if let Some(min_actions) = min_actions.as_mut() {
if actions.len() < min_actions.len() {
*min_actions = actions.clone();
}
} else {
min_actions = Some(actions.clone());
}
for i in 0..actions.len() {
let mut new_actions = actions.clone();
new_actions.remove(i);
stack.push_back(new_actions);
}
while stack.len() > 100 {
stack.pop_front();
}
if last_log.elapsed().as_secs() > 1 {
println!(
"stack size: {}. Min action size {:?}",
stack.len(),
min_actions.as_ref().map(|x| x.len())
);
last_log = Instant::now();
}
}
if let Some(minimal_failed_actions) = min_actions {
println!("Min action size {:?}", minimal_failed_actions.len());
println!("{:#?}", minimal_failed_actions);
std::panic::set_hook(hook);
run_actions_fuzz_in_async_mode::<T>(peer_num, sync_all_interval, &minimal_failed_actions);
} else {
println!("No failed tests found");
}
}

View file

@ -1,19 +1,11 @@
use loro::{LoroDoc, LoroList, LoroMap};
pub struct Actor {
pub doc: LoroDoc,
cols: LoroList,
rows: LoroList,
}
impl Actor {}
use loro::LoroDoc;
pub fn init_sheet() -> LoroDoc {
let doc = LoroDoc::new();
doc.set_peer_id(0).unwrap();
let cols = doc.get_list("cols");
let rows = doc.get_list("rows");
for i in 0..bench_utils::sheet::SheetAction::MAX_ROW {
for _ in 0..bench_utils::sheet::SheetAction::MAX_ROW {
rows.push_container(loro::ContainerType::Map).unwrap();
}

View file

@ -0,0 +1,492 @@
use examples::json::fuzz;
use loro::loro_value;
#[ctor::ctor]
fn init_color_backtrace() {
color_backtrace::install();
}
#[test]
fn fuzz_json() {
use examples::test_preload::*;
fuzz(
5,
&[
Action {
peer: 5280832617179597129,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Action {
peer: 8174158055725953393,
action: DeleteList { index: 341 },
},
Sync {
from: 18446543177843820913,
to: 5280832620235194367,
kind: Fit,
},
Action {
peer: 8174439530700032329,
action: DeleteList { index: 341 },
},
Sync {
from: 8174439528799404056,
to: 8174439530702664049,
kind: Snapshot,
},
Action {
peer: 8174439043468105841,
action: DeleteList { index: 341 },
},
Action {
peer: 5280832617179597129,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Action {
peer: 5280832617179597129,
action: InsertList {
index: 341,
value: Bool(true),
},
},
Action {
peer: 8174439139858008393,
action: DeleteList { index: 341 },
},
Sync {
from: 8174439393263710577,
to: 7586675626393291081,
kind: Fit,
},
Sync {
from: 8174439530702664049,
to: 8174439530702664049,
kind: Fit,
},
Action {
peer: 5280832685899073865,
action: InsertList {
index: 351,
value: Bool(true),
},
},
Action {
peer: 5280832789652009216,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Sync {
from: 8174439358230251889,
to: 8174439530702664049,
kind: Snapshot,
},
Action {
peer: 8174439530700032329,
action: DeleteList { index: 341 },
},
Sync {
from: 5280832617178745161,
to: 5280832617179597129,
kind: Snapshot,
},
Action {
peer: 5280832616743389513,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Sync {
from: 5280832617853317489,
to: 8174439530702664049,
kind: Snapshot,
},
Action {
peer: 5280832617179593801,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Action {
peer: 5280876597644708169,
action: DeleteList { index: 341 },
},
Sync {
from: 8174439530702664049,
to: 5280876770117120369,
kind: OnlyLastOpFromEachPeer,
},
SyncAll,
Action {
peer: 8174439358230251849,
action: DeleteText {
index: 960,
len: 126,
},
},
Action {
peer: 18404522827202906441,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Action {
peer: 8174439530700032329,
action: DeleteList { index: 341 },
},
Sync {
from: 8174439528799404056,
to: 5292135769185546609,
kind: Fit,
},
Action {
peer: 5280832617179596873,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Action {
peer: 5292135596713134409,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Sync {
from: 8174439526407696753,
to: 8174439530702664049,
kind: Snapshot,
},
Action {
peer: 8174439498734632959,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Action {
peer: 5833687803971913,
action: InsertList {
index: 301,
value: Bool(true),
},
},
Sync {
from: 8174439530702664049,
to: 8174439530702664049,
kind: Fit,
},
SyncAll,
SyncAll,
Action {
peer: 5280832617179597129,
action: DeleteList { index: 311 },
},
Sync {
from: 8174439530702664049,
to: 8174439530702664049,
kind: Fit,
},
Sync {
from: 8163136378478610801,
to: 8174439139858008393,
kind: Snapshot,
},
Sync {
from: 8174439530702664049,
to: 18395314732082491761,
kind: Snapshot,
},
Action {
peer: 5280832617179597129,
action: InsertList {
index: 303,
value: Bool(true),
},
},
Sync {
from: 8174439530702664049,
to: 8174439530702664049,
kind: Fit,
},
Action {
peer: 8174412969951185225,
action: InsertList {
index: 351,
value: Bool(true),
},
},
Sync {
from: 8174439530702664049,
to: 8163136378699346289,
kind: Snapshot,
},
Sync {
from: 5280876770117120369,
to: 5280832617179597116,
kind: Fit,
},
Action {
peer: 5280832634359466315,
action: DeleteList { index: 341 },
},
Sync {
from: 8174439530702664049,
to: 8174439358230251849,
kind: Snapshot,
},
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
Sync {
from: 16090538600105537827,
to: 936747706152398848,
kind: OnlyLastOpFromEachPeer,
},
Action {
peer: 8174439530702653769,
action: DeleteList { index: 341 },
},
Sync {
from: 8174395377765151089,
to: 8174439530702664049,
kind: Snapshot,
},
Action {
peer: 5280832617179597129,
action: InsertList {
index: 311,
value: Bool(true),
},
},
Action {
peer: 8174439530700032329,
action: DeleteList { index: 341 },
},
Sync {
from: 5277173443156078961,
to: 5280832617179597129,
kind: Fit,
},
Action {
peer: 8174439358230513993,
action: InsertList {
index: 351,
value: Bool(true),
},
},
Sync {
from: 8174439530702664049,
to: 5280832617178745161,
kind: Fit,
},
Action {
peer: 5280832617179728201,
action: DeleteList { index: 341 },
},
Sync {
from: 18446744073709515121,
to: 18446744073709551615,
kind: OnlyLastOpFromEachPeer,
},
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
SyncAll,
Action {
peer: 18446744073709551615,
action: DeleteText {
index: 960,
len: 126,
},
},
Sync {
from: 1412722910008930673,
to: 18380137171932733261,
kind: Snapshot,
},
Action {
peer: 0,
action: InsertMap {
key: "".into(),
value: Null,
},
},
],
)
}
#[test]
fn fuzz_json_1() {
use examples::test_preload::*;
let mut map = loro_value!({"": "test"});
for _ in 0..64 {
map = loro_value!({"": map});
}
let mut list = loro_value!([map]);
for _ in 0..64 {
list = loro_value!([list, 9]);
}
fuzz(
5,
&[Action {
peer: 35184913762633,
action: InsertMap {
key: "\0IIIIIIIIIIIIIIIIIII\0\0".into(),
value: list,
},
}],
);
}
#[test]
fn fuzz_json_2_decode_snapshot_that_activate_pending_changes() {
use examples::test_preload::*;
fuzz(
5,
&[
Action {
peer: 44971974514245632,
action: InsertText {
index: 228,
s: "0\0\0".into(),
},
},
SyncAll,
Action {
peer: 23939170762752,
action: InsertText {
index: 404,
s: "C\u{b}0\0\u{15555}".into(),
},
},
Sync {
from: 10778685752873424277,
to: 52870070483605,
kind: OnlyLastOpFromEachPeer,
},
Action {
peer: 6128427715264512,
action: InsertMap {
key: "".into(),
value: "".into(),
},
},
Action {
peer: 10778685752873447424,
action: DeleteList { index: 368 },
},
Sync {
from: 10778685752873440661,
to: 10778685752873424277,
kind: OnlyLastOpFromEachPeer,
},
Sync {
from: 10778685752873424277,
to: 18395315059780064661,
kind: OnlyLastOpFromEachPeer,
},
SyncAll,
SyncAll,
Sync {
from: 445944668984725,
to: 256,
kind: Snapshot,
},
Action {
peer: 562699868423424,
action: InsertText {
index: 228,
s: "\0\0".into(),
},
},
SyncAll,
Action {
peer: 0,
action: InsertMap {
key: "".into(),
value: Null,
},
},
],
)
}
#[test]
fn fuzz_json_3_frontiers_were_wrong_after_importing_pending_changes() {
use examples::test_preload::*;
fuzz(
5,
&[
Action {
peer: 4,
action: InsertList {
index: 0,
value: Bool(true),
},
},
Action {
peer: 0,
action: InsertText {
index: 0,
s: "\0\u{6}\u{13}\0\0\0*0".into(),
},
},
Sync {
from: 0,
to: 4,
kind: Fit,
},
Action {
peer: 4,
action: InsertList {
index: 1,
value: Bool(true),
},
},
Sync {
from: 4,
to: 1,
kind: OnlyLastOpFromEachPeer,
},
Sync {
from: 0,
to: 1,
kind: Fit,
},
Action {
peer: 1,
action: InsertList {
index: 2,
value: Bool(true),
},
},
],
)
}

View file

@ -17,6 +17,7 @@ enum-as-inner = "0.6.0"
string_cache = "0.8.7"
arbitrary = { version = "1.3.0", features = ["derive"] }
js-sys = { version = "0.3.60", optional = true }
serde_columnar = "0.3.3"
[features]
wasm = ["wasm-bindgen", "js-sys"]

View file

@ -1,3 +1,4 @@
use serde_columnar::ColumnarError;
use thiserror::Error;
use crate::{PeerID, TreeID, ID};
@ -12,6 +13,10 @@ pub enum LoroError {
DecodeVersionVectorError,
#[error("Decode error ({0})")]
DecodeError(Box<str>),
#[error("Checksum mismatch. The data is corrupted.")]
DecodeDataCorruptionError,
#[error("Encountered an incompatible Encoding version \"{0}\". Loro's encoding is backward compatible but not forward compatible. Please upgrade the version of Loro to support this version of the exported data.")]
IncompatibleFutureEncodingError(usize),
#[error("Js error ({0})")]
JsError(Box<str>),
#[error("Cannot get lock or the lock is poisoned")]
@ -23,9 +28,6 @@ pub enum LoroError {
// TODO: more details transaction error
#[error("Transaction error ({0})")]
TransactionError(Box<str>),
// TODO:
#[error("TempContainer cannot execute this function")]
TempContainerError,
#[error("Index out of bound. The given pos is {pos}, but the length is {len}")]
OutOfBound { pos: usize, len: usize },
#[error("Every op id should be unique. ID {id} has been used. You should use a new PeerID to edit the content. ")]
@ -36,14 +38,8 @@ pub enum LoroError {
ArgErr(Box<str>),
#[error("Auto commit has not started. The doc is readonly when detached. You should ensure autocommit is on and the doc and the state is attached.")]
AutoCommitNotStarted,
#[error("The doc is already dropped")]
DocDropError,
// #[error("the data for key `{0}` is not available")]
// Redaction(String),
// #[error("invalid header (expected {expected:?}, found {found:?})")]
// InvalidHeader { expected: String, found: String },
// #[error("unknown data store error")]
// Unknown,
#[error("Unknown Error ({0})")]
Unknown(Box<str>),
}
#[derive(Error, Debug)]
@ -78,3 +74,17 @@ pub mod wasm {
}
}
}
impl From<ColumnarError> for LoroError {
fn from(e: ColumnarError) -> Self {
match e {
ColumnarError::ColumnarDecodeError(_)
| ColumnarError::RleEncodeError(_)
| ColumnarError::RleDecodeError(_)
| ColumnarError::OverflowError => {
LoroError::DecodeError(format!("Failed to decode Columnar: {}", e).into_boxed_str())
}
e => LoroError::Unknown(e.to_string().into_boxed_str()),
}
}
}

View file

@ -72,8 +72,11 @@ impl From<u128> for ID {
}
impl ID {
/// The ID of the null object. This should be use rarely.
pub const NONE_ID: ID = ID::new(u64::MAX, 0);
#[inline]
pub fn new(peer: PeerID, counter: Counter) -> Self {
pub const fn new(peer: PeerID, counter: Counter) -> Self {
ID { peer, counter }
}

View file

@ -6,12 +6,15 @@ use enum_as_inner::EnumAsInner;
use serde::{Deserialize, Serialize};
mod error;
mod id;
mod macros;
mod span;
mod value;
pub use error::{LoroError, LoroResult, LoroTreeError};
#[doc(hidden)]
pub use fxhash::FxHashMap;
pub use span::*;
pub use value::LoroValue;
pub use value::{to_value, LoroValue};
/// Unique id for each peer. It's usually random
pub type PeerID = u64;
@ -93,6 +96,18 @@ impl ContainerType {
_ => unreachable!(),
}
}
pub fn try_from_u8(v: u8) -> LoroResult<Self> {
match v {
1 => Ok(ContainerType::Map),
2 => Ok(ContainerType::List),
3 => Ok(ContainerType::Text),
4 => Ok(ContainerType::Tree),
_ => Err(LoroError::DecodeError(
format!("Unknown container type {v}").into_boxed_str(),
)),
}
}
}
pub type IdSpanVector = fxhash::FxHashMap<PeerID, CounterSpan>;
@ -261,6 +276,11 @@ pub struct TreeID {
}
impl TreeID {
#[inline(always)]
pub fn new(peer: PeerID, counter: Counter) -> Self {
Self { peer, counter }
}
/// return [`DELETED_TREE_ROOT`]
pub const fn delete_root() -> Option<Self> {
DELETED_TREE_ROOT

View file

@ -0,0 +1,290 @@
#[macro_export(local_inner_macros)]
macro_rules! loro_value {
// Hide distracting implementation details from the generated rustdoc.
($($json:tt)+) => {
value_internal!($($json)+)
};
}
// Rocket relies on this because they export their own `json!` with a different
// doc comment than ours, and various Rust bugs prevent them from calling our
// `json!` from their `json!` so they call `value_internal!` directly. Check with
// @SergioBenitez before making breaking changes to this macro.
//
// Changes are fine as long as `value_internal!` does not call any new helper
// macros and can still be invoked as `value_internal!($($json)+)`.
#[macro_export(local_inner_macros)]
#[doc(hidden)]
macro_rules! value_internal {
//////////////////////////////////////////////////////////////////////////
// TT muncher for parsing the inside of an array [...]. Produces a vec![...]
// of the elements.
//
// Must be invoked as: value_internal!(@array [] $($tt)*)
//////////////////////////////////////////////////////////////////////////
// Done with trailing comma.
(@array [$($elems:expr,)*]) => {
json_internal_vec![$($elems,)*]
};
// Done without trailing comma.
(@array [$($elems:expr),*]) => {
json_internal_vec![$($elems),*]
};
// Next element is `null`.
(@array [$($elems:expr,)*] null $($rest:tt)*) => {
value_internal!(@array [$($elems,)* value_internal!(null)] $($rest)*)
};
// Next element is `true`.
(@array [$($elems:expr,)*] true $($rest:tt)*) => {
value_internal!(@array [$($elems,)* value_internal!(true)] $($rest)*)
};
// Next element is `false`.
(@array [$($elems:expr,)*] false $($rest:tt)*) => {
value_internal!(@array [$($elems,)* value_internal!(false)] $($rest)*)
};
// Next element is an array.
(@array [$($elems:expr,)*] [$($array:tt)*] $($rest:tt)*) => {
value_internal!(@array [$($elems,)* value_internal!([$($array)*])] $($rest)*)
};
// Next element is a map.
(@array [$($elems:expr,)*] {$($map:tt)*} $($rest:tt)*) => {
value_internal!(@array [$($elems,)* value_internal!({$($map)*})] $($rest)*)
};
// Next element is an expression followed by comma.
(@array [$($elems:expr,)*] $next:expr, $($rest:tt)*) => {
value_internal!(@array [$($elems,)* value_internal!($next),] $($rest)*)
};
// Last element is an expression with no trailing comma.
(@array [$($elems:expr,)*] $last:expr) => {
value_internal!(@array [$($elems,)* value_internal!($last)])
};
// Comma after the most recent element.
(@array [$($elems:expr),*] , $($rest:tt)*) => {
value_internal!(@array [$($elems,)*] $($rest)*)
};
// Unexpected token after most recent element.
(@array [$($elems:expr),*] $unexpected:tt $($rest:tt)*) => {
json_unexpected!($unexpected)
};
//////////////////////////////////////////////////////////////////////////
// TT muncher for parsing the inside of an object {...}. Each entry is
// inserted into the given map variable.
//
// Must be invoked as: value_internal!(@object $map () ($($tt)*) ($($tt)*))
//
// We require two copies of the input tokens so that we can match on one
// copy and trigger errors on the other copy.
//////////////////////////////////////////////////////////////////////////
// Done.
(@object $object:ident () () ()) => {};
// Insert the current entry followed by trailing comma.
(@object $object:ident [$($key:tt)+] ($value:expr) , $($rest:tt)*) => {
let _ = $object.insert(($($key)+).into(), $value);
value_internal!(@object $object () ($($rest)*) ($($rest)*));
};
// Current entry followed by unexpected token.
(@object $object:ident [$($key:tt)+] ($value:expr) $unexpected:tt $($rest:tt)*) => {
json_unexpected!($unexpected);
};
// Insert the last entry without trailing comma.
(@object $object:ident [$($key:tt)+] ($value:expr)) => {
let _ = $object.insert(($($key)+).into(), $value);
};
// Next value is `null`.
(@object $object:ident ($($key:tt)+) (: null $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object [$($key)+] (value_internal!(null)) $($rest)*);
};
// Next value is `true`.
(@object $object:ident ($($key:tt)+) (: true $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object [$($key)+] (value_internal!(true)) $($rest)*);
};
// Next value is `false`.
(@object $object:ident ($($key:tt)+) (: false $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object [$($key)+] (value_internal!(false)) $($rest)*);
};
// Next value is an array.
(@object $object:ident ($($key:tt)+) (: [$($array:tt)*] $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object [$($key)+] (value_internal!([$($array)*])) $($rest)*);
};
// Next value is a map.
(@object $object:ident ($($key:tt)+) (: {$($map:tt)*} $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object [$($key)+] (value_internal!({$($map)*})) $($rest)*);
};
// Next value is an expression followed by comma.
(@object $object:ident ($($key:tt)+) (: $value:expr , $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object [$($key)+] (value_internal!($value)) , $($rest)*);
};
// Last value is an expression with no trailing comma.
(@object $object:ident ($($key:tt)+) (: $value:expr) $copy:tt) => {
value_internal!(@object $object [$($key)+] (value_internal!($value)));
};
// Missing value for last entry. Trigger a reasonable error message.
(@object $object:ident ($($key:tt)+) (:) $copy:tt) => {
// "unexpected end of macro invocation"
value_internal!();
};
// Missing colon and value for last entry. Trigger a reasonable error
// message.
(@object $object:ident ($($key:tt)+) () $copy:tt) => {
// "unexpected end of macro invocation"
value_internal!();
};
// Misplaced colon. Trigger a reasonable error message.
(@object $object:ident () (: $($rest:tt)*) ($colon:tt $($copy:tt)*)) => {
// Takes no arguments so "no rules expected the token `:`".
json_unexpected!($colon);
};
// Found a comma inside a key. Trigger a reasonable error message.
(@object $object:ident ($($key:tt)*) (, $($rest:tt)*) ($comma:tt $($copy:tt)*)) => {
// Takes no arguments so "no rules expected the token `,`".
json_unexpected!($comma);
};
// Key is fully parenthesized. This avoids clippy double_parens false
// positives because the parenthesization may be necessary here.
(@object $object:ident () (($key:expr) : $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object ($key) (: $($rest)*) (: $($rest)*));
};
// Refuse to absorb colon token into key expression.
(@object $object:ident ($($key:tt)*) (: $($unexpected:tt)+) $copy:tt) => {
json_expect_expr_comma!($($unexpected)+);
};
// Munch a token into the current key.
(@object $object:ident ($($key:tt)*) ($tt:tt $($rest:tt)*) $copy:tt) => {
value_internal!(@object $object ($($key)* $tt) ($($rest)*) ($($rest)*));
};
//////////////////////////////////////////////////////////////////////////
// The main implementation.
//
// Must be invoked as: value_internal!($($json)+)
//////////////////////////////////////////////////////////////////////////
(null) => {
$crate::LoroValue::Null
};
(true) => {
$crate::LoroValue::Bool(true)
};
(false) => {
$crate::LoroValue::Bool(false)
};
([]) => {
$crate::LoroValue::List(std::sync::Arc::new(json_internal_vec![]))
};
([ $($tt:tt)+ ]) => {
$crate::LoroValue::List(std::sync::Arc::new(value_internal!(@array [] $($tt)+)))
};
({}) => {
$crate::LoroValue::Map(std::sync::Arc::new(Default::default()))
};
({ $($tt:tt)+ }) => {
({
let mut object = $crate::FxHashMap::default();
value_internal!(@object object () ($($tt)+) ($($tt)+));
$crate::LoroValue::Map(std::sync::Arc::new(object))
})
};
// Any Serialize type: numbers, strings, struct literals, variables etc.
// Must be below every other rule.
($other:expr) => {
$crate::to_value($other)
};
}
#[macro_export]
#[doc(hidden)]
macro_rules! json_unexpected {
() => {};
}
// The json_internal macro above cannot invoke vec directly because it uses
// local_inner_macros. A vec invocation there would resolve to $crate::vec.
// Instead invoke vec here outside of local_inner_macros.
#[macro_export]
#[doc(hidden)]
macro_rules! json_internal_vec {
($($content:tt)*) => {
vec![$($content)*]
};
}
#[cfg(test)]
mod test {
#[test]
fn test_value_macro() {
let v = loro_value!([1, 2, 3]);
let list = v.into_list().unwrap();
assert_eq!(&*list, &[1.into(), 2.into(), 3.into()]);
let map = loro_value!({
"hi": true,
"false": false,
"null": null,
"list": [],
"integer": 123,
"float": 123.123,
"map": {
"a": "1"
}
});
let map = map.into_map().unwrap();
assert_eq!(map.len(), 7);
assert!(*map.get("hi").unwrap().as_bool().unwrap());
assert!(!(*map.get("false").unwrap().as_bool().unwrap()));
assert!(map.get("null").unwrap().is_null());
assert_eq!(map.get("list").unwrap().as_list().unwrap().len(), 0);
assert_eq!(*map.get("integer").unwrap().as_i32().unwrap(), 123);
assert_eq!(*map.get("float").unwrap().as_double().unwrap(), 123.123);
assert_eq!(map.get("map").unwrap().as_map().unwrap().len(), 1);
assert_eq!(
&**map
.get("map")
.unwrap()
.as_map()
.unwrap()
.get("a")
.unwrap()
.as_string()
.unwrap(),
"1"
);
}
}

View file

@ -113,6 +113,8 @@ impl CounterSpan {
}
#[inline(always)]
/// Normalized end value.
///
/// This is different from end. start may be greater than end. This is the max of start+1 and end
pub fn norm_end(&self) -> i32 {
if self.start < self.end {
@ -160,6 +162,16 @@ impl CounterSpan {
fn next_pos(&self) -> i32 {
self.end
}
fn get_intersection(&self, counter: &CounterSpan) -> Option<Self> {
let start = self.start.max(counter.start);
let end = self.end.min(counter.end);
if start < end {
Some(CounterSpan { start, end })
} else {
None
}
}
}
impl HasLength for CounterSpan {
@ -228,15 +240,16 @@ impl Mergable for CounterSpan {
/// We need this because it'll make merging deletions easier.
#[derive(Clone, Copy, PartialEq, Eq, Debug)]
pub struct IdSpan {
// TODO: rename this to peer_id
pub client_id: PeerID,
pub counter: CounterSpan,
}
impl IdSpan {
#[inline]
pub fn new(client_id: PeerID, from: Counter, to: Counter) -> Self {
pub fn new(peer: PeerID, from: Counter, to: Counter) -> Self {
Self {
client_id,
client_id: peer,
counter: CounterSpan {
start: from,
end: to,
@ -281,6 +294,18 @@ impl IdSpan {
out.insert(self.client_id, self.counter);
out
}
pub fn get_intersection(&self, other: &Self) -> Option<Self> {
if self.client_id != other.client_id {
return None;
}
let counter = self.counter.get_intersection(&other.counter)?;
Some(Self {
client_id: self.client_id,
counter,
})
}
}
impl HasLength for IdSpan {
@ -425,6 +450,12 @@ impl HasId for (PeerID, CounterSpan) {
}
}
impl From<ID> for IdSpan {
fn from(value: ID) -> Self {
Self::new(value.peer, value.counter, value.counter + 1)
}
}
#[cfg(test)]
mod test_id_span {
use rle::RleVecWithIndex;

View file

@ -25,6 +25,29 @@ pub enum LoroValue {
Container(ContainerID),
}
const MAX_DEPTH: usize = 128;
impl<'a> arbitrary::Arbitrary<'a> for LoroValue {
fn arbitrary(u: &mut arbitrary::Unstructured<'a>) -> arbitrary::Result<Self> {
let value = match u.int_in_range(0..=7).unwrap() {
0 => LoroValue::Null,
1 => LoroValue::Bool(u.arbitrary()?),
2 => LoroValue::Double(u.arbitrary()?),
3 => LoroValue::I32(u.arbitrary()?),
4 => LoroValue::Binary(Arc::new(u.arbitrary()?)),
5 => LoroValue::String(Arc::new(u.arbitrary()?)),
6 => LoroValue::List(Arc::new(u.arbitrary()?)),
7 => LoroValue::Map(Arc::new(u.arbitrary()?)),
_ => unreachable!(),
};
if value.get_depth() > MAX_DEPTH {
Err(arbitrary::Error::IncorrectFormat)
} else {
Ok(value)
}
}
}
impl LoroValue {
pub fn get_by_key(&self, key: &str) -> Option<&LoroValue> {
match self {
@ -39,6 +62,37 @@ impl LoroValue {
_ => None,
}
}
pub fn get_depth(&self) -> usize {
let mut max_depth = 0;
let mut value_depth_pairs = vec![(self, 0)];
while let Some((value, depth)) = value_depth_pairs.pop() {
match value {
LoroValue::List(arr) => {
for v in arr.iter() {
value_depth_pairs.push((v, depth + 1));
}
max_depth = max_depth.max(depth + 1);
}
LoroValue::Map(map) => {
for (_, v) in map.iter() {
value_depth_pairs.push((v, depth + 1));
}
max_depth = max_depth.max(depth + 1);
}
_ => {}
}
}
max_depth
}
// TODO: add checks for too deep value, and return err if users
// try to insert such value into a container
pub fn is_too_deep(&self) -> bool {
self.get_depth() > MAX_DEPTH
}
}
impl Index<&str> for LoroValue {
@ -612,3 +666,7 @@ impl<'de> serde::de::Visitor<'de> for LoroValueEnumVisitor {
}
}
}
pub fn to_value<T: Into<LoroValue>>(value: T) -> LoroValue {
value.into()
}

View file

@ -34,14 +34,19 @@ itertools = "0.11.0"
enum_dispatch = "0.3.11"
im = "15.1.0"
generic-btree = { version = "0.8.2" }
miniz_oxide = "0.7.1"
getrandom = "0.2.10"
once_cell = "1.18.0"
leb128 = "0.2.5"
num-traits = "0.2"
num-derive = "0.3"
md5 = "0.7.0"
[dev-dependencies]
miniz_oxide = "0.7.1"
serde_json = "1.0.87"
dhat = "0.3.1"
rand = { version = "0.8.5" }
base64 = "0.21.5"
proptest = "1.0.0"
proptest-derive = "0.3.0"
static_assertions = "1.1.0"

View file

@ -0,0 +1,25 @@
# Encoding Schema
## Header
The header has 22 bytes.
- (0-4 bytes) Magic Bytes: The encoding starts with `loro` as magic bytes.
- (4-20 bytes) Checksum: MD5 checksum of the encoded data, including the header starting from 20th bytes. The checksum is encoded as a 16-byte array. The `checksum` and `magic bytes` fields are trimmed when calculating the checksum.
- (20-21 bytes) Encoding Method (2 bytes, big endian): Multiple encoding methods are available for a specific encoding version.
## Encode Mode: Updates
In this approach, only ops, specifically their historical record, are encoded, while document states are excluded.
Like Automerge's format, we employ columnar encoding for operations and changes.
Previously, operations were ordered by their Operation ID (OpId) before columnar encoding. However, sorting operations based on their respective containers initially enhance compression potential.
## Encode Mode: Snapshot
This mode simultaneously captures document state and historical data. Upon importing a snapshot into a new document, initialization occurs directly from the snapshot, bypassing the need for CRDT-based recalculations.
Unlike previous snapshot encoding methods, the current binary output in snapshot mode is compatible with the updates mode. This enhances the efficiency of importing snapshots into non-empty documents, where initialization via snapshot is infeasible.
Additionally, when feasible, we leverage the sequence of operations to construct state snapshots. In CRDTs, deducing the specific ops constituting the current container state is feasible. These ops are tagged in relation to the container, facilitating direct state reconstruction from them. This approach, pioneered by Automerge, significantly improves compression efficiency.

View file

@ -88,7 +88,6 @@
"https://deno.land/x/cliui@v7.0.4-deno/build/lib/index.js": "fb6030c7b12602a4fca4d81de3ddafa301ba84fd9df73c53de6f3bdda7b482d5",
"https://deno.land/x/cliui@v7.0.4-deno/build/lib/string-utils.js": "b3eb9d2e054a43a3064af17332fb1839a7dadb205c5371af4789616afb1a117f",
"https://deno.land/x/cliui@v7.0.4-deno/deno.ts": "d07bc3338661f8011e3a5fd215061d17a52107a5383c29f40ce0c1ecb8bb8cc3",
"https://deno.land/x/dirname@1.1.2/mod.ts": "4029ca6b49da58d262d65f826ba9b3a89cc0b92a94c7220d5feb7bd34e498a54",
"https://deno.land/x/dirname@1.1.2/types.ts": "c1ed1667545bc4b1d69bdb2fc26a5fa8edae3a56e3081209c16a408a322a2319",
"https://deno.land/x/escalade@v3.0.3/sync.ts": "493bc66563292c5c10c4a75a467a5933f24dad67d74b0f5a87e7b988fe97c104",
"https://deno.land/x/y18n@v5.0.0-deno/build/lib/index.d.ts": "11f40d97041eb271cc1a1c7b296c6e7a068d4843759575e7416f0d14ebf8239c",

View file

@ -42,7 +42,15 @@ fn main() {
println!(
"snapshot size {} after compression {}",
snapshot.len(),
output.len()
output.len(),
);
let updates = loro.export_from(&Default::default());
let output = miniz_oxide::deflate::compress_to_vec(&updates, 6);
println!(
"updates size {} after compression {}",
updates.len(),
output.len(),
);
// {

View file

@ -5,8 +5,8 @@ use loro_internal::{LoroDoc, LoroValue};
// static ALLOC: dhat::Alloc = dhat::Alloc;
fn main() {
with_100k_actors_then_action();
// import_with_many_actors();
// with_100k_actors_then_action();
import_with_many_actors();
}
#[allow(unused)]

View file

@ -8,7 +8,7 @@ pub fn main() {
let actions = bench_utils::get_automerge_actions();
let action_length = actions.len();
let text = loro.get_text("text");
for (_, chunks) in actions.chunks(action_length / 10).enumerate() {
for chunks in actions.chunks(action_length / 10) {
for TextAction { pos, ins, del } in chunks {
let mut txn = loro.txn().unwrap();
text.delete_with_txn(&mut txn, *pos, *del).unwrap();

View file

@ -2,12 +2,6 @@
# It is not intended for manual editing.
version = 3
[[package]]
name = "adler"
version = "1.0.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f26201604c87b1e01bd3d98f8d5d9a8fcbb815e8cedb41ffccbeb4bf593a35fe"
[[package]]
name = "append-only-bytes"
version = "0.1.12"
@ -313,6 +307,12 @@ dependencies = [
"libc",
]
[[package]]
name = "leb128"
version = "0.2.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "884e2677b40cc8c339eaefcb701c32ef1fd2493d71118dc0ca4b6a736c93bd67"
[[package]]
name = "libc"
version = "0.2.147"
@ -349,6 +349,7 @@ dependencies = [
"fxhash",
"loro-rle",
"serde",
"serde_columnar",
"string_cache",
"thiserror",
]
@ -368,11 +369,14 @@ dependencies = [
"getrandom",
"im",
"itertools",
"leb128",
"loro-common",
"loro-preload",
"loro-rle",
"miniz_oxide",
"md5",
"num",
"num-derive",
"num-traits",
"once_cell",
"postcard",
"rand",
@ -423,13 +427,10 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "3f3d053a135388e6b1df14e8af1212af5064746e9b87a06a345a7a779ee9695a"
[[package]]
name = "miniz_oxide"
version = "0.7.1"
name = "md5"
version = "0.7.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e7810e0be55b428ada41041c41f32c9f1a42817901b4ccf45fa3d4b6561e74c7"
dependencies = [
"adler",
]
checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771"
[[package]]
name = "new_debug_unreachable"
@ -471,6 +472,17 @@ dependencies = [
"num-traits",
]
[[package]]
name = "num-derive"
version = "0.3.3"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "876a53fff98e03a936a674b29568b0e605f06b29372c2489ff4de23f1949743d"
dependencies = [
"proc-macro2",
"quote",
"syn 1.0.105",
]
[[package]]
name = "num-integer"
version = "0.1.45"

View file

@ -4,5 +4,6 @@ use loro_internal::LoroDoc;
fuzz_target!(|data: Vec<u8>| {
let mut doc = LoroDoc::default();
doc.import(&data);
doc.import_snapshot_unchecked(&data);
doc.import_delta_updates_unchecked(&data);
});

View file

@ -1,5 +1,5 @@
import __ from "https://deno.land/x/dirname@1.1.2/mod.ts";
const { __dirname } = __(import.meta);
import * as path from "https://deno.land/std@0.105.0/path/mod.ts";
const __dirname = path.dirname(path.fromFileUrl(import.meta.url));
import { resolve } from "https://deno.land/std@0.198.0/path/mod.ts";
const validTargets = Array.from(

View file

@ -1,5 +1,5 @@
import __ from "https://deno.land/x/dirname@1.1.2/mod.ts";
const { __dirname } = __(import.meta);
import * as path from "https://deno.land/std@0.105.0/path/mod.ts";
const __dirname = path.dirname(path.fromFileUrl(import.meta.url));
import { resolve } from "https://deno.land/std@0.105.0/path/mod.ts";
export const Tasks = [

View file

@ -14,7 +14,7 @@ use crate::{
container::{
idx::ContainerIdx,
list::list_op::{InnerListOp, ListOp},
map::{InnerMapSet, MapSet},
map::MapSet,
ContainerID,
},
id::Counter,
@ -47,149 +47,14 @@ pub struct SharedArena {
}
pub struct StrAllocResult {
/// unicode start
pub start: usize,
/// unicode end
pub end: usize,
// TODO: remove this field?
pub utf16_len: usize,
}
pub(crate) struct OpConverter<'a> {
container_idx_to_id: MutexGuard<'a, Vec<ContainerID>>,
container_id_to_idx: MutexGuard<'a, FxHashMap<ContainerID, ContainerIdx>>,
container_idx_depth: MutexGuard<'a, Vec<u16>>,
str: MutexGuard<'a, StrArena>,
values: MutexGuard<'a, Vec<LoroValue>>,
root_c_idx: MutexGuard<'a, Vec<ContainerIdx>>,
parents: MutexGuard<'a, FxHashMap<ContainerIdx, Option<ContainerIdx>>>,
}
impl<'a> OpConverter<'a> {
pub fn convert_single_op(
&mut self,
id: &ContainerID,
_peer: PeerID,
counter: Counter,
_lamport: Lamport,
content: RawOpContent,
) -> Op {
let container = 'out: {
if let Some(&idx) = self.container_id_to_idx.get(id) {
break 'out idx;
}
let container_idx_to_id = &mut self.container_idx_to_id;
let idx = container_idx_to_id.len();
container_idx_to_id.push(id.clone());
let idx = ContainerIdx::from_index_and_type(idx as u32, id.container_type());
self.container_id_to_idx.insert(id.clone(), idx);
if id.is_root() {
self.root_c_idx.push(idx);
self.parents.insert(idx, None);
self.container_idx_depth.push(1);
} else {
self.container_idx_depth.push(0);
}
idx
};
match content {
crate::op::RawOpContent::Map(MapSet { key, value }) => {
let value = if let Some(value) = value {
Some(_alloc_value(&mut self.values, value) as u32)
} else {
None
};
Op {
counter,
container,
content: crate::op::InnerContent::Map(InnerMapSet { key, value }),
}
}
crate::op::RawOpContent::List(list) => match list {
ListOp::Insert { slice, pos } => match slice {
ListSlice::RawData(values) => {
let range = _alloc_values(&mut self.values, values.iter().cloned());
Op {
counter,
container,
content: crate::op::InnerContent::List(InnerListOp::Insert {
slice: SliceRange::from(range.start as u32..range.end as u32),
pos,
}),
}
}
ListSlice::RawStr {
str,
unicode_len: _,
} => {
let slice = _alloc_str(&mut self.str, &str);
Op {
counter,
container,
content: crate::op::InnerContent::List(InnerListOp::Insert {
slice: SliceRange::from(slice.start as u32..slice.end as u32),
pos,
}),
}
}
},
ListOp::Delete(span) => Op {
counter,
container,
content: InnerContent::List(InnerListOp::Delete(span)),
},
ListOp::StyleStart {
start,
end,
info,
key,
value,
} => Op {
counter,
container,
content: InnerContent::List(InnerListOp::StyleStart {
start,
end,
info,
key,
value,
}),
},
ListOp::StyleEnd => Op {
counter,
container,
content: InnerContent::List(InnerListOp::StyleEnd),
},
},
crate::op::RawOpContent::Tree(tree) => {
// we need create every meta container associated with target TreeID
let id = tree.target;
let meta_container_id = id.associated_meta_container();
if self.container_id_to_idx.get(&meta_container_id).is_none() {
let container_idx_to_id = &mut self.container_idx_to_id;
let idx = container_idx_to_id.len();
container_idx_to_id.push(meta_container_id.clone());
self.container_idx_depth.push(0);
let idx = ContainerIdx::from_index_and_type(
idx as u32,
meta_container_id.container_type(),
);
self.container_id_to_idx.insert(meta_container_id, idx);
let parent = &mut self.parents;
parent.insert(idx, Some(container));
}
Op {
container,
counter,
content: crate::op::InnerContent::Tree(tree),
}
}
}
}
}
impl SharedArena {
pub fn register_container(&self, id: &ContainerID) -> ContainerIdx {
let mut container_id_to_idx = self.inner.container_id_to_idx.lock().unwrap();
@ -244,12 +109,9 @@ impl SharedArena {
}
/// return slice and unicode index
pub fn alloc_str_with_slice(&self, str: &str) -> (BytesSlice, usize) {
pub fn alloc_str_with_slice(&self, str: &str) -> (BytesSlice, StrAllocResult) {
let mut text_lock = self.inner.str.lock().unwrap();
let start = text_lock.len_bytes();
let unicode_start = text_lock.len_unicode();
text_lock.alloc(str);
(text_lock.slice_bytes(start..), unicode_start)
_alloc_str_with_slice(&mut text_lock, str)
}
/// alloc str without extra info
@ -365,20 +227,6 @@ impl SharedArena {
(self.inner.values.lock().unwrap()[range]).to_vec()
}
#[inline(always)]
pub(crate) fn with_op_converter<R>(&self, f: impl FnOnce(&mut OpConverter) -> R) -> R {
let mut op_converter = OpConverter {
container_idx_to_id: self.inner.container_idx_to_id.lock().unwrap(),
container_id_to_idx: self.inner.container_id_to_idx.lock().unwrap(),
container_idx_depth: self.inner.depth.lock().unwrap(),
str: self.inner.str.lock().unwrap(),
values: self.inner.values.lock().unwrap(),
root_c_idx: self.inner.root_c_idx.lock().unwrap(),
parents: self.inner.parents.lock().unwrap(),
};
f(&mut op_converter)
}
pub fn convert_single_op(
&self,
container: &ContainerID,
@ -404,14 +252,11 @@ impl SharedArena {
container: ContainerIdx,
) -> Op {
match content {
crate::op::RawOpContent::Map(MapSet { key, value }) => {
let value = value.map(|value| self.alloc_value(value) as u32);
Op {
counter,
container,
content: crate::op::InnerContent::Map(InnerMapSet { key, value }),
}
}
crate::op::RawOpContent::Map(MapSet { key, value }) => Op {
counter,
container,
content: crate::op::InnerContent::Map(MapSet { key, value }),
},
crate::op::RawOpContent::List(list) => match list {
ListOp::Insert { slice, pos } => match slice {
ListSlice::RawData(values) => {
@ -426,13 +271,13 @@ impl SharedArena {
}
}
ListSlice::RawStr { str, unicode_len } => {
let (slice, start) = self.alloc_str_with_slice(&str);
let (slice, info) = self.alloc_str_with_slice(&str);
Op {
counter,
container,
content: crate::op::InnerContent::List(InnerListOp::InsertText {
slice,
unicode_start: start as u32,
unicode_start: info.start as u32,
unicode_len: unicode_len as u32,
pos: pos as u32,
}),
@ -519,6 +364,15 @@ impl SharedArena {
}
}
fn _alloc_str_with_slice(
text_lock: &mut MutexGuard<'_, StrArena>,
str: &str,
) -> (BytesSlice, StrAllocResult) {
let start = text_lock.len_bytes();
let ans = _alloc_str(text_lock, str);
(text_lock.slice_bytes(start..), ans)
}
fn _alloc_values(
values_lock: &mut MutexGuard<'_, Vec<LoroValue>>,
values: impl Iterator<Item = LoroValue>,

View file

@ -49,6 +49,7 @@ pub enum InnerListOp {
},
Delete(DeleteSpan),
/// StyleStart and StyleEnd must be paired.
/// The next op of StyleStart must be StyleEnd.
StyleStart {
start: u32,
end: u32,

View file

@ -11,13 +11,6 @@ pub struct MapSet {
pub(crate) value: Option<LoroValue>,
}
#[derive(Clone, Debug, PartialEq, Eq)]
pub struct InnerMapSet {
pub(crate) key: InternalString,
// the key is deleted if value is None
pub(crate) value: Option<u32>,
}
impl Mergable for MapSet {}
impl Sliceable for MapSet {
fn slice(&self, from: usize, to: usize) -> Self {
@ -31,19 +24,6 @@ impl HasLength for MapSet {
}
}
impl Mergable for InnerMapSet {}
impl Sliceable for InnerMapSet {
fn slice(&self, from: usize, to: usize) -> Self {
assert!(from == 0 && to == 1);
self.clone()
}
}
impl HasLength for InnerMapSet {
fn content_len(&self) -> usize {
1
}
}
#[cfg(test)]
mod test {
use super::MapSet;

View file

@ -1,3 +1,3 @@
mod map_content;
pub(crate) use map_content::{InnerMapSet, MapSet};
pub(crate) use map_content::MapSet;

View file

@ -4,7 +4,7 @@ use generic_btree::{
rle::{HasLength, Mergeable, Sliceable},
BTree, BTreeTrait, Cursor,
};
use loro_common::LoroValue;
use loro_common::{IdSpan, LoroValue, ID};
use serde::{ser::SerializeStruct, Serialize};
use std::fmt::{Display, Formatter};
use std::{
@ -64,17 +64,18 @@ mod text_chunk {
use std::ops::Range;
use append_only_bytes::BytesSlice;
use loro_common::ID;
#[derive(Clone, Debug, PartialEq)]
pub(crate) struct TextChunk {
unicode_len: i32,
bytes: BytesSlice,
// TODO: make this field only available in wasm mode
unicode_len: i32,
utf16_len: i32,
start_op_id: ID,
}
impl TextChunk {
pub fn from_bytes(bytes: BytesSlice) -> Self {
pub fn new(bytes: BytesSlice, id: ID) -> Self {
let mut utf16_len = 0;
let mut unicode_len = 0;
for c in std::str::from_utf8(&bytes).unwrap().chars() {
@ -86,9 +87,14 @@ mod text_chunk {
unicode_len,
bytes,
utf16_len: utf16_len as i32,
start_op_id: id,
}
}
pub fn id(&self) -> ID {
self.start_op_id
}
pub fn bytes(&self) -> &BytesSlice {
&self.bytes
}
@ -139,6 +145,9 @@ mod text_chunk {
unicode_len: 0,
bytes: BytesSlice::empty(),
utf16_len: 0,
// This is a dummy value.
// It's fine because the length is 0. We never actually use this value.
start_op_id: ID::NONE_ID,
}
}
@ -186,6 +195,7 @@ mod text_chunk {
}
(true, false) => {
self.bytes.slice_(end_byte..);
self.start_op_id = self.start_op_id.inc(end_unicode_index as i32);
None
}
(false, true) => {
@ -194,7 +204,7 @@ mod text_chunk {
}
(false, false) => {
let next = self.bytes.slice_clone(end_byte..);
let next = Self::from_bytes(next);
let next = Self::new(next, self.start_op_id.inc(end_unicode_index as i32));
self.unicode_len -= next.unicode_len;
self.utf16_len -= next.utf16_len;
self.bytes.slice_(..start_byte);
@ -283,6 +293,7 @@ mod text_chunk {
unicode_len: range.len() as i32,
bytes: self.bytes.slice_clone(start..end),
utf16_len: utf16_len as i32,
start_op_id: self.start_op_id.inc(range.start as i32),
};
ans.check();
ans
@ -303,6 +314,7 @@ mod text_chunk {
unicode_len: self.unicode_len - pos as i32,
bytes: self.bytes.slice_clone(byte_offset..),
utf16_len: self.utf16_len - utf16_len as i32,
start_op_id: self.start_op_id.inc(pos as i32),
};
self.unicode_len = pos as i32;
@ -317,6 +329,7 @@ mod text_chunk {
impl generic_btree::rle::Mergeable for TextChunk {
fn can_merge(&self, rhs: &Self) -> bool {
self.bytes.can_merge(&rhs.bytes)
&& self.start_op_id.inc(self.unicode_len) == rhs.start_op_id
}
fn merge_right(&mut self, rhs: &Self) {
@ -332,6 +345,7 @@ mod text_chunk {
self.bytes = new;
self.utf16_len += left.utf16_len;
self.unicode_len += left.unicode_len;
self.start_op_id = left.start_op_id;
self.check();
}
}
@ -348,13 +362,29 @@ pub(crate) enum RichtextStateChunk {
}
impl RichtextStateChunk {
pub fn new_text(s: BytesSlice) -> Self {
Self::Text(TextChunk::from_bytes(s))
pub fn new_text(s: BytesSlice, id: ID) -> Self {
Self::Text(TextChunk::new(s, id))
}
pub fn new_style(style: Arc<StyleOp>, anchor_type: AnchorType) -> Self {
Self::Style { style, anchor_type }
}
pub(crate) fn get_id_span(&self) -> loro_common::IdSpan {
match self {
RichtextStateChunk::Text(t) => {
let id = t.id();
IdSpan::new(id.peer, id.counter, id.counter + t.unicode_len())
}
RichtextStateChunk::Style { style, anchor_type } => match anchor_type {
AnchorType::Start => style.id().into(),
AnchorType::End => {
let id = style.id();
IdSpan::new(id.peer, id.counter + 1, id.counter + 2)
}
},
}
}
}
impl DeltaValue for RichtextStateChunk {
@ -398,9 +428,9 @@ impl Serialize for RichtextStateChunk {
}
impl RichtextStateChunk {
pub fn try_from_bytes(s: BytesSlice) -> Result<Self, Utf8Error> {
pub fn try_new(s: BytesSlice, id: ID) -> Result<Self, Utf8Error> {
std::str::from_utf8(&s)?;
Ok(RichtextStateChunk::Text(TextChunk::from_bytes(s)))
Ok(RichtextStateChunk::Text(TextChunk::new(s, id)))
}
pub fn from_style(style: Arc<StyleOp>, anchor_type: AnchorType) -> Self {
@ -1172,8 +1202,8 @@ impl RichtextState {
}
/// This is used to accept changes from DiffCalculator
pub(crate) fn insert_at_entity_index(&mut self, entity_index: usize, text: BytesSlice) {
let elem = RichtextStateChunk::try_from_bytes(text).unwrap();
pub(crate) fn insert_at_entity_index(&mut self, entity_index: usize, text: BytesSlice, id: ID) {
let elem = RichtextStateChunk::try_new(text, id).unwrap();
self.style_ranges.insert(entity_index, elem.rle_len());
let leaf;
if let Some(cursor) =
@ -1736,8 +1766,8 @@ impl RichtextState {
}
#[inline(always)]
pub fn is_emtpy(&self) -> bool {
self.tree.root_cache().bytes == 0
pub fn is_empty(&self) -> bool {
self.tree.root_cache().entity_len == 0
}
#[inline(always)]
@ -1779,7 +1809,7 @@ mod test {
let state = &mut self.state;
let text = self.bytes.slice(start..);
let entity_index = state.get_entity_index_for_text_insert(pos, PosType::Unicode);
state.insert_at_entity_index(entity_index, text);
state.insert_at_entity_index(entity_index, text, ID::new(0, 0));
};
}

View file

@ -70,6 +70,8 @@ impl Tracker {
}
pub(crate) fn insert(&mut self, mut op_id: ID, mut pos: usize, mut content: RichtextChunk) {
// debug_log::debug_dbg!(&op_id, pos, content);
// debug_log::debug_dbg!(&self);
let last_id = op_id.inc(content.len() as Counter - 1);
let applied_counter_end = self.applied_vv.get(&last_id.peer).copied().unwrap_or(0);
if applied_counter_end > op_id.counter {
@ -296,6 +298,7 @@ impl Tracker {
self._checkout(from, false);
self._checkout(to, true);
// self.id_to_cursor.diagnose();
// debug_log::debug_dbg!(&self);
self.rope.get_diff()
}
}

View file

@ -355,7 +355,10 @@ impl CrdtRope {
match elem.diff() {
DiffStatus::NotChanged => {}
DiffStatus::Created => {
let rt = Some(CrdtRopeDelta::Insert(elem.content));
let rt = Some(CrdtRopeDelta::Insert {
chunk: elem.content,
id: elem.id,
});
if index > last_pos {
next = rt;
let len = index - last_pos;
@ -405,7 +408,7 @@ impl CrdtRope {
#[derive(Debug, Clone, PartialEq, Eq, Copy)]
pub(crate) enum CrdtRopeDelta {
Retain(usize),
Insert(RichtextChunk),
Insert { chunk: RichtextChunk, id: ID },
Delete(usize),
}
@ -820,7 +823,10 @@ mod test {
CrdtRopeDelta::Retain(2),
CrdtRopeDelta::Delete(6),
CrdtRopeDelta::Retain(2),
CrdtRopeDelta::Insert(RichtextChunk::new_text(10..13))
CrdtRopeDelta::Insert {
chunk: RichtextChunk::new_text(10..13),
id: ID::new(1, 0)
}
],
vec,
);
@ -841,7 +847,10 @@ mod test {
);
let vec: Vec<_> = rope.get_diff().collect();
assert_eq!(
vec![CrdtRopeDelta::Insert(RichtextChunk::new_text(2..10))],
vec![CrdtRopeDelta::Insert {
chunk: RichtextChunk::new_text(2..10),
id: ID::new(0, 2)
}],
vec,
);
}

View file

@ -135,14 +135,14 @@ impl<'a, T: DagNode> Iterator for DagIteratorVV<'a, T> {
debug_assert_eq!(id, node.id_start());
let mut vv = {
// calculate vv
let mut vv = None;
let mut vv: Option<VersionVector> = None;
for &dep_id in node.deps() {
let dep = self.dag.get(dep_id).unwrap();
let dep_vv = self.vv_map.get(&dep.id_start()).unwrap();
if vv.is_none() {
vv = Some(dep_vv.clone());
if let Some(vv) = vv.as_mut() {
vv.merge(dep_vv);
} else {
vv.as_mut().unwrap().merge(dep_vv);
vv = Some(dep_vv.clone());
}
if dep.id_start() != dep_id {
@ -150,7 +150,7 @@ impl<'a, T: DagNode> Iterator for DagIteratorVV<'a, T> {
}
}
vv.unwrap_or_else(VersionVector::new)
vv.unwrap_or_default()
};
vv.try_update_last(id);

View file

@ -28,6 +28,12 @@ pub struct MapValue {
pub lamport: (Lamport, PeerID),
}
impl MapValue {
pub fn id(&self) -> ID {
ID::new(self.lamport.1, self.counter)
}
}
#[derive(Default, Debug, Clone)]
pub struct ResolvedMapDelta {
pub updated: FxHashMap<InternalString, ResolvedMapValue>,

View file

@ -4,7 +4,7 @@ use std::{
};
use fxhash::{FxHashMap, FxHashSet};
use loro_common::{ContainerType, LoroValue, TreeID};
use loro_common::{ContainerType, LoroValue, TreeID, ID};
use serde::Serialize;
use smallvec::{smallvec, SmallVec};
@ -92,6 +92,7 @@ pub struct TreeDelta {
pub struct TreeDeltaItem {
pub target: TreeID,
pub action: TreeInternalDiff,
pub last_effective_move_op_id: ID,
}
/// The action of [`TreeDiff`]. It's the same as [`crate::container::tree::tree_op::TreeOp`], but semantic.
@ -101,9 +102,9 @@ pub enum TreeInternalDiff {
Create,
/// Recreate the node, the node has been deleted before
Restore,
/// Same as move to `None` and the node is exist
/// Same as move to `None` and the node exists
AsRoot,
/// Move the node to the parent, the node is exist
/// Move the node to the parent, the node exists
Move(TreeID),
/// First create the node and move it to the parent
CreateMove(TreeID),
@ -120,6 +121,7 @@ impl TreeDeltaItem {
target: TreeID,
parent: Option<TreeID>,
old_parent: Option<TreeID>,
op_id: ID,
is_parent_deleted: bool,
is_old_parent_deleted: bool,
) -> Self {
@ -150,7 +152,11 @@ impl TreeDeltaItem {
unreachable!()
}
};
TreeDeltaItem { target, action }
TreeDeltaItem {
target,
action,
last_effective_move_op_id: op_id,
}
}
}

View file

@ -23,7 +23,7 @@ use crate::{
delta::{Delta, MapDelta, MapValue, TreeInternalDiff},
event::InternalDiff,
id::Counter,
op::{RichOp, SliceRange},
op::{RichOp, SliceRange, SliceRanges},
span::{HasId, HasLamport},
version::Frontiers,
InternalString, VersionVector,
@ -130,7 +130,6 @@ impl DiffCalculator {
.binary_search_by(|op| op.ctr_last().cmp(&start_counter))
.unwrap_or_else(|e| e);
let mut visited = FxHashSet::default();
debug_log::debug_dbg!(&change, iter_start);
for mut op in &change.ops.vec()[iter_start..] {
// slice the op if needed
let stack_sliced_op;
@ -272,7 +271,6 @@ impl DiffCalculator {
}
}
debug_log::debug_dbg!(&new_containers);
if len == all.len() {
debug_log::debug_log!("Container might be deleted");
debug_log::debug_dbg!(&all);
@ -306,7 +304,7 @@ impl DiffCalculator {
);
}
}
debug_log::debug_dbg!(&ans);
// debug_log::debug_dbg!(&ans);
ans.into_values()
.sorted_by_key(|x| x.0)
.map(|x| x.1)
@ -379,7 +377,7 @@ impl DiffCalculatorTrait for MapDiffCalculator {
lamport: op.lamport(),
peer: op.client_id(),
counter: op.id_start().counter,
value: op.op().content.as_map().unwrap().value,
value: op.op().content.as_map().unwrap().value.clone(),
});
}
@ -387,7 +385,7 @@ impl DiffCalculatorTrait for MapDiffCalculator {
fn calculate_diff(
&mut self,
oplog: &super::oplog::OpLog,
_oplog: &super::oplog::OpLog,
from: &crate::VersionVector,
to: &crate::VersionVector,
mut on_new_container: impl FnMut(&ContainerID),
@ -411,7 +409,7 @@ impl DiffCalculatorTrait for MapDiffCalculator {
for (key, value) in changed {
let value = value
.map(|v| {
let value = v.value.and_then(|v| oplog.arena.get_value(v as usize));
let value = v.value.clone();
if let Some(LoroValue::Container(c)) = &value {
on_new_container(c);
}
@ -434,12 +432,26 @@ impl DiffCalculatorTrait for MapDiffCalculator {
}
}
#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)]
#[derive(Debug, Clone, PartialEq, Eq)]
struct CompactMapValue {
lamport: Lamport,
peer: PeerID,
counter: Counter,
value: Option<u32>,
value: Option<LoroValue>,
}
impl Ord for CompactMapValue {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.lamport
.cmp(&other.lamport)
.then(self.peer.cmp(&other.peer))
}
}
impl PartialOrd for CompactMapValue {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
impl HasId for CompactMapValue {
@ -469,19 +481,19 @@ mod compact_register {
&self,
a: &VersionVector,
b: &VersionVector,
) -> (Option<CompactMapValue>, Option<CompactMapValue>) {
let mut max_a: Option<CompactMapValue> = None;
let mut max_b: Option<CompactMapValue> = None;
) -> (Option<&CompactMapValue>, Option<&CompactMapValue>) {
let mut max_a: Option<&CompactMapValue> = None;
let mut max_b: Option<&CompactMapValue> = None;
for v in self.tree.iter().rev() {
if b.get(&v.peer).copied().unwrap_or(0) > v.counter {
max_b = Some(*v);
max_b = Some(v);
break;
}
}
for v in self.tree.iter().rev() {
if a.get(&v.peer).copied().unwrap_or(0) > v.counter {
max_a = Some(*v);
max_a = Some(v);
break;
}
}
@ -564,7 +576,7 @@ impl DiffCalculatorTrait for ListDiffCalculator {
CrdtRopeDelta::Retain(len) => {
delta = delta.retain(len);
}
CrdtRopeDelta::Insert(value) => match value.value() {
CrdtRopeDelta::Insert { chunk: value, id } => match value.value() {
RichtextChunkValue::Text(range) => {
for i in range.clone() {
let v = oplog.arena.get_value(i as usize);
@ -572,7 +584,10 @@ impl DiffCalculatorTrait for ListDiffCalculator {
on_new_container(c);
}
}
delta = delta.insert(SliceRange(range));
delta = delta.insert(SliceRanges {
ranges: smallvec::smallvec![SliceRange(range)],
id,
});
}
RichtextChunkValue::StyleAnchor { .. } => unreachable!(),
RichtextChunkValue::Unknown(_) => unreachable!(),
@ -583,7 +598,7 @@ impl DiffCalculatorTrait for ListDiffCalculator {
}
}
InternalDiff::SeqRaw(delta)
InternalDiff::ListRaw(delta)
}
}
@ -617,12 +632,8 @@ impl DiffCalculatorTrait for RichtextDiffCalculator {
match &op.op().content {
crate::op::InnerContent::List(l) => match l {
crate::container::list::list_op::InnerListOp::Insert { slice, pos } => {
self.tracker.insert(
op.id_start(),
*pos,
RichtextChunk::new_text(slice.0.clone()),
);
crate::container::list::list_op::InnerListOp::Insert { .. } => {
unreachable!()
}
crate::container::list::list_op::InnerListOp::InsertText {
slice: _,
@ -695,14 +706,15 @@ impl DiffCalculatorTrait for RichtextDiffCalculator {
CrdtRopeDelta::Retain(len) => {
delta = delta.retain(len);
}
CrdtRopeDelta::Insert(value) => match value.value() {
CrdtRopeDelta::Insert { chunk: value, id } => match value.value() {
RichtextChunkValue::Text(text) => {
delta = delta.insert(RichtextStateChunk::Text(
// PERF: can be speedup by acquiring lock on arena
TextChunk::from_bytes(
TextChunk::new(
oplog
.arena
.slice_by_unicode(text.start as usize..text.end as usize),
id,
),
));
}

View file

@ -90,9 +90,9 @@ impl TreeDiffCache {
// When we cache local ops, we can apply these directly.
// Because importing the local op must not cause circular references, it has been checked.
pub(crate) fn add_node_uncheck(&mut self, node: MoveLamportAndID) {
pub(crate) fn add_node_from_local(&mut self, node: MoveLamportAndID) {
if !self.all_version.includes_id(node.id) {
let old_parent = self.get_parent(node.target);
let (old_parent, _id) = self.get_parent(node.target);
self.update_deleted_cache(node.target, node.parent, old_parent);
@ -147,7 +147,7 @@ impl TreeDiffCache {
let apply_ops = self.forward(to, to_max_lamport);
debug_log::debug_log!("apply ops {:?}", apply_ops);
for op in apply_ops.into_iter() {
let old_parent = self.get_parent(op.target);
let (old_parent, _id) = self.get_parent(op.target);
let is_parent_deleted =
op.parent.is_some() && self.is_deleted(op.parent.as_ref().unwrap());
let is_old_parent_deleted =
@ -159,6 +159,7 @@ impl TreeDiffCache {
op.target,
op.parent,
old_parent,
op.id,
is_parent_deleted,
is_old_parent_deleted,
);
@ -168,17 +169,18 @@ impl TreeDiffCache {
this_diff.action,
TreeInternalDiff::Restore | TreeInternalDiff::RestoreMove(_)
) {
// TODO: perf how to get children faster
// TODO: per
let mut s = vec![op.target];
while let Some(t) = s.pop() {
let children = self.get_children(t);
children.iter().for_each(|c| {
diff.push(TreeDeltaItem {
target: *c,
target: c.0,
action: TreeInternalDiff::CreateMove(t),
last_effective_move_op_id: c.1,
})
});
s.extend(children);
s.extend(children.iter().map(|x| x.0));
}
}
}
@ -200,18 +202,20 @@ impl TreeDiffCache {
self.current_version = vv.clone();
}
/// return true if it can be effected
/// return true if this apply op has effect on the tree
///
/// This method assumes that `node` has the greatest lamport value
fn apply(&mut self, mut node: MoveLamportAndID) -> bool {
let mut ans = true;
let mut effected = true;
if node.parent.is_some() && self.is_ancestor_of(node.target, node.parent.unwrap()) {
ans = false;
effected = false;
}
node.effected = ans;
let old_parent = self.get_parent(node.target);
node.effected = effected;
let (old_parent, _id) = self.get_parent(node.target);
self.update_deleted_cache(node.target, node.parent, old_parent);
self.cache.entry(node.target).or_default().insert(node);
self.current_version.set_last(node.id);
ans
effected
}
fn forward(&mut self, vv: &VersionVector, max_lamport: Lamport) -> Vec<MoveLamportAndID> {
@ -252,7 +256,7 @@ impl TreeDiffCache {
});
if op.effected {
// update deleted cache
let old_parent = self.get_parent(op.target);
let (old_parent, _id) = self.get_parent(op.target);
self.update_deleted_cache(op.target, old_parent, op.parent);
}
}
@ -273,14 +277,15 @@ impl TreeDiffCache {
}
}
for op in retreat_ops.iter_mut().sorted().rev() {
self.cache.get_mut(&op.target).unwrap().remove(op);
let btree_set = &mut self.cache.get_mut(&op.target).unwrap();
btree_set.remove(op);
self.pending.insert(*op);
self.current_version.shrink_to_exclude(IdSpan {
client_id: op.id.peer,
counter: CounterSpan::new(op.id.counter, op.id.counter + 1),
});
// calc old parent
let old_parent = self.get_parent(op.target);
let (old_parent, last_effective_move_op_id) = self.get_parent(op.target);
if op.effected {
// we need to know whether old_parent is deleted
let is_parent_deleted =
@ -291,6 +296,7 @@ impl TreeDiffCache {
op.target,
old_parent,
op.parent,
last_effective_move_op_id,
is_old_parent_deleted,
is_parent_deleted,
);
@ -305,11 +311,12 @@ impl TreeDiffCache {
let children = self.get_children(t);
children.iter().for_each(|c| {
diffs.push(TreeDeltaItem {
target: *c,
target: c.0,
action: TreeInternalDiff::CreateMove(t),
last_effective_move_op_id: c.1,
})
});
s.extend(children);
s.extend(children.iter().map(|c| c.0));
}
}
}
@ -319,15 +326,34 @@ impl TreeDiffCache {
}
/// get the parent of the first effected op
fn get_parent(&self, tree_id: TreeID) -> Option<TreeID> {
fn get_parent(&self, tree_id: TreeID) -> (Option<TreeID>, ID) {
if TreeID::is_deleted_root(Some(tree_id)) {
return None;
return (None, ID::NONE_ID);
}
let mut ans = TreeID::unexist_root();
let mut ans = (TreeID::unexist_root(), ID::NONE_ID);
if let Some(cache) = self.cache.get(&tree_id) {
for op in cache.iter().rev() {
if op.effected {
ans = op.parent;
ans = (op.parent, op.id);
break;
}
}
}
ans
}
/// get the parent of the first effected op
fn get_last_effective_move(&self, tree_id: TreeID) -> Option<&MoveLamportAndID> {
if TreeID::is_deleted_root(Some(tree_id)) {
return None;
}
let mut ans = None;
if let Some(cache) = self.cache.get(&tree_id) {
for op in cache.iter().rev() {
if op.effected {
ans = Some(op);
break;
}
}
@ -342,7 +368,7 @@ impl TreeDiffCache {
}
loop {
let parent = self.get_parent(node_id);
let (parent, _id) = self.get_parent(node_id);
match parent {
Some(parent_id) if parent_id == maybe_ancestor => return true,
Some(parent_id) if parent_id == node_id => panic!("loop detected"),
@ -358,14 +384,14 @@ impl TreeDiffCache {
pub(crate) trait TreeDeletedSetTrait {
fn deleted(&self) -> &FxHashSet<TreeID>;
fn deleted_mut(&mut self) -> &mut FxHashSet<TreeID>;
fn get_children(&self, target: TreeID) -> Vec<TreeID>;
fn get_children_recursively(&self, target: TreeID) -> Vec<TreeID> {
fn get_children(&self, target: TreeID) -> Vec<(TreeID, ID)>;
fn get_children_recursively(&self, target: TreeID) -> Vec<(TreeID, ID)> {
let mut ans = vec![];
let mut s = vec![target];
while let Some(t) = s.pop() {
let children = self.get_children(t);
ans.extend(children.clone());
s.extend(children);
s.extend(children.iter().map(|x| x.0));
}
ans
}
@ -393,7 +419,7 @@ pub(crate) trait TreeDeletedSetTrait {
self.deleted_mut().remove(&target);
}
let mut s = self.get_children(target);
while let Some(child) = s.pop() {
while let Some((child, _)) = s.pop() {
if child == target {
continue;
}
@ -416,17 +442,21 @@ impl TreeDeletedSetTrait for TreeDiffCache {
&mut self.deleted
}
fn get_children(&self, target: TreeID) -> Vec<TreeID> {
fn get_children(&self, target: TreeID) -> Vec<(TreeID, ID)> {
let mut ans = vec![];
for (tree_id, _) in self.cache.iter() {
if tree_id == &target {
continue;
}
let parent = self.get_parent(*tree_id);
if parent == Some(target) {
ans.push(*tree_id)
let Some(op) = self.get_last_effective_move(*tree_id) else {
continue;
};
if op.parent == Some(target) {
ans.push((*tree_id, op.id));
}
}
ans
}
}

View file

@ -1,133 +1,197 @@
use fxhash::FxHashMap;
use loro_common::PeerID;
use crate::{change::Change, op::RemoteOp};
pub(crate) type RemoteClientChanges<'a> = FxHashMap<PeerID, Vec<Change<RemoteOp<'a>>>>;
mod encode_enhanced;
pub(crate) mod encode_snapshot;
mod encode_updates;
use rle::HasLength;
mod encode_reordered;
use crate::op::OpWithId;
use crate::LoroDoc;
use crate::{oplog::OpLog, LoroError, VersionVector};
use loro_common::{HasCounter, IdSpan, LoroResult};
use num_derive::{FromPrimitive, ToPrimitive};
use num_traits::{FromPrimitive, ToPrimitive};
use rle::{HasLength, Sliceable};
const MAGIC_BYTES: [u8; 4] = *b"loro";
use self::encode_updates::decode_oplog_updates;
pub(crate) use encode_enhanced::{decode_oplog_v2, encode_oplog_v2};
pub(crate) use encode_updates::encode_oplog_updates;
pub(crate) const COMPRESS_RLE_THRESHOLD: usize = 20 * 1024;
// TODO: Test this threshold
#[cfg(not(test))]
pub(crate) const UPDATE_ENCODE_THRESHOLD: usize = 32;
#[cfg(test)]
pub(crate) const UPDATE_ENCODE_THRESHOLD: usize = 16;
pub(crate) const MAGIC_BYTES: [u8; 4] = [0x6c, 0x6f, 0x72, 0x6f];
pub(crate) const ENCODE_SCHEMA_VERSION: u8 = 0;
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
#[derive(Clone, Copy, Debug, PartialEq, Eq, FromPrimitive, ToPrimitive)]
pub(crate) enum EncodeMode {
// This is a config option, it won't be used in encoding.
Auto = 255,
Updates = 0,
Snapshot = 1,
RleUpdates = 2,
CompressedRleUpdates = 3,
Rle = 1,
Snapshot = 2,
}
impl EncodeMode {
pub fn to_byte(self) -> u8 {
match self {
EncodeMode::Auto => 255,
EncodeMode::Updates => 0,
EncodeMode::Snapshot => 1,
EncodeMode::RleUpdates => 2,
EncodeMode::CompressedRleUpdates => 3,
}
pub fn to_bytes(self) -> [u8; 2] {
let value = self.to_u16().unwrap();
value.to_be_bytes()
}
pub fn is_snapshot(self) -> bool {
matches!(self, EncodeMode::Snapshot)
}
}
impl TryFrom<u8> for EncodeMode {
impl TryFrom<[u8; 2]> for EncodeMode {
type Error = LoroError;
fn try_from(value: u8) -> Result<Self, Self::Error> {
match value {
0 => Ok(EncodeMode::Updates),
1 => Ok(EncodeMode::Snapshot),
2 => Ok(EncodeMode::RleUpdates),
3 => Ok(EncodeMode::CompressedRleUpdates),
_ => Err(LoroError::DecodeError("Unknown encode mode".into())),
}
fn try_from(value: [u8; 2]) -> Result<Self, Self::Error> {
let value = u16::from_be_bytes(value);
Self::from_u16(value).ok_or(LoroError::IncompatibleFutureEncodingError(value as usize))
}
}
/// The encoder used to encode the container states.
///
/// Each container state can be represented by a sequence of operations.
/// For example, a list state can be represented by a sequence of insert
/// operations that form its current state.
/// We ignore the delete operations.
///
/// We will use a new encoder for each container state.
/// Each container state should call encode_op multiple times until all the
/// operations constituting its current state are encoded.
pub(crate) struct StateSnapshotEncoder<'a> {
/// The `check_idspan` function is used to check if the id span is valid.
/// If the id span is invalid, the function should return an error that
/// contains the missing id span.
check_idspan: &'a dyn Fn(IdSpan) -> Result<(), IdSpan>,
/// The `encoder_by_op` function is used to encode an operation.
encoder_by_op: &'a mut dyn FnMut(OpWithId),
/// The `record_idspan` function is used to record the id span to track the
/// encoded order.
record_idspan: &'a mut dyn FnMut(IdSpan),
#[allow(unused)]
mode: EncodeMode,
}
impl StateSnapshotEncoder<'_> {
pub fn encode_op(&mut self, id_span: IdSpan, get_op: impl FnOnce() -> OpWithId) {
debug_log::debug_dbg!(id_span);
if let Err(span) = (self.check_idspan)(id_span) {
let mut op = get_op();
if span == id_span {
(self.encoder_by_op)(op);
} else {
debug_assert_eq!(span.ctr_start(), id_span.ctr_start());
op.op = op.op.slice(span.atom_len(), op.op.atom_len());
(self.encoder_by_op)(op);
}
}
(self.record_idspan)(id_span);
}
#[allow(unused)]
pub fn mode(&self) -> EncodeMode {
self.mode
}
}
pub(crate) struct StateSnapshotDecodeContext<'a> {
pub oplog: &'a OpLog,
pub ops: &'a mut dyn Iterator<Item = OpWithId>,
pub blob: &'a [u8],
pub mode: EncodeMode,
}
pub(crate) fn encode_oplog(oplog: &OpLog, vv: &VersionVector, mode: EncodeMode) -> Vec<u8> {
let version = ENCODE_SCHEMA_VERSION;
let mut ans = Vec::from(MAGIC_BYTES);
// maybe u8 is enough
ans.push(version);
let mode = match mode {
EncodeMode::Auto => {
let self_vv = oplog.vv();
let diff = self_vv.diff(vv);
let update_total_len = diff
.left
.values()
.map(|value| value.atom_len())
.sum::<usize>();
// EncodeMode::RleUpdates(vv)
if update_total_len <= UPDATE_ENCODE_THRESHOLD {
EncodeMode::Updates
} else if update_total_len <= COMPRESS_RLE_THRESHOLD {
EncodeMode::RleUpdates
} else {
EncodeMode::CompressedRleUpdates
}
}
EncodeMode::Auto => EncodeMode::Rle,
mode => mode,
};
let encoded = match &mode {
EncodeMode::Updates => encode_oplog_updates(oplog, vv),
EncodeMode::RleUpdates => encode_oplog_v2(oplog, vv),
EncodeMode::CompressedRleUpdates => {
let bytes = encode_oplog_v2(oplog, vv);
miniz_oxide::deflate::compress_to_vec(&bytes, 7)
}
let body = match &mode {
EncodeMode::Rle => encode_reordered::encode_updates(oplog, vv),
_ => unreachable!(),
};
ans.push(mode.to_byte());
ans.extend(encoded);
ans
encode_header_and_body(mode, body)
}
pub(crate) fn decode_oplog(oplog: &mut OpLog, input: &[u8]) -> Result<(), LoroError> {
if input.len() < 6 {
return Err(LoroError::DecodeError("".into()));
}
let (magic_bytes, input) = input.split_at(4);
let magic_bytes: [u8; 4] = magic_bytes.try_into().unwrap();
if magic_bytes != MAGIC_BYTES {
return Err(LoroError::DecodeError("Invalid header bytes".into()));
}
let (version, input) = input.split_at(1);
if version != [ENCODE_SCHEMA_VERSION] {
return Err(LoroError::DecodeError("Invalid version".into()));
}
let mode: EncodeMode = input[0].try_into()?;
let decoded = &input[1..];
pub(crate) fn decode_oplog(
oplog: &mut OpLog,
parsed: ParsedHeaderAndBody,
) -> Result<(), LoroError> {
let ParsedHeaderAndBody { mode, body, .. } = parsed;
match mode {
EncodeMode::Updates => decode_oplog_updates(oplog, decoded),
EncodeMode::Snapshot => unimplemented!(),
EncodeMode::RleUpdates => decode_oplog_v2(oplog, decoded),
EncodeMode::CompressedRleUpdates => miniz_oxide::inflate::decompress_to_vec(decoded)
.map_err(|_| LoroError::DecodeError("Invalid compressed data".into()))
.and_then(|bytes| decode_oplog_v2(oplog, &bytes)),
EncodeMode::Rle | EncodeMode::Snapshot => encode_reordered::decode_updates(oplog, body),
EncodeMode::Auto => unreachable!(),
}
}
pub(crate) struct ParsedHeaderAndBody<'a> {
pub checksum: [u8; 16],
pub checksum_body: &'a [u8],
pub mode: EncodeMode,
pub body: &'a [u8],
}
impl ParsedHeaderAndBody<'_> {
/// Return if the checksum is correct.
fn check_checksum(&self) -> LoroResult<()> {
if md5::compute(self.checksum_body).0 != self.checksum {
return Err(LoroError::DecodeDataCorruptionError);
}
Ok(())
}
}
const MIN_HEADER_SIZE: usize = 22;
pub(crate) fn parse_header_and_body(bytes: &[u8]) -> Result<ParsedHeaderAndBody, LoroError> {
let reader = &bytes;
if bytes.len() < MIN_HEADER_SIZE {
return Err(LoroError::DecodeError("Invalid import data".into()));
}
let (magic_bytes, reader) = reader.split_at(4);
let magic_bytes: [u8; 4] = magic_bytes.try_into().unwrap();
if magic_bytes != MAGIC_BYTES {
return Err(LoroError::DecodeError("Invalid magic bytes".into()));
}
let (checksum, reader) = reader.split_at(16);
let checksum_body = reader;
let (mode_bytes, reader) = reader.split_at(2);
let mode: EncodeMode = [mode_bytes[0], mode_bytes[1]].try_into()?;
let ans = ParsedHeaderAndBody {
mode,
checksum_body,
checksum: checksum.try_into().unwrap(),
body: reader,
};
ans.check_checksum()?;
Ok(ans)
}
fn encode_header_and_body(mode: EncodeMode, body: Vec<u8>) -> Vec<u8> {
let mut ans = Vec::new();
ans.extend(MAGIC_BYTES);
let checksum = [0; 16];
ans.extend(checksum);
ans.extend(mode.to_bytes());
ans.extend(body);
let checksum_body = &ans[20..];
let checksum = md5::compute(checksum_body).0;
ans[4..20].copy_from_slice(&checksum);
ans
}
pub(crate) fn export_snapshot(doc: &LoroDoc) -> Vec<u8> {
let body = encode_reordered::encode_snapshot(
&doc.oplog().try_lock().unwrap(),
&doc.app_state().try_lock().unwrap(),
&Default::default(),
);
encode_header_and_body(EncodeMode::Snapshot, body)
}
pub(crate) fn decode_snapshot(
doc: &LoroDoc,
mode: EncodeMode,
body: &[u8],
) -> Result<(), LoroError> {
match mode {
EncodeMode::Snapshot => encode_reordered::decode_snapshot(doc, body),
_ => unreachable!(),
}
}

View file

@ -1,735 +0,0 @@
use fxhash::{FxHashMap, FxHashSet};
use loro_common::{HasCounterSpan, HasIdSpan, HasLamportSpan, TreeID};
use rle::{HasLength, RleVec, Sliceable};
use serde_columnar::{columnar, iter_from_bytes, to_vec};
use std::{borrow::Cow, ops::Deref, sync::Arc};
use crate::{
change::{Change, Timestamp},
container::{
idx::ContainerIdx,
list::list_op::{DeleteSpan, ListOp},
map::MapSet,
richtext::TextStyleInfoFlag,
tree::tree_op::TreeOp,
ContainerID, ContainerType,
},
id::{Counter, PeerID, ID},
op::{ListSlice, RawOpContent, RemoteOp},
oplog::OpLog,
span::HasId,
version::Frontiers,
InternalString, LoroError, LoroValue, VersionVector,
};
type PeerIdx = u32;
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, serde::Serialize, serde::Deserialize)]
struct RootContainer {
name: InternalString,
type_: ContainerType,
}
#[columnar(vec, ser, de, iterable)]
#[derive(Debug, Clone)]
struct NormalContainer {
#[columnar(strategy = "DeltaRle")]
peer_idx: PeerIdx,
#[columnar(strategy = "DeltaRle")]
counter: Counter,
#[columnar(strategy = "Rle")]
type_: u8,
}
#[columnar(vec, ser, de, iterable)]
#[derive(Debug, Clone)]
struct ChangeEncoding {
#[columnar(strategy = "Rle")]
pub(super) peer_idx: PeerIdx,
#[columnar(strategy = "DeltaRle")]
pub(super) timestamp: Timestamp,
#[columnar(strategy = "DeltaRle")]
pub(super) op_len: u32,
/// The length of deps that exclude the dep on the same client
#[columnar(strategy = "Rle")]
pub(super) deps_len: u32,
/// Whether the change has a dep on the same client.
/// It can save lots of space by using this field instead of [`DepsEncoding`]
#[columnar(strategy = "BoolRle")]
pub(super) dep_on_self: bool,
}
#[columnar(vec, ser, de, iterable)]
#[derive(Debug, Clone)]
struct OpEncoding {
#[columnar(strategy = "DeltaRle")]
container: usize,
/// Key index or insert/delete pos or target tree id index
#[columnar(strategy = "DeltaRle")]
prop: usize,
/// 0: insert or the parent tree id is not none
/// 1: delete or the parent tree id is none
/// 2: text-anchor-start
/// 3: text-anchor-end
#[columnar(strategy = "Rle")]
kind: u8,
/// the length of the deletion or insertion or target tree id index
#[columnar(strategy = "Rle")]
insert_del_len: isize,
}
#[derive(PartialEq, Eq)]
enum Kind {
Insert,
Delete,
TextAnchorStart,
TextAnchorEnd,
}
impl Kind {
fn from_byte(byte: u8) -> Self {
match byte {
0 => Self::Insert,
1 => Self::Delete,
2 => Self::TextAnchorStart,
3 => Self::TextAnchorEnd,
_ => panic!("invalid kind byte"),
}
}
fn to_byte(&self) -> u8 {
match self {
Self::Insert => 0,
Self::Delete => 1,
Self::TextAnchorStart => 2,
Self::TextAnchorEnd => 3,
}
}
}
#[columnar(vec, ser, de, iterable)]
#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)]
pub(super) struct DepsEncoding {
#[columnar(strategy = "DeltaRle")]
pub(super) client_idx: PeerIdx,
#[columnar(strategy = "DeltaRle")]
pub(super) counter: Counter,
}
type TreeIDEncoding = DepsEncoding;
impl DepsEncoding {
pub(super) fn new(client_idx: PeerIdx, counter: Counter) -> Self {
Self {
client_idx,
counter,
}
}
}
#[columnar(ser, de)]
struct DocEncoding<'a> {
#[columnar(class = "vec", iter = "ChangeEncoding")]
changes: Vec<ChangeEncoding>,
#[columnar(class = "vec", iter = "OpEncoding")]
ops: Vec<OpEncoding>,
#[columnar(class = "vec", iter = "DepsEncoding")]
deps: Vec<DepsEncoding>,
#[columnar(class = "vec")]
normal_containers: Vec<NormalContainer>,
#[columnar(borrow)]
str: Cow<'a, str>,
#[columnar(borrow)]
style_info: Cow<'a, [u8]>,
style_key: Vec<usize>,
style_values: Vec<LoroValue>,
root_containers: Vec<RootContainer>,
start_counter: Vec<Counter>,
values: Vec<Option<LoroValue>>,
clients: Vec<PeerID>,
keys: Vec<InternalString>,
// the index 0 is DELETE_ROOT
tree_ids: Vec<TreeIDEncoding>,
}
pub fn encode_oplog_v2(oplog: &OpLog, vv: &VersionVector) -> Vec<u8> {
let mut peer_id_to_idx: FxHashMap<PeerID, PeerIdx> = FxHashMap::default();
let mut peers = Vec::with_capacity(oplog.changes().len());
let mut diff_changes = Vec::new();
let self_vv = oplog.vv();
let start_vv = vv.trim(&oplog.vv());
let diff = self_vv.diff(&start_vv);
let mut start_counter = Vec::new();
for span in diff.left.iter() {
let id = span.id_start();
let changes = oplog.get_change_at(id).unwrap();
let peer_id = *span.0;
let idx = peers.len() as PeerIdx;
peers.push(peer_id);
peer_id_to_idx.insert(peer_id, idx);
start_counter.push(
changes
.id
.counter
.max(start_vv.get(&peer_id).copied().unwrap_or(0)),
);
}
for (change, _) in oplog.iter_causally(start_vv.clone(), self_vv.clone()) {
let start_cnt = start_vv.get(&change.id.peer).copied().unwrap_or(0);
if change.id.counter < start_cnt {
let offset = start_cnt - change.id.counter;
diff_changes.push(Cow::Owned(change.slice(offset as usize, change.atom_len())));
} else {
diff_changes.push(Cow::Borrowed(change));
}
}
let (root_containers, container_idx2index, normal_containers) =
extract_containers(&diff_changes, oplog, &mut peer_id_to_idx, &mut peers);
for change in &diff_changes {
for deps in change.deps.iter() {
peer_id_to_idx.entry(deps.peer).or_insert_with(|| {
let idx = peers.len() as PeerIdx;
peers.push(deps.peer);
idx
});
}
}
let change_num = diff_changes.len();
let mut changes = Vec::with_capacity(change_num);
let mut ops = Vec::with_capacity(change_num);
let mut keys = Vec::new();
let mut key_to_idx = FxHashMap::default();
let mut deps = Vec::with_capacity(change_num);
let mut values = Vec::new();
// the index 0 is DELETE_ROOT
let mut tree_ids = Vec::new();
let mut tree_id_to_idx = FxHashMap::default();
let mut string: String = String::new();
let mut style_key_idx = Vec::new();
let mut style_values = Vec::new();
let mut style_info = Vec::new();
for change in &diff_changes {
let client_idx = peer_id_to_idx[&change.id.peer];
let mut dep_on_self = false;
let mut deps_len = 0;
for dep in change.deps.iter() {
if change.id.peer != dep.peer {
deps.push(DepsEncoding::new(
*peer_id_to_idx.get(&dep.peer).unwrap(),
dep.counter,
));
deps_len += 1;
} else {
dep_on_self = true;
}
}
let mut op_len = 0;
for op in change.ops.iter() {
let container = op.container;
let container_index = *container_idx2index.get(&container).unwrap();
let remote_ops = oplog.local_op_to_remote(op);
for op in remote_ops {
let content = op.content;
let (prop, kind, insert_del_len) = match content {
crate::op::RawOpContent::Tree(TreeOp { target, parent }) => {
// TODO: refactor extract register idx
let target_peer_idx =
*peer_id_to_idx.entry(target.peer).or_insert_with(|| {
let idx = peers.len() as PeerIdx;
peers.push(target.peer);
idx
});
let target_encoding = TreeIDEncoding {
client_idx: target_peer_idx,
counter: target.counter,
};
let target_idx =
*tree_id_to_idx.entry(target_encoding).or_insert_with(|| {
tree_ids.push(target_encoding);
// the index 0 is DELETE_ROOT
tree_ids.len()
});
let (is_none, parent_idx) = if let Some(parent) = parent {
if TreeID::is_deleted_root(Some(parent)) {
(Kind::Insert, 0)
} else {
let parent_peer_idx =
*peer_id_to_idx.entry(parent.peer).or_insert_with(|| {
let idx = peers.len() as PeerIdx;
peers.push(parent.peer);
idx
});
let parent_encoding = TreeIDEncoding {
client_idx: parent_peer_idx,
counter: parent.counter,
};
let parent_idx =
*tree_id_to_idx.entry(parent_encoding).or_insert_with(|| {
tree_ids.push(parent_encoding);
tree_ids.len()
});
(Kind::Insert, parent_idx)
}
} else {
(Kind::Delete, 0)
};
(target_idx, is_none, parent_idx as isize)
}
crate::op::RawOpContent::Map(MapSet { key, value }) => {
if value.is_some() {
values.push(value.clone());
}
(
*key_to_idx.entry(key.clone()).or_insert_with(|| {
keys.push(key.clone());
keys.len() - 1
}),
if value.is_some() {
Kind::Insert
} else {
Kind::Delete
},
0,
)
}
crate::op::RawOpContent::List(list) => match list {
ListOp::Insert { slice, pos } => {
let len;
match &slice {
ListSlice::RawData(v) => {
len = 0;
values.push(Some(LoroValue::List(Arc::new(v.to_vec()))));
}
ListSlice::RawStr {
str,
unicode_len: _,
} => {
len = str.len();
assert!(len > 0, "{:?}", &slice);
string.push_str(str.deref());
}
};
(pos, Kind::Insert, len as isize)
}
ListOp::Delete(span) => {
// span.len maybe negative
(span.pos as usize, Kind::Delete, span.signed_len)
}
ListOp::StyleStart {
start,
end,
key,
info,
value,
} => {
let key_idx = *key_to_idx.entry(key.clone()).or_insert_with(|| {
keys.push(key.clone());
keys.len() - 1
});
style_key_idx.push(key_idx);
style_info.push(info.to_byte());
style_values.push(value);
(
start as usize,
Kind::TextAnchorStart,
end as isize - start as isize,
)
}
ListOp::StyleEnd => (0, Kind::TextAnchorEnd, 0),
},
};
op_len += 1;
ops.push(OpEncoding {
prop,
kind: kind.to_byte(),
insert_del_len,
container: container_index,
})
}
}
changes.push(ChangeEncoding {
peer_idx: client_idx as PeerIdx,
timestamp: change.timestamp,
deps_len,
op_len,
dep_on_self,
});
}
let encoded = DocEncoding {
changes,
ops,
deps,
str: Cow::Owned(string),
clients: peers,
keys,
start_counter,
root_containers,
normal_containers,
values,
style_key: style_key_idx,
style_values,
style_info: Cow::Owned(style_info),
tree_ids,
};
to_vec(&encoded).unwrap()
}
/// Extract containers from oplog changes.
///
/// Containers are sorted by their peer_id and counter so that
/// they can be compressed by using delta encoding.
fn extract_containers(
diff_changes: &Vec<Cow<Change>>,
oplog: &OpLog,
peer_id_to_idx: &mut FxHashMap<PeerID, PeerIdx>,
peers: &mut Vec<PeerID>,
) -> (
Vec<RootContainer>,
FxHashMap<ContainerIdx, usize>,
Vec<NormalContainer>,
) {
let mut root_containers = Vec::new();
let mut container_idx2index = FxHashMap::default();
let normal_containers = {
// register containers in sorted order
let mut visited = FxHashSet::default();
let mut normal_container_idx_pairs = Vec::new();
for change in diff_changes {
for op in change.ops.iter() {
let container = op.container;
if visited.contains(&container) {
continue;
}
visited.insert(container);
let id = oplog.arena.get_container_id(container).unwrap();
match id {
ContainerID::Root {
name,
container_type,
} => {
container_idx2index.insert(container, root_containers.len());
root_containers.push(RootContainer {
name,
type_: container_type,
});
}
ContainerID::Normal {
peer,
counter,
container_type,
} => normal_container_idx_pairs.push((
NormalContainer {
peer_idx: *peer_id_to_idx.entry(peer).or_insert_with(|| {
peers.push(peer);
(peers.len() - 1) as PeerIdx
}),
counter,
type_: container_type.to_u8(),
},
container,
)),
}
}
}
normal_container_idx_pairs.sort_by(|a, b| {
if a.0.peer_idx != b.0.peer_idx {
a.0.peer_idx.cmp(&b.0.peer_idx)
} else {
a.0.counter.cmp(&b.0.counter)
}
});
let mut index = root_containers.len();
normal_container_idx_pairs
.into_iter()
.map(|(container, idx)| {
container_idx2index.insert(idx, index);
index += 1;
container
})
.collect::<Vec<_>>()
};
(root_containers, container_idx2index, normal_containers)
}
pub fn decode_oplog_v2(oplog: &mut OpLog, input: &[u8]) -> Result<(), LoroError> {
let encoded = iter_from_bytes::<DocEncoding>(input)
.map_err(|e| LoroError::DecodeError(e.to_string().into()))?;
let DocEncodingIter {
changes: change_encodings,
ops,
deps,
normal_containers,
mut start_counter,
str,
clients: peers,
keys,
root_containers,
values,
style_key,
style_values,
style_info,
tree_ids,
} = encoded;
debug_log::debug_dbg!(&start_counter);
let mut op_iter = ops;
let mut deps_iter = deps;
let mut style_key_iter = style_key.into_iter();
let mut style_value_iter = style_values.into_iter();
let mut style_info_iter = style_info.iter();
let get_container = |idx: usize| {
if idx < root_containers.len() {
let Some(container) = root_containers.get(idx) else {
return None;
};
Some(ContainerID::Root {
name: container.name.clone(),
container_type: container.type_,
})
} else {
let Some(container) = normal_containers.get(idx - root_containers.len()) else {
return None;
};
Some(ContainerID::Normal {
peer: peers[container.peer_idx as usize],
counter: container.counter,
container_type: ContainerType::from_u8(container.type_),
})
}
};
let mut value_iter = values.into_iter();
let mut str_index = 0;
let changes = change_encodings
.map(|change_encoding| {
let counter = start_counter
.get_mut(change_encoding.peer_idx as usize)
.unwrap();
let ChangeEncoding {
peer_idx,
timestamp,
op_len,
deps_len,
dep_on_self,
} = change_encoding;
let peer_id = peers[peer_idx as usize];
let mut ops = RleVec::<[RemoteOp; 1]>::new();
let mut delta = 0;
for op in op_iter.by_ref().take(op_len as usize) {
let OpEncoding {
container: container_idx,
prop,
insert_del_len,
kind,
} = op;
let Some(container_id) = get_container(container_idx) else {
return Err(LoroError::DecodeError("".into()));
};
let container_type = container_id.container_type();
let content = match container_type {
ContainerType::Tree => {
let target_encoding = tree_ids[prop - 1];
let target = TreeID {
peer: peers[target_encoding.client_idx as usize],
counter: target_encoding.counter,
};
let parent = if kind == 1 {
None
} else if insert_del_len == 0 {
TreeID::delete_root()
} else {
let parent_encoding = tree_ids[insert_del_len as usize - 1];
let parent = TreeID {
peer: peers[parent_encoding.client_idx as usize],
counter: parent_encoding.counter,
};
Some(parent)
};
RawOpContent::Tree(TreeOp { target, parent })
}
ContainerType::Map => {
let key = keys[prop].clone();
if Kind::from_byte(kind) == Kind::Delete {
RawOpContent::Map(MapSet { key, value: None })
} else {
RawOpContent::Map(MapSet {
key,
value: value_iter.next().unwrap(),
})
}
}
ContainerType::List | ContainerType::Text => {
let pos = prop;
match Kind::from_byte(kind) {
Kind::Insert => match container_type {
ContainerType::Text => {
let insert_len = insert_del_len as usize;
let s = &str[str_index..str_index + insert_len];
str_index += insert_len;
RawOpContent::List(ListOp::Insert {
slice: ListSlice::from_borrowed_str(s),
pos,
})
}
ContainerType::List => {
let value = value_iter.next().flatten().unwrap();
RawOpContent::List(ListOp::Insert {
slice: ListSlice::RawData(Cow::Owned(
match Arc::try_unwrap(value.into_list().unwrap()) {
Ok(v) => v,
Err(v) => v.deref().clone(),
},
)),
pos,
})
}
_ => unreachable!(),
},
Kind::Delete => RawOpContent::List(ListOp::Delete(DeleteSpan {
pos: pos as isize,
signed_len: insert_del_len,
})),
Kind::TextAnchorStart => RawOpContent::List(ListOp::StyleStart {
start: pos as u32,
end: insert_del_len as u32 + pos as u32,
key: keys[style_key_iter.next().unwrap()].clone(),
value: style_value_iter.next().unwrap(),
info: TextStyleInfoFlag::from_byte(
*style_info_iter.next().unwrap(),
),
}),
Kind::TextAnchorEnd => RawOpContent::List(ListOp::StyleEnd),
}
}
};
let remote_op = RemoteOp {
container: container_id,
counter: *counter + delta,
content,
};
delta += remote_op.content_len() as i32;
ops.push(remote_op);
}
let mut deps: Frontiers = (0..deps_len)
.map(|_| {
let raw = deps_iter.next().unwrap();
ID::new(peers[raw.client_idx as usize], raw.counter)
})
.collect();
if dep_on_self && *counter > 0 {
deps.push(ID::new(peer_id, *counter - 1));
}
let change = Change {
id: ID {
peer: peer_id,
counter: *counter,
},
// calc lamport after parsing all changes
lamport: 0,
has_dependents: false,
timestamp,
ops,
deps,
};
*counter += delta;
Ok(change)
})
.collect::<Result<Vec<_>, LoroError>>();
let changes = match changes {
Ok(changes) => changes,
Err(err) => return Err(err),
};
let mut pending_remote_changes = Vec::new();
debug_log::debug_dbg!(&changes);
let mut latest_ids = Vec::new();
oplog.arena.clone().with_op_converter(|converter| {
'outer: for mut change in changes {
if change.ctr_end() <= oplog.vv().get(&change.id.peer).copied().unwrap_or(0) {
// skip included changes
continue;
}
latest_ids.push(change.id_last());
// calc lamport or pending if its deps are not satisfied
for dep in change.deps.iter() {
match oplog.dag.get_lamport(dep) {
Some(lamport) => {
change.lamport = change.lamport.max(lamport + 1);
}
None => {
pending_remote_changes.push(change);
continue 'outer;
}
}
}
// convert change into inner format
let mut ops = RleVec::new();
for op in change.ops {
let lamport = change.lamport;
let content = op.content;
let op = converter.convert_single_op(
&op.container,
change.id.peer,
op.counter,
lamport,
content,
);
ops.push(op);
}
let change = Change {
ops,
id: change.id,
deps: change.deps,
lamport: change.lamport,
timestamp: change.timestamp,
has_dependents: false,
};
let Some(change) = oplog.trim_the_known_part_of_change(change) else {
continue;
};
// update dag and push the change
let mark = oplog.insert_dag_node_on_new_change(&change);
oplog.next_lamport = oplog.next_lamport.max(change.lamport_end());
oplog.latest_timestamp = oplog.latest_timestamp.max(change.timestamp);
oplog.dag.vv.extend_to_include_end_id(ID {
peer: change.id.peer,
counter: change.id.counter + change.atom_len() as Counter,
});
oplog.insert_new_change(change, mark);
}
});
let mut vv = oplog.dag.vv.clone();
oplog.try_apply_pending(latest_ids, &mut vv);
if !oplog.batch_importing {
oplog.dag.refresh_frontiers();
}
oplog.import_unknown_lamport_remote_changes(pending_remote_changes)?;
assert_eq!(str_index, str.len());
Ok(())
}

File diff suppressed because it is too large Load diff

File diff suppressed because it is too large Load diff

View file

@ -1,170 +0,0 @@
use rle::{HasLength, RleVec};
use serde::{Deserialize, Serialize};
use smallvec::SmallVec;
use crate::{
change::{Change, Lamport, Timestamp},
container::ContainerID,
encoding::RemoteClientChanges,
id::{Counter, PeerID, ID},
op::{RawOpContent, RemoteOp},
oplog::OpLog,
version::Frontiers,
LoroError, VersionVector,
};
#[derive(Serialize, Deserialize, Debug)]
struct Updates {
changes: Vec<EncodedClientChanges>,
}
/// the continuous changes from the same client
#[derive(Serialize, Deserialize, Debug)]
struct EncodedClientChanges {
meta: FirstChangeInfo,
data: Vec<EncodedChange>,
}
#[derive(Serialize, Deserialize, Debug)]
struct FirstChangeInfo {
pub(crate) client: PeerID,
pub(crate) counter: Counter,
pub(crate) lamport: Lamport,
pub(crate) timestamp: Timestamp,
}
#[derive(Serialize, Deserialize, Debug)]
struct EncodedOp {
pub(crate) container: ContainerID,
pub(crate) content: RawOpContent<'static>,
}
#[derive(Serialize, Deserialize, Debug)]
struct EncodedChange {
pub(crate) ops: Vec<EncodedOp>,
pub(crate) deps: Vec<ID>,
pub(crate) lamport_delta: u32,
pub(crate) timestamp_delta: i64,
}
pub(crate) fn encode_oplog_updates(oplog: &OpLog, from: &VersionVector) -> Vec<u8> {
let changes = oplog.export_changes_from(from);
let mut updates = Updates {
changes: Vec::with_capacity(changes.len()),
};
for (_, changes) in changes {
let encoded = convert_changes_to_encoded(changes.into_iter());
updates.changes.push(encoded);
}
postcard::to_allocvec(&updates).unwrap()
}
pub(crate) fn decode_oplog_updates(oplog: &mut OpLog, updates: &[u8]) -> Result<(), LoroError> {
let changes = decode_updates(updates)?;
oplog.import_remote_changes(changes)?;
Ok(())
}
pub(super) fn decode_updates(input: &[u8]) -> Result<RemoteClientChanges<'static>, LoroError> {
let updates: Updates =
postcard::from_bytes(input).map_err(|e| LoroError::DecodeError(e.to_string().into()))?;
let mut changes: RemoteClientChanges = Default::default();
for encoded in updates.changes {
changes.insert(encoded.meta.client, convert_encoded_to_changes(encoded));
}
Ok(changes)
}
fn convert_changes_to_encoded<'a, I>(mut changes: I) -> EncodedClientChanges
where
I: Iterator<Item = Change<RemoteOp<'a>>>,
{
let first_change = changes.next().unwrap();
let this_client_id = first_change.id.peer;
let mut data = Vec::with_capacity(changes.size_hint().0 + 1);
let mut last_change = first_change.clone();
data.push(EncodedChange {
ops: first_change
.ops
.iter()
.map(|op| EncodedOp {
container: op.container.clone(),
content: op.content.to_static(),
})
.collect(),
deps: first_change.deps.iter().copied().collect(),
lamport_delta: 0,
timestamp_delta: 0,
});
for change in changes {
data.push(EncodedChange {
ops: change
.ops
.iter()
.map(|op| EncodedOp {
container: op.container.clone(),
content: op.content.to_static(),
})
.collect(),
deps: change.deps.iter().copied().collect(),
lamport_delta: change.lamport - last_change.lamport,
timestamp_delta: change.timestamp - last_change.timestamp,
});
last_change = change;
}
EncodedClientChanges {
meta: FirstChangeInfo {
client: this_client_id,
counter: first_change.id.counter,
lamport: first_change.lamport,
timestamp: first_change.timestamp,
},
data,
}
}
fn convert_encoded_to_changes(changes: EncodedClientChanges) -> Vec<Change<RemoteOp<'static>>> {
let mut result = Vec::with_capacity(changes.data.len());
let mut last_lamport = changes.meta.lamport;
let mut last_timestamp = changes.meta.timestamp;
let mut counter: Counter = changes.meta.counter;
for encoded in changes.data {
let start_counter = counter;
let mut deps: Frontiers = SmallVec::with_capacity(encoded.deps.len()).into();
for dep in encoded.deps {
deps.push(dep);
}
let mut ops = RleVec::with_capacity(encoded.ops.len());
for op in encoded.ops {
let len: usize = op.content.atom_len();
let content = op.content;
ops.push(RemoteOp {
counter,
container: op.container,
content,
});
counter += len as Counter;
}
let change = Change {
id: ID {
peer: changes.meta.client,
counter: start_counter,
},
lamport: last_lamport + encoded.lamport_delta,
timestamp: last_timestamp + encoded.timestamp_delta,
ops,
deps,
has_dependents: false,
};
last_lamport = change.lamport;
last_timestamp = change.timestamp;
result.push(change);
}
result
}

View file

@ -135,7 +135,7 @@ impl DiffVariant {
#[non_exhaustive]
#[derive(Clone, Debug, EnumAsInner, Serialize)]
pub(crate) enum InternalDiff {
SeqRaw(Delta<SliceRanges>),
ListRaw(Delta<SliceRanges>),
/// This always uses entity indexes.
RichtextRaw(Delta<RichtextStateChunk>),
Map(MapDelta),
@ -177,7 +177,7 @@ impl From<Diff> for DiffVariant {
impl InternalDiff {
pub(crate) fn is_empty(&self) -> bool {
match self {
InternalDiff::SeqRaw(s) => s.is_empty(),
InternalDiff::ListRaw(s) => s.is_empty(),
InternalDiff::RichtextRaw(t) => t.is_empty(),
InternalDiff::Map(m) => m.updated.is_empty(),
InternalDiff::Tree(t) => t.is_empty(),
@ -187,8 +187,8 @@ impl InternalDiff {
pub(crate) fn compose(self, diff: InternalDiff) -> Result<Self, Self> {
// PERF: avoid clone
match (self, diff) {
(InternalDiff::SeqRaw(a), InternalDiff::SeqRaw(b)) => {
Ok(InternalDiff::SeqRaw(a.compose(b)))
(InternalDiff::ListRaw(a), InternalDiff::ListRaw(b)) => {
Ok(InternalDiff::ListRaw(a.compose(b)))
}
(InternalDiff::RichtextRaw(a), InternalDiff::RichtextRaw(b)) => {
Ok(InternalDiff::RichtextRaw(a.compose(b)))

View file

@ -312,7 +312,7 @@ impl Actionable for Vec<LoroDoc> {
*site %= self.len() as u8;
let app_state = &mut self[*site as usize].app_state().lock().unwrap();
let text = app_state.get_text("text").unwrap();
if text.is_empty() {
if text.len_unicode() == 0 {
*len = 0;
*pos = 0;
} else {
@ -436,8 +436,8 @@ where
let f_ref: *const _ = &f;
let f_ref: usize = f_ref as usize;
#[allow(clippy::redundant_clone)]
let actions_clone = actions.clone();
let action_ref: usize = (&actions_clone) as *const _ as usize;
let mut actions_clone = actions.clone();
let action_ref: usize = (&mut actions_clone) as *mut _ as usize;
#[allow(clippy::blocks_in_if_conditions)]
if std::panic::catch_unwind(|| {
// SAFETY: test
@ -465,8 +465,8 @@ where
while let Some(candidate) = candidates.pop() {
let f_ref: *const _ = &f;
let f_ref: usize = f_ref as usize;
let actions_clone = candidate.clone();
let action_ref: usize = (&actions_clone) as *const _ as usize;
let mut actions_clone = candidate.clone();
let action_ref: usize = (&mut actions_clone) as *mut _ as usize;
#[allow(clippy::blocks_in_if_conditions)]
if std::panic::catch_unwind(|| {
// SAFETY: test
@ -1336,6 +1336,35 @@ mod test {
)
}
#[test]
fn snapshot_fuzz_test() {
test_multi_sites(
8,
&mut [
Ins {
content: 163,
pos: 0,
site: 3,
},
Ins {
content: 163,
pos: 1,
site: 3,
},
Ins {
content: 113,
pos: 2,
site: 3,
},
Ins {
content: 888,
pos: 3,
site: 3,
},
],
)
}
#[test]
fn text_fuzz_2() {
test_multi_sites(
@ -1980,25 +2009,19 @@ mod test {
&mut [
Ins {
content: 41009,
pos: 10884953820616207167,
site: 151,
pos: 0,
site: 1,
},
Mark {
pos: 150995095,
len: 7502773972505002496,
site: 0,
style_key: 0,
},
Mark {
pos: 11821702543106517760,
len: 4251403153421165732,
site: 151,
pos: 0,
len: 2,
site: 1,
style_key: 151,
},
Mark {
pos: 589824,
len: 2233786514697303298,
site: 51,
pos: 0,
len: 1,
site: 0,
style_key: 151,
},
],
@ -2287,8 +2310,79 @@ mod test {
)
}
#[test]
fn fuzz_snapshot() {
test_multi_sites(
5,
&mut [
Ins {
content: 52480,
pos: 0,
site: 1,
},
Mark {
pos: 6,
len: 1,
site: 1,
style_key: 5,
},
Ins {
content: 8224,
pos: 0,
site: 1,
},
Del {
pos: 12,
len: 1,
site: 1,
},
Ins {
content: 257,
pos: 10,
site: 1,
},
Ins {
content: 332,
pos: 11,
site: 1,
},
Del {
pos: 1,
len: 21,
site: 1,
},
Del {
pos: 0,
len: 1,
site: 1,
},
Ins {
content: 11309,
pos: 0,
site: 4,
},
],
)
}
#[test]
fn mini_r() {
minify_error(5, vec![], test_multi_sites, |_, ans| ans.to_vec())
minify_error(5, vec![], test_multi_sites, |site_num, ans| {
let mut sites = Vec::new();
for i in 0..site_num {
let loro = LoroDoc::new();
loro.set_peer_id(i as u64).unwrap();
sites.push(loro);
}
let mut applied = Vec::new();
for action in ans.iter_mut() {
sites.preprocess(action);
applied.push(action.clone());
sites.apply_action(action);
}
ans.to_vec()
})
}
}

View file

@ -784,7 +784,7 @@ fn check_synced(sites: &mut [Actor]) {
fn check_history(actor: &mut Actor) {
assert!(!actor.history.is_empty());
for (_, (f, v)) in actor.history.iter().enumerate() {
for (f, v) in actor.history.iter() {
let f = Frontiers::from(f);
debug_log::group!(
"Checkout from {:?} to {:?}",

View file

@ -1878,6 +1878,257 @@ mod failed_tests {
)
}
#[test]
fn history() {
test_multi_sites(
3,
&mut [
Tree {
site: 2,
container_idx: 0,
action: TreeAction::Create,
target: (2, 0),
parent: (11863787638307726561, 0),
},
Sync { from: 2, to: 0 },
Tree {
site: 2,
container_idx: 0,
action: TreeAction::Delete,
target: (2, 0),
parent: (18446537369818038270, 320017407),
},
],
)
}
#[test]
fn encoding_err() {
test_multi_sites(
5,
&mut [
Tree {
site: 255,
container_idx: 255,
action: TreeAction::Meta,
target: (13186597159363543035, 0),
parent: (34380633367117824, 913864704),
},
Tree {
site: 255,
container_idx: 255,
action: TreeAction::Move,
target: (18409103694470982982, 1174405120),
parent: (34417214799359302, 913864704),
},
Tree {
site: 255,
container_idx: 255,
action: TreeAction::Move,
target: (5063812098665360710, 1180190267),
parent: (5063812098663728710, 1480997702),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (5063812098665367110, 1179010630),
parent: (5063812098665349190, 0),
},
Tree {
site: 120,
container_idx: 58,
action: TreeAction::Meta,
target: (5063784610870067200, 1179010630),
parent: (5063812098666546747, 1179010630),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Create,
target: (5049942557165879296, 1179010630),
parent: (201210795607622, 0),
},
Tree {
site: 122,
container_idx: 0,
action: TreeAction::Create,
target: (281470890949240, 759580160),
parent: (280900630103622, 759580160),
},
Tree {
site: 122,
container_idx: 0,
action: TreeAction::Create,
target: (281470890949240, 759580160),
parent: (13835902485686273606, 1179010648),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (5063812098665367110, 1179010630),
parent: (5063812098665367110, 1179010630),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (18446476309249406584, 1174405120),
parent: (5063812098665360710, 1180190267),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (5063812098665367110, 1179010630),
parent: (5063989120037438976, 0),
},
Tree {
site: 120,
container_idx: 58,
action: TreeAction::Meta,
target: (5063784610870067200, 1179010630),
parent: (5063812098666546747, 1179010630),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (5208490236694644294, 1212696648),
parent: (5208492444341520456, 1212696648),
},
Tree {
site: 72,
container_idx: 72,
action: TreeAction::Move,
target: (5208492444341520456, 1212696648),
parent: (5208492444341520456, 1212696648),
},
Tree {
site: 72,
container_idx: 72,
action: TreeAction::Move,
target: (5208492444341520456, 1212696648),
parent: (5208492444341520456, 1212696648),
},
Tree {
site: 72,
container_idx: 72,
action: TreeAction::Move,
target: (5063812098665367110, 1179010630),
parent: (5053397524527072838, 70),
},
Tree {
site: 0,
container_idx: 0,
action: TreeAction::Create,
target: (0, 0),
parent: (5063812098665349120, 1179010630),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (18446476309249406584, 1174405120),
parent: (5063812098665360710, 1179010619),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (5063812098665367110, 1179010630),
parent: (5063812098665367110, 1174423110),
},
Tree {
site: 0,
container_idx: 0,
action: TreeAction::Create,
target: (0, 0),
parent: (0, 0),
},
Tree {
site: 0,
container_idx: 0,
action: TreeAction::Create,
target: (0, 0),
parent: (0, 0),
},
Tree {
site: 0,
container_idx: 0,
action: TreeAction::Create,
target: (0, 0),
parent: (4412786865808080896, 2013281677),
},
Sync { from: 12, to: 255 },
Tree {
site: 70,
container_idx: 45,
action: TreeAction::Move,
target: (5063812175974057542, 1179010630),
parent: (5063812098665367110, 1179010630),
},
Tree {
site: 72,
container_idx: 72,
action: TreeAction::Move,
target: (5208492444341520456, 1212696648),
parent: (5208492444341520456, 1212696648),
},
Tree {
site: 72,
container_idx: 72,
action: TreeAction::Move,
target: (5208492444341520456, 1212696648),
parent: (5208492444341520456, 1212696648),
},
Tree {
site: 72,
container_idx: 72,
action: TreeAction::Move,
target: (5063812098665367624, 1179010630),
parent: (5063812098665367110, 1179001158),
},
Tree {
site: 0,
container_idx: 0,
action: TreeAction::Create,
target: (0, 0),
parent: (5063812098665349120, 1179010630),
},
Tree {
site: 70,
container_idx: 70,
action: TreeAction::Move,
target: (19780516010411520, 0),
parent: (18378196371912030344, 255),
},
Tree {
site: 177,
container_idx: 185,
action: TreeAction::Create,
target: (5063812098967361606, 1179010630),
parent: (5063812098665367110, 1179010630),
},
Tree {
site: 70,
container_idx: 72,
action: TreeAction::Move,
target: (5208492444341520456, 1212696648),
parent: (5208492444341520456, 1212696648),
},
Tree {
site: 72,
container_idx: 72,
action: TreeAction::Move,
target: (4271743721848457288, 1179015238),
parent: (0, 0),
},
],
);
}
#[test]
fn to_minify() {
minify_error(5, vec![], test_multi_sites, normalize)

View file

@ -10,6 +10,7 @@ pub mod arena;
pub mod diff_calc;
pub mod handler;
pub use event::{ContainerDiff, DiffEvent, DocDiff};
pub use fxhash::FxHashMap;
pub use handler::{ListHandler, MapHandler, TextHandler, TreeHandler};
pub use loro::LoroDoc;
pub use oplog::OpLog;
@ -17,7 +18,6 @@ pub use state::DocState;
pub mod loro;
pub mod obs;
pub mod oplog;
mod state;
pub mod txn;
pub mod change;
@ -42,6 +42,7 @@ pub mod event;
pub use error::{LoroError, LoroResult};
pub(crate) mod macros;
pub(crate) mod state;
pub(crate) mod value;
pub(crate) use change::Timestamp;
pub(crate) use id::{PeerID, ID};
@ -50,7 +51,7 @@ pub(crate) use id::{PeerID, ID};
pub(crate) type InternalString = DefaultAtom;
pub use container::ContainerType;
pub use fxhash::FxHashMap;
pub use loro_common::{loro_value, to_value};
pub use value::{ApplyDiff, LoroValue, ToJson};
pub use version::VersionVector;

View file

@ -16,7 +16,9 @@ use crate::{
arena::SharedArena,
change::Timestamp,
container::{idx::ContainerIdx, IntoContainerId},
encoding::{EncodeMode, ENCODE_SCHEMA_VERSION, MAGIC_BYTES},
encoding::{
decode_snapshot, export_snapshot, parse_header_and_body, EncodeMode, ParsedHeaderAndBody,
},
handler::TextHandler,
handler::TreeHandler,
id::PeerID,
@ -26,7 +28,6 @@ use crate::{
use super::{
diff_calc::DiffCalculator,
encoding::encode_snapshot::{decode_app_snapshot, encode_app_snapshot},
event::InternalDocDiff,
obs::{Observer, SubID, Subscriber},
oplog::OpLog,
@ -58,7 +59,7 @@ pub struct LoroDoc {
arena: SharedArena,
observer: Arc<Observer>,
diff_calculator: Arc<Mutex<DiffCalculator>>,
// when dropping the doc, the txn will be commited
// when dropping the doc, the txn will be committed
txn: Arc<Mutex<Option<Transaction>>>,
auto_commit: AtomicBool,
detached: AtomicBool,
@ -99,15 +100,14 @@ impl LoroDoc {
pub fn from_snapshot(bytes: &[u8]) -> LoroResult<Self> {
let doc = Self::new();
let (input, mode) = parse_encode_header(bytes)?;
match mode {
EncodeMode::Snapshot => {
decode_app_snapshot(&doc, input, true)?;
Ok(doc)
}
_ => Err(LoroError::DecodeError(
let ParsedHeaderAndBody { mode, body, .. } = parse_header_and_body(bytes)?;
if mode.is_snapshot() {
decode_snapshot(&doc, mode, body)?;
Ok(doc)
} else {
Err(LoroError::DecodeError(
"Invalid encode mode".to_string().into(),
)),
))
}
}
@ -244,7 +244,7 @@ impl LoroDoc {
/// Commit the cumulative auto commit transaction.
/// This method only has effect when `auto_commit` is true.
/// If `immediate_renew` is true, a new transaction will be created after the old one is commited
/// If `immediate_renew` is true, a new transaction will be created after the old one is committed
pub fn commit_with(
&self,
origin: Option<InternalString>,
@ -368,13 +368,6 @@ impl LoroDoc {
self.import_with(bytes, Default::default())
}
#[inline]
pub fn import_without_state(&mut self, bytes: &[u8]) -> Result<(), LoroError> {
self.commit_then_stop();
self.detach();
self.import(bytes)
}
#[inline]
pub fn import_with(&self, bytes: &[u8], origin: InternalString) -> Result<(), LoroError> {
self.commit_then_stop();
@ -388,56 +381,122 @@ impl LoroDoc {
bytes: &[u8],
origin: string_cache::Atom<string_cache::EmptyStaticAtomSet>,
) -> Result<(), LoroError> {
let (input, mode) = parse_encode_header(bytes)?;
match mode {
EncodeMode::Updates | EncodeMode::RleUpdates | EncodeMode::CompressedRleUpdates => {
let parsed = parse_header_and_body(bytes)?;
match parsed.mode.is_snapshot() {
false => {
// TODO: need to throw error if state is in transaction
debug_log::group!("import to {}", self.peer_id());
let mut oplog = self.oplog.lock().unwrap();
let old_vv = oplog.vv().clone();
let old_frontiers = oplog.frontiers().clone();
oplog.decode(bytes)?;
if !self.detached.load(Acquire) {
let mut diff = DiffCalculator::default();
let diff = diff.calc_diff_internal(
&oplog,
&old_vv,
Some(&old_frontiers),
oplog.vv(),
Some(oplog.dag.get_frontiers()),
);
let mut state = self.state.lock().unwrap();
state.apply_diff(InternalDocDiff {
origin,
local: false,
diff: (diff).into(),
from_checkout: false,
new_version: Cow::Owned(oplog.frontiers().clone()),
});
}
debug_log::group!("Import updates to {}", self.peer_id());
self.update_oplog_and_apply_delta_to_state_if_needed(
|oplog| oplog.decode(parsed),
origin,
)?;
debug_log::group_end!();
}
EncodeMode::Snapshot => {
true => {
debug_log::group!("Import snapshot to {}", self.peer_id());
if self.can_reset_with_snapshot() {
decode_app_snapshot(self, input, !self.detached.load(Acquire))?;
debug_log::debug_log!("Init by snapshot");
decode_snapshot(self, parsed.mode, parsed.body)?;
} else if parsed.mode == EncodeMode::Snapshot {
debug_log::debug_log!("Import by updates");
self.update_oplog_and_apply_delta_to_state_if_needed(
|oplog| oplog.decode(parsed),
origin,
)?;
} else {
debug_log::debug_log!("Import from new doc");
let app = LoroDoc::new();
decode_app_snapshot(&app, input, false)?;
decode_snapshot(&app, parsed.mode, parsed.body)?;
let oplog = self.oplog.lock().unwrap();
// TODO: PERF: the ser and de can be optimized out
let updates = app.export_from(oplog.vv());
drop(oplog);
debug_log::group_end!();
return self.import_with(&updates, origin);
}
debug_log::group_end!();
}
EncodeMode::Auto => unreachable!(),
};
let mut state = self.state.lock().unwrap();
self.emit_events(&mut state);
Ok(())
}
pub(crate) fn update_oplog_and_apply_delta_to_state_if_needed(
&self,
f: impl FnOnce(&mut OpLog) -> Result<(), LoroError>,
origin: InternalString,
) -> Result<(), LoroError> {
let mut oplog = self.oplog.lock().unwrap();
let old_vv = oplog.vv().clone();
let old_frontiers = oplog.frontiers().clone();
f(&mut oplog)?;
if !self.detached.load(Acquire) {
let mut diff = DiffCalculator::default();
let diff = diff.calc_diff_internal(
&oplog,
&old_vv,
Some(&old_frontiers),
oplog.vv(),
Some(oplog.dag.get_frontiers()),
);
let mut state = self.state.lock().unwrap();
state.apply_diff(InternalDocDiff {
origin,
local: false,
diff: (diff).into(),
from_checkout: false,
new_version: Cow::Owned(oplog.frontiers().clone()),
});
}
Ok(())
}
/// For fuzzing tests
#[cfg(feature = "test_utils")]
pub fn import_delta_updates_unchecked(&self, body: &[u8]) -> LoroResult<()> {
self.commit_then_stop();
let mut oplog = self.oplog.lock().unwrap();
let old_vv = oplog.vv().clone();
let old_frontiers = oplog.frontiers().clone();
let ans = oplog.decode(ParsedHeaderAndBody {
checksum: [0; 16],
checksum_body: body,
mode: EncodeMode::Rle,
body,
});
if ans.is_ok() && !self.detached.load(Acquire) {
let mut diff = DiffCalculator::default();
let diff = diff.calc_diff_internal(
&oplog,
&old_vv,
Some(&old_frontiers),
oplog.vv(),
Some(oplog.dag.get_frontiers()),
);
let mut state = self.state.lock().unwrap();
state.apply_diff(InternalDocDiff {
origin: "".into(),
local: false,
diff: (diff).into(),
from_checkout: false,
new_version: Cow::Owned(oplog.frontiers().clone()),
});
}
self.renew_txn_if_auto_commit();
ans
}
/// For fuzzing tests
#[cfg(feature = "test_utils")]
pub fn import_snapshot_unchecked(&self, bytes: &[u8]) -> LoroResult<()> {
self.commit_then_stop();
let ans = decode_snapshot(self, EncodeMode::Snapshot, bytes);
self.renew_txn_if_auto_commit();
ans
}
fn emit_events(&self, state: &mut DocState) {
let events = state.take_events();
for event in events {
@ -447,14 +506,7 @@ impl LoroDoc {
pub fn export_snapshot(&self) -> Vec<u8> {
self.commit_then_stop();
debug_log::group!("export snapshot");
let version = ENCODE_SCHEMA_VERSION;
let mut ans = Vec::from(MAGIC_BYTES);
// maybe u8 is enough
ans.push(version);
ans.push((EncodeMode::Snapshot).to_byte());
ans.extend(encode_app_snapshot(self));
debug_log::group_end!();
let ans = export_snapshot(self);
self.renew_txn_if_auto_commit();
ans
}
@ -681,23 +733,6 @@ impl LoroDoc {
}
}
fn parse_encode_header(bytes: &[u8]) -> Result<(&[u8], EncodeMode), LoroError> {
if bytes.len() <= 6 {
return Err(LoroError::DecodeError("Invalid import data".into()));
}
let (magic_bytes, input) = bytes.split_at(4);
let magic_bytes: [u8; 4] = magic_bytes.try_into().unwrap();
if magic_bytes != MAGIC_BYTES {
return Err(LoroError::DecodeError("Invalid header bytes".into()));
}
let (version, input) = input.split_at(1);
if version != [ENCODE_SCHEMA_VERSION] {
return Err(LoroError::DecodeError("Invalid version".into()));
}
let mode: EncodeMode = input[0].try_into()?;
Ok((&input[1..], mode))
}
#[cfg(test)]
mod test {
use loro_common::ID;

View file

@ -81,14 +81,18 @@ macro_rules! array_mut_ref {
}};
}
#[test]
fn test_macro() {
let mut arr = vec![100, 101, 102, 103];
let (a, b, _c) = array_mut_ref!(&mut arr, [1, 2, 3]);
assert_eq!(*a, 101);
assert_eq!(*b, 102);
*a = 50;
*b = 51;
assert!(arr[1] == 50);
assert!(arr[2] == 51);
#[cfg(test)]
mod test {
#[test]
fn test_macro() {
let mut arr = vec![100, 101, 102, 103];
let (a, b, _c) = array_mut_ref!(&mut arr, [1, 2, 3]);
assert_eq!(*a, 101);
assert_eq!(*b, 102);
*a = 50;
*b = 51;
assert!(arr[1] == 50);
assert!(arr[2] == 51);
}
}

View file

@ -6,9 +6,10 @@ use crate::{
};
use crate::{delta::DeltaValue, LoroValue};
use enum_as_inner::EnumAsInner;
use loro_common::IdSpan;
use rle::{HasIndex, HasLength, Mergable, Sliceable};
use serde::{ser::SerializeSeq, Deserialize, Serialize};
use smallvec::{smallvec, SmallVec};
use smallvec::SmallVec;
use std::{borrow::Cow, ops::Range};
mod content;
@ -24,6 +25,30 @@ pub struct Op {
pub(crate) content: InnerContent,
}
#[derive(Debug, Clone)]
pub(crate) struct OpWithId {
pub peer: PeerID,
pub op: Op,
}
impl OpWithId {
pub fn id(&self) -> ID {
ID {
peer: self.peer,
counter: self.op.counter,
}
}
#[allow(unused)]
pub fn id_span(&self) -> IdSpan {
IdSpan::new(
self.peer,
self.op.counter,
self.op.counter + self.op.atom_len() as Counter,
)
}
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct RemoteOp<'a> {
pub(crate) counter: Counter,
@ -63,6 +88,7 @@ pub struct OwnedRichOp {
impl Op {
#[inline]
#[allow(unused)]
pub(crate) fn new(id: ID, content: InnerContent, container: ContainerIdx) -> Self {
Op {
counter: id.counter,
@ -103,7 +129,7 @@ impl HasLength for Op {
impl Sliceable for Op {
fn slice(&self, from: usize, to: usize) -> Self {
assert!(to > from);
assert!(to > from, "{to} should be greater than {from}");
let content: InnerContent = self.content.slice(from, to);
Op {
counter: (self.counter + from as Counter),
@ -264,6 +290,14 @@ impl<'a> RichOp<'a> {
pub fn end(&self) -> usize {
self.end
}
#[allow(unused)]
pub(crate) fn id(&self) -> ID {
ID {
peer: self.peer,
counter: self.op.counter + self.start as Counter,
}
}
}
impl OwnedRichOp {
@ -314,6 +348,11 @@ impl SliceRange {
Self(UNKNOWN_START..UNKNOWN_START + size)
}
#[inline(always)]
pub fn new(range: Range<u32>) -> Self {
Self(range)
}
#[inline(always)]
pub fn to_range(&self) -> Range<usize> {
self.0.start as usize..self.0.end as usize
@ -420,54 +459,65 @@ impl<'a> Mergable for ListSlice<'a> {
}
#[derive(Debug, Clone)]
pub struct SliceRanges(pub SmallVec<[SliceRange; 2]>);
pub struct SliceRanges {
pub ranges: SmallVec<[SliceRange; 2]>,
pub id: ID,
}
impl Serialize for SliceRanges {
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: serde::Serializer,
{
let mut s = serializer.serialize_seq(Some(self.0.len()))?;
for item in self.0.iter() {
let mut s = serializer.serialize_seq(Some(self.ranges.len()))?;
for item in self.ranges.iter() {
s.serialize_element(item)?;
}
s.end()
}
}
impl From<SliceRange> for SliceRanges {
fn from(value: SliceRange) -> Self {
Self(smallvec![value])
}
}
impl DeltaValue for SliceRanges {
fn value_extend(&mut self, other: Self) -> Result<(), Self> {
self.0.extend(other.0);
if self.id.peer != other.id.peer {
return Err(other);
}
if self.id.counter + self.length() as Counter != other.id.counter {
return Err(other);
}
self.ranges.extend(other.ranges);
Ok(())
}
// FIXME: this seems wrong
fn take(&mut self, target_len: usize) -> Self {
let mut ret = SmallVec::new();
let mut right = Self {
ranges: Default::default(),
id: self.id.inc(target_len as i32),
};
let mut cur_len = 0;
while cur_len < target_len {
let range = self.0.pop().unwrap();
let range = self.ranges.pop().unwrap();
let range_len = range.content_len();
if cur_len + range_len <= target_len {
ret.push(range);
right.ranges.push(range);
cur_len += range_len;
} else {
let new_range = range.slice(0, target_len - cur_len);
ret.push(new_range);
self.0.push(range.slice(target_len - cur_len, range_len));
let new_range = range.slice(target_len - cur_len, range_len);
right.ranges.push(new_range);
self.ranges.push(range.slice(0, target_len - cur_len));
cur_len = target_len;
}
}
SliceRanges(ret)
std::mem::swap(self, &mut right);
right // now it's left
}
fn length(&self) -> usize {
self.0.iter().fold(0, |acc, x| acc + x.atom_len())
self.ranges.iter().fold(0, |acc, x| acc + x.atom_len())
}
}

View file

@ -6,7 +6,7 @@ use serde::{Deserialize, Serialize};
use crate::container::{
list::list_op::{InnerListOp, ListOp},
map::{InnerMapSet, MapSet},
map::MapSet,
tree::tree_op::TreeOp,
};
@ -24,7 +24,7 @@ pub enum ContentType {
#[derive(EnumAsInner, Debug, Clone)]
pub enum InnerContent {
List(InnerListOp),
Map(InnerMapSet),
Map(MapSet),
Tree(TreeOp),
}

View file

@ -9,16 +9,17 @@ use std::rc::Rc;
use std::sync::Mutex;
use fxhash::FxHashMap;
use loro_common::{HasCounter, HasId};
use rle::{HasLength, RleCollection, RlePush, RleVec, Sliceable};
use smallvec::SmallVec;
// use tabled::measurment::Percent;
use crate::change::{Change, Lamport, Timestamp};
use crate::container::list::list_op;
use crate::dag::DagUtils;
use crate::dag::{Dag, DagUtils};
use crate::diff_calc::tree::MoveLamportAndID;
use crate::diff_calc::TreeDiffCache;
use crate::encoding::RemoteClientChanges;
use crate::encoding::ParsedHeaderAndBody;
use crate::encoding::{decode_oplog, encode_oplog, EncodeMode};
use crate::id::{Counter, PeerID, ID};
use crate::op::{ListSlice, RawOpContent, RemoteOp};
@ -72,6 +73,8 @@ pub struct AppDagNode {
pub(crate) lamport: Lamport,
pub(crate) deps: Frontiers,
pub(crate) vv: ImVersionVector,
/// A flag indicating whether any other nodes depend on this node.
/// The calculation of frontiers is based on this value.
pub(crate) has_succ: bool,
pub(crate) len: usize,
}
@ -207,7 +210,13 @@ impl OpLog {
}
/// This is the only place to update the `OpLog.changes`
pub(crate) fn insert_new_change(&mut self, mut change: Change, _: EnsureChangeDepsAreAtTheEnd) {
pub(crate) fn insert_new_change(
&mut self,
mut change: Change,
_: EnsureChangeDepsAreAtTheEnd,
local: bool,
) {
self.update_tree_cache(&change, local);
let entry = self.changes.entry(change.id.peer).or_default();
match entry.last_mut() {
Some(last) => {
@ -217,6 +226,7 @@ impl OpLog {
"change id is not continuous"
);
let timestamp_change = change.timestamp - last.timestamp;
// TODO: make this a config
if !last.has_dependents && change.deps_on_self() && timestamp_change < 1000 {
for op in take(change.ops.vec_mut()) {
last.ops.push(op);
@ -242,7 +252,7 @@ impl OpLog {
///
/// - Return Err(LoroError::UsedOpID) when the change's id is occupied
/// - Return Err(LoroError::DecodeError) when the change's deps are missing
pub fn import_local_change(&mut self, change: Change, from_txn: bool) -> Result<(), LoroError> {
pub fn import_local_change(&mut self, change: Change, local: bool) -> Result<(), LoroError> {
let Some(change) = self.trim_the_known_part_of_change(change) else {
return Ok(());
};
@ -268,8 +278,12 @@ impl OpLog {
self.dag.frontiers.retain_non_included(&change.deps);
self.dag.frontiers.filter_peer(change.id.peer);
self.dag.frontiers.push(change.id_last());
let mark = self.insert_dag_node_on_new_change(&change);
let mark = self.update_dag_on_new_change(&change);
self.insert_new_change(change, mark, local);
Ok(())
}
fn update_tree_cache(&mut self, change: &Change, local: bool) {
// Update tree cache
let mut tree_cache = self.tree_parent_cache.lock().unwrap();
for op in change.ops().iter() {
@ -285,21 +299,18 @@ impl OpLog {
parent: tree.parent,
effected: true,
};
if from_txn {
tree_cache.add_node_uncheck(node);
if local {
tree_cache.add_node_from_local(node);
} else {
tree_cache.add_node(node);
}
}
}
drop(tree_cache);
self.insert_new_change(change, mark);
Ok(())
}
/// Every time we import a new change, it should run this function to update the dag
pub(crate) fn insert_dag_node_on_new_change(
pub(crate) fn update_dag_on_new_change(
&mut self,
change: &Change,
) -> EnsureChangeDepsAreAtTheEnd {
@ -319,7 +330,7 @@ impl OpLog {
change.lamport,
"lamport is not continuous"
);
last.len = change.id.counter as usize + len - last.cnt as usize;
last.len = (change.id.counter - last.cnt) as usize + len;
last.has_succ = false;
} else {
let vv = self.dag.frontiers_to_im_vv(&change.deps);
@ -449,41 +460,6 @@ impl OpLog {
self.dag.cmp_frontiers(other)
}
pub(crate) fn export_changes_from(&self, from: &VersionVector) -> RemoteClientChanges {
let mut changes = RemoteClientChanges::default();
for (&peer, &cnt) in self.vv().iter() {
let start_cnt = from.get(&peer).copied().unwrap_or(0);
if cnt <= start_cnt {
continue;
}
let mut temp = Vec::new();
if let Some(peer_changes) = self.changes.get(&peer) {
if let Some(result) = peer_changes.get_by_atom_index(start_cnt) {
for change in &peer_changes[result.merged_index..] {
if change.id.counter < start_cnt {
if change.id.counter + change.atom_len() as Counter <= start_cnt {
continue;
}
let sliced = change
.slice((start_cnt - change.id.counter) as usize, change.atom_len());
temp.push(self.convert_change_to_remote(&sliced));
} else {
temp.push(self.convert_change_to_remote(change));
}
}
}
}
if !temp.is_empty() {
changes.insert(peer, temp);
}
}
changes
}
pub(crate) fn get_min_lamport_at(&self, id: ID) -> Lamport {
self.get_change_at(id).map(|c| c.lamport).unwrap_or(0)
}
@ -602,7 +578,7 @@ impl OpLog {
}
},
crate::op::InnerContent::Map(map) => {
let value = map.value.and_then(|v| self.arena.get_value(v as usize));
let value = map.value.clone();
contents.push(RawOpContent::Map(crate::container::map::MapSet {
key: map.key.clone(),
value,
@ -622,35 +598,12 @@ impl OpLog {
ans
}
// Changes are expected to be sorted by counter in each value in the hashmap
// They should also be continuous (TODO: check this)
pub(crate) fn import_remote_changes(
pub(crate) fn import_unknown_lamport_pending_changes(
&mut self,
remote_changes: RemoteClientChanges,
) -> Result<(), LoroError> {
// check whether we can append the new changes
self.check_changes(&remote_changes)?;
let latest_vv = self.dag.vv.clone();
// op_converter is faster than using arena directly
let ids = self.arena.clone().with_op_converter(|converter| {
self.apply_appliable_changes_and_cache_pending(remote_changes, converter, latest_vv)
});
let mut latest_vv = self.dag.vv.clone();
self.try_apply_pending(ids, &mut latest_vv);
if !self.batch_importing {
self.dag.refresh_frontiers();
}
Ok(())
}
pub(crate) fn import_unknown_lamport_remote_changes(
&mut self,
remote_changes: Vec<Change<RemoteOp>>,
remote_changes: Vec<Change>,
) -> Result<(), LoroError> {
let latest_vv = self.dag.vv.clone();
self.arena.clone().with_op_converter(|converter| {
self.extend_pending_changes_with_unknown_lamport(remote_changes, converter, &latest_vv)
});
self.extend_pending_changes_with_unknown_lamport(remote_changes, &latest_vv);
Ok(())
}
@ -677,12 +630,12 @@ impl OpLog {
}
#[inline(always)]
pub fn export_from(&self, vv: &VersionVector) -> Vec<u8> {
pub(crate) fn export_from(&self, vv: &VersionVector) -> Vec<u8> {
encode_oplog(self, vv, EncodeMode::Auto)
}
#[inline(always)]
pub fn decode(&mut self, data: &[u8]) -> Result<(), LoroError> {
pub(crate) fn decode(&mut self, data: ParsedHeaderAndBody) -> Result<(), LoroError> {
decode_oplog(self, data)
}
@ -807,52 +760,7 @@ impl OpLog {
)
}
pub(crate) fn iter_causally(
&self,
from: VersionVector,
to: VersionVector,
) -> impl Iterator<Item = (&Change, Rc<RefCell<VersionVector>>)> {
let from_frontiers = from.to_frontiers(&self.dag);
let diff = from.diff(&to).right;
let mut iter = self.dag.iter_causal(&from_frontiers, diff);
let mut node = iter.next();
let mut cur_cnt = 0;
let vv = Rc::new(RefCell::new(VersionVector::default()));
std::iter::from_fn(move || {
if let Some(inner) = &node {
let mut inner_vv = vv.borrow_mut();
inner_vv.clear();
inner_vv.extend_to_include_vv(inner.data.vv.iter());
let peer = inner.data.peer;
let cnt = inner
.data
.cnt
.max(cur_cnt)
.max(from.get(&peer).copied().unwrap_or(0));
let end = (inner.data.cnt + inner.data.len as Counter)
.min(to.get(&peer).copied().unwrap_or(0));
let change = self
.changes
.get(&peer)
.and_then(|x| x.get_by_atom_index(cnt).map(|x| x.element))
.unwrap();
if change.ctr_end() < end {
cur_cnt = change.ctr_end();
} else {
node = iter.next();
cur_cnt = 0;
}
inner_vv.extend_to_include_end_id(change.id);
Some((change, vv.clone()))
} else {
None
}
})
}
pub(crate) fn len_changes(&self) -> usize {
pub fn len_changes(&self) -> usize {
self.changes.values().map(|x| x.len()).sum()
}
@ -880,6 +788,33 @@ impl OpLog {
total_dag_node,
}
}
#[allow(unused)]
pub(crate) fn debug_check(&self) {
for (_, changes) in self.changes().iter() {
let c = changes.last().unwrap();
let node = self.dag.get(c.id_start()).unwrap();
assert_eq!(c.id_end(), node.id_end());
}
}
pub(crate) fn iter_changes<'a>(
&'a self,
from: &VersionVector,
to: &VersionVector,
) -> impl Iterator<Item = &'a Change> + 'a {
let spans: Vec<_> = from.diff_iter(to).1.collect();
spans.into_iter().flat_map(move |span| {
let peer = span.client_id;
let cnt = span.counter.start;
let end_cnt = span.counter.end;
let peer_changes = self.changes.get(&peer).unwrap();
let index = peer_changes.search_atom_index(cnt);
peer_changes[index..]
.iter()
.take_while(move |x| x.ctr_start() < end_cnt)
})
}
}
#[derive(Debug)]

View file

@ -131,11 +131,27 @@ impl AppDag {
pub fn get_lamport(&self, id: &ID) -> Option<Lamport> {
self.map.get(&id.peer).and_then(|rle| {
rle.get_by_atom_index(id.counter)
.map(|x| x.element.lamport + (id.counter - x.element.cnt) as Lamport)
rle.get_by_atom_index(id.counter).and_then(|node| {
assert!(id.counter >= node.element.cnt);
if node.element.cnt + node.element.len as Counter > id.counter {
Some(node.element.lamport + (id.counter - node.element.cnt) as Lamport)
} else {
None
}
})
})
}
pub fn get_change_lamport_from_deps(&self, deps: &[ID]) -> Option<Lamport> {
let mut lamport = 0;
for id in deps.iter() {
let x = self.get_lamport(id)?;
lamport = lamport.max(x + 1);
}
Some(lamport)
}
/// Convert a frontiers to a version vector
///
/// If the frontiers version is not found in the dag, return None

View file

@ -1,15 +1,8 @@
use std::{collections::BTreeMap, ops::Deref};
use crate::{
arena::OpConverter, change::Change, encoding::RemoteClientChanges, op::RemoteOp, OpLog,
VersionVector,
};
use crate::{change::Change, OpLog, VersionVector};
use fxhash::FxHashMap;
use itertools::Itertools;
use loro_common::{
Counter, CounterSpan, HasCounterSpan, HasIdSpan, HasLamportSpan, LoroError, PeerID, ID,
};
use rle::RleVec;
use loro_common::{Counter, CounterSpan, HasCounterSpan, HasIdSpan, HasLamportSpan, PeerID, ID};
use smallvec::SmallVec;
#[derive(Debug)]
@ -17,6 +10,8 @@ pub enum PendingChange {
// The lamport of the change decoded by `enhanced` is unknown.
// we need calculate it when the change can be applied
Unknown(Change),
// TODO: Refactor, remove this?
#[allow(unused)]
Known(Change),
}
@ -35,54 +30,22 @@ pub(crate) struct PendingChanges {
changes: FxHashMap<PeerID, BTreeMap<Counter, SmallVec<[PendingChange; 1]>>>,
}
impl OpLog {
// calculate all `id_last`(s) whose change can be applied
pub(super) fn apply_appliable_changes_and_cache_pending(
&mut self,
remote_changes: RemoteClientChanges,
converter: &mut OpConverter,
mut latest_vv: VersionVector,
) -> Vec<ID> {
let mut ans = Vec::new();
for change in remote_changes
.into_values()
.filter(|c| !c.is_empty())
.flat_map(|c| c.into_iter())
.sorted_unstable_by_key(|c| c.lamport)
{
let local_change = to_local_op(change, converter);
let local_change = PendingChange::Known(local_change);
match remote_change_apply_state(&latest_vv, &local_change) {
ChangeApplyState::CanApplyDirectly => {
latest_vv.set_end(local_change.id_end());
ans.push(local_change.id_last());
self.apply_local_change_from_remote(local_change);
}
ChangeApplyState::Applied => {}
ChangeApplyState::AwaitingDependency(miss_dep) => self
.pending_changes
.changes
.entry(miss_dep.peer)
.or_default()
.entry(miss_dep.counter)
.or_default()
.push(local_change),
}
}
ans
impl PendingChanges {
pub fn is_empty(&self) -> bool {
self.changes.is_empty()
}
}
impl OpLog {
pub(super) fn extend_pending_changes_with_unknown_lamport(
&mut self,
remote_changes: Vec<Change<RemoteOp>>,
converter: &mut OpConverter,
remote_changes: Vec<Change>,
latest_vv: &VersionVector,
) {
for change in remote_changes {
let local_change = to_local_op(change, converter);
let local_change = PendingChange::Unknown(local_change);
let local_change = PendingChange::Unknown(change);
match remote_change_apply_state(latest_vv, &local_change) {
ChangeApplyState::AwaitingDependency(miss_dep) => self
ChangeState::AwaitingMissingDependency(miss_dep) => self
.pending_changes
.changes
.entry(miss_dep.peer)
@ -96,42 +59,20 @@ impl OpLog {
}
}
/// This struct indicates that the dag frontiers should be updated after the change is applied.
#[must_use]
pub(crate) struct ShouldUpdateDagFrontiers {
pub(crate) should_update: bool,
}
impl OpLog {
pub(super) fn check_changes(&self, changes: &RemoteClientChanges) -> Result<(), LoroError> {
for changes in changes.values() {
if changes.is_empty() {
continue;
}
// detect invalid id
let mut last_end_counter = None;
for change in changes.iter() {
if change.id.counter < 0 {
return Err(LoroError::DecodeError(
"Invalid data. Negative id counter.".into(),
));
}
if let Some(last_end_counter) = &mut last_end_counter {
if change.id.counter != *last_end_counter {
return Err(LoroError::DecodeError(
"Invalid data. Not continuous counter.".into(),
));
}
*last_end_counter = change.id_end().counter;
} else {
last_end_counter = Some(change.id_end().counter);
}
}
}
Ok(())
}
pub(crate) fn try_apply_pending(
&mut self,
mut id_stack: Vec<ID>,
latest_vv: &mut VersionVector,
) {
while let Some(id) = id_stack.pop() {
/// Try to apply pending changes.
///
/// `new_ids` are the ID of the op that is just applied.
pub(crate) fn try_apply_pending(&mut self, mut new_ids: Vec<ID>) -> ShouldUpdateDagFrontiers {
let mut latest_vv = self.dag.vv.clone();
let mut updated = false;
while let Some(id) = new_ids.pop() {
let Some(tree) = self.pending_changes.changes.get_mut(&id.peer) else {
continue;
};
@ -152,14 +93,15 @@ impl OpLog {
for pending_changes in pending_set {
for pending_change in pending_changes {
match remote_change_apply_state(latest_vv, &pending_change) {
ChangeApplyState::CanApplyDirectly => {
id_stack.push(pending_change.id_last());
match remote_change_apply_state(&latest_vv, &pending_change) {
ChangeState::CanApplyDirectly => {
new_ids.push(pending_change.id_last());
latest_vv.set_end(pending_change.id_end());
self.apply_local_change_from_remote(pending_change);
updated = true;
}
ChangeApplyState::Applied => {}
ChangeApplyState::AwaitingDependency(miss_dep) => self
ChangeState::Applied => {}
ChangeState::AwaitingMissingDependency(miss_dep) => self
.pending_changes
.changes
.entry(miss_dep.peer)
@ -171,6 +113,10 @@ impl OpLog {
}
}
}
ShouldUpdateDagFrontiers {
should_update: updated,
}
}
pub(super) fn apply_local_change_from_remote(&mut self, change: PendingChange) {
@ -192,59 +138,35 @@ impl OpLog {
// debug_dbg!(&change_causal_arr);
self.dag.vv.extend_to_include_last_id(change.id_last());
self.latest_timestamp = self.latest_timestamp.max(change.timestamp);
let mark = self.insert_dag_node_on_new_change(&change);
self.insert_new_change(change, mark);
let mark = self.update_dag_on_new_change(&change);
self.insert_new_change(change, mark, false);
}
}
pub(super) fn to_local_op(change: Change<RemoteOp>, converter: &mut OpConverter) -> Change {
let mut ops = RleVec::new();
for op in change.ops {
let lamport = change.lamport;
let content = op.content;
let op = converter.convert_single_op(
&op.container,
change.id.peer,
op.counter,
lamport,
content,
);
ops.push(op);
}
Change {
ops,
id: change.id,
deps: change.deps,
lamport: change.lamport,
timestamp: change.timestamp,
has_dependents: false,
}
}
enum ChangeApplyState {
enum ChangeState {
Applied,
CanApplyDirectly,
// The id of first missing dep
AwaitingDependency(ID),
AwaitingMissingDependency(ID),
}
fn remote_change_apply_state(vv: &VersionVector, change: &Change) -> ChangeApplyState {
fn remote_change_apply_state(vv: &VersionVector, change: &Change) -> ChangeState {
let peer = change.id.peer;
let CounterSpan { start, end } = change.ctr_span();
let vv_latest_ctr = vv.get(&peer).copied().unwrap_or(0);
if vv_latest_ctr < start {
return ChangeApplyState::AwaitingDependency(change.id.inc(-1));
return ChangeState::AwaitingMissingDependency(change.id.inc(-1));
}
if vv_latest_ctr >= end {
return ChangeApplyState::Applied;
return ChangeState::Applied;
}
for dep in change.deps.as_ref().iter() {
let dep_vv_latest_ctr = vv.get(&dep.peer).copied().unwrap_or(0);
if dep_vv_latest_ctr - 1 < dep.counter {
return ChangeApplyState::AwaitingDependency(*dep);
return ChangeState::AwaitingMissingDependency(*dep);
}
}
ChangeApplyState::CanApplyDirectly
ChangeState::CanApplyDirectly
}
#[cfg(test)]

View file

@ -6,11 +6,12 @@ use std::{
use enum_as_inner::EnumAsInner;
use enum_dispatch::enum_dispatch;
use fxhash::{FxHashMap, FxHashSet};
use loro_common::{ContainerID, LoroResult};
use loro_common::{ContainerID, LoroError, LoroResult};
use crate::{
configure::{DefaultRandom, SecureRandomGenerator},
container::{idx::ContainerIdx, ContainerIdRaw},
encoding::{StateSnapshotDecodeContext, StateSnapshotEncoder},
event::Index,
event::{Diff, InternalContainerDiff, InternalDiff},
fx_map,
@ -55,6 +56,10 @@ pub struct DocState {
#[enum_dispatch]
pub(crate) trait ContainerState: Clone {
fn container_idx(&self) -> ContainerIdx;
fn is_state_empty(&self) -> bool;
fn apply_diff_and_convert(
&mut self,
diff: InternalDiff,
@ -103,6 +108,16 @@ pub(crate) trait ContainerState: Clone {
fn get_child_containers(&self) -> Vec<ContainerID> {
Vec::new()
}
/// Encode the ops and the blob that can be used to restore the state to the current state.
///
/// State will use the provided encoder to encode the ops and export a blob.
/// The ops should be encoded into the snapshot as well as the blob.
/// The users then can use the ops and the blob to restore the state to the current state.
fn encode_snapshot(&self, encoder: StateSnapshotEncoder) -> Vec<u8>;
/// Restore the state to the state represented by the ops and the blob that exported by `get_snapshot_ops`
fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext);
}
#[allow(clippy::enum_variant_names)]
@ -399,6 +414,24 @@ impl DocState {
self.in_txn = true;
}
pub fn iter(&self) -> impl Iterator<Item = &State> {
self.states.values()
}
pub fn iter_mut(&mut self) -> impl Iterator<Item = &mut State> {
self.states.values_mut()
}
pub(crate) fn init_container(
&mut self,
cid: ContainerID,
decode_ctx: StateSnapshotDecodeContext,
) {
let idx = self.arena.register_container(&cid);
let state = self.states.entry(idx).or_insert_with(|| create_state(idx));
state.import_from_snapshot_ops(decode_ctx);
}
#[inline]
pub(crate) fn abort_txn(&mut self) {
for container_idx in std::mem::take(&mut self.changed_idx_in_txn) {
@ -798,6 +831,24 @@ impl DocState {
debug_log::group_end!();
Some(ans)
}
pub(crate) fn check_before_decode_snapshot(&self) -> LoroResult<()> {
if self.is_in_txn() {
return Err(LoroError::DecodeError(
"State is in txn".to_string().into_boxed_str(),
));
}
if !self.is_empty() {
return Err(LoroError::DecodeError(
"State is not empty, cannot import snapshot directly"
.to_string()
.into_boxed_str(),
));
}
Ok(())
}
}
struct SubContainerDiffPatch {
@ -896,7 +947,7 @@ pub fn create_state(idx: ContainerIdx) -> State {
ContainerType::Map => State::MapState(MapState::new(idx)),
ContainerType::List => State::ListState(ListState::new(idx)),
ContainerType::Text => State::RichtextState(RichtextState::new(idx)),
ContainerType::Tree => State::TreeState(TreeState::new()),
ContainerType::Tree => State::TreeState(TreeState::new(idx)),
}
}

View file

@ -8,6 +8,7 @@ use crate::{
arena::SharedArena,
container::{idx::ContainerIdx, ContainerID},
delta::Delta,
encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder},
event::{Diff, Index, InternalDiff},
handler::ValueOrContainer,
op::{ListSlice, Op, RawOp, RawOpContent},
@ -21,7 +22,7 @@ use generic_btree::{
rle::{HasLength, Mergeable, Sliceable},
BTree, BTreeTrait, Cursor, LeafIndex, LengthFinder, UseLengthFinder,
};
use loro_common::LoroResult;
use loro_common::{IdSpan, LoroResult, ID};
#[derive(Debug)]
pub struct ListState {
@ -46,13 +47,21 @@ impl Clone for ListState {
#[derive(Debug)]
enum UndoItem {
Insert { index: usize, len: usize },
Delete { index: usize, value: LoroValue },
Insert {
index: usize,
len: usize,
},
Delete {
index: usize,
value: LoroValue,
id: ID,
},
}
#[derive(Debug, Clone)]
struct Elem {
v: LoroValue,
pub(crate) struct Elem {
pub v: LoroValue,
pub id: ID,
}
impl HasLength for Elem {
@ -171,9 +180,16 @@ impl ListState {
Some(index as usize)
}
pub fn insert(&mut self, index: usize, value: LoroValue) {
pub fn insert(&mut self, index: usize, value: LoroValue, id: ID) {
if index > self.len() {
panic!("Index {index} out of range. The length is {}", self.len());
}
if self.list.is_empty() {
let idx = self.list.push(Elem { v: value.clone() });
let idx = self.list.push(Elem {
v: value.clone(),
id,
});
if value.is_container() {
self.child_container_to_leaf
@ -182,9 +198,13 @@ impl ListState {
return;
}
let (leaf, data) = self
.list
.insert::<LengthFinder>(&index, Elem { v: value.clone() });
let (leaf, data) = self.list.insert::<LengthFinder>(
&index,
Elem {
v: value.clone(),
id,
},
);
if value.is_container() {
self.child_container_to_leaf
@ -210,7 +230,11 @@ impl ListState {
let elem = self.list.remove_leaf(leaf.unwrap().cursor).unwrap();
let value = elem.v;
if self.in_txn {
self.undo_stack.push(UndoItem::Delete { index, value });
self.undo_stack.push(UndoItem::Delete {
index,
value,
id: elem.id,
});
}
}
@ -239,6 +263,7 @@ impl ListState {
self.undo_stack.push(UndoItem::Delete {
index: start,
value: elem.v,
id: elem.id,
})
}
} else {
@ -252,9 +277,11 @@ impl ListState {
// PERF: use &[LoroValue]
// PERF: batch
pub fn insert_batch(&mut self, index: usize, values: Vec<LoroValue>) {
pub fn insert_batch(&mut self, index: usize, values: Vec<LoroValue>, start_id: ID) {
let mut id = start_id;
for (i, value) in values.into_iter().enumerate() {
self.insert(index + i, value);
self.insert(index + i, value, id);
id = id.inc(1);
}
}
@ -262,6 +289,11 @@ impl ListState {
self.list.iter().map(|x| &x.v)
}
#[allow(unused)]
pub(crate) fn iter_with_id(&self) -> impl Iterator<Item = &Elem> {
self.list.iter()
}
pub fn len(&self) -> usize {
*self.list.root_cache() as usize
}
@ -294,6 +326,14 @@ impl ListState {
}
impl ContainerState for ListState {
fn container_idx(&self) -> ContainerIdx {
self.idx
}
fn is_state_empty(&self) -> bool {
self.list.is_empty()
}
fn apply_diff_and_convert(
&mut self,
diff: InternalDiff,
@ -301,7 +341,7 @@ impl ContainerState for ListState {
txn: &Weak<Mutex<Option<Transaction>>>,
state: &Weak<Mutex<DocState>>,
) -> Diff {
let InternalDiff::SeqRaw(delta) = diff else {
let InternalDiff::ListRaw(delta) = diff else {
unreachable!()
};
let mut ans: Delta<_> = Delta::default();
@ -314,7 +354,7 @@ impl ContainerState for ListState {
}
crate::delta::DeltaItem::Insert { insert: value, .. } => {
let mut arr = Vec::new();
for slices in value.0.iter() {
for slices in value.ranges.iter() {
for i in slices.0.start..slices.0.end {
let value = arena.get_value(i as usize).unwrap();
if value.is_container() {
@ -331,7 +371,7 @@ impl ContainerState for ListState {
.collect::<Vec<_>>(),
);
let len = arr.len();
self.insert_batch(index, arr);
self.insert_batch(index, arr, value.id);
index += len;
}
crate::delta::DeltaItem::Delete { delete: len, .. } => {
@ -351,8 +391,9 @@ impl ContainerState for ListState {
_txn: &Weak<Mutex<Option<Transaction>>>,
_state: &Weak<Mutex<DocState>>,
) {
// debug_log::debug_dbg!(&diff);
match diff {
InternalDiff::SeqRaw(delta) => {
InternalDiff::ListRaw(delta) => {
let mut index = 0;
for span in delta.iter() {
match span {
@ -361,7 +402,7 @@ impl ContainerState for ListState {
}
crate::delta::DeltaItem::Insert { insert: value, .. } => {
let mut arr = Vec::new();
for slices in value.0.iter() {
for slices in value.ranges.iter() {
for i in slices.0.start..slices.0.end {
let value = arena.get_value(i as usize).unwrap();
if value.is_container() {
@ -374,7 +415,7 @@ impl ContainerState for ListState {
}
let len = arr.len();
self.insert_batch(index, arr);
self.insert_batch(index, arr, value.id);
index += len;
}
crate::delta::DeltaItem::Delete { delete: len, .. } => {
@ -402,7 +443,7 @@ impl ContainerState for ListState {
arena.set_parent(idx, Some(self.idx));
}
}
self.insert_batch(*pos, list.to_vec());
self.insert_batch(*pos, list.to_vec(), op.id);
}
std::borrow::Cow::Owned(list) => {
for value in list.iter() {
@ -412,7 +453,7 @@ impl ContainerState for ListState {
arena.set_parent(idx, Some(self.idx));
}
}
self.insert_batch(*pos, list.clone());
self.insert_batch(*pos, list.clone(), op.id);
}
},
_ => unreachable!(),
@ -424,39 +465,9 @@ impl ContainerState for ListState {
crate::container::list::list_op::ListOp::StyleEnd { .. } => unreachable!(),
},
}
debug_log::debug_dbg!(&self);
Ok(())
}
#[doc = " Start a transaction"]
#[doc = ""]
#[doc = " The transaction may be aborted later, then all the ops during this transaction need to be undone."]
fn start_txn(&mut self) {
self.in_txn = true;
}
fn abort_txn(&mut self) {
self.in_txn = false;
while let Some(op) = self.undo_stack.pop() {
match op {
UndoItem::Insert { index, len } => {
self.delete_range(index..index + len);
}
UndoItem::Delete { index, value } => self.insert(index, value),
}
}
}
fn commit_txn(&mut self) {
self.undo_stack.clear();
self.in_txn = false;
}
fn get_value(&mut self) -> LoroValue {
let ans = self.to_vec();
LoroValue::List(Arc::new(ans))
}
#[doc = " Convert a state to a diff that when apply this diff on a empty state,"]
#[doc = " the state will be the same as this state."]
fn to_diff(
@ -475,6 +486,35 @@ impl ContainerState for ListState {
)
}
#[doc = " Start a transaction"]
#[doc = ""]
#[doc = " The transaction may be aborted later, then all the ops during this transaction need to be undone."]
fn start_txn(&mut self) {
self.in_txn = true;
}
fn abort_txn(&mut self) {
self.in_txn = false;
while let Some(op) = self.undo_stack.pop() {
match op {
UndoItem::Insert { index, len } => {
self.delete_range(index..index + len);
}
UndoItem::Delete { index, value, id } => self.insert(index, value, id),
}
}
}
fn commit_txn(&mut self) {
self.undo_stack.clear();
self.in_txn = false;
}
fn get_value(&mut self) -> LoroValue {
let ans = self.to_vec();
LoroValue::List(Arc::new(ans))
}
fn get_child_index(&self, id: &ContainerID) -> Option<Index> {
self.get_child_container_index(id).map(Index::Seq)
}
@ -488,6 +528,32 @@ impl ContainerState for ListState {
}
ans
}
#[doc = "Get a list of ops that can be used to restore the state to the current state"]
fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec<u8> {
for elem in self.list.iter() {
let id_span: IdSpan = elem.id.into();
encoder.encode_op(id_span, || unimplemented!());
}
Vec::new()
}
#[doc = "Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"]
fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) {
assert_eq!(ctx.mode, EncodeMode::Snapshot);
let mut index = 0;
for op in ctx.ops {
let value = op.op.content.as_list().unwrap().as_insert().unwrap().0;
let list = ctx
.oplog
.arena
.get_values(value.0.start as usize..value.0.end as usize);
let len = list.len();
self.insert_batch(index, list, op.id());
index += len;
}
}
}
#[cfg(test)]
@ -503,11 +569,11 @@ mod test {
fn id(name: &str) -> ContainerID {
ContainerID::new_root(name, crate::ContainerType::List)
}
list.insert(0, LoroValue::Container(id("abc")));
list.insert(0, LoroValue::Container(id("x")));
list.insert(0, LoroValue::Container(id("abc")), ID::new(0, 0));
list.insert(0, LoroValue::Container(id("x")), ID::new(0, 0));
assert_eq!(list.get_child_container_index(&id("x")), Some(0));
assert_eq!(list.get_child_container_index(&id("abc")), Some(1));
list.insert(1, LoroValue::Bool(false));
list.insert(1, LoroValue::Bool(false), ID::new(0, 0));
assert_eq!(list.get_child_container_index(&id("x")), Some(0));
assert_eq!(list.get_child_container_index(&id("abc")), Some(2));
}

View file

@ -5,15 +5,18 @@ use std::{
use fxhash::FxHashMap;
use loro_common::{ContainerID, LoroResult};
use rle::HasLength;
use crate::{
arena::SharedArena,
container::{idx::ContainerIdx, map::MapSet},
delta::{MapValue, ResolvedMapDelta, ResolvedMapValue},
encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder},
event::{Diff, Index, InternalDiff},
handler::ValueOrContainer,
op::{Op, RawOp, RawOpContent},
txn::Transaction,
utils::delta_rle_encoded_num::DeltaRleEncodedNums,
DocState, InternalString, LoroValue,
};
@ -28,6 +31,14 @@ pub struct MapState {
}
impl ContainerState for MapState {
fn container_idx(&self) -> ContainerIdx {
self.idx
}
fn is_state_empty(&self) -> bool {
self.map.is_empty()
}
fn apply_diff_and_convert(
&mut self,
diff: InternalDiff,
@ -97,6 +108,24 @@ impl ContainerState for MapState {
}
}
#[doc = " Convert a state to a diff that when apply this diff on a empty state,"]
#[doc = " the state will be the same as this state."]
fn to_diff(
&mut self,
arena: &SharedArena,
txn: &Weak<Mutex<Option<Transaction>>>,
state: &Weak<Mutex<DocState>>,
) -> Diff {
Diff::Map(ResolvedMapDelta {
updated: self
.map
.clone()
.into_iter()
.map(|(k, v)| (k, ResolvedMapValue::from_map_value(v, arena, txn, state)))
.collect::<FxHashMap<_, _>>(),
})
}
fn start_txn(&mut self) {
self.in_txn = true;
}
@ -123,24 +152,6 @@ impl ContainerState for MapState {
LoroValue::Map(Arc::new(ans))
}
#[doc = " Convert a state to a diff that when apply this diff on a empty state,"]
#[doc = " the state will be the same as this state."]
fn to_diff(
&mut self,
arena: &SharedArena,
txn: &Weak<Mutex<Option<Transaction>>>,
state: &Weak<Mutex<DocState>>,
) -> Diff {
Diff::Map(ResolvedMapDelta {
updated: self
.map
.clone()
.into_iter()
.map(|(k, v)| (k, ResolvedMapValue::from_map_value(v, arena, txn, state)))
.collect::<FxHashMap<_, _>>(),
})
}
fn get_child_index(&self, id: &ContainerID) -> Option<Index> {
for (key, value) in self.map.iter() {
if let Some(LoroValue::Container(x)) = &value.value {
@ -162,6 +173,41 @@ impl ContainerState for MapState {
}
ans
}
#[doc = " Get a list of ops that can be used to restore the state to the current state"]
fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec<u8> {
let mut lamports = DeltaRleEncodedNums::new();
for v in self.map.values() {
lamports.push(v.lamport.0);
encoder.encode_op(v.id().into(), || unimplemented!());
}
lamports.encode()
}
#[doc = " Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"]
fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) {
assert_eq!(ctx.mode, EncodeMode::Snapshot);
let lamports = DeltaRleEncodedNums::decode(ctx.blob);
let mut iter = lamports.iter();
for op in ctx.ops {
debug_assert_eq!(
op.op.atom_len(),
1,
"MapState::from_snapshot_ops: op.atom_len() != 1"
);
let content = op.op.content.as_map().unwrap();
self.map.insert(
content.key.clone(),
MapValue {
counter: op.op.counter,
value: content.value.clone(),
lamport: (iter.next().unwrap(), op.peer),
},
);
}
}
}
impl MapState {

View file

@ -5,8 +5,7 @@ use std::{
use fxhash::FxHashMap;
use generic_btree::rle::{HasLength, Mergeable};
use loro_common::{Counter, LoroResult, LoroValue, PeerID, ID};
use loro_preload::{CommonArena, EncodedRichtextState, TempArena, TextRanges};
use loro_common::{LoroResult, LoroValue, ID};
use crate::{
arena::SharedArena,
@ -14,16 +13,19 @@ use crate::{
idx::ContainerIdx,
richtext::{
richtext_state::{EntityRangeInfo, PosType},
AnchorType, RichtextState as InnerState, StyleOp, Styles, TextStyleInfoFlag,
AnchorType, RichtextState as InnerState, StyleOp, Styles,
},
},
container::{list::list_op, richtext::richtext_state::RichtextStateChunk},
delta::{Delta, DeltaItem, StyleMeta},
encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder},
event::{Diff, InternalDiff},
op::{Op, RawOp},
txn::Transaction,
utils::{bitmap::BitMap, lazy::LazyLoad, string_slice::StringSlice},
DocState, InternalString,
utils::{
delta_rle_encoded_num::DeltaRleEncodedNums, lazy::LazyLoad, string_slice::StringSlice,
},
DocState,
};
use super::ContainerState;
@ -55,11 +57,12 @@ impl RichtextState {
self.state.get_mut().to_string()
}
#[allow(unused)]
#[inline(always)]
pub(crate) fn is_empty(&self) -> bool {
match &*self.state {
LazyLoad::Src(s) => s.elements.is_empty(),
LazyLoad::Dst(d) => d.is_emtpy(),
LazyLoad::Dst(d) => d.is_empty(),
}
}
@ -138,6 +141,17 @@ impl Mergeable for UndoItem {
}
impl ContainerState for RichtextState {
fn container_idx(&self) -> ContainerIdx {
self.idx
}
fn is_state_empty(&self) -> bool {
match &*self.state {
LazyLoad::Src(s) => s.is_empty(),
LazyLoad::Dst(s) => s.is_empty(),
}
}
// TODO: refactor
fn apply_diff_and_convert(
&mut self,
@ -349,9 +363,11 @@ impl ContainerState for RichtextState {
unicode_start: _,
pos,
} => {
self.state
.get_mut()
.insert_at_entity_index(*pos as usize, slice.clone());
self.state.get_mut().insert_at_entity_index(
*pos as usize,
slice.clone(),
r_op.id,
);
if self.in_txn {
self.push_undo(UndoItem::Insert {
@ -447,6 +463,84 @@ impl ContainerState for RichtextState {
fn get_value(&mut self) -> LoroValue {
LoroValue::String(Arc::new(self.state.get_mut().to_string()))
}
#[doc = " Get a list of ops that can be used to restore the state to the current state"]
fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec<u8> {
let iter: &mut dyn Iterator<Item = &RichtextStateChunk>;
let mut a;
let mut b;
match &*self.state {
LazyLoad::Src(s) => {
a = Some(s.elements.iter());
iter = &mut *a.as_mut().unwrap();
}
LazyLoad::Dst(s) => {
b = Some(s.iter_chunk());
iter = &mut *b.as_mut().unwrap();
}
}
debug_log::group!("encode_snapshot");
let mut lamports = DeltaRleEncodedNums::new();
for chunk in iter {
debug_log::debug_dbg!(&chunk);
match chunk {
RichtextStateChunk::Style { style, anchor_type }
if *anchor_type == AnchorType::Start =>
{
lamports.push(style.lamport);
}
_ => {}
}
let id_span = chunk.get_id_span();
encoder.encode_op(id_span, || unimplemented!());
}
debug_log::group_end!();
lamports.encode()
}
#[doc = " Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"]
fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) {
assert_eq!(ctx.mode, EncodeMode::Snapshot);
let lamports = DeltaRleEncodedNums::decode(ctx.blob);
let mut lamport_iter = lamports.iter();
let mut loader = RichtextStateLoader::default();
let mut id_to_style = FxHashMap::default();
for op in ctx.ops {
let id = op.id();
let chunk = match op.op.content.into_list().unwrap() {
list_op::InnerListOp::InsertText { slice, .. } => {
RichtextStateChunk::new_text(slice.clone(), id)
}
list_op::InnerListOp::StyleStart {
key, value, info, ..
} => {
let style_op = Arc::new(StyleOp {
lamport: lamport_iter.next().unwrap(),
peer: op.peer,
cnt: op.op.counter,
key,
value,
info,
});
id_to_style.insert(id, style_op.clone());
RichtextStateChunk::new_style(style_op, AnchorType::Start)
}
list_op::InnerListOp::StyleEnd => {
let style = id_to_style.remove(&id.inc(-1)).unwrap();
RichtextStateChunk::new_style(style, AnchorType::End)
}
a => unreachable!("richtext state should not have {a:?}"),
};
debug_log::debug_dbg!(&chunk);
loader.push(chunk);
}
*self.state = LazyLoad::Src(loader);
}
}
impl RichtextState {
@ -553,142 +647,6 @@ impl RichtextState {
pub fn get_richtext_value(&mut self) -> LoroValue {
self.state.get_mut().get_richtext_value()
}
#[inline]
fn get_loader() -> RichtextStateLoader {
RichtextStateLoader {
elements: Default::default(),
start_anchor_pos: Default::default(),
entity_index: 0,
style_ranges: Default::default(),
}
}
#[inline]
pub(crate) fn iter_chunk(&self) -> Box<dyn Iterator<Item = &RichtextStateChunk> + '_> {
match &*self.state {
LazyLoad::Src(s) => Box::new(s.elements.iter()),
LazyLoad::Dst(s) => Box::new(s.iter_chunk()),
}
}
pub(crate) fn decode_snapshot(
&mut self,
EncodedRichtextState {
len,
text_bytes,
styles,
is_style_start,
}: EncodedRichtextState,
state_arena: &TempArena,
common: &CommonArena,
arena: &SharedArena,
) {
assert!(self.is_empty());
if text_bytes.is_empty() {
return;
}
let bit_len = is_style_start.len() * 8;
let is_style_start = BitMap::from_vec(is_style_start, bit_len);
let mut is_style_start_iter = is_style_start.iter();
let mut loader = Self::get_loader();
let mut is_text = true;
let mut text_range_iter = TextRanges::decode_iter(&text_bytes).unwrap();
let mut style_iter = styles.iter();
for &len in len.iter() {
if is_text {
for _ in 0..len {
let range = text_range_iter.next().unwrap();
let text = arena.slice_by_utf8(range.start..range.start + range.len);
loader.push(RichtextStateChunk::new_text(text));
}
} else {
for _ in 0..len {
let is_start = is_style_start_iter.next().unwrap();
let style_compact = style_iter.next().unwrap();
loader.push(RichtextStateChunk::new_style(
Arc::new(StyleOp {
lamport: style_compact.lamport,
peer: common.peer_ids[style_compact.peer_idx as usize],
cnt: style_compact.counter as Counter,
key: state_arena.keywords[style_compact.key_idx as usize].clone(),
value: style_compact.value.clone(),
info: TextStyleInfoFlag::from_byte(style_compact.style_info),
}),
if is_start {
AnchorType::Start
} else {
AnchorType::End
},
))
}
}
is_text = !is_text;
}
self.state = Box::new(LazyLoad::new(loader));
}
pub(crate) fn encode_snapshot(
&self,
record_peer: &mut impl FnMut(PeerID) -> u32,
record_key: &mut impl FnMut(&InternalString) -> usize,
) -> EncodedRichtextState {
// lengths are interleaved [text_elem_len, style_elem_len, ..]
let mut lengths = Vec::new();
let mut text_ranges: TextRanges = Default::default();
let mut styles = Vec::new();
let mut is_style_start = BitMap::new();
for chunk in self.iter_chunk() {
match chunk {
RichtextStateChunk::Text(s) => {
if lengths.len() % 2 == 0 {
lengths.push(0);
}
*lengths.last_mut().unwrap() += 1;
text_ranges.ranges.push(loro_preload::TextRange {
start: s.bytes().start(),
len: s.bytes().len(),
});
}
RichtextStateChunk::Style { style, anchor_type } => {
if lengths.is_empty() {
lengths.reserve(2);
lengths.push(0);
lengths.push(0);
}
if lengths.len() % 2 == 1 {
lengths.push(0);
}
*lengths.last_mut().unwrap() += 1;
is_style_start.push(*anchor_type == AnchorType::Start);
styles.push(loro_preload::CompactStyleOp {
peer_idx: record_peer(style.peer),
key_idx: record_key(&style.key) as u32,
counter: style.cnt as u32,
lamport: style.lamport,
style_info: style.info.to_byte(),
value: style.value.clone(),
})
}
}
}
let text_bytes = text_ranges.encode();
// eprintln!("bytes len={}", text_bytes.len());
EncodedRichtextState {
len: lengths,
text_bytes: std::borrow::Cow::Owned(text_bytes),
styles,
is_style_start: is_style_start.into_vec(),
}
}
}
#[derive(Debug, Default, Clone)]
@ -741,12 +699,17 @@ impl RichtextStateLoader {
state
}
fn is_empty(&self) -> bool {
self.elements.is_empty()
}
}
#[cfg(test)]
mod tests {
use append_only_bytes::AppendOnlyBytes;
use generic_btree::rle::Mergeable;
use loro_common::ID;
use crate::container::richtext::richtext_state::{RichtextStateChunk, TextChunk};
@ -761,15 +724,15 @@ mod tests {
let mut last = UndoItem::Delete {
index: 20,
content: RichtextStateChunk::Text(TextChunk::from_bytes(last_bytes)),
content: RichtextStateChunk::Text(TextChunk::new(last_bytes, ID::new(0, 2))),
};
let mut new = UndoItem::Delete {
index: 18,
content: RichtextStateChunk::Text(TextChunk::from_bytes(new_bytes)),
content: RichtextStateChunk::Text(TextChunk::new(new_bytes, ID::new(0, 0))),
};
let merged = UndoItem::Delete {
index: 18,
content: RichtextStateChunk::Text(TextChunk::from_bytes(bytes.to_slice())),
content: RichtextStateChunk::Text(TextChunk::new(bytes.to_slice(), ID::new(0, 0))),
};
assert!(last.can_merge(&new));
std::mem::swap(&mut last, &mut new);

View file

@ -1,12 +1,15 @@
use fxhash::{FxHashMap, FxHashSet};
use itertools::Itertools;
use loro_common::{ContainerID, LoroError, LoroResult, LoroTreeError, LoroValue, TreeID};
use loro_common::{ContainerID, LoroError, LoroResult, LoroTreeError, LoroValue, TreeID, ID};
use rle::HasLength;
use serde::{Deserialize, Serialize};
use std::collections::{hash_map::Iter, VecDeque};
use std::collections::VecDeque;
use std::sync::{Arc, Mutex, Weak};
use crate::container::idx::ContainerIdx;
use crate::delta::{TreeDiff, TreeDiffItem, TreeExternalDiff};
use crate::diff_calc::TreeDeletedSetTrait;
use crate::encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder};
use crate::event::InternalDiff;
use crate::txn::Transaction;
use crate::DocState;
@ -25,26 +28,66 @@ use super::ContainerState;
/// using flat representation
#[derive(Debug, Clone)]
pub struct TreeState {
pub(crate) trees: FxHashMap<TreeID, Option<TreeID>>,
idx: ContainerIdx,
pub(crate) trees: FxHashMap<TreeID, TreeStateNode>,
pub(crate) deleted: FxHashSet<TreeID>,
in_txn: bool,
undo_items: Vec<TreeUndoItem>,
}
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub(crate) struct TreeStateNode {
pub parent: Option<TreeID>,
pub last_move_op: ID,
}
impl TreeStateNode {
pub const UNEXIST_ROOT: TreeStateNode = TreeStateNode {
parent: TreeID::unexist_root(),
last_move_op: ID::NONE_ID,
};
}
impl Ord for TreeStateNode {
fn cmp(&self, other: &Self) -> std::cmp::Ordering {
self.parent.cmp(&other.parent)
}
}
impl PartialOrd for TreeStateNode {
fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
Some(self.cmp(other))
}
}
#[derive(Debug, Clone, Copy)]
struct TreeUndoItem {
target: TreeID,
old_parent: Option<TreeID>,
old_last_move_op: ID,
}
impl TreeState {
pub fn new() -> Self {
pub fn new(idx: ContainerIdx) -> Self {
let mut trees = FxHashMap::default();
trees.insert(TreeID::delete_root().unwrap(), None);
trees.insert(TreeID::unexist_root().unwrap(), None);
trees.insert(
TreeID::delete_root().unwrap(),
TreeStateNode {
parent: None,
last_move_op: ID::NONE_ID,
},
);
trees.insert(
TreeID::unexist_root().unwrap(),
TreeStateNode {
parent: None,
last_move_op: ID::NONE_ID,
},
);
let mut deleted = FxHashSet::default();
deleted.insert(TreeID::delete_root().unwrap());
Self {
idx,
trees,
deleted,
in_txn: false,
@ -52,18 +95,25 @@ impl TreeState {
}
}
pub fn mov(&mut self, target: TreeID, parent: Option<TreeID>) -> Result<(), LoroError> {
pub fn mov(&mut self, target: TreeID, parent: Option<TreeID>, id: ID) -> Result<(), LoroError> {
let Some(parent) = parent else {
// new root node
let old_parent = self
.trees
.insert(target, None)
.unwrap_or(TreeID::unexist_root());
self.update_deleted_cache(target, None, old_parent);
.insert(
target,
TreeStateNode {
parent: None,
last_move_op: id,
},
)
.unwrap_or(TreeStateNode::UNEXIST_ROOT);
self.update_deleted_cache(target, None, old_parent.parent);
if self.in_txn {
self.undo_items.push(TreeUndoItem {
target,
old_parent: TreeID::unexist_root(),
old_last_move_op: old_parent.last_move_op,
})
}
return Ok(());
@ -77,7 +127,7 @@ impl TreeState {
if self
.trees
.get(&target)
.copied()
.map(|x| x.parent)
.unwrap_or(TreeID::unexist_root())
== Some(parent)
{
@ -86,12 +136,22 @@ impl TreeState {
// move or delete or create children node
let old_parent = self
.trees
.insert(target, Some(parent))
.unwrap_or(TreeID::unexist_root());
self.update_deleted_cache(target, Some(parent), old_parent);
.insert(
target,
TreeStateNode {
parent: Some(parent),
last_move_op: id,
},
)
.unwrap_or(TreeStateNode::UNEXIST_ROOT);
self.update_deleted_cache(target, Some(parent), old_parent.parent);
if self.in_txn {
self.undo_items.push(TreeUndoItem { target, old_parent })
self.undo_items.push(TreeUndoItem {
target,
old_parent: old_parent.parent,
old_last_move_op: old_parent.last_move_op,
})
}
Ok(())
@ -108,7 +168,7 @@ impl TreeState {
let mut node_id = node_id;
loop {
let parent = self.trees.get(node_id).unwrap();
let parent = &self.trees.get(node_id).unwrap().parent;
match parent {
Some(parent_id) if parent_id == maybe_ancestor => return true,
Some(parent_id) if parent_id == node_id => panic!("loop detected"),
@ -120,10 +180,6 @@ impl TreeState {
}
}
pub fn iter(&self) -> Iter<'_, TreeID, Option<TreeID>> {
self.trees.iter()
}
pub fn contains(&self, target: TreeID) -> bool {
if TreeID::is_deleted_root(Some(target)) {
return true;
@ -135,7 +191,7 @@ impl TreeState {
if self.is_deleted(&target) {
None
} else {
self.trees.get(&target).copied()
self.trees.get(&target).map(|x| x.parent)
}
}
@ -160,9 +216,32 @@ impl TreeState {
.max()
.unwrap_or(0)
}
fn get_is_deleted_by_query(&self, target: TreeID) -> bool {
match self.trees.get(&target) {
Some(x) => {
if x.parent.is_none() {
false
} else if x.parent == TreeID::delete_root() {
true
} else {
self.get_is_deleted_by_query(x.parent.unwrap())
}
}
None => false,
}
}
}
impl ContainerState for TreeState {
fn container_idx(&self) -> crate::container::idx::ContainerIdx {
self.idx
}
fn is_state_empty(&self) -> bool {
self.trees.is_empty()
}
fn apply_diff_and_convert(
&mut self,
diff: crate::event::InternalDiff,
@ -188,12 +267,18 @@ impl ContainerState for TreeState {
continue;
}
};
let old_parent = self
let old = self
.trees
.insert(target, parent)
.unwrap_or(TreeID::unexist_root());
if parent != old_parent {
self.update_deleted_cache(target, parent, old_parent);
.insert(
target,
TreeStateNode {
parent,
last_move_op: diff.last_effective_move_op_id,
},
)
.unwrap_or(TreeStateNode::UNEXIST_ROOT);
if parent != old.parent {
self.update_deleted_cache(target, parent, old.parent);
}
}
}
@ -216,7 +301,7 @@ impl ContainerState for TreeState {
match raw_op.content {
crate::op::RawOpContent::Tree(tree) => {
let TreeOp { target, parent, .. } = tree;
self.mov(target, parent)
self.mov(target, parent, raw_op.id)
}
_ => unreachable!(),
}
@ -260,15 +345,28 @@ impl ContainerState for TreeState {
fn abort_txn(&mut self) {
self.in_txn = false;
while let Some(op) = self.undo_items.pop() {
let TreeUndoItem { target, old_parent } = op;
let TreeUndoItem {
target,
old_parent,
old_last_move_op,
} = op;
if TreeID::is_unexist_root(old_parent) {
self.trees.remove(&target);
} else {
let parent = self
.trees
.insert(target, old_parent)
.unwrap_or(TreeID::unexist_root());
self.update_deleted_cache(target, old_parent, parent);
.insert(
target,
TreeStateNode {
parent: old_parent,
last_move_op: old_last_move_op,
},
)
.unwrap_or(TreeStateNode {
parent: TreeID::unexist_root(),
last_move_op: ID::NONE_ID,
});
self.update_deleted_cache(target, old_parent, parent.parent);
}
}
}
@ -285,11 +383,12 @@ impl ContainerState for TreeState {
let iter = self.trees.iter().sorted();
#[cfg(not(feature = "test_utils"))]
let iter = self.trees.iter();
for (target, parent) in iter {
for (target, node) in iter {
if !self.deleted.contains(target) && !TreeID::is_unexist_root(Some(*target)) {
let mut t = FxHashMap::default();
t.insert("id".to_string(), target.id().to_string().into());
let p = parent
let p = node
.parent
.map(|p| p.to_string().into())
.unwrap_or(LoroValue::Null);
t.insert("parent".to_string(), p);
@ -318,6 +417,42 @@ impl ContainerState for TreeState {
.map(|n| n.associated_meta_container())
.collect_vec()
}
#[doc = " Get a list of ops that can be used to restore the state to the current state"]
fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec<u8> {
for node in self.trees.values() {
if node.last_move_op == ID::NONE_ID {
continue;
}
encoder.encode_op(node.last_move_op.into(), || unimplemented!());
}
Vec::new()
}
#[doc = " Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"]
fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) {
assert_eq!(ctx.mode, EncodeMode::Snapshot);
for op in ctx.ops {
assert_eq!(op.op.atom_len(), 1);
let content = op.op.content.as_tree().unwrap();
let target = content.target;
let parent = content.parent;
self.trees.insert(
target,
TreeStateNode {
parent,
last_move_op: op.id(),
},
);
}
for t in self.trees.keys() {
if self.get_is_deleted_by_query(*t) {
self.deleted.insert(*t);
}
}
}
}
impl TreeDeletedSetTrait for TreeState {
@ -329,12 +464,12 @@ impl TreeDeletedSetTrait for TreeState {
&mut self.deleted
}
fn get_children(&self, target: TreeID) -> Vec<TreeID> {
fn get_children(&self, target: TreeID) -> Vec<(TreeID, ID)> {
let mut ans = Vec::new();
for (t, parent) in self.trees.iter() {
if let Some(p) = parent {
if p == &target {
ans.push(*t);
if let Some(p) = parent.parent {
if p == target {
ans.push((*t, parent.last_move_op));
}
}
}
@ -366,12 +501,12 @@ pub struct TreeNode {
}
impl Forest {
pub(crate) fn from_tree_state(state: &FxHashMap<TreeID, Option<TreeID>>) -> Self {
pub(crate) fn from_tree_state(state: &FxHashMap<TreeID, TreeStateNode>) -> Self {
let mut forest = Self::default();
let mut node_to_children = FxHashMap::default();
for (id, parent) in state.iter().sorted() {
if let Some(parent) = parent {
if let Some(parent) = &parent.parent {
node_to_children
.entry(*parent)
.or_insert_with(Vec::new)
@ -381,7 +516,7 @@ impl Forest {
for root in state
.iter()
.filter(|(_, parent)| parent.is_none())
.filter(|(_, parent)| parent.parent.is_none())
.map(|(id, _)| *id)
.sorted()
{
@ -444,7 +579,7 @@ pub(crate) fn get_meta_value(nodes: &mut Vec<LoroValue>, state: &mut DocState) {
let map = Arc::make_mut(node.as_map_mut().unwrap());
let meta = map.get_mut("meta").unwrap();
let id = meta.as_container().unwrap();
*meta = state.get_container_deep_value(state.arena.id_to_idx(id).unwrap());
*meta = state.get_container_deep_value(state.arena.register_container(id));
}
}
@ -470,16 +605,22 @@ mod tests {
#[test]
fn test_tree_state() {
let mut state = TreeState::new();
state.mov(ID1, None).unwrap();
state.mov(ID2, Some(ID1)).unwrap();
let mut state = TreeState::new(ContainerIdx::from_index_and_type(
0,
loro_common::ContainerType::Tree,
));
state.mov(ID1, None, ID::NONE_ID).unwrap();
state.mov(ID2, Some(ID1), ID::NONE_ID).unwrap();
}
#[test]
fn tree_convert() {
let mut state = TreeState::new();
state.mov(ID1, None).unwrap();
state.mov(ID2, Some(ID1)).unwrap();
let mut state = TreeState::new(ContainerIdx::from_index_and_type(
0,
loro_common::ContainerType::Tree,
));
state.mov(ID1, None, ID::NONE_ID).unwrap();
state.mov(ID2, Some(ID1), ID::NONE_ID).unwrap();
let roots = Forest::from_tree_state(&state.trees);
let json = serde_json::to_string(&roots).unwrap();
assert_eq!(
@ -490,12 +631,15 @@ mod tests {
#[test]
fn delete_node() {
let mut state = TreeState::new();
state.mov(ID1, None).unwrap();
state.mov(ID2, Some(ID1)).unwrap();
state.mov(ID3, Some(ID2)).unwrap();
state.mov(ID4, Some(ID1)).unwrap();
state.mov(ID2, TreeID::delete_root()).unwrap();
let mut state = TreeState::new(ContainerIdx::from_index_and_type(
0,
loro_common::ContainerType::Tree,
));
state.mov(ID1, None, ID::NONE_ID).unwrap();
state.mov(ID2, Some(ID1), ID::NONE_ID).unwrap();
state.mov(ID3, Some(ID2), ID::NONE_ID).unwrap();
state.mov(ID4, Some(ID1), ID::NONE_ID).unwrap();
state.mov(ID2, TreeID::delete_root(), ID::NONE_ID).unwrap();
let roots = Forest::from_tree_state(&state.trees);
let json = serde_json::to_string(&roots).unwrap();
assert_eq!(

View file

@ -1,90 +0,0 @@
#[derive(Clone, PartialEq, Eq)]
pub struct BitMap {
vec: Vec<u8>,
len: usize,
}
impl std::fmt::Debug for BitMap {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
let mut ans = String::new();
for v in self.iter() {
if v {
ans.push('1');
} else {
ans.push('0');
}
}
f.debug_struct("BitMap")
.field("len", &self.len)
.field("vec", &ans)
.finish()
}
}
impl BitMap {
pub fn new() -> Self {
Self {
vec: Vec::new(),
len: 0,
}
}
pub fn from_vec(vec: Vec<u8>, len: usize) -> Self {
Self { vec, len }
}
pub fn into_vec(self) -> Vec<u8> {
self.vec
}
#[allow(unused)]
pub fn len(&self) -> usize {
self.len
}
pub fn push(&mut self, v: bool) {
while self.len / 8 >= self.vec.len() {
self.vec.push(0);
}
if v {
self.vec[self.len / 8] |= 1 << (self.len % 8);
}
self.len += 1;
}
pub fn get(&self, index: usize) -> bool {
if index >= self.len {
panic!("index out of range");
}
(self.vec[index / 8] & (1 << (index % 8))) != 0
}
pub fn iter(&self) -> impl Iterator<Item = bool> + '_ {
self.vec
.iter()
.flat_map(|&v| (0..8).map(move |i| (v & (1 << i)) != 0))
.take(self.len)
}
}
#[cfg(test)]
mod test {
use super::BitMap;
#[test]
fn basic() {
let mut map = BitMap::new();
map.push(true);
map.push(false);
map.push(true);
map.push(true);
assert!(map.get(0));
assert!(!map.get(1));
assert!(map.get(2));
assert!(map.get(3));
dbg!(map);
}
}

View file

@ -0,0 +1,37 @@
use serde_columnar::columnar;
#[columnar(vec, ser, de)]
#[derive(Debug, Clone)]
struct EncodedNum {
#[columnar(strategy = "DeltaRle")]
num: u32,
}
#[derive(Default)]
#[columnar(ser, de)]
pub struct DeltaRleEncodedNums {
#[columnar(class = "vec")]
nums: Vec<EncodedNum>,
}
impl DeltaRleEncodedNums {
pub fn new() -> Self {
Self::default()
}
pub fn push(&mut self, n: u32) {
self.nums.push(EncodedNum { num: n });
}
pub fn iter(&self) -> impl Iterator<Item = u32> + '_ {
self.nums.iter().map(|n| n.num)
}
pub fn encode(&self) -> Vec<u8> {
serde_columnar::to_vec(&self).unwrap()
}
pub fn decode(encoded: &[u8]) -> Self {
serde_columnar::from_bytes(encoded).unwrap()
}
}

View file

@ -0,0 +1,341 @@
use itertools::Either;
use loro_common::{HasCounter, HasCounterSpan, HasId, HasIdSpan, IdSpan, ID};
use rle::HasLength;
use std::collections::BTreeMap;
/// A map that maps spans of continuous [ID]s to spans of continuous integers.
///
/// It can merge spans that are adjacent to each other.
/// The value is automatically incremented by the length of the inserted span.
#[derive(Debug)]
pub struct IdIntMap {
inner: Either<BTreeMap<ID, Value>, Vec<(IdSpan, i32)>>,
next_value: i32,
}
const MAX_VEC_LEN: usize = 16;
#[derive(Debug)]
struct Value {
len: i32,
value: i32,
}
impl IdIntMap {
pub fn new() -> Self {
Self {
inner: Either::Right(Default::default()),
next_value: 0,
}
}
pub fn insert(&mut self, id_span: IdSpan) {
if cfg!(debug_assertions) {
let target = self.get(id_span.id_start());
assert!(
target.is_none(),
"ID already exists {id_span:?} {target:?} {:#?}",
self
);
}
match &mut self.inner {
Either::Left(map) => {
let value = self.next_value;
let len = id_span.atom_len() as i32;
self.next_value += len;
let id = id_span.id_start();
match map.range_mut(..&id).last() {
Some(last)
if last.0.peer == id.peer
&& last.0.counter + last.1.len == id.counter
&& last.1.value + last.1.len == value =>
{
// merge
last.1.len += len;
}
_ => {
map.insert(id, Value { len, value });
}
}
}
Either::Right(vec) => {
if vec.len() == MAX_VEC_LEN {
// convert to map and insert
self.escalate_to_map();
self.insert(id_span);
return;
}
let value = self.next_value;
let len = id_span.atom_len() as i32;
self.next_value += len;
let pos = match vec.binary_search_by(|x| x.0.id_start().cmp(&id_span.id_start())) {
Ok(_) => unreachable!("ID already exists"),
Err(i) => i,
};
if pos > 0 {
if let Some(last) = vec.get_mut(pos - 1) {
if last.0.id_end() == id_span.id_start()
&& last.1 + last.0.atom_len() as i32 == value
{
// can merge
last.0.counter.end += len;
return;
}
}
}
vec.insert(pos, (id_span, value));
}
}
}
fn escalate_to_map(&mut self) {
let Either::Right(vec) = &mut self.inner else {
return;
};
let mut map = BTreeMap::new();
for (id_span, value) in vec.drain(..) {
map.insert(
id_span.id_start(),
Value {
len: id_span.atom_len() as i32,
value,
},
);
}
self.inner = Either::Left(map);
}
/// Return (value, length) that starts at the given ID.
pub fn get(&self, target: ID) -> Option<(i32, usize)> {
let ans = match &self.inner {
Either::Left(map) => map.range(..=&target).last().and_then(|(entry_key, value)| {
if entry_key.peer != target.peer {
None
} else if entry_key.counter + value.len > target.counter {
Some((
value.value + target.counter - entry_key.counter,
(entry_key.counter + value.len - target.counter) as usize,
))
} else {
None
}
}),
Either::Right(vec) => vec
.iter()
.rev()
.find(|(id_span, _)| id_span.contains(target))
.map(|(id_span, value)| {
(
*value + target.counter - id_span.ctr_start(),
(id_span.ctr_end() - target.counter) as usize,
)
}),
};
ans
}
/// Call `next` for each key-value pair that is in the given span.
/// It's guaranteed that the keys are in ascending order.
pub fn get_values_in_span(&self, target: IdSpan, mut next: impl FnMut(IdSpan, i32)) {
let target_peer = target.client_id;
match &self.inner {
Either::Left(map) => {
let last = map
.range(..&target.id_start())
.next_back()
.and_then(|(id, v)| {
if id.peer != target_peer {
None
} else if id.counter + v.len > target.ctr_start() {
Some((id, v))
} else {
None
}
});
let iter = map.range(&target.id_start()..);
for (entry_key, value) in last.into_iter().chain(iter) {
if entry_key.peer > target_peer {
break;
}
if entry_key.counter >= target.ctr_end() {
break;
}
assert_eq!(entry_key.peer, target_peer);
let cur_span = &IdSpan::new(
target_peer,
entry_key.counter,
entry_key.counter + value.len,
);
let next_span = cur_span.get_intersection(&target).unwrap();
(next)(
next_span,
value.value + next_span.counter.start - entry_key.counter,
);
}
}
Either::Right(vec) => {
for (id_span, value) in vec.iter() {
if id_span.client_id < target_peer {
continue;
}
if id_span.client_id > target_peer {
break;
}
if target.ctr_start() >= id_span.ctr_end() {
continue;
}
if target.ctr_end() <= id_span.counter.start {
break;
}
assert_eq!(id_span.client_id, target_peer);
let next_span = id_span.get_intersection(&target).unwrap();
(next)(
next_span,
*value + next_span.counter.start - id_span.counter.start,
);
}
}
}
}
/// If the given item has overlapped section with the content in the map,
/// split the item into pieces where each piece maps to a continuous series of values or maps to none.
pub(crate) fn split<'a, T: HasIdSpan + generic_btree::rle::Sliceable + 'a>(
&'a self,
item: T,
) -> impl Iterator<Item = T> + 'a {
let len = item.rle_len();
let span = item.id_span();
// PERF: we may avoid this alloc if get_values_in_span returns an iter
let mut ans = Vec::new();
let mut ctr_start = span.ctr_start();
let mut index = 0;
let ctr_end = span.ctr_end();
self.get_values_in_span(span, |id_span: IdSpan, _| {
if id_span.counter.start == ctr_start && id_span.counter.end == ctr_end {
return;
}
if id_span.counter.start > ctr_start {
ans.push(
item.slice(
index as usize..(index + id_span.counter.start - ctr_start) as usize,
),
);
index += id_span.counter.start - ctr_start;
}
ans.push(item.slice(
index as usize..(index + id_span.counter.end - id_span.counter.start) as usize,
));
index += id_span.counter.end - id_span.counter.start;
ctr_start = id_span.ctr_end();
});
if ans.is_empty() && len > 0 {
ans.push(item);
} else if index as usize != len {
ans.push(item.slice(index as usize..len));
}
ans.into_iter()
}
}
#[cfg(test)]
mod test {
use super::*;
#[test]
fn test_basic() {
let mut map = IdIntMap::new();
map.insert(IdSpan::new(0, 0, 10));
map.insert(IdSpan::new(0, 10, 100));
map.insert(IdSpan::new(1, 0, 100));
map.insert(IdSpan::new(2, 0, 100));
map.insert(IdSpan::new(999, 0, 100));
assert!(map.inner.is_right());
assert_eq!(map.get(ID::new(0, 10)).unwrap().0, 10);
assert_eq!(map.get(ID::new(1, 10)).unwrap().0, 110);
assert_eq!(map.get(ID::new(2, 10)).unwrap().0, 210);
assert_eq!(map.get(ID::new(0, 0)).unwrap().0, 0);
assert_eq!(map.get(ID::new(1, 0)).unwrap().0, 100);
assert_eq!(map.get(ID::new(2, 0)).unwrap().0, 200);
assert_eq!(map.get(ID::new(999, 99)).unwrap().0, 399);
for i in 0..100 {
map.insert(IdSpan::new(3, i * 2, i * 2 + 1));
}
assert!(map.inner.is_left());
assert_eq!(map.get(ID::new(0, 10)).unwrap().0, 10);
assert_eq!(map.get(ID::new(1, 10)).unwrap().0, 110);
assert_eq!(map.get(ID::new(2, 10)).unwrap().0, 210);
assert_eq!(map.get(ID::new(0, 0)).unwrap().0, 0);
assert_eq!(map.get(ID::new(1, 0)).unwrap().0, 100);
assert_eq!(map.get(ID::new(2, 0)).unwrap().0, 200);
assert_eq!(map.get(ID::new(999, 99)).unwrap().0, 399);
for i in 0..100 {
assert_eq!(map.get(ID::new(3, i * 2)).unwrap().0, i + 400, "i = {i}");
}
let mut called = 0;
map.get_values_in_span(IdSpan::new(0, 3, 66), |id_span, value| {
called += 1;
assert_eq!(id_span, IdSpan::new(0, 3, 66));
assert_eq!(value, 3);
});
assert_eq!(called, 1);
let mut called = Vec::new();
map.get_values_in_span(IdSpan::new(3, 0, 10), |id_span, value| {
called.push((id_span, value));
});
assert_eq!(
called,
vec![
(IdSpan::new(3, 0, 1), 400),
(IdSpan::new(3, 2, 3), 401),
(IdSpan::new(3, 4, 5), 402),
(IdSpan::new(3, 6, 7), 403),
(IdSpan::new(3, 8, 9), 404),
]
);
}
#[test]
fn test_get_values() {
let mut map = IdIntMap::new();
map.insert(IdSpan::new(0, 3, 5));
map.insert(IdSpan::new(0, 0, 1));
map.insert(IdSpan::new(0, 2, 3));
let mut called = Vec::new();
map.get_values_in_span(IdSpan::new(0, 0, 10), |id_span, value| {
called.push((id_span, value));
});
assert_eq!(
called,
vec![
(IdSpan::new(0, 0, 1), 2),
(IdSpan::new(0, 2, 3), 3),
(IdSpan::new(0, 3, 5), 0),
]
);
}
}

View file

@ -5,10 +5,6 @@ pub enum LazyLoad<Src, Dst: From<Src>> {
}
impl<Src: Default, Dst: From<Src>> LazyLoad<Src, Dst> {
pub fn new(src: Src) -> Self {
LazyLoad::Src(src)
}
pub fn new_dst(dst: Dst) -> Self {
LazyLoad::Dst(dst)
}

View file

@ -1,4 +1,5 @@
pub(crate) mod bitmap;
pub(crate) mod delta_rle_encoded_num;
pub(crate) mod id_int_map;
pub(crate) mod lazy;
pub mod string_slice;
pub(crate) mod utf16;

View file

@ -1,4 +1,4 @@
use loro_common::IdSpanVector;
use loro_common::{HasCounter, HasCounterSpan, IdSpanVector};
use smallvec::smallvec;
use std::{
cmp::Ordering,
@ -15,7 +15,7 @@ use crate::{
change::Lamport,
id::{Counter, ID},
oplog::AppDag,
span::{CounterSpan, HasIdSpan, IdSpan},
span::{CounterSpan, IdSpan},
LoroError, PeerID,
};
@ -134,6 +134,11 @@ impl Frontiers {
pub fn filter_peer(&mut self, peer: PeerID) {
self.retain(|id| id.peer != peer);
}
#[inline]
pub(crate) fn with_capacity(cap: usize) -> Frontiers {
Self(SmallVec::with_capacity(cap))
}
}
impl Deref for Frontiers {
@ -606,13 +611,13 @@ impl VersionVector {
false
}
pub fn intersect_span<S: HasIdSpan>(&self, target: &S) -> Option<CounterSpan> {
let id = target.id_start();
if let Some(end) = self.get(&id.peer) {
if *end > id.counter {
pub fn intersect_span(&self, target: IdSpan) -> Option<CounterSpan> {
if let Some(&end) = self.get(&target.client_id) {
if end > target.ctr_start() {
let count_end = target.ctr_end();
return Some(CounterSpan {
start: id.counter,
end: *end,
start: target.ctr_start(),
end: end.min(count_end),
});
}
}

View file

@ -5,6 +5,7 @@ use serde_json::json;
#[test]
fn auto_commit() {
let mut doc_a = LoroDoc::default();
doc_a.set_peer_id(1).unwrap();
doc_a.start_auto_commit();
let text_a = doc_a.get_text("text");
text_a.insert(0, "hello").unwrap();
@ -14,6 +15,7 @@ fn auto_commit() {
let mut doc_b = LoroDoc::default();
doc_b.start_auto_commit();
doc_b.set_peer_id(2).unwrap();
let text_b = doc_b.get_text("text");
text_b.insert(0, "100").unwrap();
doc_b.import(&bytes).unwrap();

View file

@ -132,10 +132,20 @@ impl<'a> EncodedAppState<'a> {
#[derive(Debug, Clone, Serialize, Deserialize)]
pub enum EncodedContainerState<'a> {
Map(Vec<MapEntry>),
List(Vec<usize>),
List {
elem_idx: Vec<usize>,
elem_ids: Vec<ID>,
},
#[serde(borrow)]
Richtext(Box<EncodedRichtextState<'a>>),
Tree((Vec<(usize, Option<usize>)>, Vec<usize>)),
Tree((Vec<EncodedTreeNode>, Vec<usize>)),
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct EncodedTreeNode {
pub node_idx: usize,
pub parent: Option<usize>,
pub id: ID,
}
#[derive(Debug, Default, Clone, Serialize, Deserialize)]
@ -148,6 +158,7 @@ pub struct EncodedRichtextState<'a> {
/// This is encoded [TextRanges]
#[serde(borrow)]
pub text_bytes: Cow<'a, [u8]>,
pub ids: Vec<(u32, u32)>,
/// Style anchor index in the style arena
// TODO: can be optimized
pub styles: Vec<CompactStyleOp>,
@ -174,9 +185,7 @@ pub struct TextRanges {
impl TextRanges {
#[inline]
pub fn decode_iter(bytes: &[u8]) -> LoroResult<impl Iterator<Item = TextRange> + '_> {
let iter = serde_columnar::iter_from_bytes::<TextRanges>(bytes).map_err(|e| {
LoroError::DecodeError(format!("Failed to decode TextRange: {}", e).into_boxed_str())
})?;
let iter = serde_columnar::iter_from_bytes::<TextRanges>(bytes)?;
Ok(iter.ranges)
}
@ -190,7 +199,7 @@ impl<'a> EncodedContainerState<'a> {
pub fn container_type(&self) -> loro_common::ContainerType {
match self {
EncodedContainerState::Map(_) => loro_common::ContainerType::Map,
EncodedContainerState::List(_) => loro_common::ContainerType::List,
EncodedContainerState::List { .. } => loro_common::ContainerType::List,
EncodedContainerState::Tree(_) => loro_common::ContainerType::Tree,
EncodedContainerState::Richtext { .. } => loro_common::ContainerType::Text,
}

View file

@ -1,5 +1,35 @@
# Changelog
## 0.7.2-alpha.4
### Patch Changes
- Fix encoding value err
## 0.7.2-alpha.3
### Patch Changes
- Fix export compressed snapshot
## 0.7.2-alpha.2
### Patch Changes
- Add compressed method
## 0.7.2-alpha.1
### Patch Changes
- Fix v0 exports
## 0.7.2-alpha.0
### Patch Changes
- Add experimental encode methods
## 0.7.1
### Patch Changes

View file

@ -15,7 +15,6 @@
"https://deno.land/std@0.105.0/path/posix.ts": "b81974c768d298f8dcd2c720229639b3803ca4a241fa9a355c762fa2bc5ef0c1",
"https://deno.land/std@0.105.0/path/separator.ts": "8fdcf289b1b76fd726a508f57d3370ca029ae6976fcde5044007f062e643ff1c",
"https://deno.land/std@0.105.0/path/win32.ts": "f4a3d4a3f2c9fe894da046d5eac48b5e789a0ebec5152b2c0985efe96a9f7ae1",
"https://deno.land/x/dirname@1.1.2/mod.ts": "4029ca6b49da58d262d65f826ba9b3a89cc0b92a94c7220d5feb7bd34e498a54",
"https://deno.land/x/dirname@1.1.2/types.ts": "c1ed1667545bc4b1d69bdb2fc26a5fa8edae3a56e3081209c16a408a322a2319",
"https://lra6z45nakk5lnu3yjchp7tftsdnwwikwr65ocha5eojfnlgu4sa.arweave.net/XEHs860CldW2m8JEd_5lnIbbWQq0fdcI4OkckrVmpyQ/_util/assert.ts": "e1f76e77c5ccb5a8e0dbbbe6cce3a56d2556c8cb5a9a8802fc9565af72462149",
"https://lra6z45nakk5lnu3yjchp7tftsdnwwikwr65ocha5eojfnlgu4sa.arweave.net/XEHs860CldW2m8JEd_5lnIbbWQq0fdcI4OkckrVmpyQ/path/_constants.ts": "aba480c4a2c098b6374fdd5951fea13ecc8aaaf8b8aa4dae1871baa50243d676",

View file

@ -1,6 +1,6 @@
{
"name": "loro-wasm",
"version": "0.7.1",
"version": "0.7.2-alpha.4",
"description": "Loro CRDTs is a high-performance CRDT framework that makes your app state synchronized, collaborative and maintainable effortlessly.",
"keywords": [
"crdt",

View file

@ -863,11 +863,11 @@ impl Loro {
let value: JsValue = vv.into();
let is_bytes = value.is_instance_of::<js_sys::Uint8Array>();
let vv = if is_bytes {
let bytes = js_sys::Uint8Array::try_from(value.clone()).unwrap_throw();
let bytes = js_sys::Uint8Array::from(value.clone());
let bytes = bytes.to_vec();
VersionVector::decode(&bytes)?
} else {
let map = js_sys::Map::try_from(value).unwrap_throw();
let map = js_sys::Map::from(value);
js_map_to_vv(map)?
};
@ -1074,7 +1074,7 @@ impl LoroText {
/// ```
pub fn mark(&self, range: JsRange, key: &str, value: JsValue) -> Result<(), JsError> {
let range: MarkRange = serde_wasm_bindgen::from_value(range.into())?;
let value: LoroValue = LoroValue::try_from(value)?;
let value: LoroValue = LoroValue::from(value);
let expand = range
.expand
.map(|x| {
@ -2081,7 +2081,7 @@ pub fn to_readable_version(version: &[u8]) -> Result<JsVersionVectorMap, JsValue
#[wasm_bindgen(js_name = "toEncodedVersion")]
pub fn to_encoded_version(version: JsVersionVectorMap) -> Result<Vec<u8>, JsValue> {
let map: JsValue = version.into();
let map: js_sys::Map = map.try_into().unwrap_throw();
let map: js_sys::Map = map.into();
let vv = js_map_to_vv(map)?;
let encoded = vv.encode();
Ok(encoded)
@ -2153,6 +2153,8 @@ export type TreeID = `${number}@${string}`;
interface Loro {
exportFrom(version?: Uint8Array): Uint8Array;
exportFromV0(version?: Uint8Array): Uint8Array;
exportFromCompressed(version?: Uint8Array): Uint8Array;
getContainerById(id: ContainerID): LoroText | LoroMap | LoroList;
}
/**

View file

@ -22,6 +22,7 @@ pub use loro_internal::obs::SubID;
pub use loro_internal::obs::Subscriber;
pub use loro_internal::version::{Frontiers, VersionVector};
pub use loro_internal::DiffEvent;
pub use loro_internal::{loro_value, to_value};
pub use loro_internal::{LoroError, LoroResult, LoroValue, ToJson};
/// `LoroDoc` is the entry for the whole document.

View file

@ -53,7 +53,6 @@
"https://deno.land/x/cliui@v7.0.4-deno/build/lib/index.js": "fb6030c7b12602a4fca4d81de3ddafa301ba84fd9df73c53de6f3bdda7b482d5",
"https://deno.land/x/cliui@v7.0.4-deno/build/lib/string-utils.js": "b3eb9d2e054a43a3064af17332fb1839a7dadb205c5371af4789616afb1a117f",
"https://deno.land/x/cliui@v7.0.4-deno/deno.ts": "d07bc3338661f8011e3a5fd215061d17a52107a5383c29f40ce0c1ecb8bb8cc3",
"https://deno.land/x/dirname@1.1.2/mod.ts": "4029ca6b49da58d262d65f826ba9b3a89cc0b92a94c7220d5feb7bd34e498a54",
"https://deno.land/x/dirname@1.1.2/types.ts": "c1ed1667545bc4b1d69bdb2fc26a5fa8edae3a56e3081209c16a408a322a2319",
"https://deno.land/x/escalade@v3.0.3/sync.ts": "493bc66563292c5c10c4a75a467a5933f24dad67d74b0f5a87e7b988fe97c104",
"https://deno.land/x/y18n@v5.0.0-deno/build/lib/index.d.ts": "11f40d97041eb271cc1a1c7b296c6e7a068d4843759575e7416f0d14ebf8239c",

View file

@ -1,5 +1,45 @@
# Changelog
## 0.7.2-alpha.4
### Patch Changes
- Fix encoding value err
- Updated dependencies
- loro-wasm@0.7.2
## 0.7.2-alpha.3
### Patch Changes
- Fix export compressed snapshot
- Updated dependencies
- loro-wasm@0.7.2
## 0.7.2-alpha.2
### Patch Changes
- Add compressed method
- Updated dependencies
- loro-wasm@0.7.2
## 0.7.2-alpha.1
### Patch Changes
- Fix v0 exports
- Updated dependencies
- loro-wasm@0.7.2
## 0.7.2-alpha.0
### Patch Changes
- Add experimental encode methods
- Updated dependencies
- loro-wasm@0.7.2
## 0.7.1
### Patch Changes

View file

@ -1,6 +1,6 @@
{
"name": "loro-crdt",
"version": "0.7.1",
"version": "0.7.2-alpha.4",
"description": "Loro CRDTs is a high-performance CRDT framework that makes your app state synchronized, collaborative and maintainable effortlessly.",
"keywords": [
"crdt",