From bc27a475315dc6f7e2f94eff5bcd2219fc923bc6 Mon Sep 17 00:00:00 2001 From: Zixuan Chen Date: Tue, 2 Jan 2024 17:03:24 +0800 Subject: [PATCH] feat: stabilizing encoding (#219) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This PR implements a new encode schema that is more extendible and more compact. It’s also simpler and takes less binary size and maintaining effort. It is inspired by the [Automerge Encoding Format](https://automerge.org/automerge-binary-format-spec/). The main motivation is the extensibility. When we integrate a new CRDT algorithm, we don’t want to make a breaking change to the encoding or keep multiple versions of the encoding schema in the code, as it will make our WASM size much larger. We need a stable and extendible encoding schema for our v1.0 version. This PR also exposes the ops that compose the current container state. For example, now you can make a query about which operation a certain character quickly. This behavior is required in the new snapshot encoding, so it’s included in this PR. # Encoding Schema ## Header The header has 22 bytes. - (0-4 bytes) Magic Bytes: The encoding starts with `loro` as magic bytes. - (4-20 bytes) Checksum: MD5 checksum of the encoded data, including the header starting from 20th bytes. The checksum is encoded as a 16-byte array. The `checksum` and `magic bytes` fields are trimmed when calculating the checksum. - (20-21 bytes) Encoding Method (2 bytes, big endian): Multiple encoding methods are available for a specific encoding version. ## Encode Mode: Updates In this approach, only ops, specifically their historical record, are encoded, while document states are excluded. Like Automerge's format, we employ columnar encoding for operations and changes. Previously, operations were ordered by their Operation ID (OpId) before columnar encoding. However, sorting operations based on their respective containers initially enhance compression potential. ## Encode Mode: Snapshot This mode simultaneously captures document state and historical data. Upon importing a snapshot into a new document, initialization occurs directly from the snapshot, bypassing the need for CRDT-based recalculations. Unlike previous snapshot encoding methods, the current binary output in snapshot mode is compatible with the updates mode. This enhances the efficiency of importing snapshots into non-empty documents, where initialization via snapshot is infeasible. Additionally, when feasible, we leverage the sequence of operations to construct state snapshots. In CRDTs, deducing the specific ops constituting the current container state is feasible. These ops are tagged in relation to the container, facilitating direct state reconstruction from them. This approach, pioneered by Automerge, significantly improves compression efficiency. --- .gitignore | 1 + .vscode/settings.json | 13 +- Cargo.lock | 259 +- Cargo.toml | 11 +- crates/bench-utils/Cargo.toml | 1 + crates/bench-utils/src/draw.rs | 41 +- crates/bench-utils/src/json.rs | 78 + crates/bench-utils/src/lib.rs | 154 +- crates/bench-utils/src/sheet.rs | 17 +- crates/benches/src/lib.rs | 2 - crates/{benches => examples}/Cargo.toml | 8 +- crates/{benches => examples}/examples/draw.rs | 6 +- .../examples/init_sheet.rs | 2 +- crates/examples/fuzz/.gitignore | 4 + crates/examples/fuzz/Cargo.lock | 1056 ++++++++ crates/examples/fuzz/Cargo.toml | 36 + crates/examples/fuzz/fuzz_targets/draw.rs | 8 + crates/examples/fuzz/fuzz_targets/json.rs | 9 + crates/{benches => examples}/src/draw.rs | 99 +- crates/examples/src/json.rs | 74 + crates/examples/src/lib.rs | 235 ++ crates/{benches => examples}/src/sheet.rs | 12 +- crates/examples/tests/failed_tests.rs | 492 ++++ crates/loro-common/Cargo.toml | 1 + crates/loro-common/src/error.rs | 32 +- crates/loro-common/src/id.rs | 5 +- crates/loro-common/src/lib.rs | 22 +- crates/loro-common/src/macros.rs | 290 ++ crates/loro-common/src/span.rs | 35 +- crates/loro-common/src/value.rs | 58 + crates/loro-internal/Cargo.toml | 7 +- crates/loro-internal/Encoding.md | 25 + crates/loro-internal/deno.lock | 1 - crates/loro-internal/examples/encoding.rs | 10 +- crates/loro-internal/examples/many_actors.rs | 4 +- crates/loro-internal/examples/pending.rs | 2 +- crates/loro-internal/fuzz/Cargo.lock | 38 +- .../loro-internal/fuzz/fuzz_targets/import.rs | 3 +- crates/loro-internal/scripts/fuzz.ts | 4 +- crates/loro-internal/scripts/mem.ts | 4 +- crates/loro-internal/src/arena.rs | 188 +- .../src/container/list/list_op.rs | 1 + .../src/container/map/map_content.rs | 20 - crates/loro-internal/src/container/map/mod.rs | 2 +- .../src/container/richtext/richtext_state.rs | 58 +- .../src/container/richtext/tracker.rs | 3 + .../container/richtext/tracker/crdt_rope.rs | 17 +- crates/loro-internal/src/dag/iter.rs | 10 +- crates/loro-internal/src/delta/map_delta.rs | 6 + crates/loro-internal/src/delta/tree.rs | 14 +- crates/loro-internal/src/diff_calc.rs | 62 +- crates/loro-internal/src/diff_calc/tree.rs | 90 +- crates/loro-internal/src/encoding.rs | 270 +- .../src/encoding/encode_enhanced.rs | 735 ----- .../src/encoding/encode_reordered.rs | 2364 +++++++++++++++++ .../src/encoding/encode_snapshot.rs | 1139 -------- .../src/encoding/encode_updates.rs | 170 -- crates/loro-internal/src/event.rs | 8 +- crates/loro-internal/src/fuzz.rs | 134 +- .../src/fuzz/recursive_refactored.rs | 2 +- crates/loro-internal/src/fuzz/tree.rs | 251 ++ crates/loro-internal/src/lib.rs | 5 +- crates/loro-internal/src/loro.rs | 185 +- crates/loro-internal/src/macros.rs | 24 +- crates/loro-internal/src/op.rs | 90 +- crates/loro-internal/src/op/content.rs | 4 +- crates/loro-internal/src/oplog.rs | 179 +- crates/loro-internal/src/oplog/dag.rs | 20 +- .../src/oplog/pending_changes.rs | 166 +- crates/loro-internal/src/state.rs | 55 +- crates/loro-internal/src/state/list_state.rs | 174 +- crates/loro-internal/src/state/map_state.rs | 82 +- .../loro-internal/src/state/richtext_state.rs | 259 +- crates/loro-internal/src/state/tree_state.rs | 250 +- crates/loro-internal/src/utils/bitmap.rs | 90 - .../src/utils/delta_rle_encoded_num.rs | 37 + crates/loro-internal/src/utils/id_int_map.rs | 341 +++ crates/loro-internal/src/utils/lazy.rs | 4 - crates/loro-internal/src/utils/mod.rs | 3 +- crates/loro-internal/src/version.rs | 21 +- crates/loro-internal/tests/autocommit.rs | 2 + crates/loro-preload/src/encode.rs | 21 +- crates/loro-wasm/CHANGELOG.md | 30 + crates/loro-wasm/deno.lock | 1 - crates/loro-wasm/package.json | 2 +- crates/loro-wasm/src/lib.rs | 10 +- crates/loro/src/lib.rs | 1 + deno.lock | 1 - loro-js/CHANGELOG.md | 40 + loro-js/package.json | 2 +- 90 files changed, 7291 insertions(+), 3511 deletions(-) create mode 100644 crates/bench-utils/src/json.rs delete mode 100644 crates/benches/src/lib.rs rename crates/{benches => examples}/Cargo.toml (56%) rename crates/{benches => examples}/examples/draw.rs (93%) rename crates/{benches => examples}/examples/init_sheet.rs (92%) create mode 100644 crates/examples/fuzz/.gitignore create mode 100644 crates/examples/fuzz/Cargo.lock create mode 100644 crates/examples/fuzz/Cargo.toml create mode 100644 crates/examples/fuzz/fuzz_targets/draw.rs create mode 100644 crates/examples/fuzz/fuzz_targets/json.rs rename crates/{benches => examples}/src/draw.rs (64%) create mode 100644 crates/examples/src/json.rs create mode 100644 crates/examples/src/lib.rs rename crates/{benches => examples}/src/sheet.rs (64%) create mode 100644 crates/examples/tests/failed_tests.rs create mode 100644 crates/loro-common/src/macros.rs create mode 100644 crates/loro-internal/Encoding.md delete mode 100644 crates/loro-internal/src/encoding/encode_enhanced.rs create mode 100644 crates/loro-internal/src/encoding/encode_reordered.rs delete mode 100644 crates/loro-internal/src/encoding/encode_snapshot.rs delete mode 100644 crates/loro-internal/src/encoding/encode_updates.rs delete mode 100644 crates/loro-internal/src/utils/bitmap.rs create mode 100644 crates/loro-internal/src/utils/delta_rle_encoded_num.rs create mode 100644 crates/loro-internal/src/utils/id_int_map.rs diff --git a/.gitignore b/.gitignore index 78f0ec30..46b0bb55 100644 --- a/.gitignore +++ b/.gitignore @@ -7,3 +7,4 @@ dhat-heap.json .DS_Store node_modules/ .idea/ +lcov.info diff --git a/.vscode/settings.json b/.vscode/settings.json index acba475f..6902037f 100644 --- a/.vscode/settings.json +++ b/.vscode/settings.json @@ -1,11 +1,14 @@ { "cSpell.words": [ "arbtest", + "cids", "clippy", + "collab", "dhat", "flate", "gmax", "heapless", + "idspan", "insta", "Leeeon", "LOGSTORE", @@ -28,9 +31,13 @@ "RUST_BACKTRACE": "full", "DEBUG": "*" }, - "rust-analyzer.cargo.features": ["test_utils"], + "rust-analyzer.cargo.features": [ + "test_utils" + ], "editor.defaultFormatter": "rust-lang.rust-analyzer", - "rust-analyzer.server.extraEnv": { "RUSTUP_TOOLCHAIN": "stable" }, + "rust-analyzer.server.extraEnv": { + "RUSTUP_TOOLCHAIN": "stable" + }, "editor.formatOnSave": true, "todo-tree.general.tags": [ "BUG", @@ -46,7 +53,7 @@ "*.rs": "${capture}.excalidraw" }, "excalidraw.theme": "dark", - "deno.enable": false , + "deno.enable": false, "cortex-debug.variableUseNaturalFormat": true, "[markdown]": { "editor.defaultFormatter": "darkriszty.markdown-table-prettify" diff --git a/Cargo.lock b/Cargo.lock index cc6380e1..1bd4028f 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -94,6 +94,12 @@ dependencies = [ "rustc-demangle", ] +[[package]] +name = "base64" +version = "0.21.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "35636a1494ede3b646cc98f74f8e62c773a38a659ebc777a2cf26b9b74171df9" + [[package]] name = "bench-utils" version = "0.1.0" @@ -101,19 +107,11 @@ dependencies = [ "arbitrary", "enum-as-inner 0.5.1", "flate2", + "loro-common", "rand", "serde_json", ] -[[package]] -name = "benches" -version = "0.1.0" -dependencies = [ - "bench-utils", - "loro", - "tabled 0.15.0", -] - [[package]] name = "bit-set" version = "0.5.3" @@ -174,30 +172,14 @@ version = "0.3.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5" -[[package]] -name = "cbindgen" -version = "0.24.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "a6358dedf60f4d9b8db43ad187391afe959746101346fe51bb978126bec61dfb" -dependencies = [ - "clap", - "heck", - "indexmap", - "log", - "proc-macro2 1.0.67", - "quote 1.0.29", - "serde", - "serde_json", - "syn 1.0.107", - "tempfile", - "toml", -] - [[package]] name = "cc" -version = "1.0.78" +version = "1.0.83" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "a20104e2335ce8a659d6dd92a51a767a0c062599c73b343fd152cb401e828c3d" +checksum = "f1174fb0b6ec23863f8b971027804a42614e347eafb0a95bf0b12cdae21fc4d0" +dependencies = [ + "libc", +] [[package]] name = "cfg-if" @@ -238,12 +220,9 @@ version = "3.2.23" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "71655c45cb9845d3270c9d6df84ebe72b4dad3c2ba3f7023ad47c144e4e473a5" dependencies = [ - "atty", "bitflags", "clap_lex", "indexmap", - "strsim", - "termcolor", "textwrap", ] @@ -273,6 +252,16 @@ dependencies = [ "termcolor", ] +[[package]] +name = "color-backtrace" +version = "0.6.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "150fd80a270c0671379f388c8204deb6a746bb4eac8a6c03fe2460b2c0127ea0" +dependencies = [ + "backtrace", + "termcolor", +] + [[package]] name = "console_error_panic_hook" version = "0.1.7" @@ -364,7 +353,7 @@ dependencies = [ "autocfg", "cfg-if", "crossbeam-utils", - "memoffset 0.7.1", + "memoffset", "scopeguard", ] @@ -387,6 +376,16 @@ dependencies = [ "syn 1.0.107", ] +[[package]] +name = "ctor" +version = "0.2.6" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "30d2b3721e861707777e3195b0158f950ae6dc4a27e4d02ff9f67e3eb3de199e" +dependencies = [ + "quote 1.0.29", + "syn 2.0.41", +] + [[package]] name = "darling" version = "0.20.3" @@ -408,7 +407,7 @@ dependencies = [ "proc-macro2 1.0.67", "quote 1.0.29", "strsim", - "syn 2.0.25", + "syn 2.0.41", ] [[package]] @@ -419,7 +418,7 @@ checksum = "836a9bbc7ad63342d6d6e7b815ccab164bc77a2d95d84bc3117a8c0d5c98e2d5" dependencies = [ "darling_core", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", ] [[package]] @@ -440,7 +439,7 @@ checksum = "53e0efad4403bfc52dc201159c4b842a246a14b98c64b55dfd0f2d89729dfeb8" dependencies = [ "proc-macro2 1.0.67", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", ] [[package]] @@ -486,7 +485,7 @@ dependencies = [ "heck", "proc-macro2 1.0.67", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", ] [[package]] @@ -501,6 +500,19 @@ dependencies = [ "syn 1.0.107", ] +[[package]] +name = "examples" +version = "0.1.0" +dependencies = [ + "arbitrary", + "bench-utils", + "color-backtrace 0.6.1", + "ctor 0.2.6", + "debug-log", + "loro", + "tabled 0.15.0", +] + [[package]] name = "fastrand" version = "1.8.0" @@ -656,12 +668,6 @@ dependencies = [ "hashbrown", ] -[[package]] -name = "indoc" -version = "1.0.8" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "da2d6f23ffea9d7e76c53eee25dfb67bcd8fde7f1198b0855350698c9f07c780" - [[package]] name = "instant" version = "0.1.12" @@ -710,6 +716,12 @@ version = "1.4.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "e2abad23fbc42b3700f2f279844dc832adb2b2eb069b2df918f455c4e18cc646" +[[package]] +name = "leb128" +version = "0.2.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "884e2677b40cc8c339eaefcb701c32ef1fd2493d71118dc0ca4b6a736c93bd67" + [[package]] name = "libc" version = "0.2.147" @@ -755,19 +767,12 @@ dependencies = [ "js-sys", "loro-rle", "serde", + "serde_columnar", "string_cache", "thiserror", "wasm-bindgen", ] -[[package]] -name = "loro-ffi" -version = "0.1.0" -dependencies = [ - "cbindgen", - "loro-internal", -] - [[package]] name = "loro-internal" version = "0.2.2" @@ -776,10 +781,11 @@ dependencies = [ "arbitrary", "arbtest", "arref", + "base64", "bench-utils", - "color-backtrace", + "color-backtrace 0.5.1", "criterion", - "ctor", + "ctor 0.1.26", "debug-log", "dhat", "enum-as-inner 0.5.1", @@ -790,11 +796,15 @@ dependencies = [ "im", "itertools 0.11.0", "js-sys", + "leb128", "loro-common", "loro-preload", "loro-rle", + "md5", "miniz_oxide 0.7.1", "num", + "num-derive", + "num-traits", "once_cell", "postcard", "proptest", @@ -828,8 +838,8 @@ version = "0.1.0" dependencies = [ "append-only-bytes", "arref", - "color-backtrace", - "ctor", + "color-backtrace 0.5.1", + "ctor 0.1.26", "debug-log", "enum-as-inner 0.6.0", "fxhash", @@ -862,21 +872,18 @@ dependencies = [ "wasm-bindgen", ] +[[package]] +name = "md5" +version = "0.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771" + [[package]] name = "memchr" version = "2.5.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "2dffe52ecf27772e601905b7522cb4ef790d2cc203488bbd0e2fe85fcb74566d" -[[package]] -name = "memoffset" -version = "0.6.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "5aa361d4faea93603064a027415f07bd8e1d5c88c9fbf68bf56a285428fd79ce" -dependencies = [ - "autocfg", -] - [[package]] name = "memoffset" version = "0.7.1" @@ -922,9 +929,9 @@ checksum = "e4a24736216ec316047a1fc4252e27dabb04218aa4a3f37c6e7ddbf1f9782b54" [[package]] name = "num" -version = "0.4.0" +version = "0.4.1" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "43db66d1170d347f9a065114077f7dccb00c1b9478c89384490a3425279a4606" +checksum = "b05180d69e3da0e530ba2a1dae5110317e49e3b7f3d41be227dc5f92e49ee7af" dependencies = [ "num-bigint", "num-complex", @@ -947,13 +954,24 @@ dependencies = [ [[package]] name = "num-complex" -version = "0.4.2" +version = "0.4.4" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "7ae39348c8bc5fbd7f40c727a9925f03517afd2ab27d46702108b6a7e5414c19" +checksum = "1ba157ca0885411de85d6ca030ba7e2a83a28636056c7c699b07c8b6f7383214" dependencies = [ "num-traits", ] +[[package]] +name = "num-derive" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "876a53fff98e03a936a674b29568b0e605f06b29372c2489ff4de23f1949743d" +dependencies = [ + "proc-macro2 1.0.67", + "quote 1.0.29", + "syn 1.0.107", +] + [[package]] name = "num-integer" version = "0.1.45" @@ -1211,74 +1229,6 @@ dependencies = [ "syn 0.15.44", ] -[[package]] -name = "pyloro" -version = "0.1.0" -dependencies = [ - "loro-internal", - "pyo3", -] - -[[package]] -name = "pyo3" -version = "0.17.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "268be0c73583c183f2b14052337465768c07726936a260f480f0857cb95ba543" -dependencies = [ - "cfg-if", - "indoc", - "libc", - "memoffset 0.6.5", - "parking_lot", - "pyo3-build-config", - "pyo3-ffi", - "pyo3-macros", - "unindent", -] - -[[package]] -name = "pyo3-build-config" -version = "0.17.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "28fcd1e73f06ec85bf3280c48c67e731d8290ad3d730f8be9dc07946923005c8" -dependencies = [ - "once_cell", - "target-lexicon", -] - -[[package]] -name = "pyo3-ffi" -version = "0.17.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "0f6cb136e222e49115b3c51c32792886defbfb0adead26a688142b346a0b9ffc" -dependencies = [ - "libc", - "pyo3-build-config", -] - -[[package]] -name = "pyo3-macros" -version = "0.17.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "94144a1266e236b1c932682136dc35a9dee8d3589728f68130c7c3861ef96b28" -dependencies = [ - "proc-macro2 1.0.67", - "pyo3-macros-backend", - "quote 1.0.29", - "syn 1.0.107", -] - -[[package]] -name = "pyo3-macros-backend" -version = "0.17.3" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "c8df9be978a2d2f0cdebabb03206ed73b11314701a5bfe71b0d753b81997777f" -dependencies = [ - "proc-macro2 1.0.67", - "quote 1.0.29", - "syn 1.0.107", -] - [[package]] name = "quick-error" version = "1.2.3" @@ -1514,7 +1464,7 @@ dependencies = [ "darling", "proc-macro2 1.0.67", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", ] [[package]] @@ -1525,7 +1475,7 @@ checksum = "389894603bd18c46fa56231694f8d827779c0951a667087194cf9de94ed24682" dependencies = [ "proc-macro2 1.0.67", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", ] [[package]] @@ -1640,9 +1590,9 @@ dependencies = [ [[package]] name = "syn" -version = "2.0.25" +version = "2.0.41" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "15e3fc8c0c74267e2df136e5e5fb656a464158aa57624053375eb9c8c6e25ae2" +checksum = "44c8b28c477cc3bf0e7966561e3460130e1255f7a1cf71931075f1c5e7a7e269" dependencies = [ "proc-macro2 1.0.67", "quote 1.0.29", @@ -1707,12 +1657,6 @@ dependencies = [ "syn 1.0.107", ] -[[package]] -name = "target-lexicon" -version = "0.12.5" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "9410d0f6853b1d94f0e519fb95df60f29d2c1eff2d921ffdf01a4c8a3b54f12d" - [[package]] name = "tempfile" version = "3.3.0" @@ -1759,7 +1703,7 @@ checksum = "463fe12d7993d3b327787537ce8dd4dfa058de32fc2b195ef3cde03dc4771e8f" dependencies = [ "proc-macro2 1.0.67", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", ] [[package]] @@ -1778,15 +1722,6 @@ dependencies = [ "serde_json", ] -[[package]] -name = "toml" -version = "0.5.10" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "1333c76748e868a4d9d1017b5ab53171dfd095f70c712fdb4653a406547f598f" -dependencies = [ - "serde", -] - [[package]] name = "typenum" version = "1.16.0" @@ -1811,12 +1746,6 @@ version = "0.1.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "fc72304796d0818e357ead4e000d19c9c174ab23dc11093ac919054d20a6a7fc" -[[package]] -name = "unindent" -version = "0.1.11" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e1766d682d402817b5ac4490b3c3002d91dfa0d22812f341609f97b08757359c" - [[package]] name = "version_check" version = "0.9.4" @@ -1870,7 +1799,7 @@ dependencies = [ "once_cell", "proc-macro2 1.0.67", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", "wasm-bindgen-shared", ] @@ -1892,7 +1821,7 @@ checksum = "e128beba882dd1eb6200e1dc92ae6c5dbaa4311aa7bb211ca035779e5efc39f8" dependencies = [ "proc-macro2 1.0.67", "quote 1.0.29", - "syn 2.0.25", + "syn 2.0.41", "wasm-bindgen-backend", "wasm-bindgen-shared", ] diff --git a/Cargo.toml b/Cargo.toml index c66a4d73..edca2e06 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -1,2 +1,11 @@ [workspace] -members = ["crates/*"] +members = [ + "crates/loro", + "crates/examples", + "crates/bench-utils", + "crates/rle", + "crates/loro-common", + "crates/loro-internal", + "crates/loro-preload", + "crates/loro-wasm", +] diff --git a/crates/bench-utils/Cargo.toml b/crates/bench-utils/Cargo.toml index 1b22a721..5a5874da 100644 --- a/crates/bench-utils/Cargo.toml +++ b/crates/bench-utils/Cargo.toml @@ -7,6 +7,7 @@ edition = "2021" [dependencies] arbitrary = { version = "1.2.0", features = ["derive"] } +loro-common = { path = "../loro-common" } enum-as-inner = "0.5.1" flate2 = "1.0.25" rand = "0.8.5" diff --git a/crates/bench-utils/src/draw.rs b/crates/bench-utils/src/draw.rs index d1e34545..9b4b5afa 100644 --- a/crates/bench-utils/src/draw.rs +++ b/crates/bench-utils/src/draw.rs @@ -1,12 +1,14 @@ use arbitrary::Arbitrary; -#[derive(Debug, Arbitrary, PartialEq, Eq)] +use crate::ActionTrait; + +#[derive(Debug, Arbitrary, PartialEq, Eq, Clone)] pub struct Point { pub x: i32, pub y: i32, } -#[derive(Debug, Arbitrary, PartialEq, Eq)] +#[derive(Debug, Arbitrary, PartialEq, Eq, Clone)] pub enum DrawAction { CreatePath { points: Vec, @@ -25,3 +27,38 @@ pub enum DrawAction { relative_to: Point, }, } + +impl DrawAction { + pub const MAX_X: i32 = 1_000_000; + pub const MAX_Y: i32 = 1_000_000; + pub const MAX_MOVE: i32 = 200; +} + +impl ActionTrait for DrawAction { + fn normalize(&mut self) { + match self { + DrawAction::CreatePath { points } => { + for point in points { + point.x %= Self::MAX_X; + point.y %= Self::MAX_Y; + } + } + DrawAction::Text { pos, size, .. } => { + pos.x %= Self::MAX_X; + pos.y %= Self::MAX_Y; + size.x %= Self::MAX_X; + size.y %= Self::MAX_Y; + } + DrawAction::CreateRect { pos, size } => { + pos.x %= Self::MAX_X; + pos.y %= Self::MAX_Y; + size.x %= Self::MAX_X; + size.y %= Self::MAX_Y; + } + DrawAction::Move { relative_to, .. } => { + relative_to.x %= Self::MAX_MOVE; + relative_to.y %= Self::MAX_MOVE; + } + } + } +} diff --git a/crates/bench-utils/src/json.rs b/crates/bench-utils/src/json.rs new file mode 100644 index 00000000..0f552e99 --- /dev/null +++ b/crates/bench-utils/src/json.rs @@ -0,0 +1,78 @@ +use std::sync::Arc; + +use arbitrary::{Arbitrary, Unstructured}; +pub use loro_common::LoroValue; + +use crate::ActionTrait; + +#[derive(Arbitrary, Debug, PartialEq, Eq, Clone)] +pub enum JsonAction { + InsertMap { + key: String, + value: LoroValue, + }, + InsertList { + #[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))] + index: usize, + value: LoroValue, + }, + DeleteList { + #[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))] + index: usize, + }, + InsertText { + #[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))] + index: usize, + s: String, + }, + DeleteText { + #[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=1024))] + index: usize, + #[arbitrary(with = |u: &mut Unstructured| u.int_in_range(0..=128))] + len: usize, + }, +} + +const MAX_LEN: usize = 1000; +impl ActionTrait for JsonAction { + fn normalize(&mut self) { + match self { + JsonAction::InsertMap { key: _, value } => { + normalize_value(value); + } + + JsonAction::InsertList { index: _, value } => { + normalize_value(value); + } + JsonAction::DeleteList { index } => { + *index %= MAX_LEN; + } + JsonAction::InsertText { .. } => {} + JsonAction::DeleteText { .. } => {} + } + } +} + +fn normalize_value(value: &mut LoroValue) { + match value { + LoroValue::Double(f) => { + if f.is_nan() { + *f = 0.0; + } + } + LoroValue::List(l) => { + for v in Arc::make_mut(l).iter_mut() { + normalize_value(v); + } + } + LoroValue::Map(m) => { + for (_, v) in Arc::make_mut(m).iter_mut() { + normalize_value(v); + } + } + LoroValue::Container(_) => { + *value = LoroValue::Null; + } + _ => {} + } +} diff --git a/crates/bench-utils/src/lib.rs b/crates/bench-utils/src/lib.rs index f006e88d..49534cb2 100644 --- a/crates/bench-utils/src/lib.rs +++ b/crates/bench-utils/src/lib.rs @@ -1,9 +1,14 @@ pub mod draw; +pub mod json; pub mod sheet; + use arbitrary::{Arbitrary, Unstructured}; use enum_as_inner::EnumAsInner; use rand::{RngCore, SeedableRng}; -use std::io::Read; +use std::{ + io::Read, + sync::{atomic::AtomicUsize, Arc}, +}; use flate2::read::GzDecoder; use serde_json::Value; @@ -15,6 +20,10 @@ pub struct TextAction { pub del: usize, } +pub trait ActionTrait: Clone + std::fmt::Debug { + fn normalize(&mut self); +} + pub fn get_automerge_actions() -> Vec { const RAW_DATA: &[u8; 901823] = include_bytes!("../../loro-internal/benches/automerge-paper.json.gz"); @@ -46,19 +55,34 @@ pub fn get_automerge_actions() -> Vec { actions } -#[derive(Debug, EnumAsInner, Arbitrary, PartialEq, Eq)] +#[derive(Debug, EnumAsInner, Arbitrary, PartialEq, Eq, Clone, Copy)] +pub enum SyncKind { + Fit, + Snapshot, + OnlyLastOpFromEachPeer, +} + +#[derive(Debug, EnumAsInner, Arbitrary, PartialEq, Eq, Clone)] pub enum Action { - Action { peer: usize, action: T }, - Sync { from: usize, to: usize }, + Action { + peer: usize, + action: T, + }, + Sync { + from: usize, + to: usize, + kind: SyncKind, + }, SyncAll, } -pub fn gen_realtime_actions<'a, T: Arbitrary<'a>>( +pub fn gen_realtime_actions<'a, T: Arbitrary<'a> + ActionTrait>( action_num: usize, peer_num: usize, seed: &'a [u8], mut preprocess: impl FnMut(&mut Action), ) -> Result>, Box> { + let mut seed_offset = 0; let mut arb = Unstructured::new(seed); let mut ans = Vec::new(); let mut last_sync_all = 0; @@ -67,17 +91,23 @@ pub fn gen_realtime_actions<'a, T: Arbitrary<'a>>( break; } - let mut action: Action = arb - .arbitrary() - .map_err(|e| e.to_string().into_boxed_str())?; + let mut action: Action = match arb.arbitrary() { + Ok(a) => a, + Err(_) => { + seed_offset += 1; + arb = Unstructured::new(&seed[seed_offset % seed.len()..]); + arb.arbitrary().unwrap() + } + }; match &mut action { - Action::Action { peer, .. } => { + Action::Action { peer, action } => { + action.normalize(); *peer %= peer_num; } Action::SyncAll => { last_sync_all = i; } - Action::Sync { from, to } => { + Action::Sync { from, to, .. } => { *from %= peer_num; *to %= peer_num; } @@ -94,13 +124,14 @@ pub fn gen_realtime_actions<'a, T: Arbitrary<'a>>( Ok(ans) } -pub fn gen_async_actions<'a, T: Arbitrary<'a>>( +pub fn gen_async_actions<'a, T: Arbitrary<'a> + ActionTrait>( action_num: usize, peer_num: usize, seed: &'a [u8], actions_before_sync: usize, mut preprocess: impl FnMut(&mut Action), ) -> Result>, Box> { + let mut seed_offset = 0; let mut arb = Unstructured::new(seed); let mut ans = Vec::new(); let mut last_sync_all = 0; @@ -110,14 +141,16 @@ pub fn gen_async_actions<'a, T: Arbitrary<'a>>( } if arb.is_empty() { - return Err("not enough actions".into()); + seed_offset += 1; + arb = Unstructured::new(&seed[seed_offset % seed.len()..]); } let mut action: Action = arb .arbitrary() .map_err(|e| e.to_string().into_boxed_str())?; match &mut action { - Action::Action { peer, .. } => { + Action::Action { peer, action } => { + action.normalize(); *peer %= peer_num; } Action::SyncAll => { @@ -127,7 +160,7 @@ pub fn gen_async_actions<'a, T: Arbitrary<'a>>( last_sync_all = ans.len(); } - Action::Sync { from, to } => { + Action::Sync { from, to, .. } => { *from %= peer_num; *to %= peer_num; } @@ -140,6 +173,99 @@ pub fn gen_async_actions<'a, T: Arbitrary<'a>>( Ok(ans) } +pub fn preprocess_actions( + peer_num: usize, + actions: &[Action], + mut should_skip: impl FnMut(&Action) -> bool, + mut preprocess: impl FnMut(&mut Action), +) -> Vec> { + let mut ans = Vec::new(); + for action in actions { + let mut action = action.clone(); + match &mut action { + Action::Action { peer, .. } => { + *peer %= peer_num; + } + Action::Sync { from, to, .. } => { + *from %= peer_num; + *to %= peer_num; + } + Action::SyncAll => {} + } + + if should_skip(&action) { + continue; + } + + let mut action: Action<_> = action.clone(); + preprocess(&mut action); + ans.push(action.clone()); + } + + ans +} + +pub fn make_actions_realtime(peer_num: usize, actions: &[Action]) -> Vec> { + let since_last_sync_all = Arc::new(AtomicUsize::new(0)); + let since_last_sync_all_2 = since_last_sync_all.clone(); + preprocess_actions( + peer_num, + actions, + |action| match action { + Action::SyncAll => { + since_last_sync_all.store(0, std::sync::atomic::Ordering::Relaxed); + false + } + _ => { + since_last_sync_all.fetch_add(1, std::sync::atomic::Ordering::Relaxed); + false + } + }, + |action| { + if since_last_sync_all_2.load(std::sync::atomic::Ordering::Relaxed) > 10 { + *action = Action::SyncAll; + } + }, + ) +} + +pub fn make_actions_async( + peer_num: usize, + actions: &[Action], + sync_all_interval: usize, +) -> Vec> { + let since_last_sync_all = Arc::new(AtomicUsize::new(0)); + let since_last_sync_all_2 = since_last_sync_all.clone(); + preprocess_actions( + peer_num, + actions, + |action| match action { + Action::SyncAll => { + let last = since_last_sync_all.load(std::sync::atomic::Ordering::Relaxed); + if last < sync_all_interval { + true + } else { + since_last_sync_all.store(0, std::sync::atomic::Ordering::Relaxed); + false + } + } + _ => { + since_last_sync_all.fetch_add(1, std::sync::atomic::Ordering::Relaxed); + false + } + }, + |action| { + if since_last_sync_all_2.load(std::sync::atomic::Ordering::Relaxed) > 10 { + *action = Action::SyncAll; + } + + if let Action::Action { action, .. } = action { + action.normalize(); + } + }, + ) +} + pub fn create_seed(seed: u64, size: usize) -> Vec { let mut rng = rand::rngs::StdRng::seed_from_u64(seed); let mut ans = vec![0; size]; diff --git a/crates/bench-utils/src/sheet.rs b/crates/bench-utils/src/sheet.rs index e5c8810b..c2e9a1dc 100644 --- a/crates/bench-utils/src/sheet.rs +++ b/crates/bench-utils/src/sheet.rs @@ -1,6 +1,8 @@ use arbitrary::Arbitrary; -#[derive(Debug, Arbitrary, PartialEq, Eq)] +use crate::ActionTrait; + +#[derive(Debug, Clone, Arbitrary, PartialEq, Eq)] pub enum SheetAction { SetValue { row: usize, @@ -15,12 +17,10 @@ pub enum SheetAction { }, } -impl SheetAction { - pub const MAX_ROW: usize = 1_048_576; - pub const MAX_COL: usize = 16_384; +impl ActionTrait for SheetAction { /// Excel has a limit of 1,048,576 rows and 16,384 columns per sheet. - // We need to normalize the action to fit the limit. - pub fn normalize(&mut self) { + /// We need to normalize the action to fit the limit. + fn normalize(&mut self) { match self { SheetAction::SetValue { row, col, .. } => { *row %= Self::MAX_ROW; @@ -35,3 +35,8 @@ impl SheetAction { } } } + +impl SheetAction { + pub const MAX_ROW: usize = 1_048_576; + pub const MAX_COL: usize = 16_384; +} diff --git a/crates/benches/src/lib.rs b/crates/benches/src/lib.rs deleted file mode 100644 index 196b7f0e..00000000 --- a/crates/benches/src/lib.rs +++ /dev/null @@ -1,2 +0,0 @@ -pub mod draw; -pub mod sheet; diff --git a/crates/benches/Cargo.toml b/crates/examples/Cargo.toml similarity index 56% rename from crates/benches/Cargo.toml rename to crates/examples/Cargo.toml index 77fced6f..8a1ec485 100644 --- a/crates/benches/Cargo.toml +++ b/crates/examples/Cargo.toml @@ -1,5 +1,5 @@ [package] -name = "benches" +name = "examples" version = "0.1.0" edition = "2021" @@ -9,3 +9,9 @@ edition = "2021" bench-utils = { path = "../bench-utils" } loro = { path = "../loro" } tabled = "0.15.0" +arbitrary = { version = "1.3.0", features = ["derive"] } +debug-log = { version = "0.2", features = [] } + +[dev-dependencies] +color-backtrace = { version = "0.6" } +ctor = "0.2" diff --git a/crates/benches/examples/draw.rs b/crates/examples/examples/draw.rs similarity index 93% rename from crates/benches/examples/draw.rs rename to crates/examples/examples/draw.rs index 391fbaca..5c9a0b31 100644 --- a/crates/benches/examples/draw.rs +++ b/crates/examples/examples/draw.rs @@ -1,6 +1,6 @@ use std::time::Instant; -use benches::draw::{run_async_draw_workflow, run_realtime_collab_draw_workflow}; +use examples::{draw::DrawActor, run_async_workflow, run_realtime_collab_workflow}; use loro::LoroDoc; use tabled::{settings::Style, Table, Tabled}; @@ -52,7 +52,7 @@ fn run_async(peer_num: usize, action_num: usize, seed: u64) -> BenchResult { "run_async(peer_num: {}, action_num: {})", peer_num, action_num ); - let (mut actors, start) = run_async_draw_workflow(peer_num, action_num, 200, seed); + let (mut actors, start) = run_async_workflow::(peer_num, action_num, 200, seed); actors.sync_all(); let apply_duration = start.elapsed().as_secs_f64() * 1000.; @@ -96,7 +96,7 @@ fn run_realtime_collab(peer_num: usize, action_num: usize, seed: u64) -> BenchRe "run_realtime_collab(peer_num: {}, action_num: {})", peer_num, action_num ); - let (mut actors, start) = run_realtime_collab_draw_workflow(peer_num, action_num, seed); + let (mut actors, start) = run_realtime_collab_workflow::(peer_num, action_num, seed); actors.sync_all(); let apply_duration = start.elapsed().as_secs_f64() * 1000.; diff --git a/crates/benches/examples/init_sheet.rs b/crates/examples/examples/init_sheet.rs similarity index 92% rename from crates/benches/examples/init_sheet.rs rename to crates/examples/examples/init_sheet.rs index 16f1a158..1770753a 100644 --- a/crates/benches/examples/init_sheet.rs +++ b/crates/examples/examples/init_sheet.rs @@ -1,4 +1,4 @@ -use benches::sheet::init_sheet; +use examples::sheet::init_sheet; use std::time::Instant; pub fn main() { diff --git a/crates/examples/fuzz/.gitignore b/crates/examples/fuzz/.gitignore new file mode 100644 index 00000000..1a45eee7 --- /dev/null +++ b/crates/examples/fuzz/.gitignore @@ -0,0 +1,4 @@ +target +corpus +artifacts +coverage diff --git a/crates/examples/fuzz/Cargo.lock b/crates/examples/fuzz/Cargo.lock new file mode 100644 index 00000000..1c8bd864 --- /dev/null +++ b/crates/examples/fuzz/Cargo.lock @@ -0,0 +1,1056 @@ +# This file is automatically @generated by Cargo. +# It is not intended for manual editing. +version = 3 + +[[package]] +name = "adler" +version = "1.0.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f26201604c87b1e01bd3d98f8d5d9a8fcbb815e8cedb41ffccbeb4bf593a35fe" + +[[package]] +name = "append-only-bytes" +version = "0.1.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ac436601d6bdde674a0d7fb593e829ffe7b3387c351b356dd20e2d40f5bf3ee5" + +[[package]] +name = "arbitrary" +version = "1.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7d5a26814d8dcb93b0e5a0ff3c6d80a8843bafb21b39e8e18a6f05471870e110" +dependencies = [ + "derive_arbitrary", +] + +[[package]] +name = "arref" +version = "0.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2ccd462b64c3c72f1be8305905a85d85403d768e8690c9b8bd3b9009a5761679" + +[[package]] +name = "atomic-polyfill" +version = "1.0.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8cf2bce30dfe09ef0bfaef228b9d414faaf7e563035494d7fe092dba54b300f4" +dependencies = [ + "critical-section", +] + +[[package]] +name = "autocfg" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d468802bab17cbc0cc575e9b053f41e72aa36bfa6b7f55e3529ffa43161b97fa" + +[[package]] +name = "bench-utils" +version = "0.1.0" +dependencies = [ + "arbitrary", + "enum-as-inner 0.5.1", + "flate2", + "loro-common", + "rand", + "serde_json", +] + +[[package]] +name = "benches-fuzz" +version = "0.0.0" +dependencies = [ + "bench-utils", + "examples", + "libfuzzer-sys", +] + +[[package]] +name = "bitflags" +version = "1.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a" + +[[package]] +name = "bitmaps" +version = "2.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "031043d04099746d8db04daf1fa424b2bc8bd69d92b25962dcde24da39ab64a2" +dependencies = [ + "typenum", +] + +[[package]] +name = "bytecount" +version = "0.6.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e1e5f035d16fc623ae5f74981db80a439803888314e3a555fd6f04acd51a3205" + +[[package]] +name = "byteorder" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1fd0f2584146f6f2ef48085050886acf353beff7305ebd1ae69500e27c67f64b" + +[[package]] +name = "bytes" +version = "1.5.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a2bd12c1caf447e69cd4528f47f94d203fd2582878ecb9e9465484c4148a8223" + +[[package]] +name = "cc" +version = "1.0.83" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f1174fb0b6ec23863f8b971027804a42614e347eafb0a95bf0b12cdae21fc4d0" +dependencies = [ + "jobserver", + "libc", +] + +[[package]] +name = "cfg-if" +version = "1.0.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "baf1de4339761588bc0619e3cbc0120ee582ebb74b53b4efbf79117bd2da40fd" + +[[package]] +name = "cobs" +version = "0.2.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67ba02a97a2bd10f4b59b25c7973101c79642302776489e030cd13cdab09ed15" + +[[package]] +name = "crc32fast" +version = "1.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b540bd8bc810d3885c6ea91e2018302f68baba2129ab3e88f32389ee9370880d" +dependencies = [ + "cfg-if", +] + +[[package]] +name = "critical-section" +version = "1.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7059fff8937831a9ae6f0fe4d658ffabf58f2ca96aa9dec1c889f936f705f216" + +[[package]] +name = "darling" +version = "0.20.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0209d94da627ab5605dcccf08bb18afa5009cfbef48d8a8b7d7bdbc79be25c5e" +dependencies = [ + "darling_core", + "darling_macro", +] + +[[package]] +name = "darling_core" +version = "0.20.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "177e3443818124b357d8e76f53be906d60937f0d3a90773a664fa63fa253e621" +dependencies = [ + "fnv", + "ident_case", + "proc-macro2", + "quote", + "strsim", + "syn 2.0.43", +] + +[[package]] +name = "darling_macro" +version = "0.20.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "836a9bbc7ad63342d6d6e7b815ccab164bc77a2d95d84bc3117a8c0d5c98e2d5" +dependencies = [ + "darling_core", + "quote", + "syn 2.0.43", +] + +[[package]] +name = "debug-log" +version = "0.2.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b90f9d9c0c144c4aa35a874e362392ec6aef3b9291c882484235069540b26c73" +dependencies = [ + "once_cell", +] + +[[package]] +name = "derive_arbitrary" +version = "1.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "67e77553c4162a157adbf834ebae5b415acbecbeafc7a74b0e886657506a7611" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.43", +] + +[[package]] +name = "either" +version = "1.9.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a26ae43d7bcc3b814de94796a5e736d4029efb0ee900c12e2d54c993ad1a1e07" + +[[package]] +name = "embedded-io" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ef1a6892d9eef45c8fa6b9e0086428a2cca8491aca8f787c534a3d6d0bcb3ced" + +[[package]] +name = "enum-as-inner" +version = "0.5.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c9720bba047d567ffc8a3cba48bf19126600e249ab7f128e9233e6376976a116" +dependencies = [ + "heck", + "proc-macro2", + "quote", + "syn 1.0.109", +] + +[[package]] +name = "enum-as-inner" +version = "0.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5ffccbb6966c05b32ef8fbac435df276c4ae4d3dc55a8cd0eb9745e6c12f546a" +dependencies = [ + "heck", + "proc-macro2", + "quote", + "syn 2.0.43", +] + +[[package]] +name = "enum_dispatch" +version = "0.3.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f33313078bb8d4d05a2733a94ac4c2d8a0df9a2b84424ebf4f33bfc224a890e" +dependencies = [ + "once_cell", + "proc-macro2", + "quote", + "syn 2.0.43", +] + +[[package]] +name = "examples" +version = "0.1.0" +dependencies = [ + "arbitrary", + "bench-utils", + "debug-log", + "loro", + "tabled", +] + +[[package]] +name = "flate2" +version = "1.0.28" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "46303f565772937ffe1d394a4fac6f411c6013172fadde9dcdb1e147a086940e" +dependencies = [ + "crc32fast", + "miniz_oxide", +] + +[[package]] +name = "fnv" +version = "1.0.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3f9eec918d3f24069decb9af1554cad7c880e2da24a9afd88aca000531ab82c1" + +[[package]] +name = "fxhash" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "c31b6d751ae2c7f11320402d34e41349dd1016f8d5d45e48c4312bc8625af50c" +dependencies = [ + "byteorder", +] + +[[package]] +name = "generic-btree" +version = "0.8.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3563af09b8ffaa7cc5541d9e167b90c429d89791e88887739680e1fcb4f69d01" +dependencies = [ + "fxhash", + "heapless", + "itertools", + "loro-thunderdome", + "proc-macro2", +] + +[[package]] +name = "getrandom" +version = "0.2.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "fe9006bed769170c11f845cf00c7c1e9092aeb3f268e007c3e760ac68008070f" +dependencies = [ + "cfg-if", + "libc", + "wasi", +] + +[[package]] +name = "hash32" +version = "0.2.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b0c35f58762feb77d74ebe43bdbc3210f09be9fe6742234d573bacc26ed92b67" +dependencies = [ + "byteorder", +] + +[[package]] +name = "heapless" +version = "0.7.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "cdc6457c0eb62c71aac4bc17216026d8410337c4126773b9c5daba343f17964f" +dependencies = [ + "atomic-polyfill", + "hash32", + "rustc_version", + "serde", + "spin", + "stable_deref_trait", +] + +[[package]] +name = "heck" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "95505c38b4572b2d910cecb0281560f54b440a19336cbbcb27bf6ce6adc6f5a8" + +[[package]] +name = "ident_case" +version = "1.0.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b9e0384b61958566e926dc50660321d12159025e767c18e043daf26b70104c39" + +[[package]] +name = "im" +version = "15.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "d0acd33ff0285af998aaf9b57342af478078f53492322fafc47450e09397e0e9" +dependencies = [ + "bitmaps", + "rand_core", + "rand_xoshiro", + "sized-chunks", + "typenum", + "version_check", +] + +[[package]] +name = "itertools" +version = "0.11.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b1c173a5686ce8bfa551b3563d0c2170bf24ca44da99c7ca4bfdab5418c3fe57" +dependencies = [ + "either", +] + +[[package]] +name = "itoa" +version = "1.0.10" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b1a46d1a171d865aa5f83f92695765caa047a9b4cbae2cbf37dbd613a793fd4c" + +[[package]] +name = "jobserver" +version = "0.1.27" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8c37f63953c4c63420ed5fd3d6d398c719489b9f872b9fa683262f8edd363c7d" +dependencies = [ + "libc", +] + +[[package]] +name = "leb128" +version = "0.2.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "884e2677b40cc8c339eaefcb701c32ef1fd2493d71118dc0ca4b6a736c93bd67" + +[[package]] +name = "libc" +version = "0.2.151" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "302d7ab3130588088d277783b1e2d2e10c9e9e4a16dd9050e6ec93fb3e7048f4" + +[[package]] +name = "libfuzzer-sys" +version = "0.4.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a96cfd5557eb82f2b83fed4955246c988d331975a002961b07c81584d107e7f7" +dependencies = [ + "arbitrary", + "cc", + "once_cell", +] + +[[package]] +name = "lock_api" +version = "0.4.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3c168f8615b12bc01f9c17e2eb0cc07dcae1940121185446edc3744920e8ef45" +dependencies = [ + "autocfg", + "scopeguard", +] + +[[package]] +name = "loro" +version = "0.2.2" +dependencies = [ + "either", + "enum-as-inner 0.6.0", + "loro-internal", +] + +[[package]] +name = "loro-common" +version = "0.1.0" +dependencies = [ + "arbitrary", + "enum-as-inner 0.6.0", + "fxhash", + "loro-rle", + "serde", + "serde_columnar", + "string_cache", + "thiserror", +] + +[[package]] +name = "loro-internal" +version = "0.2.2" +dependencies = [ + "append-only-bytes", + "arref", + "debug-log", + "enum-as-inner 0.5.1", + "enum_dispatch", + "fxhash", + "generic-btree", + "getrandom", + "im", + "itertools", + "leb128", + "loro-common", + "loro-preload", + "loro-rle", + "md5", + "num", + "num-derive", + "num-traits", + "once_cell", + "postcard", + "serde", + "serde_columnar", + "serde_json", + "smallvec", + "string_cache", + "thiserror", +] + +[[package]] +name = "loro-preload" +version = "0.1.0" +dependencies = [ + "bytes", + "loro-common", + "serde", + "serde_columnar", +] + +[[package]] +name = "loro-rle" +version = "0.1.0" +dependencies = [ + "append-only-bytes", + "arref", + "debug-log", + "enum-as-inner 0.6.0", + "fxhash", + "num", + "smallvec", +] + +[[package]] +name = "loro-thunderdome" +version = "0.6.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3f3d053a135388e6b1df14e8af1212af5064746e9b87a06a345a7a779ee9695a" + +[[package]] +name = "md5" +version = "0.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771" + +[[package]] +name = "miniz_oxide" +version = "0.7.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e7810e0be55b428ada41041c41f32c9f1a42817901b4ccf45fa3d4b6561e74c7" +dependencies = [ + "adler", +] + +[[package]] +name = "new_debug_unreachable" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e4a24736216ec316047a1fc4252e27dabb04218aa4a3f37c6e7ddbf1f9782b54" + +[[package]] +name = "num" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b05180d69e3da0e530ba2a1dae5110317e49e3b7f3d41be227dc5f92e49ee7af" +dependencies = [ + "num-bigint", + "num-complex", + "num-integer", + "num-iter", + "num-rational", + "num-traits", +] + +[[package]] +name = "num-bigint" +version = "0.4.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "608e7659b5c3d7cba262d894801b9ec9d00de989e8a82bd4bef91d08da45cdc0" +dependencies = [ + "autocfg", + "num-integer", + "num-traits", +] + +[[package]] +name = "num-complex" +version = "0.4.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "1ba157ca0885411de85d6ca030ba7e2a83a28636056c7c699b07c8b6f7383214" +dependencies = [ + "num-traits", +] + +[[package]] +name = "num-derive" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "876a53fff98e03a936a674b29568b0e605f06b29372c2489ff4de23f1949743d" +dependencies = [ + "proc-macro2", + "quote", + "syn 1.0.109", +] + +[[package]] +name = "num-integer" +version = "0.1.45" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "225d3389fb3509a24c93f5c29eb6bde2586b98d9f016636dff58d7c6f7569cd9" +dependencies = [ + "autocfg", + "num-traits", +] + +[[package]] +name = "num-iter" +version = "0.1.43" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "7d03e6c028c5dc5cac6e2dec0efda81fc887605bb3d884578bb6d6bf7514e252" +dependencies = [ + "autocfg", + "num-integer", + "num-traits", +] + +[[package]] +name = "num-rational" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0638a1c9d0a3c0914158145bc76cff373a75a627e6ecbfb71cbe6f453a5a19b0" +dependencies = [ + "autocfg", + "num-bigint", + "num-integer", + "num-traits", +] + +[[package]] +name = "num-traits" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "39e3200413f237f41ab11ad6d161bc7239c84dcb631773ccd7de3dfe4b5c267c" +dependencies = [ + "autocfg", +] + +[[package]] +name = "once_cell" +version = "1.19.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3fdb12b2476b595f9358c5161aa467c2438859caa136dec86c26fdd2efe17b92" + +[[package]] +name = "papergrid" +version = "0.11.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ad43c07024ef767f9160710b3a6773976194758c7919b17e63b863db0bdf7fb" +dependencies = [ + "bytecount", + "fnv", + "unicode-width", +] + +[[package]] +name = "parking_lot" +version = "0.12.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3742b2c103b9f06bc9fff0a37ff4912935851bee6d36f3c02bcc755bcfec228f" +dependencies = [ + "lock_api", + "parking_lot_core", +] + +[[package]] +name = "parking_lot_core" +version = "0.9.9" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4c42a9226546d68acdd9c0a280d17ce19bfe27a46bf68784e4066115788d008e" +dependencies = [ + "cfg-if", + "libc", + "redox_syscall", + "smallvec", + "windows-targets", +] + +[[package]] +name = "phf_shared" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "b6796ad771acdc0123d2a88dc428b5e38ef24456743ddb1744ed628f9815c096" +dependencies = [ + "siphasher", +] + +[[package]] +name = "postcard" +version = "1.0.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a55c51ee6c0db07e68448e336cf8ea4131a620edefebf9893e759b2d793420f8" +dependencies = [ + "cobs", + "embedded-io", + "heapless", + "serde", +] + +[[package]] +name = "ppv-lite86" +version = "0.2.17" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5b40af805b3121feab8a3c29f04d8ad262fa8e0561883e7653e024ae4479e6de" + +[[package]] +name = "precomputed-hash" +version = "0.1.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "925383efa346730478fb4838dbe9137d2a47675ad789c546d150a6e1dd4ab31c" + +[[package]] +name = "proc-macro-error" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "da25490ff9892aab3fcf7c36f08cfb902dd3e71ca0f9f9517bea02a73a5ce38c" +dependencies = [ + "proc-macro-error-attr", + "proc-macro2", + "quote", + "syn 1.0.109", + "version_check", +] + +[[package]] +name = "proc-macro-error-attr" +version = "1.0.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a1be40180e52ecc98ad80b184934baf3d0d29f979574e439af5a55274b35f869" +dependencies = [ + "proc-macro2", + "quote", + "version_check", +] + +[[package]] +name = "proc-macro2" +version = "1.0.71" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "75cb1540fadbd5b8fbccc4dddad2734eba435053f725621c070711a14bb5f4b8" +dependencies = [ + "unicode-ident", +] + +[[package]] +name = "quote" +version = "1.0.33" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "5267fca4496028628a95160fc423a33e8b2e6af8a5302579e322e4b520293cae" +dependencies = [ + "proc-macro2", +] + +[[package]] +name = "rand" +version = "0.8.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "34af8d1a0e25924bc5b7c43c079c942339d8f0a8b57c39049bef581b46327404" +dependencies = [ + "libc", + "rand_chacha", + "rand_core", +] + +[[package]] +name = "rand_chacha" +version = "0.3.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88" +dependencies = [ + "ppv-lite86", + "rand_core", +] + +[[package]] +name = "rand_core" +version = "0.6.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ec0be4795e2f6a28069bec0b5ff3e2ac9bafc99e6a9a7dc3547996c5c816922c" +dependencies = [ + "getrandom", +] + +[[package]] +name = "rand_xoshiro" +version = "0.6.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6f97cdb2a36ed4183de61b2f824cc45c9f1037f28afe0a322e9fff4c108b5aaa" +dependencies = [ + "rand_core", +] + +[[package]] +name = "redox_syscall" +version = "0.4.1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4722d768eff46b75989dd134e5c353f0d6296e5aaa3132e776cbdb56be7731aa" +dependencies = [ + "bitflags", +] + +[[package]] +name = "rustc_version" +version = "0.4.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bfa0f585226d2e68097d4f95d113b15b83a82e819ab25717ec0590d9584ef366" +dependencies = [ + "semver", +] + +[[package]] +name = "ryu" +version = "1.0.16" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f98d2aa92eebf49b69786be48e4477826b256916e84a57ff2a4f21923b48eb4c" + +[[package]] +name = "scopeguard" +version = "1.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "94143f37725109f92c262ed2cf5e59bce7498c01bcc1502d7b9afe439a4e9f49" + +[[package]] +name = "semver" +version = "1.0.20" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "836fa6a3e1e547f9a2c4040802ec865b5d85f4014efe00555d7090a3dcaa1090" + +[[package]] +name = "serde" +version = "1.0.193" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "25dd9975e68d0cb5aa1120c288333fc98731bd1dd12f561e468ea4728c042b89" +dependencies = [ + "serde_derive", +] + +[[package]] +name = "serde_columnar" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a41a9a14c8a221abb13091da4d1075699999e6a12213283c452680a70376efd0" +dependencies = [ + "itertools", + "postcard", + "serde", + "serde_columnar_derive", + "thiserror", +] + +[[package]] +name = "serde_columnar_derive" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a0f77bad2a9b92970e7e1f8004fac293328ac9a05f92f751ae293644d764ede4" +dependencies = [ + "darling", + "proc-macro2", + "quote", + "syn 2.0.43", +] + +[[package]] +name = "serde_derive" +version = "1.0.193" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "43576ca501357b9b071ac53cdc7da8ef0cbd9493d8df094cd821777ea6e894d3" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.43", +] + +[[package]] +name = "serde_json" +version = "1.0.108" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3d1c7e3eac408d115102c4c24ad393e0821bb3a5df4d506a80f85f7a742a526b" +dependencies = [ + "itoa", + "ryu", + "serde", +] + +[[package]] +name = "siphasher" +version = "0.3.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "38b58827f4464d87d377d175e90bf58eb00fd8716ff0a62f80356b5e61555d0d" + +[[package]] +name = "sized-chunks" +version = "0.6.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "16d69225bde7a69b235da73377861095455d298f2b970996eec25ddbb42b3d1e" +dependencies = [ + "bitmaps", + "typenum", +] + +[[package]] +name = "smallvec" +version = "1.11.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4dccd0940a2dcdf68d092b8cbab7dc0ad8fa938bf95787e1b916b0e3d0e8e970" +dependencies = [ + "serde", +] + +[[package]] +name = "spin" +version = "0.9.8" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "6980e8d7511241f8acf4aebddbb1ff938df5eebe98691418c4468d0b72a96a67" +dependencies = [ + "lock_api", +] + +[[package]] +name = "stable_deref_trait" +version = "1.2.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a8f112729512f8e442d81f95a8a7ddf2b7c6b8a1a6f509a95864142b30cab2d3" + +[[package]] +name = "string_cache" +version = "0.8.7" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "f91138e76242f575eb1d3b38b4f1362f10d3a43f47d182a5b359af488a02293b" +dependencies = [ + "new_debug_unreachable", + "once_cell", + "parking_lot", + "phf_shared", + "precomputed-hash", + "serde", +] + +[[package]] +name = "strsim" +version = "0.10.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "73473c0e59e6d5812c5dfe2a064a6444949f089e20eec9a2e5506596494e4623" + +[[package]] +name = "syn" +version = "1.0.109" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "72b64191b275b66ffe2469e8af2c1cfe3bafa67b529ead792a6d0160888b4237" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "syn" +version = "2.0.43" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ee659fb5f3d355364e1f3e5bc10fb82068efbf824a1e9d1c9504244a6469ad53" +dependencies = [ + "proc-macro2", + "quote", + "unicode-ident", +] + +[[package]] +name = "tabled" +version = "0.15.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4c998b0c8b921495196a48aabaf1901ff28be0760136e31604f7967b0792050e" +dependencies = [ + "papergrid", + "tabled_derive", + "unicode-width", +] + +[[package]] +name = "tabled_derive" +version = "0.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "4c138f99377e5d653a371cdad263615634cfc8467685dfe8e73e2b8e98f44b17" +dependencies = [ + "heck", + "proc-macro-error", + "proc-macro2", + "quote", + "syn 1.0.109", +] + +[[package]] +name = "thiserror" +version = "1.0.52" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "83a48fd946b02c0a526b2e9481c8e2a17755e47039164a86c4070446e3a4614d" +dependencies = [ + "thiserror-impl", +] + +[[package]] +name = "thiserror-impl" +version = "1.0.52" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e7fbe9b594d6568a6a1443250a7e67d80b74e1e96f6d1715e1e21cc1888291d3" +dependencies = [ + "proc-macro2", + "quote", + "syn 2.0.43", +] + +[[package]] +name = "typenum" +version = "1.17.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "42ff0bf0c66b8238c6f3b578df37d0b7848e55df8577b3f74f92a69acceeb825" + +[[package]] +name = "unicode-ident" +version = "1.0.12" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3354b9ac3fae1ff6755cb6db53683adb661634f67557942dea4facebec0fee4b" + +[[package]] +name = "unicode-width" +version = "0.1.11" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e51733f11c9c4f72aa0c160008246859e340b00807569a0da0e7a1079b27ba85" + +[[package]] +name = "version_check" +version = "0.9.4" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "49874b5167b65d7193b8aba1567f5c7d93d001cafc34600cee003eda787e483f" + +[[package]] +name = "wasi" +version = "0.11.0+wasi-snapshot-preview1" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9c8d87e72b64a3b4db28d11ce29237c246188f4f51057d65a7eab63b7987e423" + +[[package]] +name = "windows-targets" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9a2fa6e2155d7247be68c096456083145c183cbbbc2764150dda45a87197940c" +dependencies = [ + "windows_aarch64_gnullvm", + "windows_aarch64_msvc", + "windows_i686_gnu", + "windows_i686_msvc", + "windows_x86_64_gnu", + "windows_x86_64_gnullvm", + "windows_x86_64_msvc", +] + +[[package]] +name = "windows_aarch64_gnullvm" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "2b38e32f0abccf9987a4e3079dfb67dcd799fb61361e53e2882c3cbaf0d905d8" + +[[package]] +name = "windows_aarch64_msvc" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "dc35310971f3b2dbbf3f0690a219f40e2d9afcf64f9ab7cc1be722937c26b4bc" + +[[package]] +name = "windows_i686_gnu" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "a75915e7def60c94dcef72200b9a8e58e5091744960da64ec734a6c6e9b3743e" + +[[package]] +name = "windows_i686_msvc" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "8f55c233f70c4b27f66c523580f78f1004e8b5a8b659e05a4eb49d4166cca406" + +[[package]] +name = "windows_x86_64_gnu" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "53d40abd2583d23e4718fddf1ebec84dbff8381c07cae67ff7768bbf19c6718e" + +[[package]] +name = "windows_x86_64_gnullvm" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "0b7b52767868a23d5bab768e390dc5f5c55825b6d30b86c844ff2dc7414044cc" + +[[package]] +name = "windows_x86_64_msvc" +version = "0.48.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "ed94fce61571a4006852b7389a063ab983c02eb1bb37b47f8272ce92d06d9538" diff --git a/crates/examples/fuzz/Cargo.toml b/crates/examples/fuzz/Cargo.toml new file mode 100644 index 00000000..036593c8 --- /dev/null +++ b/crates/examples/fuzz/Cargo.toml @@ -0,0 +1,36 @@ +[package] +name = "benches-fuzz" +version = "0.0.0" +publish = false +edition = "2021" + +[package.metadata] +cargo-fuzz = true + +[dependencies] +libfuzzer-sys = "0.4" + +[dependencies.examples] +path = ".." + +[dependencies.bench-utils] +path = "../../bench-utils" + +# Prevent this from interfering with workspaces +[workspace] +members = ["."] + +[profile.release] +debug = 1 + +[[bin]] +name = "draw" +path = "fuzz_targets/draw.rs" +test = false +doc = false + +[[bin]] +name = "json" +path = "fuzz_targets/json.rs" +test = false +doc = false diff --git a/crates/examples/fuzz/fuzz_targets/draw.rs b/crates/examples/fuzz/fuzz_targets/draw.rs new file mode 100644 index 00000000..15a414d5 --- /dev/null +++ b/crates/examples/fuzz/fuzz_targets/draw.rs @@ -0,0 +1,8 @@ +#![no_main] + +use bench_utils::{draw::DrawAction, Action}; +use libfuzzer_sys::fuzz_target; + +fuzz_target!(|actions: Vec>| { + examples::draw::run_actions_fuzz_in_async_mode(5, 100, &actions) +}); diff --git a/crates/examples/fuzz/fuzz_targets/json.rs b/crates/examples/fuzz/fuzz_targets/json.rs new file mode 100644 index 00000000..2b8557ca --- /dev/null +++ b/crates/examples/fuzz/fuzz_targets/json.rs @@ -0,0 +1,9 @@ +#![no_main] + +use bench_utils::{json::JsonAction, Action}; +use examples::json::fuzz; +use libfuzzer_sys::fuzz_target; + +fuzz_target!(|data: Vec>| { + fuzz(5, &data); +}); diff --git a/crates/benches/src/draw.rs b/crates/examples/src/draw.rs similarity index 64% rename from crates/benches/src/draw.rs rename to crates/examples/src/draw.rs index b42b47ab..f4604226 100644 --- a/crates/benches/src/draw.rs +++ b/crates/examples/src/draw.rs @@ -1,8 +1,10 @@ -use std::{collections::HashMap, time::Instant}; +use std::collections::HashMap; -use bench_utils::{create_seed, draw::DrawAction, gen_async_actions, gen_realtime_actions, Action}; +use bench_utils::{draw::DrawAction, Action}; use loro::{ContainerID, ContainerType}; +use crate::{run_actions_fuzz_in_async_mode, ActorTrait}; + pub struct DrawActor { pub doc: loro::LoroDoc, paths: loro::LoroList, @@ -27,8 +29,16 @@ impl DrawActor { id_to_obj, } } +} - pub fn apply_action(&mut self, action: &mut DrawAction) { +impl ActorTrait for DrawActor { + type ActionKind = DrawAction; + + fn create(peer_id: u64) -> Self { + Self::new(peer_id) + } + + fn apply_action(&mut self, action: &mut Self::ActionKind) { match action { DrawAction::CreatePath { points } => { let path = self.paths.insert_container(0, ContainerType::Map).unwrap(); @@ -55,7 +65,7 @@ impl DrawActor { map.insert("y", p.y).unwrap(); } let len = self.id_to_obj.len(); - self.id_to_obj.insert(len, path.id()); + self.id_to_obj.insert(len, path_map.id()); } DrawAction::Text { text, pos, size } => { let text_container = self @@ -119,82 +129,21 @@ impl DrawActor { let pos_map = map.get("pos").unwrap().unwrap_right().into_map().unwrap(); let x = pos_map.get("x").unwrap().unwrap_left().into_i32().unwrap(); let y = pos_map.get("y").unwrap().unwrap_left().into_i32().unwrap(); - pos_map.insert("x", x + relative_to.x).unwrap(); - pos_map.insert("y", y + relative_to.y).unwrap(); + pos_map + .insert("x", x.overflowing_add(relative_to.x).0) + .unwrap(); + pos_map + .insert("y", y.overflowing_add(relative_to.y).0) + .unwrap(); } } } -} -pub struct DrawActors { - pub docs: Vec, -} - -impl DrawActors { - pub fn new(size: usize) -> Self { - let docs = (0..size).map(|i| DrawActor::new(i as u64)).collect(); - Self { docs } - } - - pub fn apply_action(&mut self, action: &mut Action) { - match action { - Action::Action { peer, action } => { - self.docs[*peer].apply_action(action); - } - Action::Sync { from, to } => { - let vv = self.docs[*from].doc.oplog_vv(); - let data = self.docs[*from].doc.export_from(&vv); - self.docs[*to].doc.import(&data).unwrap(); - } - Action::SyncAll => self.sync_all(), - } - } - - pub fn sync_all(&mut self) { - let (first, rest) = self.docs.split_at_mut(1); - for doc in rest.iter_mut() { - let vv = first[0].doc.oplog_vv(); - first[0].doc.import(&doc.doc.export_from(&vv)).unwrap(); - } - for doc in rest.iter_mut() { - let vv = doc.doc.oplog_vv(); - doc.doc.import(&first[0].doc.export_from(&vv)).unwrap(); - } + fn doc(&self) -> &loro::LoroDoc { + &self.doc } } -pub fn run_async_draw_workflow( - peer_num: usize, - action_num: usize, - actions_before_sync: usize, - seed: u64, -) -> (DrawActors, Instant) { - let seed = create_seed(seed, action_num * 32); - let mut actions = - gen_async_actions::(action_num, peer_num, &seed, actions_before_sync, |_| {}) - .unwrap(); - let mut actors = DrawActors::new(peer_num); - let start = Instant::now(); - for action in actions.iter_mut() { - actors.apply_action(action); - } - - (actors, start) -} - -pub fn run_realtime_collab_draw_workflow( - peer_num: usize, - action_num: usize, - seed: u64, -) -> (DrawActors, Instant) { - let seed = create_seed(seed, action_num * 32); - let mut actions = - gen_realtime_actions::(action_num, peer_num, &seed, |_| {}).unwrap(); - let mut actors = DrawActors::new(peer_num); - let start = Instant::now(); - for action in actions.iter_mut() { - actors.apply_action(action); - } - - (actors, start) +pub fn fuzz(peer_num: usize, sync_all_interval: usize, actions: &[Action]) { + run_actions_fuzz_in_async_mode::(peer_num, sync_all_interval, actions); } diff --git a/crates/examples/src/json.rs b/crates/examples/src/json.rs new file mode 100644 index 00000000..5460185e --- /dev/null +++ b/crates/examples/src/json.rs @@ -0,0 +1,74 @@ +use bench_utils::{json::JsonAction, Action}; +use loro::LoroDoc; + +use crate::{minify_failed_tests_in_async_mode, run_actions_fuzz_in_async_mode, ActorTrait}; + +pub struct JsonActor { + doc: LoroDoc, + list: loro::LoroList, + map: loro::LoroMap, + text: loro::LoroText, +} + +impl ActorTrait for JsonActor { + type ActionKind = JsonAction; + + fn create(peer_id: u64) -> Self { + let doc = LoroDoc::new(); + doc.set_peer_id(peer_id).unwrap(); + let list = doc.get_list("list"); + let map = doc.get_map("map"); + let text = doc.get_text("text"); + Self { + doc, + list, + map, + text, + } + } + + fn apply_action(&mut self, action: &mut Self::ActionKind) { + match action { + JsonAction::InsertMap { key, value } => { + self.map.insert(key, value.clone()).unwrap(); + } + JsonAction::InsertList { index, value } => { + *index %= self.list.len() + 1; + self.list.insert(*index, value.clone()).unwrap(); + } + JsonAction::DeleteList { index } => { + if self.list.is_empty() { + return; + } + + *index %= self.list.len(); + self.list.delete(*index, 1).unwrap(); + } + JsonAction::InsertText { index, s } => { + *index %= self.text.len_unicode() + 1; + self.text.insert(*index, s).unwrap(); + } + JsonAction::DeleteText { index, len } => { + if self.text.is_empty() { + return; + } + + *index %= self.text.len_unicode(); + *len %= self.text.len_unicode() - *index; + self.text.delete(*index, *len).unwrap(); + } + } + } + + fn doc(&self) -> &loro::LoroDoc { + &self.doc + } +} + +pub fn fuzz(peer_num: usize, inputs: &[Action]) { + run_actions_fuzz_in_async_mode::(peer_num, 20, inputs); +} + +pub fn minify(peer_num: usize, inputs: &[Action]) { + minify_failed_tests_in_async_mode::(peer_num, 20, inputs); +} diff --git a/crates/examples/src/lib.rs b/crates/examples/src/lib.rs new file mode 100644 index 00000000..4b1684da --- /dev/null +++ b/crates/examples/src/lib.rs @@ -0,0 +1,235 @@ +use std::{ + collections::VecDeque, + sync::{atomic::AtomicUsize, Arc, Mutex}, + time::Instant, +}; + +use bench_utils::{ + create_seed, gen_async_actions, gen_realtime_actions, make_actions_async, Action, ActionTrait, +}; + +pub mod draw; +pub mod json; +pub mod sheet; +pub mod test_preload { + pub use bench_utils::json::JsonAction::*; + pub use bench_utils::json::LoroValue::*; + pub use bench_utils::Action::*; + pub use bench_utils::SyncKind::*; +} + +pub trait ActorTrait { + type ActionKind: ActionTrait; + fn create(peer_id: u64) -> Self; + fn apply_action(&mut self, action: &mut Self::ActionKind); + fn doc(&self) -> &loro::LoroDoc; +} + +pub struct ActorGroup { + pub docs: Vec, +} + +impl ActorGroup { + pub fn new(size: usize) -> Self { + let docs = (0..size).map(|i| T::create(i as u64)).collect(); + Self { docs } + } + + pub fn apply_action(&mut self, action: &mut Action) { + match action { + Action::Action { peer, action } => { + self.docs[*peer].apply_action(action); + } + Action::Sync { from, to, kind } => match kind { + bench_utils::SyncKind::Fit => { + let vv = self.docs[*to].doc().oplog_vv(); + let data = self.docs[*from].doc().export_from(&vv); + self.docs[*to].doc().import(&data).unwrap(); + } + bench_utils::SyncKind::Snapshot => { + let data = self.docs[*from].doc().export_snapshot(); + self.docs[*to].doc().import(&data).unwrap(); + } + bench_utils::SyncKind::OnlyLastOpFromEachPeer => { + let mut vv = self.docs[*from].doc().oplog_vv(); + for cnt in vv.values_mut() { + *cnt -= 1; + } + let data = self.docs[*from].doc().export_from(&vv); + self.docs[*to].doc().import(&data).unwrap(); + } + }, + Action::SyncAll => self.sync_all(), + } + } + + pub fn sync_all(&mut self) { + debug_log::group!("SyncAll"); + let (first, rest) = self.docs.split_at_mut(1); + for doc in rest.iter_mut() { + debug_log::group!("Importing to doc0"); + let vv = first[0].doc().oplog_vv(); + first[0].doc().import(&doc.doc().export_from(&vv)).unwrap(); + debug_log::group_end!(); + } + for (i, doc) in rest.iter_mut().enumerate() { + debug_log::group!("Importing to doc{}", i + 1); + let vv = doc.doc().oplog_vv(); + doc.doc().import(&first[0].doc().export_from(&vv)).unwrap(); + debug_log::group_end!(); + } + debug_log::group_end!(); + } + + pub fn check_sync(&self) { + debug_log::group!("Check sync"); + let first = &self.docs[0]; + let content = first.doc().get_deep_value(); + for doc in self.docs.iter().skip(1) { + assert_eq!(content, doc.doc().get_deep_value()); + } + debug_log::group_end!(); + } +} + +pub fn run_async_workflow( + peer_num: usize, + action_num: usize, + actions_before_sync: usize, + seed: u64, +) -> (ActorGroup, Instant) +where + for<'a> T::ActionKind: arbitrary::Arbitrary<'a>, +{ + let seed = create_seed(seed, action_num * 32); + let mut actions = gen_async_actions::( + action_num, + peer_num, + &seed, + actions_before_sync, + |_| {}, + ) + .unwrap(); + let mut actors = ActorGroup::::new(peer_num); + let start = Instant::now(); + for action in actions.iter_mut() { + actors.apply_action(action); + } + + (actors, start) +} + +pub fn run_realtime_collab_workflow( + peer_num: usize, + action_num: usize, + seed: u64, +) -> (ActorGroup, Instant) +where + for<'a> T::ActionKind: arbitrary::Arbitrary<'a>, +{ + let seed = create_seed(seed, action_num * 32); + let mut actions = + gen_realtime_actions::(action_num, peer_num, &seed, |_| {}).unwrap(); + let mut actors = ActorGroup::::new(peer_num); + let start = Instant::now(); + for action in actions.iter_mut() { + actors.apply_action(action); + } + + (actors, start) +} + +pub fn run_actions_fuzz_in_async_mode( + peer_num: usize, + sync_all_interval: usize, + actions: &[Action], +) { + let mut actions = make_actions_async::(peer_num, actions, sync_all_interval); + let mut actors = ActorGroup::::new(peer_num); + for action in actions.iter_mut() { + debug_log::group!("[ApplyAction] {:?}", &action); + actors.apply_action(action); + debug_log::group_end!(); + } + actors.sync_all(); + actors.check_sync(); +} + +pub fn minify_failed_tests_in_async_mode( + peer_num: usize, + sync_all_interval: usize, + actions: &[Action], +) { + let hook = std::panic::take_hook(); + std::panic::set_hook(Box::new(|_info| { + // ignore panic output + // println!("{:?}", _info); + })); + + let actions = make_actions_async::(peer_num, actions, sync_all_interval); + let mut stack: VecDeque>> = VecDeque::new(); + stack.push_back(actions); + let mut last_log = Instant::now(); + let mut min_actions: Option>> = None; + while let Some(actions) = stack.pop_back() { + let actions = Arc::new(Mutex::new(actions)); + let actions_clone = Arc::clone(&actions); + let num = Arc::new(AtomicUsize::new(0)); + let num_clone = Arc::clone(&num); + let result = std::panic::catch_unwind(move || { + let mut actors = ActorGroup::::new(peer_num); + for action in actions_clone.lock().unwrap().iter_mut() { + actors.apply_action(action); + num_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst); + } + actors.sync_all(); + actors.check_sync(); + }); + + if result.is_ok() { + continue; + } + + let num = num.load(std::sync::atomic::Ordering::SeqCst); + let mut actions = match actions.lock() { + Ok(a) => a, + Err(a) => a.into_inner(), + }; + actions.drain(num..); + if let Some(min_actions) = min_actions.as_mut() { + if actions.len() < min_actions.len() { + *min_actions = actions.clone(); + } + } else { + min_actions = Some(actions.clone()); + } + + for i in 0..actions.len() { + let mut new_actions = actions.clone(); + new_actions.remove(i); + stack.push_back(new_actions); + } + + while stack.len() > 100 { + stack.pop_front(); + } + + if last_log.elapsed().as_secs() > 1 { + println!( + "stack size: {}. Min action size {:?}", + stack.len(), + min_actions.as_ref().map(|x| x.len()) + ); + last_log = Instant::now(); + } + } + + if let Some(minimal_failed_actions) = min_actions { + println!("Min action size {:?}", minimal_failed_actions.len()); + println!("{:#?}", minimal_failed_actions); + std::panic::set_hook(hook); + run_actions_fuzz_in_async_mode::(peer_num, sync_all_interval, &minimal_failed_actions); + } else { + println!("No failed tests found"); + } +} diff --git a/crates/benches/src/sheet.rs b/crates/examples/src/sheet.rs similarity index 64% rename from crates/benches/src/sheet.rs rename to crates/examples/src/sheet.rs index e3b9cd5f..f1f8ce16 100644 --- a/crates/benches/src/sheet.rs +++ b/crates/examples/src/sheet.rs @@ -1,19 +1,11 @@ -use loro::{LoroDoc, LoroList, LoroMap}; - -pub struct Actor { - pub doc: LoroDoc, - cols: LoroList, - rows: LoroList, -} - -impl Actor {} +use loro::LoroDoc; pub fn init_sheet() -> LoroDoc { let doc = LoroDoc::new(); doc.set_peer_id(0).unwrap(); let cols = doc.get_list("cols"); let rows = doc.get_list("rows"); - for i in 0..bench_utils::sheet::SheetAction::MAX_ROW { + for _ in 0..bench_utils::sheet::SheetAction::MAX_ROW { rows.push_container(loro::ContainerType::Map).unwrap(); } diff --git a/crates/examples/tests/failed_tests.rs b/crates/examples/tests/failed_tests.rs new file mode 100644 index 00000000..37284ef1 --- /dev/null +++ b/crates/examples/tests/failed_tests.rs @@ -0,0 +1,492 @@ +use examples::json::fuzz; +use loro::loro_value; + +#[ctor::ctor] +fn init_color_backtrace() { + color_backtrace::install(); +} + +#[test] +fn fuzz_json() { + use examples::test_preload::*; + fuzz( + 5, + &[ + Action { + peer: 5280832617179597129, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Action { + peer: 8174158055725953393, + action: DeleteList { index: 341 }, + }, + Sync { + from: 18446543177843820913, + to: 5280832620235194367, + kind: Fit, + }, + Action { + peer: 8174439530700032329, + action: DeleteList { index: 341 }, + }, + Sync { + from: 8174439528799404056, + to: 8174439530702664049, + kind: Snapshot, + }, + Action { + peer: 8174439043468105841, + action: DeleteList { index: 341 }, + }, + Action { + peer: 5280832617179597129, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Action { + peer: 5280832617179597129, + action: InsertList { + index: 341, + value: Bool(true), + }, + }, + Action { + peer: 8174439139858008393, + action: DeleteList { index: 341 }, + }, + Sync { + from: 8174439393263710577, + to: 7586675626393291081, + kind: Fit, + }, + Sync { + from: 8174439530702664049, + to: 8174439530702664049, + kind: Fit, + }, + Action { + peer: 5280832685899073865, + action: InsertList { + index: 351, + value: Bool(true), + }, + }, + Action { + peer: 5280832789652009216, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Sync { + from: 8174439358230251889, + to: 8174439530702664049, + kind: Snapshot, + }, + Action { + peer: 8174439530700032329, + action: DeleteList { index: 341 }, + }, + Sync { + from: 5280832617178745161, + to: 5280832617179597129, + kind: Snapshot, + }, + Action { + peer: 5280832616743389513, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Sync { + from: 5280832617853317489, + to: 8174439530702664049, + kind: Snapshot, + }, + Action { + peer: 5280832617179593801, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Action { + peer: 5280876597644708169, + action: DeleteList { index: 341 }, + }, + Sync { + from: 8174439530702664049, + to: 5280876770117120369, + kind: OnlyLastOpFromEachPeer, + }, + SyncAll, + Action { + peer: 8174439358230251849, + action: DeleteText { + index: 960, + len: 126, + }, + }, + Action { + peer: 18404522827202906441, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Action { + peer: 8174439530700032329, + action: DeleteList { index: 341 }, + }, + Sync { + from: 8174439528799404056, + to: 5292135769185546609, + kind: Fit, + }, + Action { + peer: 5280832617179596873, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Action { + peer: 5292135596713134409, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Sync { + from: 8174439526407696753, + to: 8174439530702664049, + kind: Snapshot, + }, + Action { + peer: 8174439498734632959, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Action { + peer: 5833687803971913, + action: InsertList { + index: 301, + value: Bool(true), + }, + }, + Sync { + from: 8174439530702664049, + to: 8174439530702664049, + kind: Fit, + }, + SyncAll, + SyncAll, + Action { + peer: 5280832617179597129, + action: DeleteList { index: 311 }, + }, + Sync { + from: 8174439530702664049, + to: 8174439530702664049, + kind: Fit, + }, + Sync { + from: 8163136378478610801, + to: 8174439139858008393, + kind: Snapshot, + }, + Sync { + from: 8174439530702664049, + to: 18395314732082491761, + kind: Snapshot, + }, + Action { + peer: 5280832617179597129, + action: InsertList { + index: 303, + value: Bool(true), + }, + }, + Sync { + from: 8174439530702664049, + to: 8174439530702664049, + kind: Fit, + }, + Action { + peer: 8174412969951185225, + action: InsertList { + index: 351, + value: Bool(true), + }, + }, + Sync { + from: 8174439530702664049, + to: 8163136378699346289, + kind: Snapshot, + }, + Sync { + from: 5280876770117120369, + to: 5280832617179597116, + kind: Fit, + }, + Action { + peer: 5280832634359466315, + action: DeleteList { index: 341 }, + }, + Sync { + from: 8174439530702664049, + to: 8174439358230251849, + kind: Snapshot, + }, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + Sync { + from: 16090538600105537827, + to: 936747706152398848, + kind: OnlyLastOpFromEachPeer, + }, + Action { + peer: 8174439530702653769, + action: DeleteList { index: 341 }, + }, + Sync { + from: 8174395377765151089, + to: 8174439530702664049, + kind: Snapshot, + }, + Action { + peer: 5280832617179597129, + action: InsertList { + index: 311, + value: Bool(true), + }, + }, + Action { + peer: 8174439530700032329, + action: DeleteList { index: 341 }, + }, + Sync { + from: 5277173443156078961, + to: 5280832617179597129, + kind: Fit, + }, + Action { + peer: 8174439358230513993, + action: InsertList { + index: 351, + value: Bool(true), + }, + }, + Sync { + from: 8174439530702664049, + to: 5280832617178745161, + kind: Fit, + }, + Action { + peer: 5280832617179728201, + action: DeleteList { index: 341 }, + }, + Sync { + from: 18446744073709515121, + to: 18446744073709551615, + kind: OnlyLastOpFromEachPeer, + }, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + SyncAll, + Action { + peer: 18446744073709551615, + action: DeleteText { + index: 960, + len: 126, + }, + }, + Sync { + from: 1412722910008930673, + to: 18380137171932733261, + kind: Snapshot, + }, + Action { + peer: 0, + action: InsertMap { + key: "".into(), + value: Null, + }, + }, + ], + ) +} + +#[test] +fn fuzz_json_1() { + use examples::test_preload::*; + let mut map = loro_value!({"": "test"}); + for _ in 0..64 { + map = loro_value!({"": map}); + } + + let mut list = loro_value!([map]); + for _ in 0..64 { + list = loro_value!([list, 9]); + } + + fuzz( + 5, + &[Action { + peer: 35184913762633, + action: InsertMap { + key: "\0IIIIIIIIIIIIIIIIIII\0\0".into(), + value: list, + }, + }], + ); +} + +#[test] +fn fuzz_json_2_decode_snapshot_that_activate_pending_changes() { + use examples::test_preload::*; + fuzz( + 5, + &[ + Action { + peer: 44971974514245632, + action: InsertText { + index: 228, + s: "0\0\0".into(), + }, + }, + SyncAll, + Action { + peer: 23939170762752, + action: InsertText { + index: 404, + s: "C\u{b}0\0\u{15555}".into(), + }, + }, + Sync { + from: 10778685752873424277, + to: 52870070483605, + kind: OnlyLastOpFromEachPeer, + }, + Action { + peer: 6128427715264512, + action: InsertMap { + key: "".into(), + value: "".into(), + }, + }, + Action { + peer: 10778685752873447424, + action: DeleteList { index: 368 }, + }, + Sync { + from: 10778685752873440661, + to: 10778685752873424277, + kind: OnlyLastOpFromEachPeer, + }, + Sync { + from: 10778685752873424277, + to: 18395315059780064661, + kind: OnlyLastOpFromEachPeer, + }, + SyncAll, + SyncAll, + Sync { + from: 445944668984725, + to: 256, + kind: Snapshot, + }, + Action { + peer: 562699868423424, + action: InsertText { + index: 228, + s: "\0\0".into(), + }, + }, + SyncAll, + Action { + peer: 0, + action: InsertMap { + key: "".into(), + value: Null, + }, + }, + ], + ) +} + +#[test] +fn fuzz_json_3_frontiers_were_wrong_after_importing_pending_changes() { + use examples::test_preload::*; + fuzz( + 5, + &[ + Action { + peer: 4, + action: InsertList { + index: 0, + value: Bool(true), + }, + }, + Action { + peer: 0, + action: InsertText { + index: 0, + s: "aś\0\u{6}\u{13}\0\0\0*0".into(), + }, + }, + Sync { + from: 0, + to: 4, + kind: Fit, + }, + Action { + peer: 4, + action: InsertList { + index: 1, + value: Bool(true), + }, + }, + Sync { + from: 4, + to: 1, + kind: OnlyLastOpFromEachPeer, + }, + Sync { + from: 0, + to: 1, + kind: Fit, + }, + Action { + peer: 1, + action: InsertList { + index: 2, + value: Bool(true), + }, + }, + ], + ) +} diff --git a/crates/loro-common/Cargo.toml b/crates/loro-common/Cargo.toml index 3fae248c..2cbdff9d 100644 --- a/crates/loro-common/Cargo.toml +++ b/crates/loro-common/Cargo.toml @@ -17,6 +17,7 @@ enum-as-inner = "0.6.0" string_cache = "0.8.7" arbitrary = { version = "1.3.0", features = ["derive"] } js-sys = { version = "0.3.60", optional = true } +serde_columnar = "0.3.3" [features] wasm = ["wasm-bindgen", "js-sys"] diff --git a/crates/loro-common/src/error.rs b/crates/loro-common/src/error.rs index 88ad6ff6..6843739c 100644 --- a/crates/loro-common/src/error.rs +++ b/crates/loro-common/src/error.rs @@ -1,3 +1,4 @@ +use serde_columnar::ColumnarError; use thiserror::Error; use crate::{PeerID, TreeID, ID}; @@ -12,6 +13,10 @@ pub enum LoroError { DecodeVersionVectorError, #[error("Decode error ({0})")] DecodeError(Box), + #[error("Checksum mismatch. The data is corrupted.")] + DecodeDataCorruptionError, + #[error("Encountered an incompatible Encoding version \"{0}\". Loro's encoding is backward compatible but not forward compatible. Please upgrade the version of Loro to support this version of the exported data.")] + IncompatibleFutureEncodingError(usize), #[error("Js error ({0})")] JsError(Box), #[error("Cannot get lock or the lock is poisoned")] @@ -23,9 +28,6 @@ pub enum LoroError { // TODO: more details transaction error #[error("Transaction error ({0})")] TransactionError(Box), - // TODO: - #[error("TempContainer cannot execute this function")] - TempContainerError, #[error("Index out of bound. The given pos is {pos}, but the length is {len}")] OutOfBound { pos: usize, len: usize }, #[error("Every op id should be unique. ID {id} has been used. You should use a new PeerID to edit the content. ")] @@ -36,14 +38,8 @@ pub enum LoroError { ArgErr(Box), #[error("Auto commit has not started. The doc is readonly when detached. You should ensure autocommit is on and the doc and the state is attached.")] AutoCommitNotStarted, - #[error("The doc is already dropped")] - DocDropError, - // #[error("the data for key `{0}` is not available")] - // Redaction(String), - // #[error("invalid header (expected {expected:?}, found {found:?})")] - // InvalidHeader { expected: String, found: String }, - // #[error("unknown data store error")] - // Unknown, + #[error("Unknown Error ({0})")] + Unknown(Box), } #[derive(Error, Debug)] @@ -78,3 +74,17 @@ pub mod wasm { } } } + +impl From for LoroError { + fn from(e: ColumnarError) -> Self { + match e { + ColumnarError::ColumnarDecodeError(_) + | ColumnarError::RleEncodeError(_) + | ColumnarError::RleDecodeError(_) + | ColumnarError::OverflowError => { + LoroError::DecodeError(format!("Failed to decode Columnar: {}", e).into_boxed_str()) + } + e => LoroError::Unknown(e.to_string().into_boxed_str()), + } + } +} diff --git a/crates/loro-common/src/id.rs b/crates/loro-common/src/id.rs index 936bd69b..99223f2f 100644 --- a/crates/loro-common/src/id.rs +++ b/crates/loro-common/src/id.rs @@ -72,8 +72,11 @@ impl From for ID { } impl ID { + /// The ID of the null object. This should be use rarely. + pub const NONE_ID: ID = ID::new(u64::MAX, 0); + #[inline] - pub fn new(peer: PeerID, counter: Counter) -> Self { + pub const fn new(peer: PeerID, counter: Counter) -> Self { ID { peer, counter } } diff --git a/crates/loro-common/src/lib.rs b/crates/loro-common/src/lib.rs index 873aee8e..d9b7a93c 100644 --- a/crates/loro-common/src/lib.rs +++ b/crates/loro-common/src/lib.rs @@ -6,12 +6,15 @@ use enum_as_inner::EnumAsInner; use serde::{Deserialize, Serialize}; mod error; mod id; +mod macros; mod span; mod value; pub use error::{LoroError, LoroResult, LoroTreeError}; +#[doc(hidden)] +pub use fxhash::FxHashMap; pub use span::*; -pub use value::LoroValue; +pub use value::{to_value, LoroValue}; /// Unique id for each peer. It's usually random pub type PeerID = u64; @@ -93,6 +96,18 @@ impl ContainerType { _ => unreachable!(), } } + + pub fn try_from_u8(v: u8) -> LoroResult { + match v { + 1 => Ok(ContainerType::Map), + 2 => Ok(ContainerType::List), + 3 => Ok(ContainerType::Text), + 4 => Ok(ContainerType::Tree), + _ => Err(LoroError::DecodeError( + format!("Unknown container type {v}").into_boxed_str(), + )), + } + } } pub type IdSpanVector = fxhash::FxHashMap; @@ -261,6 +276,11 @@ pub struct TreeID { } impl TreeID { + #[inline(always)] + pub fn new(peer: PeerID, counter: Counter) -> Self { + Self { peer, counter } + } + /// return [`DELETED_TREE_ROOT`] pub const fn delete_root() -> Option { DELETED_TREE_ROOT diff --git a/crates/loro-common/src/macros.rs b/crates/loro-common/src/macros.rs new file mode 100644 index 00000000..d6d4d486 --- /dev/null +++ b/crates/loro-common/src/macros.rs @@ -0,0 +1,290 @@ +#[macro_export(local_inner_macros)] +macro_rules! loro_value { + // Hide distracting implementation details from the generated rustdoc. + ($($json:tt)+) => { + value_internal!($($json)+) + }; +} + +// Rocket relies on this because they export their own `json!` with a different +// doc comment than ours, and various Rust bugs prevent them from calling our +// `json!` from their `json!` so they call `value_internal!` directly. Check with +// @SergioBenitez before making breaking changes to this macro. +// +// Changes are fine as long as `value_internal!` does not call any new helper +// macros and can still be invoked as `value_internal!($($json)+)`. +#[macro_export(local_inner_macros)] +#[doc(hidden)] +macro_rules! value_internal { + ////////////////////////////////////////////////////////////////////////// + // TT muncher for parsing the inside of an array [...]. Produces a vec![...] + // of the elements. + // + // Must be invoked as: value_internal!(@array [] $($tt)*) + ////////////////////////////////////////////////////////////////////////// + + // Done with trailing comma. + (@array [$($elems:expr,)*]) => { + json_internal_vec![$($elems,)*] + }; + + // Done without trailing comma. + (@array [$($elems:expr),*]) => { + json_internal_vec![$($elems),*] + }; + + // Next element is `null`. + (@array [$($elems:expr,)*] null $($rest:tt)*) => { + value_internal!(@array [$($elems,)* value_internal!(null)] $($rest)*) + }; + + // Next element is `true`. + (@array [$($elems:expr,)*] true $($rest:tt)*) => { + value_internal!(@array [$($elems,)* value_internal!(true)] $($rest)*) + }; + + // Next element is `false`. + (@array [$($elems:expr,)*] false $($rest:tt)*) => { + value_internal!(@array [$($elems,)* value_internal!(false)] $($rest)*) + }; + + // Next element is an array. + (@array [$($elems:expr,)*] [$($array:tt)*] $($rest:tt)*) => { + value_internal!(@array [$($elems,)* value_internal!([$($array)*])] $($rest)*) + }; + + // Next element is a map. + (@array [$($elems:expr,)*] {$($map:tt)*} $($rest:tt)*) => { + value_internal!(@array [$($elems,)* value_internal!({$($map)*})] $($rest)*) + }; + + // Next element is an expression followed by comma. + (@array [$($elems:expr,)*] $next:expr, $($rest:tt)*) => { + value_internal!(@array [$($elems,)* value_internal!($next),] $($rest)*) + }; + + // Last element is an expression with no trailing comma. + (@array [$($elems:expr,)*] $last:expr) => { + value_internal!(@array [$($elems,)* value_internal!($last)]) + }; + + // Comma after the most recent element. + (@array [$($elems:expr),*] , $($rest:tt)*) => { + value_internal!(@array [$($elems,)*] $($rest)*) + }; + + // Unexpected token after most recent element. + (@array [$($elems:expr),*] $unexpected:tt $($rest:tt)*) => { + json_unexpected!($unexpected) + }; + + ////////////////////////////////////////////////////////////////////////// + // TT muncher for parsing the inside of an object {...}. Each entry is + // inserted into the given map variable. + // + // Must be invoked as: value_internal!(@object $map () ($($tt)*) ($($tt)*)) + // + // We require two copies of the input tokens so that we can match on one + // copy and trigger errors on the other copy. + ////////////////////////////////////////////////////////////////////////// + + // Done. + (@object $object:ident () () ()) => {}; + + // Insert the current entry followed by trailing comma. + (@object $object:ident [$($key:tt)+] ($value:expr) , $($rest:tt)*) => { + let _ = $object.insert(($($key)+).into(), $value); + value_internal!(@object $object () ($($rest)*) ($($rest)*)); + }; + + // Current entry followed by unexpected token. + (@object $object:ident [$($key:tt)+] ($value:expr) $unexpected:tt $($rest:tt)*) => { + json_unexpected!($unexpected); + }; + + // Insert the last entry without trailing comma. + (@object $object:ident [$($key:tt)+] ($value:expr)) => { + let _ = $object.insert(($($key)+).into(), $value); + }; + + // Next value is `null`. + (@object $object:ident ($($key:tt)+) (: null $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object [$($key)+] (value_internal!(null)) $($rest)*); + }; + + // Next value is `true`. + (@object $object:ident ($($key:tt)+) (: true $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object [$($key)+] (value_internal!(true)) $($rest)*); + }; + + // Next value is `false`. + (@object $object:ident ($($key:tt)+) (: false $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object [$($key)+] (value_internal!(false)) $($rest)*); + }; + + // Next value is an array. + (@object $object:ident ($($key:tt)+) (: [$($array:tt)*] $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object [$($key)+] (value_internal!([$($array)*])) $($rest)*); + }; + + // Next value is a map. + (@object $object:ident ($($key:tt)+) (: {$($map:tt)*} $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object [$($key)+] (value_internal!({$($map)*})) $($rest)*); + }; + + // Next value is an expression followed by comma. + (@object $object:ident ($($key:tt)+) (: $value:expr , $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object [$($key)+] (value_internal!($value)) , $($rest)*); + }; + + // Last value is an expression with no trailing comma. + (@object $object:ident ($($key:tt)+) (: $value:expr) $copy:tt) => { + value_internal!(@object $object [$($key)+] (value_internal!($value))); + }; + + // Missing value for last entry. Trigger a reasonable error message. + (@object $object:ident ($($key:tt)+) (:) $copy:tt) => { + // "unexpected end of macro invocation" + value_internal!(); + }; + + // Missing colon and value for last entry. Trigger a reasonable error + // message. + (@object $object:ident ($($key:tt)+) () $copy:tt) => { + // "unexpected end of macro invocation" + value_internal!(); + }; + + // Misplaced colon. Trigger a reasonable error message. + (@object $object:ident () (: $($rest:tt)*) ($colon:tt $($copy:tt)*)) => { + // Takes no arguments so "no rules expected the token `:`". + json_unexpected!($colon); + }; + + // Found a comma inside a key. Trigger a reasonable error message. + (@object $object:ident ($($key:tt)*) (, $($rest:tt)*) ($comma:tt $($copy:tt)*)) => { + // Takes no arguments so "no rules expected the token `,`". + json_unexpected!($comma); + }; + + // Key is fully parenthesized. This avoids clippy double_parens false + // positives because the parenthesization may be necessary here. + (@object $object:ident () (($key:expr) : $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object ($key) (: $($rest)*) (: $($rest)*)); + }; + + // Refuse to absorb colon token into key expression. + (@object $object:ident ($($key:tt)*) (: $($unexpected:tt)+) $copy:tt) => { + json_expect_expr_comma!($($unexpected)+); + }; + + // Munch a token into the current key. + (@object $object:ident ($($key:tt)*) ($tt:tt $($rest:tt)*) $copy:tt) => { + value_internal!(@object $object ($($key)* $tt) ($($rest)*) ($($rest)*)); + }; + + ////////////////////////////////////////////////////////////////////////// + // The main implementation. + // + // Must be invoked as: value_internal!($($json)+) + ////////////////////////////////////////////////////////////////////////// + + (null) => { + $crate::LoroValue::Null + }; + + (true) => { + $crate::LoroValue::Bool(true) + }; + + (false) => { + $crate::LoroValue::Bool(false) + }; + + ([]) => { + $crate::LoroValue::List(std::sync::Arc::new(json_internal_vec![])) + }; + + ([ $($tt:tt)+ ]) => { + $crate::LoroValue::List(std::sync::Arc::new(value_internal!(@array [] $($tt)+))) + }; + + ({}) => { + $crate::LoroValue::Map(std::sync::Arc::new(Default::default())) + }; + + ({ $($tt:tt)+ }) => { + ({ + let mut object = $crate::FxHashMap::default(); + value_internal!(@object object () ($($tt)+) ($($tt)+)); + $crate::LoroValue::Map(std::sync::Arc::new(object)) + }) + }; + + // Any Serialize type: numbers, strings, struct literals, variables etc. + // Must be below every other rule. + ($other:expr) => { + $crate::to_value($other) + }; +} + +#[macro_export] +#[doc(hidden)] +macro_rules! json_unexpected { + () => {}; +} + +// The json_internal macro above cannot invoke vec directly because it uses +// local_inner_macros. A vec invocation there would resolve to $crate::vec. +// Instead invoke vec here outside of local_inner_macros. +#[macro_export] +#[doc(hidden)] +macro_rules! json_internal_vec { + ($($content:tt)*) => { + vec![$($content)*] + }; +} + +#[cfg(test)] +mod test { + #[test] + fn test_value_macro() { + let v = loro_value!([1, 2, 3]); + let list = v.into_list().unwrap(); + assert_eq!(&*list, &[1.into(), 2.into(), 3.into()]); + + let map = loro_value!({ + "hi": true, + "false": false, + "null": null, + "list": [], + "integer": 123, + "float": 123.123, + "map": { + "a": "1" + } + }); + + let map = map.into_map().unwrap(); + assert_eq!(map.len(), 7); + assert!(*map.get("hi").unwrap().as_bool().unwrap()); + assert!(!(*map.get("false").unwrap().as_bool().unwrap())); + assert!(map.get("null").unwrap().is_null()); + assert_eq!(map.get("list").unwrap().as_list().unwrap().len(), 0); + assert_eq!(*map.get("integer").unwrap().as_i32().unwrap(), 123); + assert_eq!(*map.get("float").unwrap().as_double().unwrap(), 123.123); + assert_eq!(map.get("map").unwrap().as_map().unwrap().len(), 1); + assert_eq!( + &**map + .get("map") + .unwrap() + .as_map() + .unwrap() + .get("a") + .unwrap() + .as_string() + .unwrap(), + "1" + ); + } +} diff --git a/crates/loro-common/src/span.rs b/crates/loro-common/src/span.rs index f9697dd8..303661ef 100644 --- a/crates/loro-common/src/span.rs +++ b/crates/loro-common/src/span.rs @@ -113,6 +113,8 @@ impl CounterSpan { } #[inline(always)] + /// Normalized end value. + /// /// This is different from end. start may be greater than end. This is the max of start+1 and end pub fn norm_end(&self) -> i32 { if self.start < self.end { @@ -160,6 +162,16 @@ impl CounterSpan { fn next_pos(&self) -> i32 { self.end } + + fn get_intersection(&self, counter: &CounterSpan) -> Option { + let start = self.start.max(counter.start); + let end = self.end.min(counter.end); + if start < end { + Some(CounterSpan { start, end }) + } else { + None + } + } } impl HasLength for CounterSpan { @@ -228,15 +240,16 @@ impl Mergable for CounterSpan { /// We need this because it'll make merging deletions easier. #[derive(Clone, Copy, PartialEq, Eq, Debug)] pub struct IdSpan { + // TODO: rename this to peer_id pub client_id: PeerID, pub counter: CounterSpan, } impl IdSpan { #[inline] - pub fn new(client_id: PeerID, from: Counter, to: Counter) -> Self { + pub fn new(peer: PeerID, from: Counter, to: Counter) -> Self { Self { - client_id, + client_id: peer, counter: CounterSpan { start: from, end: to, @@ -281,6 +294,18 @@ impl IdSpan { out.insert(self.client_id, self.counter); out } + + pub fn get_intersection(&self, other: &Self) -> Option { + if self.client_id != other.client_id { + return None; + } + + let counter = self.counter.get_intersection(&other.counter)?; + Some(Self { + client_id: self.client_id, + counter, + }) + } } impl HasLength for IdSpan { @@ -425,6 +450,12 @@ impl HasId for (PeerID, CounterSpan) { } } +impl From for IdSpan { + fn from(value: ID) -> Self { + Self::new(value.peer, value.counter, value.counter + 1) + } +} + #[cfg(test)] mod test_id_span { use rle::RleVecWithIndex; diff --git a/crates/loro-common/src/value.rs b/crates/loro-common/src/value.rs index b993bb6e..5ff6c445 100644 --- a/crates/loro-common/src/value.rs +++ b/crates/loro-common/src/value.rs @@ -25,6 +25,29 @@ pub enum LoroValue { Container(ContainerID), } +const MAX_DEPTH: usize = 128; +impl<'a> arbitrary::Arbitrary<'a> for LoroValue { + fn arbitrary(u: &mut arbitrary::Unstructured<'a>) -> arbitrary::Result { + let value = match u.int_in_range(0..=7).unwrap() { + 0 => LoroValue::Null, + 1 => LoroValue::Bool(u.arbitrary()?), + 2 => LoroValue::Double(u.arbitrary()?), + 3 => LoroValue::I32(u.arbitrary()?), + 4 => LoroValue::Binary(Arc::new(u.arbitrary()?)), + 5 => LoroValue::String(Arc::new(u.arbitrary()?)), + 6 => LoroValue::List(Arc::new(u.arbitrary()?)), + 7 => LoroValue::Map(Arc::new(u.arbitrary()?)), + _ => unreachable!(), + }; + + if value.get_depth() > MAX_DEPTH { + Err(arbitrary::Error::IncorrectFormat) + } else { + Ok(value) + } + } +} + impl LoroValue { pub fn get_by_key(&self, key: &str) -> Option<&LoroValue> { match self { @@ -39,6 +62,37 @@ impl LoroValue { _ => None, } } + + pub fn get_depth(&self) -> usize { + let mut max_depth = 0; + let mut value_depth_pairs = vec![(self, 0)]; + while let Some((value, depth)) = value_depth_pairs.pop() { + match value { + LoroValue::List(arr) => { + for v in arr.iter() { + value_depth_pairs.push((v, depth + 1)); + } + max_depth = max_depth.max(depth + 1); + } + LoroValue::Map(map) => { + for (_, v) in map.iter() { + value_depth_pairs.push((v, depth + 1)); + } + + max_depth = max_depth.max(depth + 1); + } + _ => {} + } + } + + max_depth + } + + // TODO: add checks for too deep value, and return err if users + // try to insert such value into a container + pub fn is_too_deep(&self) -> bool { + self.get_depth() > MAX_DEPTH + } } impl Index<&str> for LoroValue { @@ -612,3 +666,7 @@ impl<'de> serde::de::Visitor<'de> for LoroValueEnumVisitor { } } } + +pub fn to_value>(value: T) -> LoroValue { + value.into() +} diff --git a/crates/loro-internal/Cargo.toml b/crates/loro-internal/Cargo.toml index 0f9a32ea..cd7203ac 100644 --- a/crates/loro-internal/Cargo.toml +++ b/crates/loro-internal/Cargo.toml @@ -34,14 +34,19 @@ itertools = "0.11.0" enum_dispatch = "0.3.11" im = "15.1.0" generic-btree = { version = "0.8.2" } -miniz_oxide = "0.7.1" getrandom = "0.2.10" once_cell = "1.18.0" +leb128 = "0.2.5" +num-traits = "0.2" +num-derive = "0.3" +md5 = "0.7.0" [dev-dependencies] +miniz_oxide = "0.7.1" serde_json = "1.0.87" dhat = "0.3.1" rand = { version = "0.8.5" } +base64 = "0.21.5" proptest = "1.0.0" proptest-derive = "0.3.0" static_assertions = "1.1.0" diff --git a/crates/loro-internal/Encoding.md b/crates/loro-internal/Encoding.md new file mode 100644 index 00000000..c2722820 --- /dev/null +++ b/crates/loro-internal/Encoding.md @@ -0,0 +1,25 @@ +# Encoding Schema + +## Header + +The header has 22 bytes. + +- (0-4 bytes) Magic Bytes: The encoding starts with `loro` as magic bytes. +- (4-20 bytes) Checksum: MD5 checksum of the encoded data, including the header starting from 20th bytes. The checksum is encoded as a 16-byte array. The `checksum` and `magic bytes` fields are trimmed when calculating the checksum. +- (20-21 bytes) Encoding Method (2 bytes, big endian): Multiple encoding methods are available for a specific encoding version. + +## Encode Mode: Updates + +In this approach, only ops, specifically their historical record, are encoded, while document states are excluded. + +Like Automerge's format, we employ columnar encoding for operations and changes. + +Previously, operations were ordered by their Operation ID (OpId) before columnar encoding. However, sorting operations based on their respective containers initially enhance compression potential. + +## Encode Mode: Snapshot + +This mode simultaneously captures document state and historical data. Upon importing a snapshot into a new document, initialization occurs directly from the snapshot, bypassing the need for CRDT-based recalculations. + +Unlike previous snapshot encoding methods, the current binary output in snapshot mode is compatible with the updates mode. This enhances the efficiency of importing snapshots into non-empty documents, where initialization via snapshot is infeasible. + +Additionally, when feasible, we leverage the sequence of operations to construct state snapshots. In CRDTs, deducing the specific ops constituting the current container state is feasible. These ops are tagged in relation to the container, facilitating direct state reconstruction from them. This approach, pioneered by Automerge, significantly improves compression efficiency. diff --git a/crates/loro-internal/deno.lock b/crates/loro-internal/deno.lock index 9aa6158e..5a95611b 100644 --- a/crates/loro-internal/deno.lock +++ b/crates/loro-internal/deno.lock @@ -88,7 +88,6 @@ "https://deno.land/x/cliui@v7.0.4-deno/build/lib/index.js": "fb6030c7b12602a4fca4d81de3ddafa301ba84fd9df73c53de6f3bdda7b482d5", "https://deno.land/x/cliui@v7.0.4-deno/build/lib/string-utils.js": "b3eb9d2e054a43a3064af17332fb1839a7dadb205c5371af4789616afb1a117f", "https://deno.land/x/cliui@v7.0.4-deno/deno.ts": "d07bc3338661f8011e3a5fd215061d17a52107a5383c29f40ce0c1ecb8bb8cc3", - "https://deno.land/x/dirname@1.1.2/mod.ts": "4029ca6b49da58d262d65f826ba9b3a89cc0b92a94c7220d5feb7bd34e498a54", "https://deno.land/x/dirname@1.1.2/types.ts": "c1ed1667545bc4b1d69bdb2fc26a5fa8edae3a56e3081209c16a408a322a2319", "https://deno.land/x/escalade@v3.0.3/sync.ts": "493bc66563292c5c10c4a75a467a5933f24dad67d74b0f5a87e7b988fe97c104", "https://deno.land/x/y18n@v5.0.0-deno/build/lib/index.d.ts": "11f40d97041eb271cc1a1c7b296c6e7a068d4843759575e7416f0d14ebf8239c", diff --git a/crates/loro-internal/examples/encoding.rs b/crates/loro-internal/examples/encoding.rs index df10c03f..bc39594f 100644 --- a/crates/loro-internal/examples/encoding.rs +++ b/crates/loro-internal/examples/encoding.rs @@ -42,7 +42,15 @@ fn main() { println!( "snapshot size {} after compression {}", snapshot.len(), - output.len() + output.len(), + ); + + let updates = loro.export_from(&Default::default()); + let output = miniz_oxide::deflate::compress_to_vec(&updates, 6); + println!( + "updates size {} after compression {}", + updates.len(), + output.len(), ); // { diff --git a/crates/loro-internal/examples/many_actors.rs b/crates/loro-internal/examples/many_actors.rs index 59dbcaff..4780ea27 100644 --- a/crates/loro-internal/examples/many_actors.rs +++ b/crates/loro-internal/examples/many_actors.rs @@ -5,8 +5,8 @@ use loro_internal::{LoroDoc, LoroValue}; // static ALLOC: dhat::Alloc = dhat::Alloc; fn main() { - with_100k_actors_then_action(); - // import_with_many_actors(); + // with_100k_actors_then_action(); + import_with_many_actors(); } #[allow(unused)] diff --git a/crates/loro-internal/examples/pending.rs b/crates/loro-internal/examples/pending.rs index fe920cc8..e59dc002 100644 --- a/crates/loro-internal/examples/pending.rs +++ b/crates/loro-internal/examples/pending.rs @@ -8,7 +8,7 @@ pub fn main() { let actions = bench_utils::get_automerge_actions(); let action_length = actions.len(); let text = loro.get_text("text"); - for (_, chunks) in actions.chunks(action_length / 10).enumerate() { + for chunks in actions.chunks(action_length / 10) { for TextAction { pos, ins, del } in chunks { let mut txn = loro.txn().unwrap(); text.delete_with_txn(&mut txn, *pos, *del).unwrap(); diff --git a/crates/loro-internal/fuzz/Cargo.lock b/crates/loro-internal/fuzz/Cargo.lock index 637a3ed7..2453dad5 100644 --- a/crates/loro-internal/fuzz/Cargo.lock +++ b/crates/loro-internal/fuzz/Cargo.lock @@ -2,12 +2,6 @@ # It is not intended for manual editing. version = 3 -[[package]] -name = "adler" -version = "1.0.2" -source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "f26201604c87b1e01bd3d98f8d5d9a8fcbb815e8cedb41ffccbeb4bf593a35fe" - [[package]] name = "append-only-bytes" version = "0.1.12" @@ -313,6 +307,12 @@ dependencies = [ "libc", ] +[[package]] +name = "leb128" +version = "0.2.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "884e2677b40cc8c339eaefcb701c32ef1fd2493d71118dc0ca4b6a736c93bd67" + [[package]] name = "libc" version = "0.2.147" @@ -349,6 +349,7 @@ dependencies = [ "fxhash", "loro-rle", "serde", + "serde_columnar", "string_cache", "thiserror", ] @@ -368,11 +369,14 @@ dependencies = [ "getrandom", "im", "itertools", + "leb128", "loro-common", "loro-preload", "loro-rle", - "miniz_oxide", + "md5", "num", + "num-derive", + "num-traits", "once_cell", "postcard", "rand", @@ -423,13 +427,10 @@ source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "3f3d053a135388e6b1df14e8af1212af5064746e9b87a06a345a7a779ee9695a" [[package]] -name = "miniz_oxide" -version = "0.7.1" +name = "md5" +version = "0.7.0" source = "registry+https://github.com/rust-lang/crates.io-index" -checksum = "e7810e0be55b428ada41041c41f32c9f1a42817901b4ccf45fa3d4b6561e74c7" -dependencies = [ - "adler", -] +checksum = "490cc448043f947bae3cbee9c203358d62dbee0db12107a74be5c30ccfd09771" [[package]] name = "new_debug_unreachable" @@ -471,6 +472,17 @@ dependencies = [ "num-traits", ] +[[package]] +name = "num-derive" +version = "0.3.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "876a53fff98e03a936a674b29568b0e605f06b29372c2489ff4de23f1949743d" +dependencies = [ + "proc-macro2", + "quote", + "syn 1.0.105", +] + [[package]] name = "num-integer" version = "0.1.45" diff --git a/crates/loro-internal/fuzz/fuzz_targets/import.rs b/crates/loro-internal/fuzz/fuzz_targets/import.rs index 8700e573..6a416f87 100644 --- a/crates/loro-internal/fuzz/fuzz_targets/import.rs +++ b/crates/loro-internal/fuzz/fuzz_targets/import.rs @@ -4,5 +4,6 @@ use loro_internal::LoroDoc; fuzz_target!(|data: Vec| { let mut doc = LoroDoc::default(); - doc.import(&data); + doc.import_snapshot_unchecked(&data); + doc.import_delta_updates_unchecked(&data); }); diff --git a/crates/loro-internal/scripts/fuzz.ts b/crates/loro-internal/scripts/fuzz.ts index 16f72769..53536def 100644 --- a/crates/loro-internal/scripts/fuzz.ts +++ b/crates/loro-internal/scripts/fuzz.ts @@ -1,5 +1,5 @@ -import __ from "https://deno.land/x/dirname@1.1.2/mod.ts"; -const { __dirname } = __(import.meta); +import * as path from "https://deno.land/std@0.105.0/path/mod.ts"; +const __dirname = path.dirname(path.fromFileUrl(import.meta.url)); import { resolve } from "https://deno.land/std@0.198.0/path/mod.ts"; const validTargets = Array.from( diff --git a/crates/loro-internal/scripts/mem.ts b/crates/loro-internal/scripts/mem.ts index 23ea84a9..384ba2b5 100644 --- a/crates/loro-internal/scripts/mem.ts +++ b/crates/loro-internal/scripts/mem.ts @@ -1,5 +1,5 @@ -import __ from "https://deno.land/x/dirname@1.1.2/mod.ts"; -const { __dirname } = __(import.meta); +import * as path from "https://deno.land/std@0.105.0/path/mod.ts"; +const __dirname = path.dirname(path.fromFileUrl(import.meta.url)); import { resolve } from "https://deno.land/std@0.105.0/path/mod.ts"; export const Tasks = [ diff --git a/crates/loro-internal/src/arena.rs b/crates/loro-internal/src/arena.rs index f9da7681..16426bda 100644 --- a/crates/loro-internal/src/arena.rs +++ b/crates/loro-internal/src/arena.rs @@ -14,7 +14,7 @@ use crate::{ container::{ idx::ContainerIdx, list::list_op::{InnerListOp, ListOp}, - map::{InnerMapSet, MapSet}, + map::MapSet, ContainerID, }, id::Counter, @@ -47,149 +47,14 @@ pub struct SharedArena { } pub struct StrAllocResult { + /// unicode start pub start: usize, + /// unicode end pub end: usize, // TODO: remove this field? pub utf16_len: usize, } -pub(crate) struct OpConverter<'a> { - container_idx_to_id: MutexGuard<'a, Vec>, - container_id_to_idx: MutexGuard<'a, FxHashMap>, - container_idx_depth: MutexGuard<'a, Vec>, - str: MutexGuard<'a, StrArena>, - values: MutexGuard<'a, Vec>, - root_c_idx: MutexGuard<'a, Vec>, - parents: MutexGuard<'a, FxHashMap>>, -} - -impl<'a> OpConverter<'a> { - pub fn convert_single_op( - &mut self, - id: &ContainerID, - _peer: PeerID, - counter: Counter, - _lamport: Lamport, - content: RawOpContent, - ) -> Op { - let container = 'out: { - if let Some(&idx) = self.container_id_to_idx.get(id) { - break 'out idx; - } - - let container_idx_to_id = &mut self.container_idx_to_id; - let idx = container_idx_to_id.len(); - container_idx_to_id.push(id.clone()); - let idx = ContainerIdx::from_index_and_type(idx as u32, id.container_type()); - self.container_id_to_idx.insert(id.clone(), idx); - if id.is_root() { - self.root_c_idx.push(idx); - self.parents.insert(idx, None); - self.container_idx_depth.push(1); - } else { - self.container_idx_depth.push(0); - } - idx - }; - - match content { - crate::op::RawOpContent::Map(MapSet { key, value }) => { - let value = if let Some(value) = value { - Some(_alloc_value(&mut self.values, value) as u32) - } else { - None - }; - Op { - counter, - container, - content: crate::op::InnerContent::Map(InnerMapSet { key, value }), - } - } - crate::op::RawOpContent::List(list) => match list { - ListOp::Insert { slice, pos } => match slice { - ListSlice::RawData(values) => { - let range = _alloc_values(&mut self.values, values.iter().cloned()); - Op { - counter, - container, - content: crate::op::InnerContent::List(InnerListOp::Insert { - slice: SliceRange::from(range.start as u32..range.end as u32), - pos, - }), - } - } - ListSlice::RawStr { - str, - unicode_len: _, - } => { - let slice = _alloc_str(&mut self.str, &str); - Op { - counter, - container, - content: crate::op::InnerContent::List(InnerListOp::Insert { - slice: SliceRange::from(slice.start as u32..slice.end as u32), - pos, - }), - } - } - }, - ListOp::Delete(span) => Op { - counter, - container, - content: InnerContent::List(InnerListOp::Delete(span)), - }, - ListOp::StyleStart { - start, - end, - info, - key, - value, - } => Op { - counter, - container, - content: InnerContent::List(InnerListOp::StyleStart { - start, - end, - info, - key, - value, - }), - }, - ListOp::StyleEnd => Op { - counter, - container, - content: InnerContent::List(InnerListOp::StyleEnd), - }, - }, - crate::op::RawOpContent::Tree(tree) => { - // we need create every meta container associated with target TreeID - let id = tree.target; - let meta_container_id = id.associated_meta_container(); - - if self.container_id_to_idx.get(&meta_container_id).is_none() { - let container_idx_to_id = &mut self.container_idx_to_id; - let idx = container_idx_to_id.len(); - container_idx_to_id.push(meta_container_id.clone()); - self.container_idx_depth.push(0); - let idx = ContainerIdx::from_index_and_type( - idx as u32, - meta_container_id.container_type(), - ); - self.container_id_to_idx.insert(meta_container_id, idx); - let parent = &mut self.parents; - parent.insert(idx, Some(container)); - } - - Op { - container, - counter, - content: crate::op::InnerContent::Tree(tree), - } - } - } - } -} - impl SharedArena { pub fn register_container(&self, id: &ContainerID) -> ContainerIdx { let mut container_id_to_idx = self.inner.container_id_to_idx.lock().unwrap(); @@ -244,12 +109,9 @@ impl SharedArena { } /// return slice and unicode index - pub fn alloc_str_with_slice(&self, str: &str) -> (BytesSlice, usize) { + pub fn alloc_str_with_slice(&self, str: &str) -> (BytesSlice, StrAllocResult) { let mut text_lock = self.inner.str.lock().unwrap(); - let start = text_lock.len_bytes(); - let unicode_start = text_lock.len_unicode(); - text_lock.alloc(str); - (text_lock.slice_bytes(start..), unicode_start) + _alloc_str_with_slice(&mut text_lock, str) } /// alloc str without extra info @@ -365,20 +227,6 @@ impl SharedArena { (self.inner.values.lock().unwrap()[range]).to_vec() } - #[inline(always)] - pub(crate) fn with_op_converter(&self, f: impl FnOnce(&mut OpConverter) -> R) -> R { - let mut op_converter = OpConverter { - container_idx_to_id: self.inner.container_idx_to_id.lock().unwrap(), - container_id_to_idx: self.inner.container_id_to_idx.lock().unwrap(), - container_idx_depth: self.inner.depth.lock().unwrap(), - str: self.inner.str.lock().unwrap(), - values: self.inner.values.lock().unwrap(), - root_c_idx: self.inner.root_c_idx.lock().unwrap(), - parents: self.inner.parents.lock().unwrap(), - }; - f(&mut op_converter) - } - pub fn convert_single_op( &self, container: &ContainerID, @@ -404,14 +252,11 @@ impl SharedArena { container: ContainerIdx, ) -> Op { match content { - crate::op::RawOpContent::Map(MapSet { key, value }) => { - let value = value.map(|value| self.alloc_value(value) as u32); - Op { - counter, - container, - content: crate::op::InnerContent::Map(InnerMapSet { key, value }), - } - } + crate::op::RawOpContent::Map(MapSet { key, value }) => Op { + counter, + container, + content: crate::op::InnerContent::Map(MapSet { key, value }), + }, crate::op::RawOpContent::List(list) => match list { ListOp::Insert { slice, pos } => match slice { ListSlice::RawData(values) => { @@ -426,13 +271,13 @@ impl SharedArena { } } ListSlice::RawStr { str, unicode_len } => { - let (slice, start) = self.alloc_str_with_slice(&str); + let (slice, info) = self.alloc_str_with_slice(&str); Op { counter, container, content: crate::op::InnerContent::List(InnerListOp::InsertText { slice, - unicode_start: start as u32, + unicode_start: info.start as u32, unicode_len: unicode_len as u32, pos: pos as u32, }), @@ -519,6 +364,15 @@ impl SharedArena { } } +fn _alloc_str_with_slice( + text_lock: &mut MutexGuard<'_, StrArena>, + str: &str, +) -> (BytesSlice, StrAllocResult) { + let start = text_lock.len_bytes(); + let ans = _alloc_str(text_lock, str); + (text_lock.slice_bytes(start..), ans) +} + fn _alloc_values( values_lock: &mut MutexGuard<'_, Vec>, values: impl Iterator, diff --git a/crates/loro-internal/src/container/list/list_op.rs b/crates/loro-internal/src/container/list/list_op.rs index 4344594b..42204870 100644 --- a/crates/loro-internal/src/container/list/list_op.rs +++ b/crates/loro-internal/src/container/list/list_op.rs @@ -49,6 +49,7 @@ pub enum InnerListOp { }, Delete(DeleteSpan), /// StyleStart and StyleEnd must be paired. + /// The next op of StyleStart must be StyleEnd. StyleStart { start: u32, end: u32, diff --git a/crates/loro-internal/src/container/map/map_content.rs b/crates/loro-internal/src/container/map/map_content.rs index 1a829305..993718ad 100644 --- a/crates/loro-internal/src/container/map/map_content.rs +++ b/crates/loro-internal/src/container/map/map_content.rs @@ -11,13 +11,6 @@ pub struct MapSet { pub(crate) value: Option, } -#[derive(Clone, Debug, PartialEq, Eq)] -pub struct InnerMapSet { - pub(crate) key: InternalString, - // the key is deleted if value is None - pub(crate) value: Option, -} - impl Mergable for MapSet {} impl Sliceable for MapSet { fn slice(&self, from: usize, to: usize) -> Self { @@ -31,19 +24,6 @@ impl HasLength for MapSet { } } -impl Mergable for InnerMapSet {} -impl Sliceable for InnerMapSet { - fn slice(&self, from: usize, to: usize) -> Self { - assert!(from == 0 && to == 1); - self.clone() - } -} -impl HasLength for InnerMapSet { - fn content_len(&self) -> usize { - 1 - } -} - #[cfg(test)] mod test { use super::MapSet; diff --git a/crates/loro-internal/src/container/map/mod.rs b/crates/loro-internal/src/container/map/mod.rs index 6b6a0d3b..1a667865 100644 --- a/crates/loro-internal/src/container/map/mod.rs +++ b/crates/loro-internal/src/container/map/mod.rs @@ -1,3 +1,3 @@ mod map_content; -pub(crate) use map_content::{InnerMapSet, MapSet}; +pub(crate) use map_content::MapSet; diff --git a/crates/loro-internal/src/container/richtext/richtext_state.rs b/crates/loro-internal/src/container/richtext/richtext_state.rs index 8b3b342f..bf351d85 100644 --- a/crates/loro-internal/src/container/richtext/richtext_state.rs +++ b/crates/loro-internal/src/container/richtext/richtext_state.rs @@ -4,7 +4,7 @@ use generic_btree::{ rle::{HasLength, Mergeable, Sliceable}, BTree, BTreeTrait, Cursor, }; -use loro_common::LoroValue; +use loro_common::{IdSpan, LoroValue, ID}; use serde::{ser::SerializeStruct, Serialize}; use std::fmt::{Display, Formatter}; use std::{ @@ -64,17 +64,18 @@ mod text_chunk { use std::ops::Range; use append_only_bytes::BytesSlice; + use loro_common::ID; #[derive(Clone, Debug, PartialEq)] pub(crate) struct TextChunk { - unicode_len: i32, bytes: BytesSlice, - // TODO: make this field only available in wasm mode + unicode_len: i32, utf16_len: i32, + start_op_id: ID, } impl TextChunk { - pub fn from_bytes(bytes: BytesSlice) -> Self { + pub fn new(bytes: BytesSlice, id: ID) -> Self { let mut utf16_len = 0; let mut unicode_len = 0; for c in std::str::from_utf8(&bytes).unwrap().chars() { @@ -86,9 +87,14 @@ mod text_chunk { unicode_len, bytes, utf16_len: utf16_len as i32, + start_op_id: id, } } + pub fn id(&self) -> ID { + self.start_op_id + } + pub fn bytes(&self) -> &BytesSlice { &self.bytes } @@ -139,6 +145,9 @@ mod text_chunk { unicode_len: 0, bytes: BytesSlice::empty(), utf16_len: 0, + // This is a dummy value. + // It's fine because the length is 0. We never actually use this value. + start_op_id: ID::NONE_ID, } } @@ -186,6 +195,7 @@ mod text_chunk { } (true, false) => { self.bytes.slice_(end_byte..); + self.start_op_id = self.start_op_id.inc(end_unicode_index as i32); None } (false, true) => { @@ -194,7 +204,7 @@ mod text_chunk { } (false, false) => { let next = self.bytes.slice_clone(end_byte..); - let next = Self::from_bytes(next); + let next = Self::new(next, self.start_op_id.inc(end_unicode_index as i32)); self.unicode_len -= next.unicode_len; self.utf16_len -= next.utf16_len; self.bytes.slice_(..start_byte); @@ -283,6 +293,7 @@ mod text_chunk { unicode_len: range.len() as i32, bytes: self.bytes.slice_clone(start..end), utf16_len: utf16_len as i32, + start_op_id: self.start_op_id.inc(range.start as i32), }; ans.check(); ans @@ -303,6 +314,7 @@ mod text_chunk { unicode_len: self.unicode_len - pos as i32, bytes: self.bytes.slice_clone(byte_offset..), utf16_len: self.utf16_len - utf16_len as i32, + start_op_id: self.start_op_id.inc(pos as i32), }; self.unicode_len = pos as i32; @@ -317,6 +329,7 @@ mod text_chunk { impl generic_btree::rle::Mergeable for TextChunk { fn can_merge(&self, rhs: &Self) -> bool { self.bytes.can_merge(&rhs.bytes) + && self.start_op_id.inc(self.unicode_len) == rhs.start_op_id } fn merge_right(&mut self, rhs: &Self) { @@ -332,6 +345,7 @@ mod text_chunk { self.bytes = new; self.utf16_len += left.utf16_len; self.unicode_len += left.unicode_len; + self.start_op_id = left.start_op_id; self.check(); } } @@ -348,13 +362,29 @@ pub(crate) enum RichtextStateChunk { } impl RichtextStateChunk { - pub fn new_text(s: BytesSlice) -> Self { - Self::Text(TextChunk::from_bytes(s)) + pub fn new_text(s: BytesSlice, id: ID) -> Self { + Self::Text(TextChunk::new(s, id)) } pub fn new_style(style: Arc, anchor_type: AnchorType) -> Self { Self::Style { style, anchor_type } } + + pub(crate) fn get_id_span(&self) -> loro_common::IdSpan { + match self { + RichtextStateChunk::Text(t) => { + let id = t.id(); + IdSpan::new(id.peer, id.counter, id.counter + t.unicode_len()) + } + RichtextStateChunk::Style { style, anchor_type } => match anchor_type { + AnchorType::Start => style.id().into(), + AnchorType::End => { + let id = style.id(); + IdSpan::new(id.peer, id.counter + 1, id.counter + 2) + } + }, + } + } } impl DeltaValue for RichtextStateChunk { @@ -398,9 +428,9 @@ impl Serialize for RichtextStateChunk { } impl RichtextStateChunk { - pub fn try_from_bytes(s: BytesSlice) -> Result { + pub fn try_new(s: BytesSlice, id: ID) -> Result { std::str::from_utf8(&s)?; - Ok(RichtextStateChunk::Text(TextChunk::from_bytes(s))) + Ok(RichtextStateChunk::Text(TextChunk::new(s, id))) } pub fn from_style(style: Arc, anchor_type: AnchorType) -> Self { @@ -1172,8 +1202,8 @@ impl RichtextState { } /// This is used to accept changes from DiffCalculator - pub(crate) fn insert_at_entity_index(&mut self, entity_index: usize, text: BytesSlice) { - let elem = RichtextStateChunk::try_from_bytes(text).unwrap(); + pub(crate) fn insert_at_entity_index(&mut self, entity_index: usize, text: BytesSlice, id: ID) { + let elem = RichtextStateChunk::try_new(text, id).unwrap(); self.style_ranges.insert(entity_index, elem.rle_len()); let leaf; if let Some(cursor) = @@ -1736,8 +1766,8 @@ impl RichtextState { } #[inline(always)] - pub fn is_emtpy(&self) -> bool { - self.tree.root_cache().bytes == 0 + pub fn is_empty(&self) -> bool { + self.tree.root_cache().entity_len == 0 } #[inline(always)] @@ -1779,7 +1809,7 @@ mod test { let state = &mut self.state; let text = self.bytes.slice(start..); let entity_index = state.get_entity_index_for_text_insert(pos, PosType::Unicode); - state.insert_at_entity_index(entity_index, text); + state.insert_at_entity_index(entity_index, text, ID::new(0, 0)); }; } diff --git a/crates/loro-internal/src/container/richtext/tracker.rs b/crates/loro-internal/src/container/richtext/tracker.rs index 14347d0e..4e831eed 100644 --- a/crates/loro-internal/src/container/richtext/tracker.rs +++ b/crates/loro-internal/src/container/richtext/tracker.rs @@ -70,6 +70,8 @@ impl Tracker { } pub(crate) fn insert(&mut self, mut op_id: ID, mut pos: usize, mut content: RichtextChunk) { + // debug_log::debug_dbg!(&op_id, pos, content); + // debug_log::debug_dbg!(&self); let last_id = op_id.inc(content.len() as Counter - 1); let applied_counter_end = self.applied_vv.get(&last_id.peer).copied().unwrap_or(0); if applied_counter_end > op_id.counter { @@ -296,6 +298,7 @@ impl Tracker { self._checkout(from, false); self._checkout(to, true); // self.id_to_cursor.diagnose(); + // debug_log::debug_dbg!(&self); self.rope.get_diff() } } diff --git a/crates/loro-internal/src/container/richtext/tracker/crdt_rope.rs b/crates/loro-internal/src/container/richtext/tracker/crdt_rope.rs index 2c6ecce0..c69c9403 100644 --- a/crates/loro-internal/src/container/richtext/tracker/crdt_rope.rs +++ b/crates/loro-internal/src/container/richtext/tracker/crdt_rope.rs @@ -355,7 +355,10 @@ impl CrdtRope { match elem.diff() { DiffStatus::NotChanged => {} DiffStatus::Created => { - let rt = Some(CrdtRopeDelta::Insert(elem.content)); + let rt = Some(CrdtRopeDelta::Insert { + chunk: elem.content, + id: elem.id, + }); if index > last_pos { next = rt; let len = index - last_pos; @@ -405,7 +408,7 @@ impl CrdtRope { #[derive(Debug, Clone, PartialEq, Eq, Copy)] pub(crate) enum CrdtRopeDelta { Retain(usize), - Insert(RichtextChunk), + Insert { chunk: RichtextChunk, id: ID }, Delete(usize), } @@ -820,7 +823,10 @@ mod test { CrdtRopeDelta::Retain(2), CrdtRopeDelta::Delete(6), CrdtRopeDelta::Retain(2), - CrdtRopeDelta::Insert(RichtextChunk::new_text(10..13)) + CrdtRopeDelta::Insert { + chunk: RichtextChunk::new_text(10..13), + id: ID::new(1, 0) + } ], vec, ); @@ -841,7 +847,10 @@ mod test { ); let vec: Vec<_> = rope.get_diff().collect(); assert_eq!( - vec![CrdtRopeDelta::Insert(RichtextChunk::new_text(2..10))], + vec![CrdtRopeDelta::Insert { + chunk: RichtextChunk::new_text(2..10), + id: ID::new(0, 2) + }], vec, ); } diff --git a/crates/loro-internal/src/dag/iter.rs b/crates/loro-internal/src/dag/iter.rs index f7b8a0f2..d2a11b93 100644 --- a/crates/loro-internal/src/dag/iter.rs +++ b/crates/loro-internal/src/dag/iter.rs @@ -135,14 +135,14 @@ impl<'a, T: DagNode> Iterator for DagIteratorVV<'a, T> { debug_assert_eq!(id, node.id_start()); let mut vv = { // calculate vv - let mut vv = None; + let mut vv: Option = None; for &dep_id in node.deps() { let dep = self.dag.get(dep_id).unwrap(); let dep_vv = self.vv_map.get(&dep.id_start()).unwrap(); - if vv.is_none() { - vv = Some(dep_vv.clone()); + if let Some(vv) = vv.as_mut() { + vv.merge(dep_vv); } else { - vv.as_mut().unwrap().merge(dep_vv); + vv = Some(dep_vv.clone()); } if dep.id_start() != dep_id { @@ -150,7 +150,7 @@ impl<'a, T: DagNode> Iterator for DagIteratorVV<'a, T> { } } - vv.unwrap_or_else(VersionVector::new) + vv.unwrap_or_default() }; vv.try_update_last(id); diff --git a/crates/loro-internal/src/delta/map_delta.rs b/crates/loro-internal/src/delta/map_delta.rs index 336f8846..065eab24 100644 --- a/crates/loro-internal/src/delta/map_delta.rs +++ b/crates/loro-internal/src/delta/map_delta.rs @@ -28,6 +28,12 @@ pub struct MapValue { pub lamport: (Lamport, PeerID), } +impl MapValue { + pub fn id(&self) -> ID { + ID::new(self.lamport.1, self.counter) + } +} + #[derive(Default, Debug, Clone)] pub struct ResolvedMapDelta { pub updated: FxHashMap, diff --git a/crates/loro-internal/src/delta/tree.rs b/crates/loro-internal/src/delta/tree.rs index 846dc981..c3c23534 100644 --- a/crates/loro-internal/src/delta/tree.rs +++ b/crates/loro-internal/src/delta/tree.rs @@ -4,7 +4,7 @@ use std::{ }; use fxhash::{FxHashMap, FxHashSet}; -use loro_common::{ContainerType, LoroValue, TreeID}; +use loro_common::{ContainerType, LoroValue, TreeID, ID}; use serde::Serialize; use smallvec::{smallvec, SmallVec}; @@ -92,6 +92,7 @@ pub struct TreeDelta { pub struct TreeDeltaItem { pub target: TreeID, pub action: TreeInternalDiff, + pub last_effective_move_op_id: ID, } /// The action of [`TreeDiff`]. It's the same as [`crate::container::tree::tree_op::TreeOp`], but semantic. @@ -101,9 +102,9 @@ pub enum TreeInternalDiff { Create, /// Recreate the node, the node has been deleted before Restore, - /// Same as move to `None` and the node is exist + /// Same as move to `None` and the node exists AsRoot, - /// Move the node to the parent, the node is exist + /// Move the node to the parent, the node exists Move(TreeID), /// First create the node and move it to the parent CreateMove(TreeID), @@ -120,6 +121,7 @@ impl TreeDeltaItem { target: TreeID, parent: Option, old_parent: Option, + op_id: ID, is_parent_deleted: bool, is_old_parent_deleted: bool, ) -> Self { @@ -150,7 +152,11 @@ impl TreeDeltaItem { unreachable!() } }; - TreeDeltaItem { target, action } + TreeDeltaItem { + target, + action, + last_effective_move_op_id: op_id, + } } } diff --git a/crates/loro-internal/src/diff_calc.rs b/crates/loro-internal/src/diff_calc.rs index fe8f5cc3..9811916a 100644 --- a/crates/loro-internal/src/diff_calc.rs +++ b/crates/loro-internal/src/diff_calc.rs @@ -23,7 +23,7 @@ use crate::{ delta::{Delta, MapDelta, MapValue, TreeInternalDiff}, event::InternalDiff, id::Counter, - op::{RichOp, SliceRange}, + op::{RichOp, SliceRange, SliceRanges}, span::{HasId, HasLamport}, version::Frontiers, InternalString, VersionVector, @@ -130,7 +130,6 @@ impl DiffCalculator { .binary_search_by(|op| op.ctr_last().cmp(&start_counter)) .unwrap_or_else(|e| e); let mut visited = FxHashSet::default(); - debug_log::debug_dbg!(&change, iter_start); for mut op in &change.ops.vec()[iter_start..] { // slice the op if needed let stack_sliced_op; @@ -272,7 +271,6 @@ impl DiffCalculator { } } - debug_log::debug_dbg!(&new_containers); if len == all.len() { debug_log::debug_log!("Container might be deleted"); debug_log::debug_dbg!(&all); @@ -306,7 +304,7 @@ impl DiffCalculator { ); } } - debug_log::debug_dbg!(&ans); + // debug_log::debug_dbg!(&ans); ans.into_values() .sorted_by_key(|x| x.0) .map(|x| x.1) @@ -379,7 +377,7 @@ impl DiffCalculatorTrait for MapDiffCalculator { lamport: op.lamport(), peer: op.client_id(), counter: op.id_start().counter, - value: op.op().content.as_map().unwrap().value, + value: op.op().content.as_map().unwrap().value.clone(), }); } @@ -387,7 +385,7 @@ impl DiffCalculatorTrait for MapDiffCalculator { fn calculate_diff( &mut self, - oplog: &super::oplog::OpLog, + _oplog: &super::oplog::OpLog, from: &crate::VersionVector, to: &crate::VersionVector, mut on_new_container: impl FnMut(&ContainerID), @@ -411,7 +409,7 @@ impl DiffCalculatorTrait for MapDiffCalculator { for (key, value) in changed { let value = value .map(|v| { - let value = v.value.and_then(|v| oplog.arena.get_value(v as usize)); + let value = v.value.clone(); if let Some(LoroValue::Container(c)) = &value { on_new_container(c); } @@ -434,12 +432,26 @@ impl DiffCalculatorTrait for MapDiffCalculator { } } -#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] +#[derive(Debug, Clone, PartialEq, Eq)] struct CompactMapValue { lamport: Lamport, peer: PeerID, counter: Counter, - value: Option, + value: Option, +} + +impl Ord for CompactMapValue { + fn cmp(&self, other: &Self) -> std::cmp::Ordering { + self.lamport + .cmp(&other.lamport) + .then(self.peer.cmp(&other.peer)) + } +} + +impl PartialOrd for CompactMapValue { + fn partial_cmp(&self, other: &Self) -> Option { + Some(self.cmp(other)) + } } impl HasId for CompactMapValue { @@ -469,19 +481,19 @@ mod compact_register { &self, a: &VersionVector, b: &VersionVector, - ) -> (Option, Option) { - let mut max_a: Option = None; - let mut max_b: Option = None; + ) -> (Option<&CompactMapValue>, Option<&CompactMapValue>) { + let mut max_a: Option<&CompactMapValue> = None; + let mut max_b: Option<&CompactMapValue> = None; for v in self.tree.iter().rev() { if b.get(&v.peer).copied().unwrap_or(0) > v.counter { - max_b = Some(*v); + max_b = Some(v); break; } } for v in self.tree.iter().rev() { if a.get(&v.peer).copied().unwrap_or(0) > v.counter { - max_a = Some(*v); + max_a = Some(v); break; } } @@ -564,7 +576,7 @@ impl DiffCalculatorTrait for ListDiffCalculator { CrdtRopeDelta::Retain(len) => { delta = delta.retain(len); } - CrdtRopeDelta::Insert(value) => match value.value() { + CrdtRopeDelta::Insert { chunk: value, id } => match value.value() { RichtextChunkValue::Text(range) => { for i in range.clone() { let v = oplog.arena.get_value(i as usize); @@ -572,7 +584,10 @@ impl DiffCalculatorTrait for ListDiffCalculator { on_new_container(c); } } - delta = delta.insert(SliceRange(range)); + delta = delta.insert(SliceRanges { + ranges: smallvec::smallvec![SliceRange(range)], + id, + }); } RichtextChunkValue::StyleAnchor { .. } => unreachable!(), RichtextChunkValue::Unknown(_) => unreachable!(), @@ -583,7 +598,7 @@ impl DiffCalculatorTrait for ListDiffCalculator { } } - InternalDiff::SeqRaw(delta) + InternalDiff::ListRaw(delta) } } @@ -617,12 +632,8 @@ impl DiffCalculatorTrait for RichtextDiffCalculator { match &op.op().content { crate::op::InnerContent::List(l) => match l { - crate::container::list::list_op::InnerListOp::Insert { slice, pos } => { - self.tracker.insert( - op.id_start(), - *pos, - RichtextChunk::new_text(slice.0.clone()), - ); + crate::container::list::list_op::InnerListOp::Insert { .. } => { + unreachable!() } crate::container::list::list_op::InnerListOp::InsertText { slice: _, @@ -695,14 +706,15 @@ impl DiffCalculatorTrait for RichtextDiffCalculator { CrdtRopeDelta::Retain(len) => { delta = delta.retain(len); } - CrdtRopeDelta::Insert(value) => match value.value() { + CrdtRopeDelta::Insert { chunk: value, id } => match value.value() { RichtextChunkValue::Text(text) => { delta = delta.insert(RichtextStateChunk::Text( // PERF: can be speedup by acquiring lock on arena - TextChunk::from_bytes( + TextChunk::new( oplog .arena .slice_by_unicode(text.start as usize..text.end as usize), + id, ), )); } diff --git a/crates/loro-internal/src/diff_calc/tree.rs b/crates/loro-internal/src/diff_calc/tree.rs index 9ea55139..7681dad7 100644 --- a/crates/loro-internal/src/diff_calc/tree.rs +++ b/crates/loro-internal/src/diff_calc/tree.rs @@ -90,9 +90,9 @@ impl TreeDiffCache { // When we cache local ops, we can apply these directly. // Because importing the local op must not cause circular references, it has been checked. - pub(crate) fn add_node_uncheck(&mut self, node: MoveLamportAndID) { + pub(crate) fn add_node_from_local(&mut self, node: MoveLamportAndID) { if !self.all_version.includes_id(node.id) { - let old_parent = self.get_parent(node.target); + let (old_parent, _id) = self.get_parent(node.target); self.update_deleted_cache(node.target, node.parent, old_parent); @@ -147,7 +147,7 @@ impl TreeDiffCache { let apply_ops = self.forward(to, to_max_lamport); debug_log::debug_log!("apply ops {:?}", apply_ops); for op in apply_ops.into_iter() { - let old_parent = self.get_parent(op.target); + let (old_parent, _id) = self.get_parent(op.target); let is_parent_deleted = op.parent.is_some() && self.is_deleted(op.parent.as_ref().unwrap()); let is_old_parent_deleted = @@ -159,6 +159,7 @@ impl TreeDiffCache { op.target, op.parent, old_parent, + op.id, is_parent_deleted, is_old_parent_deleted, ); @@ -168,17 +169,18 @@ impl TreeDiffCache { this_diff.action, TreeInternalDiff::Restore | TreeInternalDiff::RestoreMove(_) ) { - // TODO: perf how to get children faster + // TODO: per let mut s = vec![op.target]; while let Some(t) = s.pop() { let children = self.get_children(t); children.iter().for_each(|c| { diff.push(TreeDeltaItem { - target: *c, + target: c.0, action: TreeInternalDiff::CreateMove(t), + last_effective_move_op_id: c.1, }) }); - s.extend(children); + s.extend(children.iter().map(|x| x.0)); } } } @@ -200,18 +202,20 @@ impl TreeDiffCache { self.current_version = vv.clone(); } - /// return true if it can be effected + /// return true if this apply op has effect on the tree + /// + /// This method assumes that `node` has the greatest lamport value fn apply(&mut self, mut node: MoveLamportAndID) -> bool { - let mut ans = true; + let mut effected = true; if node.parent.is_some() && self.is_ancestor_of(node.target, node.parent.unwrap()) { - ans = false; + effected = false; } - node.effected = ans; - let old_parent = self.get_parent(node.target); + node.effected = effected; + let (old_parent, _id) = self.get_parent(node.target); self.update_deleted_cache(node.target, node.parent, old_parent); self.cache.entry(node.target).or_default().insert(node); self.current_version.set_last(node.id); - ans + effected } fn forward(&mut self, vv: &VersionVector, max_lamport: Lamport) -> Vec { @@ -252,7 +256,7 @@ impl TreeDiffCache { }); if op.effected { // update deleted cache - let old_parent = self.get_parent(op.target); + let (old_parent, _id) = self.get_parent(op.target); self.update_deleted_cache(op.target, old_parent, op.parent); } } @@ -273,14 +277,15 @@ impl TreeDiffCache { } } for op in retreat_ops.iter_mut().sorted().rev() { - self.cache.get_mut(&op.target).unwrap().remove(op); + let btree_set = &mut self.cache.get_mut(&op.target).unwrap(); + btree_set.remove(op); self.pending.insert(*op); self.current_version.shrink_to_exclude(IdSpan { client_id: op.id.peer, counter: CounterSpan::new(op.id.counter, op.id.counter + 1), }); // calc old parent - let old_parent = self.get_parent(op.target); + let (old_parent, last_effective_move_op_id) = self.get_parent(op.target); if op.effected { // we need to know whether old_parent is deleted let is_parent_deleted = @@ -291,6 +296,7 @@ impl TreeDiffCache { op.target, old_parent, op.parent, + last_effective_move_op_id, is_old_parent_deleted, is_parent_deleted, ); @@ -305,11 +311,12 @@ impl TreeDiffCache { let children = self.get_children(t); children.iter().for_each(|c| { diffs.push(TreeDeltaItem { - target: *c, + target: c.0, action: TreeInternalDiff::CreateMove(t), + last_effective_move_op_id: c.1, }) }); - s.extend(children); + s.extend(children.iter().map(|c| c.0)); } } } @@ -319,15 +326,34 @@ impl TreeDiffCache { } /// get the parent of the first effected op - fn get_parent(&self, tree_id: TreeID) -> Option { + fn get_parent(&self, tree_id: TreeID) -> (Option, ID) { if TreeID::is_deleted_root(Some(tree_id)) { - return None; + return (None, ID::NONE_ID); } - let mut ans = TreeID::unexist_root(); + let mut ans = (TreeID::unexist_root(), ID::NONE_ID); if let Some(cache) = self.cache.get(&tree_id) { for op in cache.iter().rev() { if op.effected { - ans = op.parent; + ans = (op.parent, op.id); + break; + } + } + } + + ans + } + + /// get the parent of the first effected op + fn get_last_effective_move(&self, tree_id: TreeID) -> Option<&MoveLamportAndID> { + if TreeID::is_deleted_root(Some(tree_id)) { + return None; + } + + let mut ans = None; + if let Some(cache) = self.cache.get(&tree_id) { + for op in cache.iter().rev() { + if op.effected { + ans = Some(op); break; } } @@ -342,7 +368,7 @@ impl TreeDiffCache { } loop { - let parent = self.get_parent(node_id); + let (parent, _id) = self.get_parent(node_id); match parent { Some(parent_id) if parent_id == maybe_ancestor => return true, Some(parent_id) if parent_id == node_id => panic!("loop detected"), @@ -358,14 +384,14 @@ impl TreeDiffCache { pub(crate) trait TreeDeletedSetTrait { fn deleted(&self) -> &FxHashSet; fn deleted_mut(&mut self) -> &mut FxHashSet; - fn get_children(&self, target: TreeID) -> Vec; - fn get_children_recursively(&self, target: TreeID) -> Vec { + fn get_children(&self, target: TreeID) -> Vec<(TreeID, ID)>; + fn get_children_recursively(&self, target: TreeID) -> Vec<(TreeID, ID)> { let mut ans = vec![]; let mut s = vec![target]; while let Some(t) = s.pop() { let children = self.get_children(t); ans.extend(children.clone()); - s.extend(children); + s.extend(children.iter().map(|x| x.0)); } ans } @@ -393,7 +419,7 @@ pub(crate) trait TreeDeletedSetTrait { self.deleted_mut().remove(&target); } let mut s = self.get_children(target); - while let Some(child) = s.pop() { + while let Some((child, _)) = s.pop() { if child == target { continue; } @@ -416,17 +442,21 @@ impl TreeDeletedSetTrait for TreeDiffCache { &mut self.deleted } - fn get_children(&self, target: TreeID) -> Vec { + fn get_children(&self, target: TreeID) -> Vec<(TreeID, ID)> { let mut ans = vec![]; for (tree_id, _) in self.cache.iter() { if tree_id == &target { continue; } - let parent = self.get_parent(*tree_id); - if parent == Some(target) { - ans.push(*tree_id) + let Some(op) = self.get_last_effective_move(*tree_id) else { + continue; + }; + + if op.parent == Some(target) { + ans.push((*tree_id, op.id)); } } + ans } } diff --git a/crates/loro-internal/src/encoding.rs b/crates/loro-internal/src/encoding.rs index a185a1a2..348ca81c 100644 --- a/crates/loro-internal/src/encoding.rs +++ b/crates/loro-internal/src/encoding.rs @@ -1,133 +1,197 @@ -use fxhash::FxHashMap; -use loro_common::PeerID; - -use crate::{change::Change, op::RemoteOp}; - -pub(crate) type RemoteClientChanges<'a> = FxHashMap>>>; - -mod encode_enhanced; -pub(crate) mod encode_snapshot; -mod encode_updates; - -use rle::HasLength; +mod encode_reordered; +use crate::op::OpWithId; +use crate::LoroDoc; use crate::{oplog::OpLog, LoroError, VersionVector}; +use loro_common::{HasCounter, IdSpan, LoroResult}; +use num_derive::{FromPrimitive, ToPrimitive}; +use num_traits::{FromPrimitive, ToPrimitive}; +use rle::{HasLength, Sliceable}; +const MAGIC_BYTES: [u8; 4] = *b"loro"; -use self::encode_updates::decode_oplog_updates; - -pub(crate) use encode_enhanced::{decode_oplog_v2, encode_oplog_v2}; -pub(crate) use encode_updates::encode_oplog_updates; - -pub(crate) const COMPRESS_RLE_THRESHOLD: usize = 20 * 1024; -// TODO: Test this threshold -#[cfg(not(test))] -pub(crate) const UPDATE_ENCODE_THRESHOLD: usize = 32; -#[cfg(test)] -pub(crate) const UPDATE_ENCODE_THRESHOLD: usize = 16; -pub(crate) const MAGIC_BYTES: [u8; 4] = [0x6c, 0x6f, 0x72, 0x6f]; -pub(crate) const ENCODE_SCHEMA_VERSION: u8 = 0; - -#[derive(Clone, Copy, Debug, PartialEq, Eq)] +#[derive(Clone, Copy, Debug, PartialEq, Eq, FromPrimitive, ToPrimitive)] pub(crate) enum EncodeMode { // This is a config option, it won't be used in encoding. Auto = 255, - Updates = 0, - Snapshot = 1, - RleUpdates = 2, - CompressedRleUpdates = 3, + Rle = 1, + Snapshot = 2, } impl EncodeMode { - pub fn to_byte(self) -> u8 { - match self { - EncodeMode::Auto => 255, - EncodeMode::Updates => 0, - EncodeMode::Snapshot => 1, - EncodeMode::RleUpdates => 2, - EncodeMode::CompressedRleUpdates => 3, - } + pub fn to_bytes(self) -> [u8; 2] { + let value = self.to_u16().unwrap(); + value.to_be_bytes() + } + + pub fn is_snapshot(self) -> bool { + matches!(self, EncodeMode::Snapshot) } } -impl TryFrom for EncodeMode { +impl TryFrom<[u8; 2]> for EncodeMode { type Error = LoroError; - fn try_from(value: u8) -> Result { - match value { - 0 => Ok(EncodeMode::Updates), - 1 => Ok(EncodeMode::Snapshot), - 2 => Ok(EncodeMode::RleUpdates), - 3 => Ok(EncodeMode::CompressedRleUpdates), - _ => Err(LoroError::DecodeError("Unknown encode mode".into())), - } + fn try_from(value: [u8; 2]) -> Result { + let value = u16::from_be_bytes(value); + Self::from_u16(value).ok_or(LoroError::IncompatibleFutureEncodingError(value as usize)) } } +/// The encoder used to encode the container states. +/// +/// Each container state can be represented by a sequence of operations. +/// For example, a list state can be represented by a sequence of insert +/// operations that form its current state. +/// We ignore the delete operations. +/// +/// We will use a new encoder for each container state. +/// Each container state should call encode_op multiple times until all the +/// operations constituting its current state are encoded. +pub(crate) struct StateSnapshotEncoder<'a> { + /// The `check_idspan` function is used to check if the id span is valid. + /// If the id span is invalid, the function should return an error that + /// contains the missing id span. + check_idspan: &'a dyn Fn(IdSpan) -> Result<(), IdSpan>, + /// The `encoder_by_op` function is used to encode an operation. + encoder_by_op: &'a mut dyn FnMut(OpWithId), + /// The `record_idspan` function is used to record the id span to track the + /// encoded order. + record_idspan: &'a mut dyn FnMut(IdSpan), + #[allow(unused)] + mode: EncodeMode, +} + +impl StateSnapshotEncoder<'_> { + pub fn encode_op(&mut self, id_span: IdSpan, get_op: impl FnOnce() -> OpWithId) { + debug_log::debug_dbg!(id_span); + if let Err(span) = (self.check_idspan)(id_span) { + let mut op = get_op(); + if span == id_span { + (self.encoder_by_op)(op); + } else { + debug_assert_eq!(span.ctr_start(), id_span.ctr_start()); + op.op = op.op.slice(span.atom_len(), op.op.atom_len()); + (self.encoder_by_op)(op); + } + } + + (self.record_idspan)(id_span); + } + + #[allow(unused)] + pub fn mode(&self) -> EncodeMode { + self.mode + } +} + +pub(crate) struct StateSnapshotDecodeContext<'a> { + pub oplog: &'a OpLog, + pub ops: &'a mut dyn Iterator, + pub blob: &'a [u8], + pub mode: EncodeMode, +} + pub(crate) fn encode_oplog(oplog: &OpLog, vv: &VersionVector, mode: EncodeMode) -> Vec { - let version = ENCODE_SCHEMA_VERSION; - let mut ans = Vec::from(MAGIC_BYTES); - // maybe u8 is enough - ans.push(version); let mode = match mode { - EncodeMode::Auto => { - let self_vv = oplog.vv(); - let diff = self_vv.diff(vv); - let update_total_len = diff - .left - .values() - .map(|value| value.atom_len()) - .sum::(); - - // EncodeMode::RleUpdates(vv) - if update_total_len <= UPDATE_ENCODE_THRESHOLD { - EncodeMode::Updates - } else if update_total_len <= COMPRESS_RLE_THRESHOLD { - EncodeMode::RleUpdates - } else { - EncodeMode::CompressedRleUpdates - } - } + EncodeMode::Auto => EncodeMode::Rle, mode => mode, }; - let encoded = match &mode { - EncodeMode::Updates => encode_oplog_updates(oplog, vv), - EncodeMode::RleUpdates => encode_oplog_v2(oplog, vv), - EncodeMode::CompressedRleUpdates => { - let bytes = encode_oplog_v2(oplog, vv); - miniz_oxide::deflate::compress_to_vec(&bytes, 7) - } + let body = match &mode { + EncodeMode::Rle => encode_reordered::encode_updates(oplog, vv), _ => unreachable!(), }; - ans.push(mode.to_byte()); - ans.extend(encoded); - ans + + encode_header_and_body(mode, body) } -pub(crate) fn decode_oplog(oplog: &mut OpLog, input: &[u8]) -> Result<(), LoroError> { - if input.len() < 6 { - return Err(LoroError::DecodeError("".into())); - } - - let (magic_bytes, input) = input.split_at(4); - let magic_bytes: [u8; 4] = magic_bytes.try_into().unwrap(); - if magic_bytes != MAGIC_BYTES { - return Err(LoroError::DecodeError("Invalid header bytes".into())); - } - let (version, input) = input.split_at(1); - if version != [ENCODE_SCHEMA_VERSION] { - return Err(LoroError::DecodeError("Invalid version".into())); - } - - let mode: EncodeMode = input[0].try_into()?; - let decoded = &input[1..]; +pub(crate) fn decode_oplog( + oplog: &mut OpLog, + parsed: ParsedHeaderAndBody, +) -> Result<(), LoroError> { + let ParsedHeaderAndBody { mode, body, .. } = parsed; match mode { - EncodeMode::Updates => decode_oplog_updates(oplog, decoded), - EncodeMode::Snapshot => unimplemented!(), - EncodeMode::RleUpdates => decode_oplog_v2(oplog, decoded), - EncodeMode::CompressedRleUpdates => miniz_oxide::inflate::decompress_to_vec(decoded) - .map_err(|_| LoroError::DecodeError("Invalid compressed data".into())) - .and_then(|bytes| decode_oplog_v2(oplog, &bytes)), + EncodeMode::Rle | EncodeMode::Snapshot => encode_reordered::decode_updates(oplog, body), EncodeMode::Auto => unreachable!(), } } + +pub(crate) struct ParsedHeaderAndBody<'a> { + pub checksum: [u8; 16], + pub checksum_body: &'a [u8], + pub mode: EncodeMode, + pub body: &'a [u8], +} + +impl ParsedHeaderAndBody<'_> { + /// Return if the checksum is correct. + fn check_checksum(&self) -> LoroResult<()> { + if md5::compute(self.checksum_body).0 != self.checksum { + return Err(LoroError::DecodeDataCorruptionError); + } + + Ok(()) + } +} + +const MIN_HEADER_SIZE: usize = 22; +pub(crate) fn parse_header_and_body(bytes: &[u8]) -> Result { + let reader = &bytes; + if bytes.len() < MIN_HEADER_SIZE { + return Err(LoroError::DecodeError("Invalid import data".into())); + } + + let (magic_bytes, reader) = reader.split_at(4); + let magic_bytes: [u8; 4] = magic_bytes.try_into().unwrap(); + if magic_bytes != MAGIC_BYTES { + return Err(LoroError::DecodeError("Invalid magic bytes".into())); + } + + let (checksum, reader) = reader.split_at(16); + let checksum_body = reader; + let (mode_bytes, reader) = reader.split_at(2); + let mode: EncodeMode = [mode_bytes[0], mode_bytes[1]].try_into()?; + + let ans = ParsedHeaderAndBody { + mode, + checksum_body, + checksum: checksum.try_into().unwrap(), + body: reader, + }; + + ans.check_checksum()?; + Ok(ans) +} + +fn encode_header_and_body(mode: EncodeMode, body: Vec) -> Vec { + let mut ans = Vec::new(); + ans.extend(MAGIC_BYTES); + let checksum = [0; 16]; + ans.extend(checksum); + ans.extend(mode.to_bytes()); + ans.extend(body); + let checksum_body = &ans[20..]; + let checksum = md5::compute(checksum_body).0; + ans[4..20].copy_from_slice(&checksum); + ans +} + +pub(crate) fn export_snapshot(doc: &LoroDoc) -> Vec { + let body = encode_reordered::encode_snapshot( + &doc.oplog().try_lock().unwrap(), + &doc.app_state().try_lock().unwrap(), + &Default::default(), + ); + encode_header_and_body(EncodeMode::Snapshot, body) +} + +pub(crate) fn decode_snapshot( + doc: &LoroDoc, + mode: EncodeMode, + body: &[u8], +) -> Result<(), LoroError> { + match mode { + EncodeMode::Snapshot => encode_reordered::decode_snapshot(doc, body), + _ => unreachable!(), + } +} diff --git a/crates/loro-internal/src/encoding/encode_enhanced.rs b/crates/loro-internal/src/encoding/encode_enhanced.rs deleted file mode 100644 index faf09107..00000000 --- a/crates/loro-internal/src/encoding/encode_enhanced.rs +++ /dev/null @@ -1,735 +0,0 @@ -use fxhash::{FxHashMap, FxHashSet}; -use loro_common::{HasCounterSpan, HasIdSpan, HasLamportSpan, TreeID}; -use rle::{HasLength, RleVec, Sliceable}; -use serde_columnar::{columnar, iter_from_bytes, to_vec}; -use std::{borrow::Cow, ops::Deref, sync::Arc}; - -use crate::{ - change::{Change, Timestamp}, - container::{ - idx::ContainerIdx, - list::list_op::{DeleteSpan, ListOp}, - map::MapSet, - richtext::TextStyleInfoFlag, - tree::tree_op::TreeOp, - ContainerID, ContainerType, - }, - id::{Counter, PeerID, ID}, - op::{ListSlice, RawOpContent, RemoteOp}, - oplog::OpLog, - span::HasId, - version::Frontiers, - InternalString, LoroError, LoroValue, VersionVector, -}; - -type PeerIdx = u32; - -#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, serde::Serialize, serde::Deserialize)] -struct RootContainer { - name: InternalString, - type_: ContainerType, -} - -#[columnar(vec, ser, de, iterable)] -#[derive(Debug, Clone)] -struct NormalContainer { - #[columnar(strategy = "DeltaRle")] - peer_idx: PeerIdx, - #[columnar(strategy = "DeltaRle")] - counter: Counter, - #[columnar(strategy = "Rle")] - type_: u8, -} - -#[columnar(vec, ser, de, iterable)] -#[derive(Debug, Clone)] -struct ChangeEncoding { - #[columnar(strategy = "Rle")] - pub(super) peer_idx: PeerIdx, - #[columnar(strategy = "DeltaRle")] - pub(super) timestamp: Timestamp, - #[columnar(strategy = "DeltaRle")] - pub(super) op_len: u32, - /// The length of deps that exclude the dep on the same client - #[columnar(strategy = "Rle")] - pub(super) deps_len: u32, - /// Whether the change has a dep on the same client. - /// It can save lots of space by using this field instead of [`DepsEncoding`] - #[columnar(strategy = "BoolRle")] - pub(super) dep_on_self: bool, -} - -#[columnar(vec, ser, de, iterable)] -#[derive(Debug, Clone)] -struct OpEncoding { - #[columnar(strategy = "DeltaRle")] - container: usize, - /// Key index or insert/delete pos or target tree id index - #[columnar(strategy = "DeltaRle")] - prop: usize, - /// 0: insert or the parent tree id is not none - /// 1: delete or the parent tree id is none - /// 2: text-anchor-start - /// 3: text-anchor-end - #[columnar(strategy = "Rle")] - kind: u8, - /// the length of the deletion or insertion or target tree id index - #[columnar(strategy = "Rle")] - insert_del_len: isize, -} - -#[derive(PartialEq, Eq)] -enum Kind { - Insert, - Delete, - TextAnchorStart, - TextAnchorEnd, -} - -impl Kind { - fn from_byte(byte: u8) -> Self { - match byte { - 0 => Self::Insert, - 1 => Self::Delete, - 2 => Self::TextAnchorStart, - 3 => Self::TextAnchorEnd, - _ => panic!("invalid kind byte"), - } - } - - fn to_byte(&self) -> u8 { - match self { - Self::Insert => 0, - Self::Delete => 1, - Self::TextAnchorStart => 2, - Self::TextAnchorEnd => 3, - } - } -} - -#[columnar(vec, ser, de, iterable)] -#[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)] -pub(super) struct DepsEncoding { - #[columnar(strategy = "DeltaRle")] - pub(super) client_idx: PeerIdx, - #[columnar(strategy = "DeltaRle")] - pub(super) counter: Counter, -} - -type TreeIDEncoding = DepsEncoding; - -impl DepsEncoding { - pub(super) fn new(client_idx: PeerIdx, counter: Counter) -> Self { - Self { - client_idx, - counter, - } - } -} - -#[columnar(ser, de)] -struct DocEncoding<'a> { - #[columnar(class = "vec", iter = "ChangeEncoding")] - changes: Vec, - #[columnar(class = "vec", iter = "OpEncoding")] - ops: Vec, - #[columnar(class = "vec", iter = "DepsEncoding")] - deps: Vec, - #[columnar(class = "vec")] - normal_containers: Vec, - #[columnar(borrow)] - str: Cow<'a, str>, - #[columnar(borrow)] - style_info: Cow<'a, [u8]>, - style_key: Vec, - style_values: Vec, - root_containers: Vec, - start_counter: Vec, - values: Vec>, - clients: Vec, - keys: Vec, - // the index 0 is DELETE_ROOT - tree_ids: Vec, -} - -pub fn encode_oplog_v2(oplog: &OpLog, vv: &VersionVector) -> Vec { - let mut peer_id_to_idx: FxHashMap = FxHashMap::default(); - let mut peers = Vec::with_capacity(oplog.changes().len()); - let mut diff_changes = Vec::new(); - let self_vv = oplog.vv(); - let start_vv = vv.trim(&oplog.vv()); - let diff = self_vv.diff(&start_vv); - - let mut start_counter = Vec::new(); - - for span in diff.left.iter() { - let id = span.id_start(); - let changes = oplog.get_change_at(id).unwrap(); - let peer_id = *span.0; - let idx = peers.len() as PeerIdx; - peers.push(peer_id); - peer_id_to_idx.insert(peer_id, idx); - start_counter.push( - changes - .id - .counter - .max(start_vv.get(&peer_id).copied().unwrap_or(0)), - ); - } - - for (change, _) in oplog.iter_causally(start_vv.clone(), self_vv.clone()) { - let start_cnt = start_vv.get(&change.id.peer).copied().unwrap_or(0); - if change.id.counter < start_cnt { - let offset = start_cnt - change.id.counter; - diff_changes.push(Cow::Owned(change.slice(offset as usize, change.atom_len()))); - } else { - diff_changes.push(Cow::Borrowed(change)); - } - } - - let (root_containers, container_idx2index, normal_containers) = - extract_containers(&diff_changes, oplog, &mut peer_id_to_idx, &mut peers); - - for change in &diff_changes { - for deps in change.deps.iter() { - peer_id_to_idx.entry(deps.peer).or_insert_with(|| { - let idx = peers.len() as PeerIdx; - peers.push(deps.peer); - idx - }); - } - } - - let change_num = diff_changes.len(); - let mut changes = Vec::with_capacity(change_num); - let mut ops = Vec::with_capacity(change_num); - let mut keys = Vec::new(); - let mut key_to_idx = FxHashMap::default(); - let mut deps = Vec::with_capacity(change_num); - let mut values = Vec::new(); - // the index 0 is DELETE_ROOT - let mut tree_ids = Vec::new(); - let mut tree_id_to_idx = FxHashMap::default(); - let mut string: String = String::new(); - let mut style_key_idx = Vec::new(); - let mut style_values = Vec::new(); - let mut style_info = Vec::new(); - - for change in &diff_changes { - let client_idx = peer_id_to_idx[&change.id.peer]; - let mut dep_on_self = false; - let mut deps_len = 0; - for dep in change.deps.iter() { - if change.id.peer != dep.peer { - deps.push(DepsEncoding::new( - *peer_id_to_idx.get(&dep.peer).unwrap(), - dep.counter, - )); - deps_len += 1; - } else { - dep_on_self = true; - } - } - - let mut op_len = 0; - for op in change.ops.iter() { - let container = op.container; - let container_index = *container_idx2index.get(&container).unwrap(); - let remote_ops = oplog.local_op_to_remote(op); - for op in remote_ops { - let content = op.content; - let (prop, kind, insert_del_len) = match content { - crate::op::RawOpContent::Tree(TreeOp { target, parent }) => { - // TODO: refactor extract register idx - let target_peer_idx = - *peer_id_to_idx.entry(target.peer).or_insert_with(|| { - let idx = peers.len() as PeerIdx; - peers.push(target.peer); - idx - }); - let target_encoding = TreeIDEncoding { - client_idx: target_peer_idx, - counter: target.counter, - }; - let target_idx = - *tree_id_to_idx.entry(target_encoding).or_insert_with(|| { - tree_ids.push(target_encoding); - // the index 0 is DELETE_ROOT - tree_ids.len() - }); - let (is_none, parent_idx) = if let Some(parent) = parent { - if TreeID::is_deleted_root(Some(parent)) { - (Kind::Insert, 0) - } else { - let parent_peer_idx = - *peer_id_to_idx.entry(parent.peer).or_insert_with(|| { - let idx = peers.len() as PeerIdx; - peers.push(parent.peer); - idx - }); - let parent_encoding = TreeIDEncoding { - client_idx: parent_peer_idx, - counter: parent.counter, - }; - let parent_idx = - *tree_id_to_idx.entry(parent_encoding).or_insert_with(|| { - tree_ids.push(parent_encoding); - tree_ids.len() - }); - (Kind::Insert, parent_idx) - } - } else { - (Kind::Delete, 0) - }; - (target_idx, is_none, parent_idx as isize) - } - crate::op::RawOpContent::Map(MapSet { key, value }) => { - if value.is_some() { - values.push(value.clone()); - } - ( - *key_to_idx.entry(key.clone()).or_insert_with(|| { - keys.push(key.clone()); - keys.len() - 1 - }), - if value.is_some() { - Kind::Insert - } else { - Kind::Delete - }, - 0, - ) - } - crate::op::RawOpContent::List(list) => match list { - ListOp::Insert { slice, pos } => { - let len; - match &slice { - ListSlice::RawData(v) => { - len = 0; - values.push(Some(LoroValue::List(Arc::new(v.to_vec())))); - } - ListSlice::RawStr { - str, - unicode_len: _, - } => { - len = str.len(); - assert!(len > 0, "{:?}", &slice); - string.push_str(str.deref()); - } - }; - (pos, Kind::Insert, len as isize) - } - ListOp::Delete(span) => { - // span.len maybe negative - (span.pos as usize, Kind::Delete, span.signed_len) - } - ListOp::StyleStart { - start, - end, - key, - info, - value, - } => { - let key_idx = *key_to_idx.entry(key.clone()).or_insert_with(|| { - keys.push(key.clone()); - keys.len() - 1 - }); - style_key_idx.push(key_idx); - style_info.push(info.to_byte()); - style_values.push(value); - ( - start as usize, - Kind::TextAnchorStart, - end as isize - start as isize, - ) - } - ListOp::StyleEnd => (0, Kind::TextAnchorEnd, 0), - }, - }; - op_len += 1; - ops.push(OpEncoding { - prop, - kind: kind.to_byte(), - insert_del_len, - container: container_index, - }) - } - } - changes.push(ChangeEncoding { - peer_idx: client_idx as PeerIdx, - timestamp: change.timestamp, - deps_len, - op_len, - dep_on_self, - }); - } - - let encoded = DocEncoding { - changes, - ops, - deps, - str: Cow::Owned(string), - clients: peers, - keys, - start_counter, - root_containers, - normal_containers, - values, - style_key: style_key_idx, - style_values, - style_info: Cow::Owned(style_info), - tree_ids, - }; - - to_vec(&encoded).unwrap() -} - -/// Extract containers from oplog changes. -/// -/// Containers are sorted by their peer_id and counter so that -/// they can be compressed by using delta encoding. -fn extract_containers( - diff_changes: &Vec>, - oplog: &OpLog, - peer_id_to_idx: &mut FxHashMap, - peers: &mut Vec, -) -> ( - Vec, - FxHashMap, - Vec, -) { - let mut root_containers = Vec::new(); - let mut container_idx2index = FxHashMap::default(); - let normal_containers = { - // register containers in sorted order - let mut visited = FxHashSet::default(); - let mut normal_container_idx_pairs = Vec::new(); - for change in diff_changes { - for op in change.ops.iter() { - let container = op.container; - if visited.contains(&container) { - continue; - } - - visited.insert(container); - let id = oplog.arena.get_container_id(container).unwrap(); - match id { - ContainerID::Root { - name, - container_type, - } => { - container_idx2index.insert(container, root_containers.len()); - root_containers.push(RootContainer { - name, - type_: container_type, - }); - } - ContainerID::Normal { - peer, - counter, - container_type, - } => normal_container_idx_pairs.push(( - NormalContainer { - peer_idx: *peer_id_to_idx.entry(peer).or_insert_with(|| { - peers.push(peer); - (peers.len() - 1) as PeerIdx - }), - counter, - type_: container_type.to_u8(), - }, - container, - )), - } - } - } - - normal_container_idx_pairs.sort_by(|a, b| { - if a.0.peer_idx != b.0.peer_idx { - a.0.peer_idx.cmp(&b.0.peer_idx) - } else { - a.0.counter.cmp(&b.0.counter) - } - }); - - let mut index = root_containers.len(); - normal_container_idx_pairs - .into_iter() - .map(|(container, idx)| { - container_idx2index.insert(idx, index); - index += 1; - container - }) - .collect::>() - }; - - (root_containers, container_idx2index, normal_containers) -} - -pub fn decode_oplog_v2(oplog: &mut OpLog, input: &[u8]) -> Result<(), LoroError> { - let encoded = iter_from_bytes::(input) - .map_err(|e| LoroError::DecodeError(e.to_string().into()))?; - - let DocEncodingIter { - changes: change_encodings, - ops, - deps, - normal_containers, - mut start_counter, - str, - clients: peers, - keys, - root_containers, - values, - style_key, - style_values, - style_info, - tree_ids, - } = encoded; - - debug_log::debug_dbg!(&start_counter); - let mut op_iter = ops; - let mut deps_iter = deps; - let mut style_key_iter = style_key.into_iter(); - let mut style_value_iter = style_values.into_iter(); - let mut style_info_iter = style_info.iter(); - let get_container = |idx: usize| { - if idx < root_containers.len() { - let Some(container) = root_containers.get(idx) else { - return None; - }; - Some(ContainerID::Root { - name: container.name.clone(), - container_type: container.type_, - }) - } else { - let Some(container) = normal_containers.get(idx - root_containers.len()) else { - return None; - }; - Some(ContainerID::Normal { - peer: peers[container.peer_idx as usize], - counter: container.counter, - container_type: ContainerType::from_u8(container.type_), - }) - } - }; - - let mut value_iter = values.into_iter(); - let mut str_index = 0; - let changes = change_encodings - .map(|change_encoding| { - let counter = start_counter - .get_mut(change_encoding.peer_idx as usize) - .unwrap(); - let ChangeEncoding { - peer_idx, - timestamp, - op_len, - deps_len, - dep_on_self, - } = change_encoding; - - let peer_id = peers[peer_idx as usize]; - let mut ops = RleVec::<[RemoteOp; 1]>::new(); - let mut delta = 0; - for op in op_iter.by_ref().take(op_len as usize) { - let OpEncoding { - container: container_idx, - prop, - insert_del_len, - kind, - } = op; - - let Some(container_id) = get_container(container_idx) else { - return Err(LoroError::DecodeError("".into())); - }; - let container_type = container_id.container_type(); - let content = match container_type { - ContainerType::Tree => { - let target_encoding = tree_ids[prop - 1]; - let target = TreeID { - peer: peers[target_encoding.client_idx as usize], - counter: target_encoding.counter, - }; - let parent = if kind == 1 { - None - } else if insert_del_len == 0 { - TreeID::delete_root() - } else { - let parent_encoding = tree_ids[insert_del_len as usize - 1]; - let parent = TreeID { - peer: peers[parent_encoding.client_idx as usize], - counter: parent_encoding.counter, - }; - Some(parent) - }; - RawOpContent::Tree(TreeOp { target, parent }) - } - ContainerType::Map => { - let key = keys[prop].clone(); - if Kind::from_byte(kind) == Kind::Delete { - RawOpContent::Map(MapSet { key, value: None }) - } else { - RawOpContent::Map(MapSet { - key, - value: value_iter.next().unwrap(), - }) - } - } - ContainerType::List | ContainerType::Text => { - let pos = prop; - match Kind::from_byte(kind) { - Kind::Insert => match container_type { - ContainerType::Text => { - let insert_len = insert_del_len as usize; - let s = &str[str_index..str_index + insert_len]; - str_index += insert_len; - RawOpContent::List(ListOp::Insert { - slice: ListSlice::from_borrowed_str(s), - pos, - }) - } - ContainerType::List => { - let value = value_iter.next().flatten().unwrap(); - RawOpContent::List(ListOp::Insert { - slice: ListSlice::RawData(Cow::Owned( - match Arc::try_unwrap(value.into_list().unwrap()) { - Ok(v) => v, - Err(v) => v.deref().clone(), - }, - )), - pos, - }) - } - _ => unreachable!(), - }, - Kind::Delete => RawOpContent::List(ListOp::Delete(DeleteSpan { - pos: pos as isize, - signed_len: insert_del_len, - })), - Kind::TextAnchorStart => RawOpContent::List(ListOp::StyleStart { - start: pos as u32, - end: insert_del_len as u32 + pos as u32, - key: keys[style_key_iter.next().unwrap()].clone(), - value: style_value_iter.next().unwrap(), - info: TextStyleInfoFlag::from_byte( - *style_info_iter.next().unwrap(), - ), - }), - Kind::TextAnchorEnd => RawOpContent::List(ListOp::StyleEnd), - } - } - }; - let remote_op = RemoteOp { - container: container_id, - counter: *counter + delta, - content, - }; - delta += remote_op.content_len() as i32; - ops.push(remote_op); - } - - let mut deps: Frontiers = (0..deps_len) - .map(|_| { - let raw = deps_iter.next().unwrap(); - ID::new(peers[raw.client_idx as usize], raw.counter) - }) - .collect(); - if dep_on_self && *counter > 0 { - deps.push(ID::new(peer_id, *counter - 1)); - } - - let change = Change { - id: ID { - peer: peer_id, - counter: *counter, - }, - // calc lamport after parsing all changes - lamport: 0, - has_dependents: false, - timestamp, - ops, - deps, - }; - - *counter += delta; - Ok(change) - }) - .collect::, LoroError>>(); - let changes = match changes { - Ok(changes) => changes, - Err(err) => return Err(err), - }; - - let mut pending_remote_changes = Vec::new(); - debug_log::debug_dbg!(&changes); - let mut latest_ids = Vec::new(); - oplog.arena.clone().with_op_converter(|converter| { - 'outer: for mut change in changes { - if change.ctr_end() <= oplog.vv().get(&change.id.peer).copied().unwrap_or(0) { - // skip included changes - continue; - } - - latest_ids.push(change.id_last()); - // calc lamport or pending if its deps are not satisfied - for dep in change.deps.iter() { - match oplog.dag.get_lamport(dep) { - Some(lamport) => { - change.lamport = change.lamport.max(lamport + 1); - } - None => { - pending_remote_changes.push(change); - continue 'outer; - } - } - } - - // convert change into inner format - let mut ops = RleVec::new(); - for op in change.ops { - let lamport = change.lamport; - let content = op.content; - let op = converter.convert_single_op( - &op.container, - change.id.peer, - op.counter, - lamport, - content, - ); - ops.push(op); - } - - let change = Change { - ops, - id: change.id, - deps: change.deps, - lamport: change.lamport, - timestamp: change.timestamp, - has_dependents: false, - }; - - let Some(change) = oplog.trim_the_known_part_of_change(change) else { - continue; - }; - // update dag and push the change - let mark = oplog.insert_dag_node_on_new_change(&change); - oplog.next_lamport = oplog.next_lamport.max(change.lamport_end()); - oplog.latest_timestamp = oplog.latest_timestamp.max(change.timestamp); - oplog.dag.vv.extend_to_include_end_id(ID { - peer: change.id.peer, - counter: change.id.counter + change.atom_len() as Counter, - }); - oplog.insert_new_change(change, mark); - } - }); - - let mut vv = oplog.dag.vv.clone(); - oplog.try_apply_pending(latest_ids, &mut vv); - if !oplog.batch_importing { - oplog.dag.refresh_frontiers(); - } - - oplog.import_unknown_lamport_remote_changes(pending_remote_changes)?; - assert_eq!(str_index, str.len()); - Ok(()) -} diff --git a/crates/loro-internal/src/encoding/encode_reordered.rs b/crates/loro-internal/src/encoding/encode_reordered.rs new file mode 100644 index 00000000..bd032ca7 --- /dev/null +++ b/crates/loro-internal/src/encoding/encode_reordered.rs @@ -0,0 +1,2364 @@ +use std::{borrow::Cow, cmp::Ordering, mem::take, sync::Arc}; + +use fxhash::{FxHashMap, FxHashSet}; +use itertools::Itertools; +use loro_common::{ + ContainerID, ContainerType, Counter, HasCounterSpan, HasIdSpan, HasLamportSpan, IdSpan, + InternalString, LoroError, LoroResult, PeerID, ID, +}; +use num_traits::FromPrimitive; +use rle::HasLength; +use serde_columnar::columnar; + +use crate::{ + arena::SharedArena, + change::Change, + container::{idx::ContainerIdx, list::list_op::DeleteSpan, richtext::TextStyleInfoFlag}, + encoding::{ + encode_reordered::value::{ValueKind, ValueWriter}, + StateSnapshotDecodeContext, + }, + op::{Op, OpWithId, SliceRange}, + state::ContainerState, + utils::id_int_map::IdIntMap, + version::Frontiers, + DocState, LoroDoc, OpLog, VersionVector, +}; + +use self::{ + arena::{decode_arena, encode_arena, ContainerArena, DecodedArenas}, + encode::{encode_changes, encode_ops, init_encode, TempOp, ValueRegister}, + value::ValueReader, +}; + +/// If any section of the document is longer than this, we will not decode it. +/// It will return an data corruption error instead. +const MAX_DECODED_SIZE: usize = 1 << 30; +/// If any collection in the document is longer than this, we will not decode it. +/// It will return an data corruption error instead. +const MAX_COLLECTION_SIZE: usize = 1 << 28; + +pub(crate) fn encode_updates(oplog: &OpLog, vv: &VersionVector) -> Vec { + let mut peer_register: ValueRegister = ValueRegister::new(); + let mut key_register: ValueRegister = ValueRegister::new(); + let (start_counters, diff_changes) = init_encode(oplog, vv, &mut peer_register); + let ExtractedContainer { + containers, + cid_idx_pairs: _, + idx_to_index: container_idx2index, + } = extract_containers_in_order( + &mut diff_changes + .iter() + .flat_map(|x| x.ops.iter()) + .map(|x| x.container), + &oplog.arena, + ); + let mut cid_register: ValueRegister = ValueRegister::from_existing(containers); + let mut dep_arena = arena::DepsArena::default(); + let mut value_writer = ValueWriter::new(); + let mut ops: Vec = Vec::new(); + let arena = &oplog.arena; + let changes = encode_changes( + &diff_changes, + &mut dep_arena, + &mut peer_register, + &mut |op| ops.push(op), + &mut key_register, + &container_idx2index, + ); + + ops.sort_by(move |a, b| { + a.container_index + .cmp(&b.container_index) + .then_with(|| a.prop_that_used_for_sort.cmp(&b.prop_that_used_for_sort)) + .then_with(|| a.peer_idx.cmp(&b.peer_idx)) + .then_with(|| a.lamport.cmp(&b.lamport)) + }); + + let encoded_ops = encode_ops( + ops, + arena, + &mut value_writer, + &mut key_register, + &mut cid_register, + &mut peer_register, + ); + + let container_arena = ContainerArena::from_containers( + cid_register.unwrap_vec(), + &mut peer_register, + &mut key_register, + ); + + let frontiers = oplog + .frontiers() + .iter() + .map(|x| (peer_register.register(&x.peer), x.counter)) + .collect(); + let doc = EncodedDoc { + ops: encoded_ops, + changes, + states: Vec::new(), + start_counters, + raw_values: Cow::Owned(value_writer.finish()), + arenas: Cow::Owned(encode_arena( + peer_register.unwrap_vec(), + container_arena, + key_register.unwrap_vec(), + dep_arena, + &[], + )), + frontiers, + }; + + serde_columnar::to_vec(&doc).unwrap() +} + +pub(crate) fn decode_updates(oplog: &mut OpLog, bytes: &[u8]) -> LoroResult<()> { + let iter = serde_columnar::iter_from_bytes::(bytes)?; + let DecodedArenas { + peer_ids, + containers, + keys, + deps, + state_blob_arena: _, + } = decode_arena(&iter.arenas)?; + let ops_map = extract_ops( + &iter.raw_values, + iter.ops, + &oplog.arena, + &containers, + &keys, + &peer_ids, + false, + )? + .ops_map; + + let changes = decode_changes(iter.changes, iter.start_counters, peer_ids, deps, ops_map)?; + // debug_log::debug_dbg!(&changes); + let (latest_ids, pending_changes) = import_changes_to_oplog(changes, oplog)?; + if oplog.try_apply_pending(latest_ids).should_update && !oplog.batch_importing { + oplog.dag.refresh_frontiers(); + } + + oplog.import_unknown_lamport_pending_changes(pending_changes)?; + Ok(()) +} + +fn import_changes_to_oplog( + changes: Vec, + oplog: &mut OpLog, +) -> Result<(Vec, Vec), LoroError> { + let mut pending_changes = Vec::new(); + let mut latest_ids = Vec::new(); + for mut change in changes { + if change.ctr_end() <= oplog.vv().get(&change.id.peer).copied().unwrap_or(0) { + // skip included changes + continue; + } + + latest_ids.push(change.id_last()); + // calc lamport or pending if its deps are not satisfied + match oplog.dag.get_change_lamport_from_deps(&change.deps) { + Some(lamport) => change.lamport = lamport, + None => { + pending_changes.push(change); + continue; + } + } + + let Some(change) = oplog.trim_the_known_part_of_change(change) else { + continue; + }; + // update dag and push the change + let mark = oplog.update_dag_on_new_change(&change); + oplog.next_lamport = oplog.next_lamport.max(change.lamport_end()); + oplog.latest_timestamp = oplog.latest_timestamp.max(change.timestamp); + oplog.dag.vv.extend_to_include_end_id(ID { + peer: change.id.peer, + counter: change.id.counter + change.atom_len() as Counter, + }); + oplog.insert_new_change(change, mark, false); + } + if !oplog.batch_importing { + oplog.dag.refresh_frontiers(); + } + + Ok((latest_ids, pending_changes)) +} + +fn decode_changes<'a>( + encoded_changes: IterableEncodedChange<'_>, + mut counters: Vec, + peer_ids: arena::PeerIdArena, + mut deps: impl Iterator + 'a, + mut ops_map: std::collections::HashMap< + u64, + Vec, + std::hash::BuildHasherDefault, + >, +) -> LoroResult> { + let mut changes = Vec::with_capacity(encoded_changes.size_hint().0); + for EncodedChange { + peer_idx, + mut len, + timestamp, + deps_len, + dep_on_self, + msg_len: _, + } in encoded_changes + { + if peer_ids.peer_ids.len() <= peer_idx || counters.len() <= peer_idx { + return Err(LoroError::DecodeDataCorruptionError); + } + + let counter = counters[peer_idx]; + counters[peer_idx] += len as Counter; + let peer = peer_ids.peer_ids[peer_idx]; + let mut change: Change = Change { + id: ID::new(peer, counter), + ops: Default::default(), + deps: Frontiers::with_capacity((deps_len + if dep_on_self { 1 } else { 0 }) as usize), + lamport: 0, + timestamp, + has_dependents: false, + }; + + if dep_on_self { + if counter <= 0 { + return Err(LoroError::DecodeDataCorruptionError); + } + + change.deps.push(ID::new(peer, counter - 1)); + } + + for _ in 0..deps_len { + let dep = deps.next().ok_or(LoroError::DecodeDataCorruptionError)?; + change + .deps + .push(ID::new(peer_ids.peer_ids[dep.peer_idx], dep.counter)); + } + + let ops = ops_map + .get_mut(&peer) + .ok_or(LoroError::DecodeDataCorruptionError)?; + while len > 0 { + let op = ops.pop().ok_or(LoroError::DecodeDataCorruptionError)?; + len -= op.atom_len(); + change.ops.push(op); + } + + changes.push(change); + } + + Ok(changes) +} + +struct ExtractedOps { + ops_map: FxHashMap>, + ops: Vec, + containers: Vec, +} + +fn extract_ops( + raw_values: &[u8], + iter: impl Iterator, + arena: &SharedArena, + containers: &ContainerArena, + keys: &arena::KeyArena, + peer_ids: &arena::PeerIdArena, + should_extract_ops_with_ids: bool, +) -> LoroResult { + let mut value_reader = ValueReader::new(raw_values); + let mut ops_map: FxHashMap> = FxHashMap::default(); + let containers: Vec<_> = containers + .containers + .iter() + .map(|x| x.as_container_id(&keys.keys, &peer_ids.peer_ids)) + .try_collect()?; + let mut ops = Vec::new(); + for EncodedOp { + container_index, + prop, + peer_idx, + value_type, + counter, + } in iter + { + if containers.len() <= container_index as usize + || peer_ids.peer_ids.len() <= peer_idx as usize + { + return Err(LoroError::DecodeDataCorruptionError); + } + + let cid = &containers[container_index as usize]; + let c_idx = arena.register_container(cid); + let kind = ValueKind::from_u8(value_type).expect("Unknown value type"); + let content = decode_op( + cid, + kind, + &mut value_reader, + arena, + prop, + keys, + &peer_ids.peer_ids, + &containers, + )?; + + let peer = peer_ids.peer_ids[peer_idx as usize]; + let op = Op { + counter, + container: c_idx, + content, + }; + + if should_extract_ops_with_ids { + ops.push(OpWithId { + peer, + op: op.clone(), + }); + } + + ops_map.entry(peer).or_default().push(op); + } + + for (_, ops) in ops_map.iter_mut() { + // sort op by counter in the reversed order + ops.sort_by_key(|x| -x.counter); + } + + Ok(ExtractedOps { + ops_map, + ops, + containers, + }) +} + +pub(crate) fn encode_snapshot(oplog: &OpLog, state: &DocState, vv: &VersionVector) -> Vec { + assert!(!state.is_in_txn()); + assert_eq!(oplog.frontiers(), &state.frontiers); + let mut peer_register: ValueRegister = ValueRegister::new(); + let mut key_register: ValueRegister = ValueRegister::new(); + let (start_counters, diff_changes) = init_encode(oplog, vv, &mut peer_register); + let ExtractedContainer { + containers, + cid_idx_pairs: c_pairs, + idx_to_index: container_idx2index, + } = extract_containers_in_order( + &mut state.iter().map(|x| x.container_idx()).chain( + diff_changes + .iter() + .flat_map(|x| x.ops.iter()) + .map(|x| x.container), + ), + &oplog.arena, + ); + let mut cid_register: ValueRegister = ValueRegister::from_existing(containers); + let mut dep_arena = arena::DepsArena::default(); + let mut value_writer = ValueWriter::new(); + let mut ops: Vec = Vec::new(); + + // This stores the required op positions of each container state. + // The states can be encoded in these positions in the next step. + // This data structure stores that mapping from op id to the required total order. + let mut map_op_to_pos = IdIntMap::new(); + let mut states = Vec::new(); + let mut state_bytes = Vec::new(); + for (_, c_idx) in c_pairs.iter() { + let container_index = *container_idx2index.get(c_idx).unwrap() as u32; + let state = match state.get_state(*c_idx) { + Some(state) if !state.is_state_empty() => state, + _ => { + states.push(EncodedStateInfo { + container_index, + op_len: 0, + state_bytes_len: 0, + }); + continue; + } + }; + + let mut op_len = 0; + let bytes = state.encode_snapshot(super::StateSnapshotEncoder { + check_idspan: &|id_span| { + if let Some(counter) = vv.intersect_span(id_span) { + Err(IdSpan { + client_id: id_span.client_id, + counter, + }) + } else { + Ok(()) + } + }, + encoder_by_op: &mut |op| { + ops.push(TempOp { + op: Cow::Owned(op.op), + peer_idx: peer_register.register(&op.peer) as u32, + peer_id: op.peer, + container_index, + prop_that_used_for_sort: -1, + // lamport value is fake, but it's only used for sorting and will not be encoded + lamport: 0, + }); + }, + record_idspan: &mut |id_span| { + op_len += id_span.atom_len(); + map_op_to_pos.insert(id_span); + }, + mode: super::EncodeMode::Snapshot, + }); + + states.push(EncodedStateInfo { + container_index, + op_len: op_len as u32, + state_bytes_len: bytes.len() as u32, + }); + state_bytes.extend(bytes); + } + + let changes = encode_changes( + &diff_changes, + &mut dep_arena, + &mut peer_register, + &mut |op| { + let mut count = 0; + let o_len = op.atom_len(); + ops.extend(map_op_to_pos.split(op).map(|x| { + count += x.atom_len(); + x + })); + + debug_assert_eq!(count, o_len); + }, + &mut key_register, + &container_idx2index, + ); + ops.sort_by(move |a, b| { + a.container_index.cmp(&b.container_index).then_with(|| { + match (map_op_to_pos.get(a.id()), map_op_to_pos.get(b.id())) { + (None, None) => a + .prop_that_used_for_sort + .cmp(&b.prop_that_used_for_sort) + .then_with(|| a.peer_idx.cmp(&b.peer_idx)) + .then_with(|| a.lamport.cmp(&b.lamport)), + (None, Some(_)) => Ordering::Greater, + (Some(_), None) => Ordering::Less, + (Some(a), Some(b)) => a.0.cmp(&b.0), + } + }) + }); + + let encoded_ops = encode_ops( + ops, + &oplog.arena, + &mut value_writer, + &mut key_register, + &mut cid_register, + &mut peer_register, + ); + + let container_arena = ContainerArena::from_containers( + cid_register.unwrap_vec(), + &mut peer_register, + &mut key_register, + ); + + let frontiers = oplog + .frontiers() + .iter() + .map(|x| (peer_register.register(&x.peer), x.counter)) + .collect(); + let doc = EncodedDoc { + ops: encoded_ops, + changes, + states, + start_counters, + raw_values: Cow::Owned(value_writer.finish()), + arenas: Cow::Owned(encode_arena( + peer_register.unwrap_vec(), + container_arena, + key_register.unwrap_vec(), + dep_arena, + &state_bytes, + )), + frontiers, + }; + + serde_columnar::to_vec(&doc).unwrap() +} + +pub(crate) fn decode_snapshot(doc: &LoroDoc, bytes: &[u8]) -> LoroResult<()> { + let mut state = doc.app_state().try_lock().map_err(|_| { + LoroError::DecodeError( + "decode_snapshot: failed to lock app state" + .to_string() + .into_boxed_str(), + ) + })?; + + state.check_before_decode_snapshot()?; + let mut oplog = doc.oplog().try_lock().map_err(|_| { + LoroError::DecodeError( + "decode_snapshot: failed to lock oplog" + .to_string() + .into_boxed_str(), + ) + })?; + + if !oplog.is_empty() { + unimplemented!("You can only import snapshot to a empty loro doc now"); + } + + assert!(state.frontiers.is_empty()); + assert!(oplog.frontiers().is_empty()); + let iter = serde_columnar::iter_from_bytes::(bytes)?; + let DecodedArenas { + peer_ids, + containers, + keys, + deps, + state_blob_arena, + } = decode_arena(&iter.arenas)?; + let frontiers: Frontiers = iter + .frontiers + .iter() + .map(|x| { + let peer = peer_ids + .peer_ids + .get(x.0) + .ok_or(LoroError::DecodeDataCorruptionError)?; + let ans: Result = Ok(ID::new(*peer, x.1)); + ans + }) + .try_collect()?; + + let ExtractedOps { + ops_map, + ops, + containers, + } = extract_ops( + &iter.raw_values, + iter.ops, + &oplog.arena, + &containers, + &keys, + &peer_ids, + true, + )?; + + decode_snapshot_states( + &mut state, + frontiers, + iter.states, + containers, + state_blob_arena, + ops, + &oplog, + ) + .unwrap(); + let changes = decode_changes(iter.changes, iter.start_counters, peer_ids, deps, ops_map)?; + let (new_ids, pending_changes) = import_changes_to_oplog(changes, &mut oplog)?; + assert!(pending_changes.is_empty()); + assert_eq!(&state.frontiers, oplog.frontiers()); + if !oplog.pending_changes.is_empty() { + drop(oplog); + drop(state); + // TODO: Fix this origin value + doc.update_oplog_and_apply_delta_to_state_if_needed( + |oplog| { + if oplog.try_apply_pending(new_ids).should_update && !oplog.batch_importing { + oplog.dag.refresh_frontiers(); + } + + Ok(()) + }, + "".into(), + )?; + } + + Ok(()) +} + +fn decode_snapshot_states( + state: &mut DocState, + frontiers: Frontiers, + encoded_state_iter: IterableEncodedStateInfo<'_>, + containers: Vec, + state_blob_arena: &[u8], + ops: Vec, + oplog: &std::sync::MutexGuard<'_, OpLog>, +) -> LoroResult<()> { + let mut state_blob_index: usize = 0; + let mut ops_index: usize = 0; + for EncodedStateInfo { + container_index, + mut op_len, + state_bytes_len, + } in encoded_state_iter + { + if op_len == 0 && state_bytes_len == 0 { + continue; + } + + if container_index >= containers.len() as u32 { + return Err(LoroError::DecodeDataCorruptionError); + } + + let container_id = &containers[container_index as usize]; + let idx = state.arena.register_container(container_id); + if state_blob_arena.len() < state_blob_index + state_bytes_len as usize { + return Err(LoroError::DecodeDataCorruptionError); + } + + let state_bytes = + &state_blob_arena[state_blob_index..state_blob_index + state_bytes_len as usize]; + state_blob_index += state_bytes_len as usize; + + if ops.len() < ops_index { + return Err(LoroError::DecodeDataCorruptionError); + } + + let mut next_ops = ops[ops_index..] + .iter() + .skip_while(|x| x.op.container != idx) + .take_while(|x| { + if op_len == 0 { + false + } else { + op_len -= x.op.atom_len() as u32; + ops_index += 1; + true + } + }) + .cloned(); + state.init_container( + container_id.clone(), + StateSnapshotDecodeContext { + oplog, + ops: &mut next_ops, + blob: state_bytes, + mode: crate::encoding::EncodeMode::Snapshot, + }, + ); + } + + let s = take(&mut state.states); + state.init_with_states_and_version(s, frontiers); + Ok(()) +} + +mod encode { + use fxhash::FxHashMap; + use loro_common::{ContainerID, ContainerType, HasId, PeerID, ID}; + use num_traits::ToPrimitive; + use rle::{HasLength, Sliceable}; + use std::borrow::Cow; + + use crate::{ + change::{Change, Lamport}, + container::idx::ContainerIdx, + encoding::encode_reordered::value::{EncodedTreeMove, ValueWriter}, + op::Op, + InternalString, + }; + + #[derive(Debug)] + pub(super) struct TempOp<'a> { + pub op: Cow<'a, Op>, + pub lamport: Lamport, + pub peer_idx: u32, + pub peer_id: PeerID, + pub container_index: u32, + /// Prop is fake and will be encoded in the snapshot. + /// But it will not be used when decoding, because this op is not included in the vv so it's not in the encoded changes. + pub prop_that_used_for_sort: i32, + } + + impl TempOp<'_> { + pub(crate) fn id(&self) -> loro_common::ID { + loro_common::ID { + peer: self.peer_id, + counter: self.op.counter, + } + } + } + + impl HasId for TempOp<'_> { + fn id_start(&self) -> loro_common::ID { + ID::new(self.peer_id, self.op.counter) + } + } + + impl HasLength for TempOp<'_> { + #[inline(always)] + fn atom_len(&self) -> usize { + self.op.atom_len() + } + + #[inline(always)] + fn content_len(&self) -> usize { + self.op.atom_len() + } + } + impl<'a> generic_btree::rle::HasLength for TempOp<'a> { + #[inline(always)] + fn rle_len(&self) -> usize { + self.op.atom_len() + } + } + + impl<'a> generic_btree::rle::Sliceable for TempOp<'a> { + fn _slice(&self, range: std::ops::Range) -> TempOp<'a> { + Self { + op: if range.start == 0 && range.end == self.op.atom_len() { + match &self.op { + Cow::Borrowed(o) => Cow::Borrowed(o), + Cow::Owned(o) => Cow::Owned(o.clone()), + } + } else { + let op = self.op.slice(range.start, range.end); + Cow::Owned(op) + }, + lamport: self.lamport + range.start as Lamport, + peer_idx: self.peer_idx, + peer_id: self.peer_id, + container_index: self.container_index, + prop_that_used_for_sort: self.prop_that_used_for_sort, + } + } + } + + pub(super) fn encode_ops( + ops: Vec>, + arena: &crate::arena::SharedArena, + value_writer: &mut ValueWriter, + key_register: &mut ValueRegister>, + cid_register: &mut ValueRegister, + peer_register: &mut ValueRegister, + ) -> Vec { + let mut encoded_ops = Vec::with_capacity(ops.len()); + for TempOp { + op, + peer_idx, + container_index, + .. + } in ops + { + let value_type = encode_op( + &op, + arena, + value_writer, + key_register, + cid_register, + peer_register, + ); + + let prop = get_op_prop(&op, key_register); + encoded_ops.push(EncodedOp { + container_index, + peer_idx, + counter: op.counter, + prop, + value_type: value_type.to_u8().unwrap(), + }); + } + encoded_ops + } + + pub(super) fn encode_changes<'a>( + diff_changes: &'a Vec>, + dep_arena: &mut super::arena::DepsArena, + peer_register: &mut ValueRegister, + push_op: &mut impl FnMut(TempOp<'a>), + key_register: &mut ValueRegister>, + container_idx2index: &FxHashMap, + ) -> Vec { + let mut changes: Vec = Vec::with_capacity(diff_changes.len()); + for change in diff_changes.iter() { + let mut dep_on_self = false; + let mut deps_len = 0; + for dep in change.deps.iter() { + if dep.peer == change.id.peer { + dep_on_self = true; + } else { + deps_len += 1; + dep_arena.push(peer_register.register(&dep.peer), dep.counter); + } + } + + let peer_idx = peer_register.register(&change.id.peer); + changes.push(EncodedChange { + dep_on_self, + deps_len, + peer_idx, + len: change.atom_len(), + timestamp: change.timestamp, + msg_len: 0, + }); + + for (i, op) in change.ops().iter().enumerate() { + let lamport = i as Lamport + change.lamport(); + push_op(TempOp { + op: Cow::Borrowed(op), + lamport, + prop_that_used_for_sort: get_sorting_prop(op, key_register), + peer_idx: peer_idx as u32, + peer_id: change.id.peer, + container_index: container_idx2index[&op.container] as u32, + }); + } + } + changes + } + + use crate::{OpLog, VersionVector}; + pub(super) use value_register::ValueRegister; + + use super::{ + value::{MarkStart, Value, ValueKind}, + EncodedChange, EncodedOp, + }; + mod value_register { + use fxhash::FxHashMap; + + pub struct ValueRegister { + map_value_to_index: FxHashMap, + vec: Vec, + } + + impl ValueRegister { + pub fn new() -> Self { + Self { + map_value_to_index: FxHashMap::default(), + vec: Vec::new(), + } + } + + pub fn from_existing(vec: Vec) -> Self { + let mut map = FxHashMap::with_capacity_and_hasher(vec.len(), Default::default()); + for (i, value) in vec.iter().enumerate() { + map.insert(value.clone(), i); + } + + Self { + map_value_to_index: map, + vec, + } + } + + /// Return the index of the given value. If it does not exist, + /// insert it and return the new index. + pub fn register(&mut self, key: &T) -> usize { + if let Some(index) = self.map_value_to_index.get(key) { + *index + } else { + let idx = self.vec.len(); + self.vec.push(key.clone()); + self.map_value_to_index.insert(key.clone(), idx); + idx + } + } + + pub fn contains(&self, key: &T) -> bool { + self.map_value_to_index.contains_key(key) + } + + pub fn unwrap_vec(self) -> Vec { + self.vec + } + } + } + + pub(super) fn init_encode<'a>( + oplog: &'a OpLog, + vv: &'_ VersionVector, + peer_register: &mut ValueRegister, + ) -> (Vec, Vec>) { + let self_vv = oplog.vv(); + let start_vv = vv.trim(&oplog.vv()); + let mut start_counters = Vec::new(); + + let mut diff_changes: Vec> = Vec::new(); + for change in oplog.iter_changes(&start_vv, self_vv) { + let start_cnt = start_vv.get(&change.id.peer).copied().unwrap_or(0); + if !peer_register.contains(&change.id.peer) { + peer_register.register(&change.id.peer); + start_counters.push(start_cnt); + } + if change.id.counter < start_cnt { + let offset = start_cnt - change.id.counter; + diff_changes.push(Cow::Owned(change.slice(offset as usize, change.atom_len()))); + } else { + diff_changes.push(Cow::Borrowed(change)); + } + } + + diff_changes.sort_by_key(|x| x.lamport); + (start_counters, diff_changes) + } + + fn get_op_prop(op: &Op, register_key: &mut ValueRegister) -> i32 { + match &op.content { + crate::op::InnerContent::List(list) => match list { + crate::container::list::list_op::InnerListOp::Insert { pos, .. } => *pos as i32, + crate::container::list::list_op::InnerListOp::InsertText { pos, .. } => *pos as i32, + crate::container::list::list_op::InnerListOp::Delete(span) => span.pos as i32, + crate::container::list::list_op::InnerListOp::StyleStart { start, .. } => { + *start as i32 + } + crate::container::list::list_op::InnerListOp::StyleEnd => 0, + }, + crate::op::InnerContent::Map(map) => { + let key = register_key.register(&map.key); + key as i32 + } + crate::op::InnerContent::Tree(..) => 0, + } + } + + fn get_sorting_prop(op: &Op, register_key: &mut ValueRegister) -> i32 { + match &op.content { + crate::op::InnerContent::List(_) => 0, + crate::op::InnerContent::Map(map) => { + let key = register_key.register(&map.key); + key as i32 + } + crate::op::InnerContent::Tree(..) => 0, + } + } + + #[inline] + fn encode_op( + op: &Op, + arena: &crate::arena::SharedArena, + value_writer: &mut ValueWriter, + register_key: &mut ValueRegister, + register_cid: &mut ValueRegister, + register_peer: &mut ValueRegister, + ) -> ValueKind { + match &op.content { + crate::op::InnerContent::List(list) => match list { + crate::container::list::list_op::InnerListOp::Insert { slice, .. } => { + assert_eq!(op.container.get_type(), ContainerType::List); + let value = arena.get_values(slice.0.start as usize..slice.0.end as usize); + value_writer.write_value_content(&value.into(), register_key, register_cid); + ValueKind::Array + } + crate::container::list::list_op::InnerListOp::InsertText { + slice, + unicode_start: _, + unicode_len: _, + .. + } => { + // TODO: refactor this from_utf8 can be done internally without checking + value_writer.write( + &Value::Str(std::str::from_utf8(slice.as_bytes()).unwrap()), + register_key, + register_cid, + ); + ValueKind::Str + } + crate::container::list::list_op::InnerListOp::Delete(span) => { + value_writer.write( + &Value::DeleteSeq(span.signed_len as i32), + register_key, + register_cid, + ); + ValueKind::DeleteSeq + } + crate::container::list::list_op::InnerListOp::StyleStart { + start, + end, + key, + value, + info, + } => { + value_writer.write( + &Value::MarkStart(MarkStart { + len: end - start, + key_idx: register_key.register(key) as u32, + value: value.clone(), + info: info.to_byte(), + }), + register_key, + register_cid, + ); + ValueKind::MarkStart + } + crate::container::list::list_op::InnerListOp::StyleEnd => ValueKind::Null, + }, + crate::op::InnerContent::Map(map) => { + assert_eq!(op.container.get_type(), ContainerType::Map); + match &map.value { + Some(v) => value_writer.write_value_content(v, register_key, register_cid), + None => ValueKind::DeleteOnce, + } + } + crate::op::InnerContent::Tree(t) => { + assert_eq!(op.container.get_type(), ContainerType::Tree); + let op = EncodedTreeMove::from_tree_op(t, register_peer); + value_writer.write(&Value::TreeMove(op), register_key, register_cid); + ValueKind::TreeMove + } + } + } +} + +#[allow(clippy::too_many_arguments)] +#[inline] +fn decode_op( + cid: &ContainerID, + kind: ValueKind, + value_reader: &mut ValueReader<'_>, + arena: &crate::arena::SharedArena, + prop: i32, + keys: &arena::KeyArena, + peers: &[u64], + cids: &[ContainerID], +) -> LoroResult { + let content = match cid.container_type() { + ContainerType::Text => match kind { + ValueKind::Str => { + let s = value_reader.read_str()?; + let (slice, result) = arena.alloc_str_with_slice(s); + crate::op::InnerContent::List( + crate::container::list::list_op::InnerListOp::InsertText { + slice, + unicode_start: result.start as u32, + unicode_len: (result.end - result.start) as u32, + pos: prop as u32, + }, + ) + } + ValueKind::DeleteSeq => { + let len = value_reader.read_i32()?; + crate::op::InnerContent::List(crate::container::list::list_op::InnerListOp::Delete( + DeleteSpan::new(prop as isize, len as isize), + )) + } + ValueKind::MarkStart => { + let mark = value_reader.read_mark(&keys.keys, cids)?; + let key = keys + .keys + .get(mark.key_idx as usize) + .ok_or_else(|| LoroError::DecodeDataCorruptionError)? + .clone(); + crate::op::InnerContent::List( + crate::container::list::list_op::InnerListOp::StyleStart { + start: prop as u32, + end: prop as u32 + mark.len, + key, + value: mark.value, + info: TextStyleInfoFlag::from_byte(mark.info), + }, + ) + } + ValueKind::Null => crate::op::InnerContent::List( + crate::container::list::list_op::InnerListOp::StyleEnd, + ), + _ => unreachable!(), + }, + ContainerType::Map => { + let key = keys + .keys + .get(prop as usize) + .ok_or(LoroError::DecodeDataCorruptionError)? + .clone(); + match kind { + ValueKind::DeleteOnce => { + crate::op::InnerContent::Map(crate::container::map::MapSet { key, value: None }) + } + kind => { + let value = value_reader.read_value_content(kind, &keys.keys, cids)?; + crate::op::InnerContent::Map(crate::container::map::MapSet { + key, + value: Some(value), + }) + } + } + } + ContainerType::List => { + let pos = prop as usize; + match kind { + ValueKind::Array => { + let arr = + value_reader.read_value_content(ValueKind::Array, &keys.keys, cids)?; + let range = arena.alloc_values( + Arc::try_unwrap( + arr.into_list() + .map_err(|_| LoroError::DecodeDataCorruptionError)?, + ) + .unwrap() + .into_iter(), + ); + crate::op::InnerContent::List( + crate::container::list::list_op::InnerListOp::Insert { + slice: SliceRange::new(range.start as u32..range.end as u32), + pos, + }, + ) + } + ValueKind::DeleteSeq => { + let len = value_reader.read_i32()?; + crate::op::InnerContent::List( + crate::container::list::list_op::InnerListOp::Delete(DeleteSpan::new( + pos as isize, + len as isize, + )), + ) + } + _ => unreachable!(), + } + } + ContainerType::Tree => match kind { + ValueKind::TreeMove => { + let op = value_reader.read_tree_move()?; + crate::op::InnerContent::Tree(op.as_tree_op(peers)?) + } + _ => unreachable!(), + }, + }; + + Ok(content) +} + +type PeerIdx = usize; + +struct ExtractedContainer { + containers: Vec, + cid_idx_pairs: Vec<(ContainerID, ContainerIdx)>, + idx_to_index: FxHashMap, +} + +/// Extract containers from oplog changes. +/// +/// Containers are sorted by their peer_id and counter so that +/// they can be compressed by using delta encoding. +fn extract_containers_in_order( + c_iter: &mut dyn Iterator, + arena: &SharedArena, +) -> ExtractedContainer { + let mut containers = Vec::new(); + let mut visited = FxHashSet::default(); + for c in c_iter { + if visited.contains(&c) { + continue; + } + + visited.insert(c); + let id = arena.get_container_id(c).unwrap(); + containers.push((id, c)); + } + + containers.sort_unstable_by(|(a, _), (b, _)| { + a.is_root() + .cmp(&b.is_root()) + .then_with(|| a.container_type().cmp(&b.container_type())) + .then_with(|| match (a, b) { + (ContainerID::Root { name: a, .. }, ContainerID::Root { name: b, .. }) => a.cmp(b), + ( + ContainerID::Normal { + peer: peer_a, + counter: counter_a, + .. + }, + ContainerID::Normal { + peer: peer_b, + counter: counter_b, + .. + }, + ) => peer_a.cmp(peer_b).then_with(|| counter_a.cmp(counter_b)), + _ => unreachable!(), + }) + }); + + let container_idx2index = containers + .iter() + .enumerate() + .map(|(i, (_, c))| (*c, i)) + .collect(); + + ExtractedContainer { + containers: containers.iter().map(|x| x.0.clone()).collect(), + cid_idx_pairs: containers, + idx_to_index: container_idx2index, + } +} + +#[columnar(ser, de)] +struct EncodedDoc<'a> { + #[columnar(class = "vec", iter = "EncodedOp")] + ops: Vec, + #[columnar(class = "vec", iter = "EncodedChange")] + changes: Vec, + /// Container states snapshot. + /// + /// It's empty when the encoding mode is not snapshot. + #[columnar(class = "vec", iter = "EncodedStateInfo")] + states: Vec, + /// The first counter value for each change of each peer in `changes` + start_counters: Vec, + frontiers: Vec<(PeerIdx, Counter)>, + #[columnar(borrow)] + raw_values: Cow<'a, [u8]>, + + /// A list of encoded arenas, in the following order + /// - `peer_id_arena` + /// - `container_arena` + /// - `key_arena` + /// - `deps_arena` + /// - `state_arena` + /// - `others`, left for future use + #[columnar(borrow)] + arenas: Cow<'a, [u8]>, +} + +#[columnar(vec, ser, de, iterable)] +#[derive(Debug, Clone)] +struct EncodedOp { + #[columnar(strategy = "DeltaRle")] + container_index: u32, + #[columnar(strategy = "DeltaRle")] + prop: i32, + #[columnar(strategy = "Rle")] + peer_idx: u32, + #[columnar(strategy = "DeltaRle")] + value_type: u8, + #[columnar(strategy = "DeltaRle")] + counter: i32, +} + +#[columnar(vec, ser, de, iterable)] +#[derive(Debug, Clone)] +struct EncodedChange { + #[columnar(strategy = "Rle")] + peer_idx: usize, + #[columnar(strategy = "DeltaRle")] + len: usize, + #[columnar(strategy = "DeltaRle")] + timestamp: i64, + #[columnar(strategy = "DeltaRle")] + deps_len: i32, + #[columnar(strategy = "BoolRle")] + dep_on_self: bool, + #[columnar(strategy = "DeltaRle")] + msg_len: i32, +} + +#[columnar(vec, ser, de, iterable)] +#[derive(Debug, Clone)] +struct EncodedStateInfo { + #[columnar(strategy = "DeltaRle")] + container_index: u32, + #[columnar(strategy = "DeltaRle")] + op_len: u32, + #[columnar(strategy = "DeltaRle")] + state_bytes_len: u32, +} + +mod value { + use std::sync::Arc; + + use fxhash::FxHashMap; + use loro_common::{ + ContainerID, Counter, InternalString, LoroError, LoroResult, LoroValue, PeerID, TreeID, + }; + use num_derive::{FromPrimitive, ToPrimitive}; + use num_traits::{FromPrimitive, ToPrimitive}; + + use crate::container::tree::tree_op::TreeOp; + + use super::{encode::ValueRegister, MAX_COLLECTION_SIZE}; + + #[allow(unused)] + #[non_exhaustive] + pub enum Value<'a> { + Null, + True, + False, + DeleteOnce, + ContainerIdx(usize), + I32(i32), + F64(f64), + Str(&'a str), + DeleteSeq(i32), + DeltaInt(i32), + Array(Vec>), + Map(FxHashMap>), + Binary(&'a [u8]), + MarkStart(MarkStart), + TreeMove(EncodedTreeMove), + Unknown { kind: u8, data: &'a [u8] }, + } + + pub struct MarkStart { + pub len: u32, + pub key_idx: u32, + pub value: LoroValue, + pub info: u8, + } + + pub struct EncodedTreeMove { + pub subject_peer_idx: usize, + pub subject_cnt: usize, + pub is_parent_null: bool, + pub parent_peer_idx: usize, + pub parent_cnt: usize, + } + + impl EncodedTreeMove { + pub fn as_tree_op(&self, peer_ids: &[u64]) -> LoroResult { + Ok(TreeOp { + target: TreeID::new( + *(peer_ids + .get(self.subject_peer_idx) + .ok_or(LoroError::DecodeDataCorruptionError)?), + self.subject_cnt as Counter, + ), + parent: if self.is_parent_null { + None + } else { + Some(TreeID::new( + *(peer_ids + .get(self.parent_peer_idx) + .ok_or(LoroError::DecodeDataCorruptionError)?), + self.parent_cnt as Counter, + )) + }, + }) + } + + pub fn from_tree_op(op: &TreeOp, register_peer_id: &mut ValueRegister) -> Self { + EncodedTreeMove { + subject_peer_idx: register_peer_id.register(&op.target.peer), + subject_cnt: op.target.counter as usize, + is_parent_null: op.parent.is_none(), + parent_peer_idx: op.parent.map_or(0, |x| register_peer_id.register(&x.peer)), + parent_cnt: op.parent.map_or(0, |x| x.counter as usize), + } + } + } + + #[non_exhaustive] + #[derive(Debug, FromPrimitive, ToPrimitive)] + pub enum ValueKind { + Null = 0, + True = 1, + False = 2, + DeleteOnce = 3, + I32 = 4, + ContainerIdx = 5, + F64 = 6, + Str = 7, + DeleteSeq = 8, + DeltaInt = 9, + Array = 10, + Map = 11, + MarkStart = 12, + TreeMove = 13, + Binary = 14, + Unknown = 65536, + } + + impl<'a> Value<'a> { + pub fn kind(&self) -> ValueKind { + match self { + Value::Null => ValueKind::Null, + Value::True => ValueKind::True, + Value::False => ValueKind::False, + Value::DeleteOnce => ValueKind::DeleteOnce, + Value::I32(_) => ValueKind::I32, + Value::ContainerIdx(_) => ValueKind::ContainerIdx, + Value::F64(_) => ValueKind::F64, + Value::Str(_) => ValueKind::Str, + Value::DeleteSeq(_) => ValueKind::DeleteSeq, + Value::DeltaInt(_) => ValueKind::DeltaInt, + Value::Array(_) => ValueKind::Array, + Value::Map(_) => ValueKind::Map, + Value::MarkStart { .. } => ValueKind::MarkStart, + Value::TreeMove(_) => ValueKind::TreeMove, + Value::Binary(_) => ValueKind::Binary, + Value::Unknown { .. } => ValueKind::Unknown, + } + } + } + + fn get_loro_value_kind(value: &LoroValue) -> ValueKind { + match value { + LoroValue::Null => ValueKind::Null, + LoroValue::Bool(true) => ValueKind::True, + LoroValue::Bool(false) => ValueKind::False, + LoroValue::I32(_) => ValueKind::I32, + LoroValue::Double(_) => ValueKind::F64, + LoroValue::String(_) => ValueKind::Str, + LoroValue::List(_) => ValueKind::Array, + LoroValue::Map(_) => ValueKind::Map, + LoroValue::Binary(_) => ValueKind::Binary, + LoroValue::Container(_) => ValueKind::ContainerIdx, + } + } + + pub struct ValueWriter { + buffer: Vec, + } + + impl ValueWriter { + pub fn new() -> Self { + ValueWriter { buffer: Vec::new() } + } + + pub fn write_value_type_and_content( + &mut self, + value: &LoroValue, + register_key: &mut ValueRegister, + register_cid: &mut ValueRegister, + ) -> ValueKind { + self.write_u8(get_loro_value_kind(value).to_u8().unwrap()); + self.write_value_content(value, register_key, register_cid) + } + + pub fn write_value_content( + &mut self, + value: &LoroValue, + register_key: &mut ValueRegister, + register_cid: &mut ValueRegister, + ) -> ValueKind { + match value { + LoroValue::Null => ValueKind::Null, + LoroValue::Bool(true) => ValueKind::True, + LoroValue::Bool(false) => ValueKind::False, + LoroValue::I32(value) => { + self.write_i32(*value); + ValueKind::I32 + } + LoroValue::Double(value) => { + self.write_f64(*value); + ValueKind::F64 + } + LoroValue::String(value) => { + self.write_str(value); + ValueKind::Str + } + LoroValue::List(value) => { + self.write_usize(value.len()); + for value in value.iter() { + self.write_value_type_and_content(value, register_key, register_cid); + } + ValueKind::Array + } + LoroValue::Map(value) => { + self.write_usize(value.len()); + for (key, value) in value.iter() { + let key_idx = register_key.register(&key.as_str().into()); + self.write_usize(key_idx); + self.write_value_type_and_content(value, register_key, register_cid); + } + ValueKind::Map + } + LoroValue::Binary(value) => { + self.write_binary(value); + ValueKind::Binary + } + LoroValue::Container(c) => { + let idx = register_cid.register(c); + self.write_usize(idx); + ValueKind::ContainerIdx + } + } + } + + pub fn write( + &mut self, + value: &Value, + register_key: &mut ValueRegister, + register_cid: &mut ValueRegister, + ) { + match value { + Value::Null => {} + Value::True => {} + Value::False => {} + Value::DeleteOnce => {} + Value::I32(value) => self.write_i32(*value), + Value::F64(value) => self.write_f64(*value), + Value::Str(value) => self.write_str(value), + Value::DeleteSeq(value) => self.write_i32(*value), + Value::DeltaInt(value) => self.write_i32(*value), + Value::Array(value) => self.write_array(value, register_key, register_cid), + Value::Map(value) => self.write_map(value, register_key, register_cid), + Value::MarkStart(value) => self.write_mark(value, register_key, register_cid), + Value::TreeMove(op) => self.write_tree_move(op), + Value::Binary(value) => self.write_binary(value), + Value::ContainerIdx(value) => self.write_usize(*value), + Value::Unknown { kind: _, data: _ } => unreachable!(), + } + } + + fn write_i32(&mut self, value: i32) { + leb128::write::signed(&mut self.buffer, value as i64).unwrap(); + } + + fn write_usize(&mut self, value: usize) { + leb128::write::unsigned(&mut self.buffer, value as u64).unwrap(); + } + + fn write_f64(&mut self, value: f64) { + self.buffer.extend_from_slice(&value.to_be_bytes()); + } + + fn write_str(&mut self, value: &str) { + self.write_usize(value.len()); + self.buffer.extend_from_slice(value.as_bytes()); + } + + fn write_u8(&mut self, value: u8) { + self.buffer.push(value); + } + + pub fn write_kind(&mut self, kind: ValueKind) { + self.write_u8(kind.to_u8().unwrap()); + } + + fn write_array( + &mut self, + value: &[Value], + register_key: &mut ValueRegister, + register_cid: &mut ValueRegister, + ) { + self.write_usize(value.len()); + for value in value { + self.write_kind(value.kind()); + self.write(value, register_key, register_cid); + } + } + + fn write_map( + &mut self, + value: &FxHashMap, + register_key: &mut ValueRegister, + register_cid: &mut ValueRegister, + ) { + self.write_usize(value.len()); + for (key, value) in value { + let key_idx = register_key.register(key); + self.write_usize(key_idx); + self.write_kind(value.kind()); + self.write(value, register_key, register_cid); + } + } + + fn write_binary(&mut self, value: &[u8]) { + self.write_usize(value.len()); + self.buffer.extend_from_slice(value); + } + + fn write_mark( + &mut self, + mark: &MarkStart, + register_key: &mut ValueRegister, + register_cid: &mut ValueRegister, + ) { + self.write_u8(mark.info); + self.write_usize(mark.len as usize); + self.write_usize(mark.key_idx as usize); + self.write_value_type_and_content(&mark.value, register_key, register_cid); + } + + fn write_tree_move(&mut self, op: &EncodedTreeMove) { + self.write_usize(op.subject_peer_idx); + self.write_usize(op.subject_cnt); + self.write_u8(op.is_parent_null as u8); + if op.is_parent_null { + return; + } + + self.write_usize(op.parent_peer_idx); + self.write_usize(op.parent_cnt); + } + + pub(crate) fn finish(self) -> Vec { + self.buffer + } + } + + pub struct ValueReader<'a> { + raw: &'a [u8], + } + + impl<'a> ValueReader<'a> { + pub fn new(raw: &'a [u8]) -> Self { + ValueReader { raw } + } + + #[allow(unused)] + pub fn read( + &mut self, + kind: u8, + keys: &[InternalString], + cids: &[ContainerID], + ) -> LoroResult> { + let Some(kind) = ValueKind::from_u8(kind) else { + return Ok(Value::Unknown { + kind, + data: self.read_binary()?, + }); + }; + + Ok(match kind { + ValueKind::Null => Value::Null, + ValueKind::True => Value::True, + ValueKind::False => Value::False, + ValueKind::DeleteOnce => Value::DeleteOnce, + ValueKind::I32 => Value::I32(self.read_i32()?), + ValueKind::F64 => Value::F64(self.read_f64()?), + ValueKind::Str => Value::Str(self.read_str()?), + ValueKind::DeleteSeq => Value::DeleteSeq(self.read_i32()?), + ValueKind::DeltaInt => Value::DeltaInt(self.read_i32()?), + ValueKind::Array => Value::Array(self.read_array(keys, cids)?), + ValueKind::Map => Value::Map(self.read_map(keys, cids)?), + ValueKind::Binary => Value::Binary(self.read_binary()?), + ValueKind::MarkStart => Value::MarkStart(self.read_mark(keys, cids)?), + ValueKind::TreeMove => Value::TreeMove(self.read_tree_move()?), + ValueKind::ContainerIdx => Value::ContainerIdx(self.read_usize()?), + ValueKind::Unknown => unreachable!(), + }) + } + + pub fn read_value_type_and_content( + &mut self, + keys: &[InternalString], + cids: &[ContainerID], + ) -> LoroResult { + let kind = self.read_u8()?; + self.read_value_content( + ValueKind::from_u8(kind).expect("Unknown value type"), + keys, + cids, + ) + } + + pub fn read_value_content( + &mut self, + kind: ValueKind, + keys: &[InternalString], + cids: &[ContainerID], + ) -> LoroResult { + Ok(match kind { + ValueKind::Null => LoroValue::Null, + ValueKind::True => LoroValue::Bool(true), + ValueKind::False => LoroValue::Bool(false), + ValueKind::I32 => LoroValue::I32(self.read_i32()?), + ValueKind::F64 => LoroValue::Double(self.read_f64()?), + ValueKind::Str => LoroValue::String(Arc::new(self.read_str()?.to_owned())), + ValueKind::DeltaInt => LoroValue::I32(self.read_i32()?), + ValueKind::Array => { + let len = self.read_usize()?; + if len > MAX_COLLECTION_SIZE { + return Err(LoroError::DecodeDataCorruptionError); + } + + let mut ans = Vec::with_capacity(len); + for _ in 0..len { + ans.push(self.recursive_read_value_type_and_content(keys, cids)?); + } + ans.into() + } + ValueKind::Map => { + let len = self.read_usize()?; + if len > MAX_COLLECTION_SIZE { + return Err(LoroError::DecodeDataCorruptionError); + } + + let mut ans = FxHashMap::with_capacity_and_hasher(len, Default::default()); + for _ in 0..len { + let key_idx = self.read_usize()?; + let key = keys + .get(key_idx) + .ok_or(LoroError::DecodeDataCorruptionError)? + .to_string(); + let value = self.recursive_read_value_type_and_content(keys, cids)?; + ans.insert(key, value); + } + ans.into() + } + ValueKind::Binary => LoroValue::Binary(Arc::new(self.read_binary()?.to_owned())), + ValueKind::ContainerIdx => LoroValue::Container( + cids.get(self.read_usize()?) + .ok_or(LoroError::DecodeDataCorruptionError)? + .clone(), + ), + a => unreachable!("Unexpected value kind {:?}", a), + }) + } + + /// Read a value that may be very deep efficiently. + /// + /// This method avoids using recursive calls to read deeply nested values. + /// Otherwise, it may cause stack overflow. + fn recursive_read_value_type_and_content( + &mut self, + keys: &[InternalString], + cids: &[ContainerID], + ) -> LoroResult { + #[derive(Debug)] + enum Task { + Init, + ReadList { + left: usize, + vec: Vec, + + key_idx_in_parent: usize, + }, + ReadMap { + left: usize, + map: FxHashMap, + + key_idx_in_parent: usize, + }, + } + impl Task { + fn should_read(&self) -> bool { + !matches!( + self, + Self::ReadList { left: 0, .. } | Self::ReadMap { left: 0, .. } + ) + } + + fn key_idx(&self) -> usize { + match self { + Self::ReadList { + key_idx_in_parent, .. + } => *key_idx_in_parent, + Self::ReadMap { + key_idx_in_parent, .. + } => *key_idx_in_parent, + _ => unreachable!(), + } + } + + fn into_value(self) -> LoroValue { + match self { + Self::ReadList { vec, .. } => vec.into(), + Self::ReadMap { map, .. } => map.into(), + _ => unreachable!(), + } + } + } + let mut stack = vec![Task::Init]; + while let Some(mut task) = stack.pop() { + if task.should_read() { + let key_idx = if matches!(task, Task::ReadMap { .. }) { + self.read_usize()? + } else { + 0 + }; + let kind = self.read_u8()?; + let kind = ValueKind::from_u8(kind).expect("Unknown value type"); + let value = match kind { + ValueKind::Null => LoroValue::Null, + ValueKind::True => LoroValue::Bool(true), + ValueKind::False => LoroValue::Bool(false), + ValueKind::I32 => LoroValue::I32(self.read_i32()?), + ValueKind::F64 => LoroValue::Double(self.read_f64()?), + ValueKind::Str => LoroValue::String(Arc::new(self.read_str()?.to_owned())), + ValueKind::DeltaInt => LoroValue::I32(self.read_i32()?), + ValueKind::Array => { + let len = self.read_usize()?; + if len > MAX_COLLECTION_SIZE { + return Err(LoroError::DecodeDataCorruptionError); + } + + let ans = Vec::with_capacity(len); + stack.push(task); + stack.push(Task::ReadList { + left: len, + vec: ans, + key_idx_in_parent: key_idx, + }); + continue; + } + ValueKind::Map => { + let len = self.read_usize()?; + if len > MAX_COLLECTION_SIZE { + return Err(LoroError::DecodeDataCorruptionError); + } + + let ans = FxHashMap::with_capacity_and_hasher(len, Default::default()); + stack.push(task); + stack.push(Task::ReadMap { + left: len, + map: ans, + key_idx_in_parent: key_idx, + }); + continue; + } + ValueKind::Binary => { + LoroValue::Binary(Arc::new(self.read_binary()?.to_owned())) + } + ValueKind::ContainerIdx => LoroValue::Container( + cids.get(self.read_usize()?) + .ok_or(LoroError::DecodeDataCorruptionError)? + .clone(), + ), + a => unreachable!("Unexpected value kind {:?}", a), + }; + + task = match task { + Task::Init => { + return Ok(value); + } + Task::ReadList { + mut left, + mut vec, + key_idx_in_parent, + } => { + left -= 1; + vec.push(value); + let task = Task::ReadList { + left, + vec, + key_idx_in_parent, + }; + if left != 0 { + stack.push(task); + continue; + } + + task + } + Task::ReadMap { + mut left, + mut map, + key_idx_in_parent, + } => { + left -= 1; + let key = keys + .get(key_idx) + .ok_or(LoroError::DecodeDataCorruptionError)? + .to_string(); + map.insert(key, value); + let task = Task::ReadMap { + left, + map, + key_idx_in_parent, + }; + if left != 0 { + stack.push(task); + continue; + } + task + } + }; + } + + let key_index = task.key_idx(); + let value = task.into_value(); + if let Some(last) = stack.last_mut() { + match last { + Task::Init => { + return Ok(value); + } + Task::ReadList { left, vec, .. } => { + *left -= 1; + vec.push(value); + } + Task::ReadMap { left, map, .. } => { + *left -= 1; + let key = keys + .get(key_index) + .ok_or(LoroError::DecodeDataCorruptionError)? + .to_string(); + map.insert(key, value); + } + } + } else { + return Ok(value); + } + } + + unreachable!(); + } + + pub fn read_i32(&mut self) -> LoroResult { + leb128::read::signed(&mut self.raw) + .map(|x| x as i32) + .map_err(|_| LoroError::DecodeDataCorruptionError) + } + + fn read_f64(&mut self) -> LoroResult { + if self.raw.len() < 8 { + return Err(LoroError::DecodeDataCorruptionError); + } + + let mut bytes = [0; 8]; + bytes.copy_from_slice(&self.raw[..8]); + self.raw = &self.raw[8..]; + Ok(f64::from_be_bytes(bytes)) + } + + pub fn read_usize(&mut self) -> LoroResult { + Ok(leb128::read::unsigned(&mut self.raw) + .map_err(|_| LoroError::DecodeDataCorruptionError)? as usize) + } + + pub fn read_str(&mut self) -> LoroResult<&'a str> { + let len = self.read_usize()?; + if self.raw.len() < len { + return Err(LoroError::DecodeDataCorruptionError); + } + + let ans = std::str::from_utf8(&self.raw[..len]).unwrap(); + self.raw = &self.raw[len..]; + Ok(ans) + } + + fn read_u8(&mut self) -> LoroResult { + if self.raw.is_empty() { + return Err(LoroError::DecodeDataCorruptionError); + } + + let ans = self.raw[0]; + self.raw = &self.raw[1..]; + Ok(ans) + } + + fn read_array( + &mut self, + keys: &[InternalString], + cids: &[ContainerID], + ) -> LoroResult>> { + let len = self.read_usize()?; + if len > MAX_COLLECTION_SIZE { + return Err(LoroError::DecodeDataCorruptionError); + } + + let mut ans = Vec::with_capacity(len); + for _ in 0..len { + let kind = self.read_u8()?; + ans.push(self.read(kind, keys, cids)?); + } + Ok(ans) + } + + fn read_map( + &mut self, + keys: &[InternalString], + cids: &[ContainerID], + ) -> LoroResult>> { + let len = self.read_usize()?; + if len > MAX_COLLECTION_SIZE { + return Err(LoroError::DecodeDataCorruptionError); + } + + let mut ans = FxHashMap::with_capacity_and_hasher(len, Default::default()); + for _ in 0..len { + let key_idx = self.read_usize()?; + let key = keys + .get(key_idx) + .ok_or(LoroError::DecodeDataCorruptionError)? + .clone(); + let kind = self.read_u8()?; + let value = self.read(kind, keys, cids)?; + ans.insert(key, value); + } + Ok(ans) + } + + fn read_binary(&mut self) -> LoroResult<&'a [u8]> { + let len = self.read_usize()?; + if self.raw.len() < len { + return Err(LoroError::DecodeDataCorruptionError); + } + + let ans = &self.raw[..len]; + self.raw = &self.raw[len..]; + Ok(ans) + } + + pub fn read_mark( + &mut self, + keys: &[InternalString], + cids: &[ContainerID], + ) -> LoroResult { + let info = self.read_u8()?; + let len = self.read_usize()?; + let key_idx = self.read_usize()?; + let value = self.read_value_type_and_content(keys, cids)?; + Ok(MarkStart { + len: len as u32, + key_idx: key_idx as u32, + value, + info, + }) + } + + pub fn read_tree_move(&mut self) -> LoroResult { + let subject_peer_idx = self.read_usize()?; + let subject_cnt = self.read_usize()?; + let is_parent_null = self.read_u8()? != 0; + let mut parent_peer_idx = 0; + let mut parent_cnt = 0; + if !is_parent_null { + parent_peer_idx = self.read_usize()?; + parent_cnt = self.read_usize()?; + } + + Ok(EncodedTreeMove { + subject_peer_idx, + subject_cnt, + is_parent_null, + parent_peer_idx, + parent_cnt, + }) + } + } +} + +mod arena { + use crate::InternalString; + use loro_common::{ContainerID, ContainerType, LoroError, LoroResult, PeerID}; + use serde::{Deserialize, Serialize}; + use serde_columnar::columnar; + + use super::{encode::ValueRegister, PeerIdx, MAX_DECODED_SIZE}; + + pub fn encode_arena( + peer_ids_arena: Vec, + containers: ContainerArena, + keys: Vec, + deps: DepsArena, + state_blob_arena: &[u8], + ) -> Vec { + let peer_ids = PeerIdArena { + peer_ids: peer_ids_arena, + }; + + let key_arena = KeyArena { keys }; + let encoded = EncodedArenas { + peer_id_arena: &peer_ids.encode(), + container_arena: &containers.encode(), + key_arena: &key_arena.encode(), + deps_arena: &deps.encode(), + state_blob_arena, + }; + + encoded.encode_arenas() + } + + pub struct DecodedArenas<'a> { + pub peer_ids: PeerIdArena, + pub containers: ContainerArena, + pub keys: KeyArena, + pub deps: Box + 'a>, + pub state_blob_arena: &'a [u8], + } + + pub fn decode_arena(bytes: &[u8]) -> LoroResult { + let arenas = EncodedArenas::decode_arenas(bytes)?; + Ok(DecodedArenas { + peer_ids: PeerIdArena::decode(arenas.peer_id_arena)?, + containers: ContainerArena::decode(arenas.container_arena)?, + keys: KeyArena::decode(arenas.key_arena)?, + deps: Box::new(DepsArena::decode_iter(arenas.deps_arena)?), + state_blob_arena: arenas.state_blob_arena, + }) + } + + struct EncodedArenas<'a> { + peer_id_arena: &'a [u8], + container_arena: &'a [u8], + key_arena: &'a [u8], + deps_arena: &'a [u8], + state_blob_arena: &'a [u8], + } + + impl EncodedArenas<'_> { + fn encode_arenas(self) -> Vec { + let mut ans = Vec::with_capacity( + self.peer_id_arena.len() + + self.container_arena.len() + + self.key_arena.len() + + self.deps_arena.len() + + 4 * 4, + ); + + write_arena(&mut ans, self.peer_id_arena); + write_arena(&mut ans, self.container_arena); + write_arena(&mut ans, self.key_arena); + write_arena(&mut ans, self.deps_arena); + write_arena(&mut ans, self.state_blob_arena); + ans + } + + fn decode_arenas(bytes: &[u8]) -> LoroResult { + let (peer_id_arena, rest) = read_arena(bytes)?; + let (container_arena, rest) = read_arena(rest)?; + let (key_arena, rest) = read_arena(rest)?; + let (deps_arena, rest) = read_arena(rest)?; + let (state_blob_arena, _) = read_arena(rest)?; + Ok(EncodedArenas { + peer_id_arena, + container_arena, + key_arena, + deps_arena, + state_blob_arena, + }) + } + } + + #[derive(Serialize, Deserialize)] + pub(super) struct PeerIdArena { + pub(super) peer_ids: Vec, + } + + impl PeerIdArena { + fn encode(&self) -> Vec { + let mut ans = Vec::with_capacity(self.peer_ids.len() * 8); + leb128::write::unsigned(&mut ans, self.peer_ids.len() as u64).unwrap(); + for &peer_id in &self.peer_ids { + ans.extend_from_slice(&peer_id.to_be_bytes()); + } + ans + } + + fn decode(peer_id_arena: &[u8]) -> LoroResult { + let mut reader = peer_id_arena; + let len = leb128::read::unsigned(&mut reader) + .map_err(|_| LoroError::DecodeDataCorruptionError)?; + if len > MAX_DECODED_SIZE as u64 { + return Err(LoroError::DecodeDataCorruptionError); + } + + let mut peer_ids = Vec::with_capacity(len as usize); + if reader.len() < len as usize * 8 { + return Err(LoroError::DecodeDataCorruptionError); + } + + for _ in 0..len { + let mut peer_id_bytes = [0; 8]; + peer_id_bytes.copy_from_slice(&reader[..8]); + peer_ids.push(u64::from_be_bytes(peer_id_bytes)); + reader = &reader[8..]; + } + Ok(PeerIdArena { peer_ids }) + } + } + + #[columnar(vec, ser, de, iterable)] + #[derive(Debug, Copy, Clone, PartialEq, Eq, Hash, PartialOrd, Ord)] + pub(super) struct EncodedContainer { + #[columnar(strategy = "BoolRle")] + is_root: bool, + #[columnar(strategy = "Rle")] + kind: u8, + #[columnar(strategy = "Rle")] + peer_idx: usize, + #[columnar(strategy = "DeltaRle")] + key_idx_or_counter: i32, + } + + impl EncodedContainer { + pub fn as_container_id( + &self, + key_arena: &[InternalString], + peer_arena: &[u64], + ) -> LoroResult { + if self.is_root { + Ok(ContainerID::Root { + container_type: ContainerType::try_from_u8(self.kind)?, + name: key_arena + .get(self.key_idx_or_counter as usize) + .ok_or(LoroError::DecodeDataCorruptionError)? + .clone(), + }) + } else { + Ok(ContainerID::Normal { + container_type: ContainerType::try_from_u8(self.kind)?, + peer: *(peer_arena + .get(self.peer_idx) + .ok_or(LoroError::DecodeDataCorruptionError)?), + counter: self.key_idx_or_counter, + }) + } + } + } + + #[columnar(ser, de)] + #[derive(Default)] + pub(super) struct ContainerArena { + #[columnar(class = "vec", iter = "EncodedContainer")] + pub(super) containers: Vec, + } + + impl ContainerArena { + fn encode(&self) -> Vec { + serde_columnar::to_vec(&self.containers).unwrap() + } + + fn decode(bytes: &[u8]) -> LoroResult { + Ok(ContainerArena { + containers: serde_columnar::from_bytes(bytes)?, + }) + } + + pub fn from_containers( + cids: Vec, + peer_register: &mut ValueRegister, + key_reg: &mut ValueRegister, + ) -> Self { + let mut ans = Self { + containers: Vec::with_capacity(cids.len()), + }; + for cid in cids { + ans.push(cid, peer_register, key_reg); + } + + ans + } + + pub fn push( + &mut self, + id: ContainerID, + peer_register: &mut ValueRegister, + register_key: &mut ValueRegister, + ) { + let (is_root, kind, peer_idx, key_idx_or_counter) = match id { + ContainerID::Root { + container_type, + name, + } => (true, container_type, 0, register_key.register(&name) as i32), + ContainerID::Normal { + container_type, + peer, + counter, + } => ( + false, + container_type, + peer_register.register(&peer), + counter, + ), + }; + self.containers.push(EncodedContainer { + is_root, + kind: kind.to_u8(), + peer_idx, + key_idx_or_counter, + }); + } + } + + #[columnar(vec, ser, de, iterable)] + #[derive(Debug, Copy, Clone, PartialEq, Eq, Hash)] + pub struct EncodedDep { + #[columnar(strategy = "Rle")] + pub peer_idx: usize, + #[columnar(strategy = "DeltaRle")] + pub counter: i32, + } + + #[columnar(ser, de)] + #[derive(Default)] + pub(super) struct DepsArena { + #[columnar(class = "vec", iter = "EncodedDep")] + deps: Vec, + } + + impl DepsArena { + pub fn push(&mut self, peer_idx: PeerIdx, counter: i32) { + self.deps.push(EncodedDep { peer_idx, counter }); + } + + pub fn encode(&self) -> Vec { + serde_columnar::to_vec(&self).unwrap() + } + + pub fn decode_iter(bytes: &[u8]) -> LoroResult + '_> { + let iter = serde_columnar::iter_from_bytes::(bytes)?; + Ok(iter.deps) + } + } + + #[derive(Serialize, Deserialize, Default)] + pub(super) struct KeyArena { + pub(super) keys: Vec, + } + + impl KeyArena { + pub fn encode(&self) -> Vec { + serde_columnar::to_vec(&self).unwrap() + } + + pub fn decode(bytes: &[u8]) -> LoroResult { + Ok(serde_columnar::from_bytes(bytes)?) + } + } + + fn write_arena(buffer: &mut Vec, arena: &[u8]) { + leb128::write::unsigned(buffer, arena.len() as u64).unwrap(); + buffer.extend_from_slice(arena); + } + + /// Return (next_arena, rest) + fn read_arena(mut buffer: &[u8]) -> LoroResult<(&[u8], &[u8])> { + let reader = &mut buffer; + let len = leb128::read::unsigned(reader) + .map_err(|_| LoroError::DecodeDataCorruptionError)? as usize; + if len > MAX_DECODED_SIZE { + return Err(LoroError::DecodeDataCorruptionError); + } + + if len > reader.len() { + return Err(LoroError::DecodeDataCorruptionError); + } + + Ok((reader[..len as usize].as_ref(), &reader[len as usize..])) + } +} + +#[cfg(test)] +mod test { + use loro_common::LoroValue; + + use crate::fx_map; + + use super::*; + + fn test_loro_value_read_write(v: impl Into) { + let v = v.into(); + let mut key_reg: ValueRegister = ValueRegister::new(); + let mut cid_reg: ValueRegister = ValueRegister::new(); + let mut writer = ValueWriter::new(); + let kind = writer.write_value_content(&v, &mut key_reg, &mut cid_reg); + + let binding = writer.finish(); + let mut reader = ValueReader::new(binding.as_slice()); + let keys = &key_reg.unwrap_vec(); + let cids = &cid_reg.unwrap_vec(); + let ans = reader.read_value_content(kind, keys, cids).unwrap(); + assert_eq!(v, ans) + } + + #[test] + fn test_value_read_write() { + test_loro_value_read_write(true); + test_loro_value_read_write(false); + test_loro_value_read_write(123); + test_loro_value_read_write(1.23); + test_loro_value_read_write(LoroValue::Null); + test_loro_value_read_write(LoroValue::Binary(Arc::new(vec![123, 223, 255, 0, 1, 2, 3]))); + test_loro_value_read_write("sldk;ajfas;dlkfas测试"); + test_loro_value_read_write(LoroValue::Container(ContainerID::new_root( + "name", + ContainerType::Text, + ))); + test_loro_value_read_write(LoroValue::Container(ContainerID::new_normal( + ID::new(u64::MAX, 123), + ContainerType::Tree, + ))); + test_loro_value_read_write(vec![1.into(), 2.into(), 3.into()]); + test_loro_value_read_write(LoroValue::Map(Arc::new(fx_map![ + "1".into() => 123.into(), + "2".into() => "123".into(), + "3".into() => vec![true.into()].into() + ]))); + } +} diff --git a/crates/loro-internal/src/encoding/encode_snapshot.rs b/crates/loro-internal/src/encoding/encode_snapshot.rs deleted file mode 100644 index 63569397..00000000 --- a/crates/loro-internal/src/encoding/encode_snapshot.rs +++ /dev/null @@ -1,1139 +0,0 @@ -use std::{borrow::Cow, ops::Deref}; - -use fxhash::FxHashMap; -use itertools::Itertools; -use loro_common::{ContainerType, HasLamport, TreeID, ID}; -use loro_preload::{ - CommonArena, EncodedAppState, EncodedContainerState, FinalPhase, MapEntry, TempArena, -}; -use rle::{HasLength, RleVec}; -use serde::{Deserialize, Serialize}; -use serde_columnar::{columnar, to_vec}; -use smallvec::smallvec; - -use crate::{ - change::{Change, Timestamp}, - container::{ - idx::ContainerIdx, list::list_op::InnerListOp, map::InnerMapSet, - richtext::TextStyleInfoFlag, tree::tree_op::TreeOp, - }, - delta::MapValue, - id::{Counter, PeerID}, - op::{InnerContent, Op}, - state::RichtextState, - state::TreeState, - version::Frontiers, - InternalString, LoroError, LoroValue, -}; - -use crate::{ - arena::SharedArena, - loro::LoroDoc, - oplog::OpLog, - state::{DocState, ListState, MapState, State}, -}; - -pub fn encode_app_snapshot(app: &LoroDoc) -> Vec { - let state = app.app_state().lock().unwrap(); - let pre_encoded_state = encode_app_state(&state); - let f = encode_oplog(&app.oplog().lock().unwrap(), Some(pre_encoded_state)); - // f.diagnose_size(); - f.encode() -} - -pub fn decode_app_snapshot(app: &LoroDoc, bytes: &[u8], with_state: bool) -> Result<(), LoroError> { - assert!(app.can_reset_with_snapshot()); - let data = FinalPhase::decode(bytes)?; - if with_state { - let mut app_state = app.app_state().lock().unwrap(); - let (state_arena, common) = decode_state(&mut app_state, &data)?; - let arena = app_state.arena.clone(); - decode_oplog( - &mut app.oplog().lock().unwrap(), - &data, - Some((arena, state_arena, common)), - )?; - } else { - decode_oplog(&mut app.oplog().lock().unwrap(), &data, None)?; - } - Ok(()) -} - -pub fn decode_oplog( - oplog: &mut OpLog, - data: &FinalPhase, - arena: Option<(SharedArena, TempArena, CommonArena)>, -) -> Result<(), LoroError> { - let (arena, state_arena, common) = arena.unwrap_or_else(|| { - let arena = oplog.arena.clone(); - let state_arena = TempArena::decode_state_arena(data).unwrap(); - debug_assert!(arena.can_import_snapshot()); - arena.alloc_str_fast(&state_arena.text); - (arena, state_arena, CommonArena::decode(data).unwrap()) - }); - oplog.arena = arena.clone(); - let mut extra_arena = TempArena::decode_additional_arena(data)?; - // str and values are allocated directly on the empty arena, - // so the indices don't need to be updated - arena.alloc_str_fast(&extra_arena.text); - arena.alloc_values(state_arena.values.into_iter()); - arena.alloc_values(extra_arena.values.into_iter()); - let mut keys = state_arena.keywords; - keys.append(&mut extra_arena.keywords); - let mut tree_ids = state_arena.tree_ids; - tree_ids.append(&mut extra_arena.tree_ids); - - let oplog_data = OplogEncoded::decode_iter(data)?; - let mut style_iter = oplog_data.styles.iter(); - let mut changes = Vec::new(); - let mut dep_iter = oplog_data.deps; - let mut op_iter = oplog_data.ops; - let mut counters = FxHashMap::default(); - for change in oplog_data.changes { - let peer_idx = change.peer_idx as usize; - let peer_id = common.peer_ids[peer_idx]; - let timestamp = change.timestamp; - let deps_len = change.deps_len; - let dep_on_self = change.dep_on_self; - let mut ops = RleVec::new(); - let counter_mut = counters.entry(peer_idx).or_insert(0); - let start_counter = *counter_mut; - - // decode ops - for _ in 0..change.op_len { - let id = ID::new(peer_id, *counter_mut); - let encoded_op = op_iter.next().unwrap(); - let container = common.container_ids[encoded_op.container as usize].clone(); - let container_idx = arena.register_container(&container); - let op = match container.container_type() { - loro_common::ContainerType::List | loro_common::ContainerType::Text => { - let op = match container.container_type() { - ContainerType::List => encoded_op.get_list(), - ContainerType::Text => encoded_op.get_richtext(), - _ => unreachable!(), - }; - - match op { - SnapshotOp::ListInsert { - value_idx: start, - pos, - } => Op::new( - id, - InnerContent::List(InnerListOp::new_insert(start..start + 1, pos)), - container_idx, - ), - SnapshotOp::TextOrListDelete { len, pos } => Op::new( - id, - InnerContent::List(InnerListOp::new_del(pos, len)), - container_idx, - ), - SnapshotOp::Map { .. } => { - unreachable!() - } - SnapshotOp::Tree { .. } => unreachable!(), - SnapshotOp::RichtextStyleStart { start, end } => { - let style = style_iter.next().unwrap(); - let key = keys[style.key_idx as usize].clone(); - let info = style.info; - Op::new( - id, - InnerContent::List(InnerListOp::StyleStart { - start: start as u32, - end: end as u32, - key, - value: style.value.clone(), - info: TextStyleInfoFlag::from_byte(info), - }), - container_idx, - ) - } - SnapshotOp::RichtextStyleEnd => { - Op::new(id, InnerContent::List(InnerListOp::StyleEnd), container_idx) - } - SnapshotOp::RichtextInsert { pos, start, len } => Op::new( - id, - InnerContent::List(InnerListOp::new_insert( - start as u32..start as u32 + (len as u32), - pos, - )), - container_idx, - ), - } - } - loro_common::ContainerType::Map => { - let op = encoded_op.get_map(); - match op { - SnapshotOp::Map { - key, - value_idx_plus_one, - } => { - let value = if value_idx_plus_one == 0 { - None - } else { - Some(value_idx_plus_one - 1) - }; - Op::new( - id, - InnerContent::Map(InnerMapSet { - key: (&*keys[key]).into(), - value, - }), - container_idx, - ) - } - _ => unreachable!(), - } - } - loro_common::ContainerType::Tree => { - let op = encoded_op.get_tree(); - match op { - SnapshotOp::Tree { target, parent } => { - let target = { - let (peer, counter) = tree_ids[target - 1]; - let peer = common.peer_ids[peer as usize]; - TreeID { peer, counter } - }; - let parent = { - if parent == Some(0) { - TreeID::delete_root() - } else { - parent.map(|p| { - let (peer, counter) = tree_ids[p - 1]; - let peer = common.peer_ids[peer as usize]; - TreeID { peer, counter } - }) - } - }; - Op::new( - id, - InnerContent::Tree(TreeOp { target, parent }), - container_idx, - ) - } - _ => unreachable!(), - } - } - }; - *counter_mut += op.content_len() as Counter; - ops.push(op); - } - - // calc deps - let mut deps: smallvec::SmallVec<[ID; 1]> = smallvec![]; - if dep_on_self { - deps.push(ID::new(peer_id, start_counter - 1)); - } - - for _ in 0..deps_len { - let dep = dep_iter.next().unwrap(); - let peer = common.peer_ids[dep.peer_idx as usize]; - deps.push(ID::new(peer, dep.counter)); - } - - changes.push(Change { - deps: Frontiers::from(deps), - ops, - timestamp, - id: ID::new(peer_id, start_counter), - lamport: 0, // calculate lamport when importing - has_dependents: false, - }); - } - - // we assume changes are already sorted by lamport already - for mut change in changes { - let lamport = oplog.dag.frontiers_to_next_lamport(&change.deps); - change.lamport = lamport; - oplog.import_local_change(change, false)?; - } - Ok(()) -} - -pub fn decode_state<'b>( - app_state: &'_ mut DocState, - data: &'b FinalPhase, -) -> Result<(TempArena<'b>, CommonArena<'b>), LoroError> { - assert!(app_state.is_empty()); - assert!(!app_state.is_in_txn()); - let arena = app_state.arena.clone(); - let common = CommonArena::decode(data)?; - let state_arena = TempArena::decode_state_arena(data)?; - // str and values are allocated directly on the empty arena, - // so the indices don't need to be updated - debug_assert!(arena.can_import_snapshot()); - arena.alloc_str_fast(&state_arena.text); - let encoded_app_state = EncodedAppState::decode(data)?; - let mut container_states = - FxHashMap::with_capacity_and_hasher(common.container_ids.len(), Default::default()); - // this part should be moved to encode.rs in preload - for ((id, parent), state) in common - .container_ids - .iter() - .zip(encoded_app_state.parents.iter()) - .zip(encoded_app_state.states.into_iter()) - { - // We need to register new container, and cannot reuse the container idx. Because arena's containers fields may not be empty. - let idx = arena.register_container(id); - let parent_idx = - (*parent).map(|x| arena.register_container(&common.container_ids[x as usize])); - arena.set_parent(idx, parent_idx); - match state { - loro_preload::EncodedContainerState::Map(map_data) => { - let mut map = MapState::new(idx); - for entry in map_data.iter() { - map.insert( - InternalString::from(&*state_arena.keywords[entry.key]), - MapValue { - counter: entry.counter as Counter, - value: if entry.value == 0 { - None - } else { - Some(state_arena.values[entry.value - 1].clone()) - }, - lamport: (entry.lamport, common.peer_ids[entry.peer as usize]), - }, - ) - } - container_states.insert(idx, State::MapState(map)); - } - loro_preload::EncodedContainerState::List(list_data) => { - let mut list = ListState::new(idx); - list.insert_batch( - 0, - list_data - .iter() - .map(|&x| state_arena.values[x].clone()) - .collect_vec(), - ); - container_states.insert(idx, State::ListState(list)); - } - loro_preload::EncodedContainerState::Richtext(richtext_data) => { - let mut richtext = RichtextState::new(idx); - richtext.decode_snapshot(*richtext_data, &state_arena, &common, &arena); - container_states.insert(idx, State::RichtextState(richtext)); - } - loro_preload::EncodedContainerState::Tree((tree_data, deleted)) => { - let mut tree = TreeState::new(); - for (target, parent) in tree_data { - let (peer, counter) = state_arena.tree_ids[target - 1]; - let target_peer = common.peer_ids[peer as usize]; - let target = TreeID { - peer: target_peer, - counter, - }; - - let parent = if parent == Some(0) { - TreeID::delete_root() - } else { - parent.map(|p| { - let (peer, counter) = state_arena.tree_ids[p - 1]; - let peer = common.peer_ids[peer as usize]; - TreeID { peer, counter } - }) - }; - tree.trees.insert(target, parent); - } - - for target in deleted { - let (peer, counter) = state_arena.tree_ids[target - 1]; - let target_peer = common.peer_ids[peer as usize]; - let target = TreeID { - peer: target_peer, - counter, - }; - tree.deleted.insert(target); - } - - container_states.insert(idx, State::TreeState(tree)); - } - } - } - - let frontiers = Frontiers::from(&encoded_app_state.frontiers); - app_state.init_with_states_and_version(container_states, frontiers); - Ok((state_arena, common)) -} - -type ClientIdx = u32; - -#[columnar(ser, de)] -#[derive(Debug)] -struct OplogEncoded { - #[columnar(class = "vec", iter = "EncodedChange")] - pub(crate) changes: Vec, - #[columnar(class = "vec", iter = "EncodedSnapshotOp")] - ops: Vec, - #[columnar(class = "vec", iter = "DepsEncoding")] - deps: Vec, - - styles: Vec, -} - -#[derive(Debug, Clone, Serialize, Deserialize)] -struct StyleInfo { - key_idx: u32, - info: u8, - value: LoroValue, -} - -impl OplogEncoded { - fn decode_iter<'f: 'iter, 'iter>( - data: &'f FinalPhase, - ) -> Result<>::Iter, LoroError> { - serde_columnar::iter_from_bytes::(&data.oplog) - .map_err(|e| LoroError::DecodeError(e.to_string().into_boxed_str())) - } - - fn encode(&self) -> Vec { - to_vec(self).unwrap() - } -} - -#[columnar(vec, ser, de, iterable)] -#[derive(Debug, Clone)] -struct EncodedChange { - #[columnar(strategy = "Rle")] - pub(super) peer_idx: ClientIdx, - #[columnar(strategy = "DeltaRle")] - pub(super) timestamp: Timestamp, - #[columnar(strategy = "Rle")] - pub(super) op_len: u32, - /// The length of deps that exclude the dep on the same client - #[columnar(strategy = "Rle")] - pub(super) deps_len: u32, - /// Whether the change has a dep on the same client. - /// It can save lots of space by using this field instead of [`DepsEncoding`] - #[columnar(strategy = "BoolRle")] - pub(super) dep_on_self: bool, -} - -#[columnar(vec, ser, de, iterable)] -#[derive(Debug, Clone)] -struct EncodedSnapshotOp { - #[columnar(strategy = "Rle")] - container: u32, - /// key index or insert/delete pos - #[columnar(strategy = "DeltaRle")] - prop: usize, - /// Richtext: insert range start - /// Text: 0 - /// List: 0 - /// Map: 0 - #[columnar(strategy = "DeltaRle")] - prop2: usize, - /// Richtext: insert len | del len | end position (for style) - /// Text: insert len | del len (can be neg) - /// List: 0 | del len (can be neg) - /// Map: always 0 - #[columnar(strategy = "DeltaRle")] - len: i64, - #[columnar(strategy = "BoolRle")] - is_del: bool, - /// Richtext: 0 (text) | 1 (style_start) | 2 (style_end) - /// Text: 0 - /// List: 0 | value index - /// Map: 0 (deleted) | value index + 1 - #[columnar(strategy = "DeltaRle")] - value: isize, -} - -enum SnapshotOp { - RichtextStyleStart { - start: usize, - end: usize, - }, - RichtextStyleEnd, - RichtextInsert { - pos: usize, - start: usize, - len: usize, - }, - ListInsert { - pos: usize, - value_idx: u32, - }, - TextOrListDelete { - pos: usize, - len: isize, - }, - Map { - key: usize, - value_idx_plus_one: u32, - }, - Tree { - target: usize, - parent: Option, - }, -} - -impl EncodedSnapshotOp { - pub fn get_richtext(&self) -> SnapshotOp { - if self.is_del { - SnapshotOp::TextOrListDelete { - pos: self.prop, - len: self.len as isize, - } - } else { - match self.value { - 0 => SnapshotOp::RichtextInsert { - pos: self.prop, - start: self.prop2, - len: self.len as usize, - }, - 1 => SnapshotOp::RichtextStyleStart { - start: self.prop, - end: self.len as usize, - }, - 2 => SnapshotOp::RichtextStyleEnd, - _ => unreachable!(), - } - } - } - - pub fn get_list(&self) -> SnapshotOp { - if self.is_del { - SnapshotOp::TextOrListDelete { - pos: self.prop, - len: self.len as isize, - } - } else { - SnapshotOp::ListInsert { - pos: self.prop, - value_idx: self.value as u32, - } - } - } - - pub fn get_map(&self) -> SnapshotOp { - let value_idx_plus_one = if self.value < 0 { 0 } else { self.value as u32 }; - SnapshotOp::Map { - key: self.prop, - value_idx_plus_one, - } - } - - pub fn get_tree(&self) -> SnapshotOp { - let parent = if self.is_del { - Some(0) - } else if self.value == 0 { - None - } else { - Some(self.value as usize) - }; - SnapshotOp::Tree { - target: self.prop, - parent, - } - } - - pub fn from(value: SnapshotOp, container: u32) -> Self { - match value { - SnapshotOp::ListInsert { - pos, - value_idx: start, - } => Self { - container, - prop: pos, - prop2: 0, - len: 0, - is_del: false, - value: start as isize, - }, - SnapshotOp::TextOrListDelete { pos, len } => Self { - container, - prop: pos, - prop2: 0, - len: len as i64, - is_del: true, - value: 0, - }, - SnapshotOp::Map { - key, - value_idx_plus_one: value, - } => { - let value = if value == 0 { -1 } else { value as isize }; - Self { - container, - prop: key, - prop2: 0, - len: 0, - is_del: false, - value, - } - } - SnapshotOp::RichtextStyleStart { start, end } => Self { - container, - prop: start, - prop2: 0, - len: end as i64, - is_del: false, - value: 1, - }, - SnapshotOp::RichtextStyleEnd => Self { - container, - prop: 0, - prop2: 0, - len: 0, - is_del: false, - value: 2, - }, - SnapshotOp::RichtextInsert { pos, start, len } => Self { - container, - prop: pos, - prop2: start, - len: len as i64, - is_del: false, - value: 0, - }, - SnapshotOp::Tree { target, parent } => { - let is_del = parent.unwrap_or(1) == 0; - Self { - container, - prop: target, - prop2: 0, - len: 0, - is_del, - value: parent.unwrap_or(0) as isize, - } - } - } - } -} - -#[columnar(vec, ser, de, iterable)] -#[derive(Debug, Copy, Clone)] -struct DepsEncoding { - #[columnar(strategy = "Rle")] - peer_idx: ClientIdx, - #[columnar(strategy = "DeltaRle")] - counter: Counter, -} - -#[derive(Default)] -struct PreEncodedState<'a> { - common: CommonArena<'static>, - arena: TempArena<'static>, - key_lookup: FxHashMap, - value_lookup: FxHashMap, - peer_lookup: FxHashMap, - app_state: EncodedAppState<'a>, - tree_id_lookup: FxHashMap<(u32, i32), usize>, -} - -fn encode_app_state(app_state: &DocState) -> PreEncodedState { - assert!(!app_state.is_in_txn()); - let mut peers = Vec::new(); - let mut peer_lookup = FxHashMap::default(); - let mut tree_ids = Vec::new(); - let mut keywords = Vec::new(); - let mut values = Vec::new(); - let mut key_lookup = FxHashMap::default(); - let mut value_lookup = FxHashMap::default(); - let mut tree_id_lookup = FxHashMap::default(); - let mut encoded = EncodedAppState { - frontiers: app_state.frontiers.iter().cloned().collect(), - states: Vec::new(), - parents: app_state - .arena - .export_parents() - .into_iter() - .map(|x| x.map(|x| x.to_index())) - .collect(), - }; - - let mut record_key = |key: &InternalString| { - if let Some(idx) = key_lookup.get(key) { - return *idx; - } - - keywords.push(key.clone()); - key_lookup - .entry(key.clone()) - .or_insert_with(|| keywords.len() - 1); - keywords.len() - 1 - }; - - let mut record_value = |value: &LoroValue| { - if let Some(idx) = value_lookup.get(value) { - return *idx; - } - - let idx = values.len(); - values.push(value.clone()); - value_lookup.entry(value.clone()).or_insert_with(|| idx); - idx - }; - - let mut record_peer = |peer: PeerID| { - if let Some(idx) = peer_lookup.get(&peer) { - return *idx as u32; - } - - peers.push(peer); - peer_lookup.entry(peer).or_insert_with(|| peers.len() - 1); - peers.len() as u32 - 1 - }; - - let mut record_tree_id = |tree_id: TreeID, peer: u32| { - let tree_id = (peer, tree_id.counter); - if let Some(idx) = tree_id_lookup.get(&tree_id) { - return *idx; - } - - tree_ids.push(tree_id); - // the idx 0 is the delete root - tree_id_lookup - .entry(tree_id) - .or_insert_with(|| tree_ids.len()); - tree_ids.len() - }; - - let container_ids = app_state.arena.export_containers(); - for (i, id) in container_ids.iter().enumerate() { - let idx = ContainerIdx::from_index_and_type(i as u32, id.container_type()); - let Some(state) = app_state.states.get(&idx) else { - match id.container_type() { - loro_common::ContainerType::List => { - encoded.states.push(EncodedContainerState::List(Vec::new())) - } - loro_common::ContainerType::Map => { - encoded.states.push(EncodedContainerState::Map(Vec::new())) - } - loro_common::ContainerType::Tree => encoded - .states - .push(EncodedContainerState::Tree((Vec::new(), Vec::new()))), - loro_common::ContainerType::Text => encoded - .states - .push(EncodedContainerState::Richtext(Default::default())), - } - - continue; - }; - match state { - State::TreeState(tree) => { - let v = tree - .iter() - .map(|(target, parent)| { - let peer_idx = record_peer(target.peer); - let t = record_tree_id(*target, peer_idx); - let p = if TreeID::is_deleted_root(*parent) { - Some(0) - } else { - parent.map(|p| { - let peer_idx = record_peer(p.peer); - record_tree_id(p, peer_idx) - }) - }; - (t, p) - }) - .collect::>(); - let d = tree - .deleted - .iter() - .map(|target| { - let peer_idx = record_peer(target.peer); - record_tree_id(*target, peer_idx) - }) - .collect::>(); - encoded.states.push(EncodedContainerState::Tree((v, d))) - } - State::ListState(list) => { - let v = list.iter().map(&mut record_value).collect(); - encoded.states.push(EncodedContainerState::List(v)) - } - State::MapState(map) => { - let v = map - .iter() - .map(|(key, value)| { - let key = record_key(key); - MapEntry { - key, - value: if let Some(value) = &value.value { - record_value(value) + 1 - } else { - 0 - }, - peer: record_peer(value.lamport.1), - counter: value.counter as u32, - lamport: value.lamport(), - } - }) - .collect(); - encoded.states.push(EncodedContainerState::Map(v)) - } - State::RichtextState(text) => { - let result = text.encode_snapshot(&mut record_peer, &mut record_key); - encoded - .states - .push(EncodedContainerState::Richtext(Box::new(result))); - } - } - } - - let common = CommonArena { - peer_ids: peers.into(), - container_ids, - }; - let arena = TempArena { - values, - keywords, - text: app_state.arena.slice_by_unicode(..).deref().to_vec().into(), - tree_ids, - }; - - PreEncodedState { - common, - arena, - key_lookup, - value_lookup, - peer_lookup, - tree_id_lookup, - app_state: encoded, - } -} - -fn encode_oplog(oplog: &OpLog, state_ref: Option) -> FinalPhase<'static> { - let state_ref = state_ref.unwrap_or_default(); - let PreEncodedState { - mut common, - arena, - mut key_lookup, - mut value_lookup, - mut peer_lookup, - mut tree_id_lookup, - app_state, - } = state_ref; - if common.container_ids.is_empty() { - common.container_ids = oplog.arena.export_containers(); - } - // need to rebuild bytes from ops, because arena.text may contain garbage - let mut extra_keys = Vec::new(); - let mut extra_values = Vec::new(); - let mut extra_tree_ids = Vec::new(); - - let mut record_key = |key: &InternalString| { - if let Some(idx) = key_lookup.get(key) { - return *idx; - } - - let idx = extra_keys.len() + arena.keywords.len(); - extra_keys.push(key.clone()); - key_lookup.entry(key.clone()).or_insert_with(|| idx); - idx - }; - - let mut record_value = |value: &LoroValue| { - if let Some(idx) = value_lookup.get(value) { - return *idx; - } - - let idx = extra_values.len() + arena.values.len(); - extra_values.push(value.clone()); - value_lookup.entry(value.clone()).or_insert_with(|| idx); - idx - }; - - let Cow::Owned(mut peers) = std::mem::take(&mut common.peer_ids) else { - unreachable!() - }; - let mut record_peer = |peer: PeerID, peer_lookup: &mut FxHashMap| { - if let Some(idx) = peer_lookup.get(&peer) { - return *idx as u32; - } - - peers.push(peer); - peer_lookup.entry(peer).or_insert_with(|| peers.len() - 1); - peers.len() as u32 - 1 - }; - - let mut record_tree_id = |tree_id: TreeID, peer_lookup: &mut FxHashMap| { - let peer_idx = *peer_lookup.get(&tree_id.peer).unwrap() as u32; - if let Some(idx) = tree_id_lookup.get(&(peer_idx, tree_id.counter)) { - return *idx; - } - let idx = extra_tree_ids.len() + arena.tree_ids.len(); - extra_tree_ids.push((peer_idx, tree_id.counter)); - tree_id_lookup - .entry((peer_idx, tree_id.counter)) - .or_insert_with(|| idx); - idx - }; - - let mut styles = Vec::new(); - // Add all changes - let mut changes: Vec<&Change> = Vec::with_capacity(oplog.len_changes()); - for (_, peer_changes) in oplog.changes().iter() { - for change in peer_changes.iter() { - changes.push(change); - } - } - - // Sort changes by lamport. So it's in causal order - changes.sort_by_key(|x| x.lamport()); - let mut encoded_changes = Vec::with_capacity(changes.len()); - let mut encoded_ops: Vec = - Vec::with_capacity(changes.iter().map(|x| x.ops.len()).sum()); - let mut deps = Vec::with_capacity(changes.iter().map(|x| x.deps.len()).sum()); - for change in changes { - let peer_idx = record_peer(change.id.peer, &mut peer_lookup); - let op_index_start = encoded_ops.len(); - for op in change.ops.iter() { - match &op.content { - InnerContent::Tree(TreeOp { target, parent }) => { - let target_idx = record_tree_id(*target, &mut peer_lookup); - let parent_idx = if TreeID::is_deleted_root(*parent) { - Some(0) - } else { - parent.map(|p| record_tree_id(p, &mut peer_lookup)) - }; - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::Tree { - target: target_idx, - parent: parent_idx, - }, - op.container.to_index(), - )); - } - InnerContent::List(list) => match list { - InnerListOp::Insert { slice, pos } => match op.container.get_type() { - loro_common::ContainerType::List => { - let values = oplog - .arena - .get_values(slice.0.start as usize..slice.0.end as usize); - let mut pos = *pos; - for value in values { - let idx = record_value(&value); - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::ListInsert { - pos, - value_idx: idx as u32, - }, - op.container.to_index(), - )); - pos += 1; - } - } - loro_common::ContainerType::Text => { - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::RichtextInsert { - pos: *pos, - start: slice.0.start as usize, - len: slice.0.len(), - }, - op.container.to_index(), - )) - } - loro_common::ContainerType::Map => unreachable!(), - loro_common::ContainerType::Tree => unreachable!(), - }, - InnerListOp::InsertText { - slice: _, - unicode_len: len, - unicode_start: start, - pos, - } => match op.container.get_type() { - loro_common::ContainerType::Text => { - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::RichtextInsert { - pos: *pos as usize, - start: *start as usize, - len: *len as usize, - }, - op.container.to_index(), - )) - } - loro_common::ContainerType::Map => unreachable!(), - loro_common::ContainerType::List => unreachable!(), - loro_common::ContainerType::Tree => unreachable!(), - }, - InnerListOp::Delete(del) => { - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::TextOrListDelete { - pos: del.pos as usize, - len: del.signed_len, - }, - op.container.to_index(), - )); - } - InnerListOp::StyleStart { - start, - end, - key, - info, - value, - } => { - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::RichtextStyleStart { - start: *start as usize, - end: *end as usize, - }, - op.container.to_index(), - )); - styles.push(StyleInfo { - key_idx: record_key(key) as u32, - info: info.to_byte(), - value: value.clone(), - }) - } - InnerListOp::StyleEnd => { - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::RichtextStyleEnd, - op.container.to_index(), - )); - } - }, - InnerContent::Map(map) => { - let key = record_key(&map.key); - let value = map.value.and_then(|v| oplog.arena.get_value(v as usize)); - let value = if let Some(value) = value { - (record_value(&value) + 1) as u32 - } else { - 0 - }; - encoded_ops.push(EncodedSnapshotOp::from( - SnapshotOp::Map { - key, - value_idx_plus_one: value, - }, - op.container.to_index(), - )); - } - } - } - let op_len = encoded_ops.len() - op_index_start; - let mut dep_on_self = false; - let dep_start = deps.len(); - for dep in change.deps.iter() { - if dep.peer == change.id.peer { - dep_on_self = true; - } else { - let peer_idx = record_peer(dep.peer, &mut peer_lookup); - deps.push(DepsEncoding { - peer_idx, - counter: dep.counter, - }); - } - } - - let deps_len = deps.len() - dep_start; - encoded_changes.push(EncodedChange { - peer_idx, - timestamp: change.timestamp, - op_len: op_len as u32, - deps_len: deps_len as u32, - dep_on_self, - }) - } - - common.peer_ids = Cow::Owned(peers); - let oplog_encoded = OplogEncoded { - changes: encoded_changes, - ops: encoded_ops, - deps, - styles, - }; - - // println!("OplogEncoded:"); - // println!("changes {}", oplog_encoded.changes.len()); - // println!("ops {}", oplog_encoded.ops.len()); - // println!("deps {}", oplog_encoded.deps.len()); - // println!("\n"); - let ans = FinalPhase { - common: Cow::Owned(common.encode()), - app_state: Cow::Owned(app_state.encode()), - state_arena: Cow::Owned(arena.encode()), - oplog_extra_arena: Cow::Owned( - TempArena { - text: Cow::Borrowed(&[]), - keywords: extra_keys, - values: extra_values, - tree_ids: extra_tree_ids, - } - .encode(), - ), - oplog: Cow::Owned(oplog_encoded.encode()), - }; - - ans -} - -#[cfg(test)] -mod test { - use debug_log::debug_dbg; - - use super::*; - - #[test] - fn test_snapshot_encode() { - use std::borrow::Cow; - - FinalPhase { - common: Cow::Owned(vec![0, 1, 2, 253, 254, 255]), - app_state: Cow::Owned(vec![255]), - state_arena: Cow::Owned(vec![255]), - oplog_extra_arena: Cow::Owned(vec![255]), - oplog: Cow::Owned(vec![255]), - } - .encode(); - } - - #[test] - fn text_edit_snapshot_encode_decode() { - // test import snapshot directly - let app = LoroDoc::new(); - let mut txn = app.txn().unwrap(); - let text = txn.get_text("id"); - text.insert_with_txn(&mut txn, 0, "hello").unwrap(); - txn.commit().unwrap(); - let snapshot = app.export_snapshot(); - let app2 = LoroDoc::new(); - app2.import(&snapshot).unwrap(); - let actual = app2 - .app_state() - .lock() - .unwrap() - .get_text("id") - .unwrap() - .to_string_mut(); - assert_eq!("hello", &actual); - debug_dbg!(&app2.oplog().lock().unwrap()); - - // test import snapshot to a LoroApp that is already changed - let mut txn = app2.txn().unwrap(); - let text = txn.get_text("id"); - text.insert_with_txn(&mut txn, 2, " ").unwrap(); - txn.commit().unwrap(); - debug_log::group!("app2 export"); - let snapshot = app2.export_snapshot(); - debug_log::group_end!(); - debug_log::group!("import snapshot to a LoroApp that is already changed"); - app.import(&snapshot).unwrap(); - debug_log::group_end!(); - let actual = app - .app_state() - .lock() - .unwrap() - .get_text("id") - .unwrap() - .to_string_mut(); - assert_eq!("he llo", &actual); - } - - #[test] - fn tree_encode_decode() { - let a = LoroDoc::default(); - let b = LoroDoc::default(); - let tree_a = a.get_tree("tree"); - let tree_b = b.get_tree("tree"); - let id1 = a.with_txn(|txn| tree_a.create_with_txn(txn, None)).unwrap(); - let id2 = a.with_txn(|txn| tree_a.create_with_txn(txn, id1)).unwrap(); - let bytes = a.export_snapshot(); - b.import(&bytes).unwrap(); - assert_eq!(a.get_deep_value(), b.get_deep_value()); - let _id3 = b.with_txn(|txn| tree_b.create_with_txn(txn, id1)).unwrap(); - b.with_txn(|txn| tree_b.delete_with_txn(txn, id2)).unwrap(); - let bytes = b.export_snapshot(); - a.import(&bytes).unwrap(); - assert_eq!(a.get_deep_value(), b.get_deep_value()); - } -} diff --git a/crates/loro-internal/src/encoding/encode_updates.rs b/crates/loro-internal/src/encoding/encode_updates.rs deleted file mode 100644 index bde073b7..00000000 --- a/crates/loro-internal/src/encoding/encode_updates.rs +++ /dev/null @@ -1,170 +0,0 @@ -use rle::{HasLength, RleVec}; -use serde::{Deserialize, Serialize}; -use smallvec::SmallVec; - -use crate::{ - change::{Change, Lamport, Timestamp}, - container::ContainerID, - encoding::RemoteClientChanges, - id::{Counter, PeerID, ID}, - op::{RawOpContent, RemoteOp}, - oplog::OpLog, - version::Frontiers, - LoroError, VersionVector, -}; - -#[derive(Serialize, Deserialize, Debug)] -struct Updates { - changes: Vec, -} - -/// the continuous changes from the same client -#[derive(Serialize, Deserialize, Debug)] -struct EncodedClientChanges { - meta: FirstChangeInfo, - data: Vec, -} - -#[derive(Serialize, Deserialize, Debug)] -struct FirstChangeInfo { - pub(crate) client: PeerID, - pub(crate) counter: Counter, - pub(crate) lamport: Lamport, - pub(crate) timestamp: Timestamp, -} - -#[derive(Serialize, Deserialize, Debug)] -struct EncodedOp { - pub(crate) container: ContainerID, - pub(crate) content: RawOpContent<'static>, -} - -#[derive(Serialize, Deserialize, Debug)] -struct EncodedChange { - pub(crate) ops: Vec, - pub(crate) deps: Vec, - pub(crate) lamport_delta: u32, - pub(crate) timestamp_delta: i64, -} - -pub(crate) fn encode_oplog_updates(oplog: &OpLog, from: &VersionVector) -> Vec { - let changes = oplog.export_changes_from(from); - let mut updates = Updates { - changes: Vec::with_capacity(changes.len()), - }; - for (_, changes) in changes { - let encoded = convert_changes_to_encoded(changes.into_iter()); - updates.changes.push(encoded); - } - - postcard::to_allocvec(&updates).unwrap() -} - -pub(crate) fn decode_oplog_updates(oplog: &mut OpLog, updates: &[u8]) -> Result<(), LoroError> { - let changes = decode_updates(updates)?; - oplog.import_remote_changes(changes)?; - Ok(()) -} - -pub(super) fn decode_updates(input: &[u8]) -> Result, LoroError> { - let updates: Updates = - postcard::from_bytes(input).map_err(|e| LoroError::DecodeError(e.to_string().into()))?; - let mut changes: RemoteClientChanges = Default::default(); - for encoded in updates.changes { - changes.insert(encoded.meta.client, convert_encoded_to_changes(encoded)); - } - - Ok(changes) -} - -fn convert_changes_to_encoded<'a, I>(mut changes: I) -> EncodedClientChanges -where - I: Iterator>>, -{ - let first_change = changes.next().unwrap(); - let this_client_id = first_change.id.peer; - let mut data = Vec::with_capacity(changes.size_hint().0 + 1); - let mut last_change = first_change.clone(); - data.push(EncodedChange { - ops: first_change - .ops - .iter() - .map(|op| EncodedOp { - container: op.container.clone(), - content: op.content.to_static(), - }) - .collect(), - deps: first_change.deps.iter().copied().collect(), - lamport_delta: 0, - timestamp_delta: 0, - }); - for change in changes { - data.push(EncodedChange { - ops: change - .ops - .iter() - .map(|op| EncodedOp { - container: op.container.clone(), - content: op.content.to_static(), - }) - .collect(), - deps: change.deps.iter().copied().collect(), - lamport_delta: change.lamport - last_change.lamport, - timestamp_delta: change.timestamp - last_change.timestamp, - }); - last_change = change; - } - - EncodedClientChanges { - meta: FirstChangeInfo { - client: this_client_id, - counter: first_change.id.counter, - lamport: first_change.lamport, - timestamp: first_change.timestamp, - }, - data, - } -} - -fn convert_encoded_to_changes(changes: EncodedClientChanges) -> Vec>> { - let mut result = Vec::with_capacity(changes.data.len()); - let mut last_lamport = changes.meta.lamport; - let mut last_timestamp = changes.meta.timestamp; - let mut counter: Counter = changes.meta.counter; - for encoded in changes.data { - let start_counter = counter; - let mut deps: Frontiers = SmallVec::with_capacity(encoded.deps.len()).into(); - - for dep in encoded.deps { - deps.push(dep); - } - - let mut ops = RleVec::with_capacity(encoded.ops.len()); - for op in encoded.ops { - let len: usize = op.content.atom_len(); - let content = op.content; - ops.push(RemoteOp { - counter, - container: op.container, - content, - }); - counter += len as Counter; - } - let change = Change { - id: ID { - peer: changes.meta.client, - counter: start_counter, - }, - lamport: last_lamport + encoded.lamport_delta, - timestamp: last_timestamp + encoded.timestamp_delta, - ops, - deps, - has_dependents: false, - }; - last_lamport = change.lamport; - last_timestamp = change.timestamp; - result.push(change); - } - - result -} diff --git a/crates/loro-internal/src/event.rs b/crates/loro-internal/src/event.rs index cdd25d70..49aa1f17 100644 --- a/crates/loro-internal/src/event.rs +++ b/crates/loro-internal/src/event.rs @@ -135,7 +135,7 @@ impl DiffVariant { #[non_exhaustive] #[derive(Clone, Debug, EnumAsInner, Serialize)] pub(crate) enum InternalDiff { - SeqRaw(Delta), + ListRaw(Delta), /// This always uses entity indexes. RichtextRaw(Delta), Map(MapDelta), @@ -177,7 +177,7 @@ impl From for DiffVariant { impl InternalDiff { pub(crate) fn is_empty(&self) -> bool { match self { - InternalDiff::SeqRaw(s) => s.is_empty(), + InternalDiff::ListRaw(s) => s.is_empty(), InternalDiff::RichtextRaw(t) => t.is_empty(), InternalDiff::Map(m) => m.updated.is_empty(), InternalDiff::Tree(t) => t.is_empty(), @@ -187,8 +187,8 @@ impl InternalDiff { pub(crate) fn compose(self, diff: InternalDiff) -> Result { // PERF: avoid clone match (self, diff) { - (InternalDiff::SeqRaw(a), InternalDiff::SeqRaw(b)) => { - Ok(InternalDiff::SeqRaw(a.compose(b))) + (InternalDiff::ListRaw(a), InternalDiff::ListRaw(b)) => { + Ok(InternalDiff::ListRaw(a.compose(b))) } (InternalDiff::RichtextRaw(a), InternalDiff::RichtextRaw(b)) => { Ok(InternalDiff::RichtextRaw(a.compose(b))) diff --git a/crates/loro-internal/src/fuzz.rs b/crates/loro-internal/src/fuzz.rs index d5ff5c55..29245b69 100644 --- a/crates/loro-internal/src/fuzz.rs +++ b/crates/loro-internal/src/fuzz.rs @@ -312,7 +312,7 @@ impl Actionable for Vec { *site %= self.len() as u8; let app_state = &mut self[*site as usize].app_state().lock().unwrap(); let text = app_state.get_text("text").unwrap(); - if text.is_empty() { + if text.len_unicode() == 0 { *len = 0; *pos = 0; } else { @@ -436,8 +436,8 @@ where let f_ref: *const _ = &f; let f_ref: usize = f_ref as usize; #[allow(clippy::redundant_clone)] - let actions_clone = actions.clone(); - let action_ref: usize = (&actions_clone) as *const _ as usize; + let mut actions_clone = actions.clone(); + let action_ref: usize = (&mut actions_clone) as *mut _ as usize; #[allow(clippy::blocks_in_if_conditions)] if std::panic::catch_unwind(|| { // SAFETY: test @@ -465,8 +465,8 @@ where while let Some(candidate) = candidates.pop() { let f_ref: *const _ = &f; let f_ref: usize = f_ref as usize; - let actions_clone = candidate.clone(); - let action_ref: usize = (&actions_clone) as *const _ as usize; + let mut actions_clone = candidate.clone(); + let action_ref: usize = (&mut actions_clone) as *mut _ as usize; #[allow(clippy::blocks_in_if_conditions)] if std::panic::catch_unwind(|| { // SAFETY: test @@ -1336,6 +1336,35 @@ mod test { ) } + #[test] + fn snapshot_fuzz_test() { + test_multi_sites( + 8, + &mut [ + Ins { + content: 163, + pos: 0, + site: 3, + }, + Ins { + content: 163, + pos: 1, + site: 3, + }, + Ins { + content: 113, + pos: 2, + site: 3, + }, + Ins { + content: 888, + pos: 3, + site: 3, + }, + ], + ) + } + #[test] fn text_fuzz_2() { test_multi_sites( @@ -1980,25 +2009,19 @@ mod test { &mut [ Ins { content: 41009, - pos: 10884953820616207167, - site: 151, + pos: 0, + site: 1, }, Mark { - pos: 150995095, - len: 7502773972505002496, - site: 0, - style_key: 0, - }, - Mark { - pos: 11821702543106517760, - len: 4251403153421165732, - site: 151, + pos: 0, + len: 2, + site: 1, style_key: 151, }, Mark { - pos: 589824, - len: 2233786514697303298, - site: 51, + pos: 0, + len: 1, + site: 0, style_key: 151, }, ], @@ -2287,8 +2310,79 @@ mod test { ) } + #[test] + fn fuzz_snapshot() { + test_multi_sites( + 5, + &mut [ + Ins { + content: 52480, + pos: 0, + site: 1, + }, + Mark { + pos: 6, + len: 1, + site: 1, + style_key: 5, + }, + Ins { + content: 8224, + pos: 0, + site: 1, + }, + Del { + pos: 12, + len: 1, + site: 1, + }, + Ins { + content: 257, + pos: 10, + site: 1, + }, + Ins { + content: 332, + pos: 11, + site: 1, + }, + Del { + pos: 1, + len: 21, + site: 1, + }, + Del { + pos: 0, + len: 1, + site: 1, + }, + Ins { + content: 11309, + pos: 0, + site: 4, + }, + ], + ) + } + #[test] fn mini_r() { - minify_error(5, vec![], test_multi_sites, |_, ans| ans.to_vec()) + minify_error(5, vec![], test_multi_sites, |site_num, ans| { + let mut sites = Vec::new(); + for i in 0..site_num { + let loro = LoroDoc::new(); + loro.set_peer_id(i as u64).unwrap(); + sites.push(loro); + } + + let mut applied = Vec::new(); + for action in ans.iter_mut() { + sites.preprocess(action); + applied.push(action.clone()); + sites.apply_action(action); + } + + ans.to_vec() + }) } } diff --git a/crates/loro-internal/src/fuzz/recursive_refactored.rs b/crates/loro-internal/src/fuzz/recursive_refactored.rs index a2b269b0..333f8e2a 100644 --- a/crates/loro-internal/src/fuzz/recursive_refactored.rs +++ b/crates/loro-internal/src/fuzz/recursive_refactored.rs @@ -784,7 +784,7 @@ fn check_synced(sites: &mut [Actor]) { fn check_history(actor: &mut Actor) { assert!(!actor.history.is_empty()); - for (_, (f, v)) in actor.history.iter().enumerate() { + for (f, v) in actor.history.iter() { let f = Frontiers::from(f); debug_log::group!( "Checkout from {:?} to {:?}", diff --git a/crates/loro-internal/src/fuzz/tree.rs b/crates/loro-internal/src/fuzz/tree.rs index ee10cd07..263f5a09 100644 --- a/crates/loro-internal/src/fuzz/tree.rs +++ b/crates/loro-internal/src/fuzz/tree.rs @@ -1878,6 +1878,257 @@ mod failed_tests { ) } + #[test] + fn history() { + test_multi_sites( + 3, + &mut [ + Tree { + site: 2, + container_idx: 0, + action: TreeAction::Create, + target: (2, 0), + parent: (11863787638307726561, 0), + }, + Sync { from: 2, to: 0 }, + Tree { + site: 2, + container_idx: 0, + action: TreeAction::Delete, + target: (2, 0), + parent: (18446537369818038270, 320017407), + }, + ], + ) + } + + #[test] + fn encoding_err() { + test_multi_sites( + 5, + &mut [ + Tree { + site: 255, + container_idx: 255, + action: TreeAction::Meta, + target: (13186597159363543035, 0), + parent: (34380633367117824, 913864704), + }, + Tree { + site: 255, + container_idx: 255, + action: TreeAction::Move, + target: (18409103694470982982, 1174405120), + parent: (34417214799359302, 913864704), + }, + Tree { + site: 255, + container_idx: 255, + action: TreeAction::Move, + target: (5063812098665360710, 1180190267), + parent: (5063812098663728710, 1480997702), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (5063812098665367110, 1179010630), + parent: (5063812098665349190, 0), + }, + Tree { + site: 120, + container_idx: 58, + action: TreeAction::Meta, + target: (5063784610870067200, 1179010630), + parent: (5063812098666546747, 1179010630), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Create, + target: (5049942557165879296, 1179010630), + parent: (201210795607622, 0), + }, + Tree { + site: 122, + container_idx: 0, + action: TreeAction::Create, + target: (281470890949240, 759580160), + parent: (280900630103622, 759580160), + }, + Tree { + site: 122, + container_idx: 0, + action: TreeAction::Create, + target: (281470890949240, 759580160), + parent: (13835902485686273606, 1179010648), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (5063812098665367110, 1179010630), + parent: (5063812098665367110, 1179010630), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (18446476309249406584, 1174405120), + parent: (5063812098665360710, 1180190267), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (5063812098665367110, 1179010630), + parent: (5063989120037438976, 0), + }, + Tree { + site: 120, + container_idx: 58, + action: TreeAction::Meta, + target: (5063784610870067200, 1179010630), + parent: (5063812098666546747, 1179010630), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (5208490236694644294, 1212696648), + parent: (5208492444341520456, 1212696648), + }, + Tree { + site: 72, + container_idx: 72, + action: TreeAction::Move, + target: (5208492444341520456, 1212696648), + parent: (5208492444341520456, 1212696648), + }, + Tree { + site: 72, + container_idx: 72, + action: TreeAction::Move, + target: (5208492444341520456, 1212696648), + parent: (5208492444341520456, 1212696648), + }, + Tree { + site: 72, + container_idx: 72, + action: TreeAction::Move, + target: (5063812098665367110, 1179010630), + parent: (5053397524527072838, 70), + }, + Tree { + site: 0, + container_idx: 0, + action: TreeAction::Create, + target: (0, 0), + parent: (5063812098665349120, 1179010630), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (18446476309249406584, 1174405120), + parent: (5063812098665360710, 1179010619), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (5063812098665367110, 1179010630), + parent: (5063812098665367110, 1174423110), + }, + Tree { + site: 0, + container_idx: 0, + action: TreeAction::Create, + target: (0, 0), + parent: (0, 0), + }, + Tree { + site: 0, + container_idx: 0, + action: TreeAction::Create, + target: (0, 0), + parent: (0, 0), + }, + Tree { + site: 0, + container_idx: 0, + action: TreeAction::Create, + target: (0, 0), + parent: (4412786865808080896, 2013281677), + }, + Sync { from: 12, to: 255 }, + Tree { + site: 70, + container_idx: 45, + action: TreeAction::Move, + target: (5063812175974057542, 1179010630), + parent: (5063812098665367110, 1179010630), + }, + Tree { + site: 72, + container_idx: 72, + action: TreeAction::Move, + target: (5208492444341520456, 1212696648), + parent: (5208492444341520456, 1212696648), + }, + Tree { + site: 72, + container_idx: 72, + action: TreeAction::Move, + target: (5208492444341520456, 1212696648), + parent: (5208492444341520456, 1212696648), + }, + Tree { + site: 72, + container_idx: 72, + action: TreeAction::Move, + target: (5063812098665367624, 1179010630), + parent: (5063812098665367110, 1179001158), + }, + Tree { + site: 0, + container_idx: 0, + action: TreeAction::Create, + target: (0, 0), + parent: (5063812098665349120, 1179010630), + }, + Tree { + site: 70, + container_idx: 70, + action: TreeAction::Move, + target: (19780516010411520, 0), + parent: (18378196371912030344, 255), + }, + Tree { + site: 177, + container_idx: 185, + action: TreeAction::Create, + target: (5063812098967361606, 1179010630), + parent: (5063812098665367110, 1179010630), + }, + Tree { + site: 70, + container_idx: 72, + action: TreeAction::Move, + target: (5208492444341520456, 1212696648), + parent: (5208492444341520456, 1212696648), + }, + Tree { + site: 72, + container_idx: 72, + action: TreeAction::Move, + target: (4271743721848457288, 1179015238), + parent: (0, 0), + }, + ], + ); + } + #[test] fn to_minify() { minify_error(5, vec![], test_multi_sites, normalize) diff --git a/crates/loro-internal/src/lib.rs b/crates/loro-internal/src/lib.rs index 5815df7f..7f737743 100644 --- a/crates/loro-internal/src/lib.rs +++ b/crates/loro-internal/src/lib.rs @@ -10,6 +10,7 @@ pub mod arena; pub mod diff_calc; pub mod handler; pub use event::{ContainerDiff, DiffEvent, DocDiff}; +pub use fxhash::FxHashMap; pub use handler::{ListHandler, MapHandler, TextHandler, TreeHandler}; pub use loro::LoroDoc; pub use oplog::OpLog; @@ -17,7 +18,6 @@ pub use state::DocState; pub mod loro; pub mod obs; pub mod oplog; -mod state; pub mod txn; pub mod change; @@ -42,6 +42,7 @@ pub mod event; pub use error::{LoroError, LoroResult}; pub(crate) mod macros; +pub(crate) mod state; pub(crate) mod value; pub(crate) use change::Timestamp; pub(crate) use id::{PeerID, ID}; @@ -50,7 +51,7 @@ pub(crate) use id::{PeerID, ID}; pub(crate) type InternalString = DefaultAtom; pub use container::ContainerType; -pub use fxhash::FxHashMap; +pub use loro_common::{loro_value, to_value}; pub use value::{ApplyDiff, LoroValue, ToJson}; pub use version::VersionVector; diff --git a/crates/loro-internal/src/loro.rs b/crates/loro-internal/src/loro.rs index 4fceb133..52bf5a10 100644 --- a/crates/loro-internal/src/loro.rs +++ b/crates/loro-internal/src/loro.rs @@ -16,7 +16,9 @@ use crate::{ arena::SharedArena, change::Timestamp, container::{idx::ContainerIdx, IntoContainerId}, - encoding::{EncodeMode, ENCODE_SCHEMA_VERSION, MAGIC_BYTES}, + encoding::{ + decode_snapshot, export_snapshot, parse_header_and_body, EncodeMode, ParsedHeaderAndBody, + }, handler::TextHandler, handler::TreeHandler, id::PeerID, @@ -26,7 +28,6 @@ use crate::{ use super::{ diff_calc::DiffCalculator, - encoding::encode_snapshot::{decode_app_snapshot, encode_app_snapshot}, event::InternalDocDiff, obs::{Observer, SubID, Subscriber}, oplog::OpLog, @@ -58,7 +59,7 @@ pub struct LoroDoc { arena: SharedArena, observer: Arc, diff_calculator: Arc>, - // when dropping the doc, the txn will be commited + // when dropping the doc, the txn will be committed txn: Arc>>, auto_commit: AtomicBool, detached: AtomicBool, @@ -99,15 +100,14 @@ impl LoroDoc { pub fn from_snapshot(bytes: &[u8]) -> LoroResult { let doc = Self::new(); - let (input, mode) = parse_encode_header(bytes)?; - match mode { - EncodeMode::Snapshot => { - decode_app_snapshot(&doc, input, true)?; - Ok(doc) - } - _ => Err(LoroError::DecodeError( + let ParsedHeaderAndBody { mode, body, .. } = parse_header_and_body(bytes)?; + if mode.is_snapshot() { + decode_snapshot(&doc, mode, body)?; + Ok(doc) + } else { + Err(LoroError::DecodeError( "Invalid encode mode".to_string().into(), - )), + )) } } @@ -244,7 +244,7 @@ impl LoroDoc { /// Commit the cumulative auto commit transaction. /// This method only has effect when `auto_commit` is true. - /// If `immediate_renew` is true, a new transaction will be created after the old one is commited + /// If `immediate_renew` is true, a new transaction will be created after the old one is committed pub fn commit_with( &self, origin: Option, @@ -368,13 +368,6 @@ impl LoroDoc { self.import_with(bytes, Default::default()) } - #[inline] - pub fn import_without_state(&mut self, bytes: &[u8]) -> Result<(), LoroError> { - self.commit_then_stop(); - self.detach(); - self.import(bytes) - } - #[inline] pub fn import_with(&self, bytes: &[u8], origin: InternalString) -> Result<(), LoroError> { self.commit_then_stop(); @@ -388,56 +381,122 @@ impl LoroDoc { bytes: &[u8], origin: string_cache::Atom, ) -> Result<(), LoroError> { - let (input, mode) = parse_encode_header(bytes)?; - match mode { - EncodeMode::Updates | EncodeMode::RleUpdates | EncodeMode::CompressedRleUpdates => { + let parsed = parse_header_and_body(bytes)?; + match parsed.mode.is_snapshot() { + false => { // TODO: need to throw error if state is in transaction - debug_log::group!("import to {}", self.peer_id()); - let mut oplog = self.oplog.lock().unwrap(); - let old_vv = oplog.vv().clone(); - let old_frontiers = oplog.frontiers().clone(); - oplog.decode(bytes)?; - if !self.detached.load(Acquire) { - let mut diff = DiffCalculator::default(); - let diff = diff.calc_diff_internal( - &oplog, - &old_vv, - Some(&old_frontiers), - oplog.vv(), - Some(oplog.dag.get_frontiers()), - ); - let mut state = self.state.lock().unwrap(); - state.apply_diff(InternalDocDiff { - origin, - local: false, - diff: (diff).into(), - from_checkout: false, - new_version: Cow::Owned(oplog.frontiers().clone()), - }); - } - + debug_log::group!("Import updates to {}", self.peer_id()); + self.update_oplog_and_apply_delta_to_state_if_needed( + |oplog| oplog.decode(parsed), + origin, + )?; debug_log::group_end!(); } - EncodeMode::Snapshot => { + true => { + debug_log::group!("Import snapshot to {}", self.peer_id()); if self.can_reset_with_snapshot() { - decode_app_snapshot(self, input, !self.detached.load(Acquire))?; + debug_log::debug_log!("Init by snapshot"); + decode_snapshot(self, parsed.mode, parsed.body)?; + } else if parsed.mode == EncodeMode::Snapshot { + debug_log::debug_log!("Import by updates"); + self.update_oplog_and_apply_delta_to_state_if_needed( + |oplog| oplog.decode(parsed), + origin, + )?; } else { + debug_log::debug_log!("Import from new doc"); let app = LoroDoc::new(); - decode_app_snapshot(&app, input, false)?; + decode_snapshot(&app, parsed.mode, parsed.body)?; let oplog = self.oplog.lock().unwrap(); // TODO: PERF: the ser and de can be optimized out let updates = app.export_from(oplog.vv()); drop(oplog); + debug_log::group_end!(); return self.import_with(&updates, origin); } + debug_log::group_end!(); } - EncodeMode::Auto => unreachable!(), }; + let mut state = self.state.lock().unwrap(); self.emit_events(&mut state); Ok(()) } + pub(crate) fn update_oplog_and_apply_delta_to_state_if_needed( + &self, + f: impl FnOnce(&mut OpLog) -> Result<(), LoroError>, + origin: InternalString, + ) -> Result<(), LoroError> { + let mut oplog = self.oplog.lock().unwrap(); + let old_vv = oplog.vv().clone(); + let old_frontiers = oplog.frontiers().clone(); + f(&mut oplog)?; + if !self.detached.load(Acquire) { + let mut diff = DiffCalculator::default(); + let diff = diff.calc_diff_internal( + &oplog, + &old_vv, + Some(&old_frontiers), + oplog.vv(), + Some(oplog.dag.get_frontiers()), + ); + let mut state = self.state.lock().unwrap(); + state.apply_diff(InternalDocDiff { + origin, + local: false, + diff: (diff).into(), + from_checkout: false, + new_version: Cow::Owned(oplog.frontiers().clone()), + }); + } + Ok(()) + } + + /// For fuzzing tests + #[cfg(feature = "test_utils")] + pub fn import_delta_updates_unchecked(&self, body: &[u8]) -> LoroResult<()> { + self.commit_then_stop(); + let mut oplog = self.oplog.lock().unwrap(); + let old_vv = oplog.vv().clone(); + let old_frontiers = oplog.frontiers().clone(); + let ans = oplog.decode(ParsedHeaderAndBody { + checksum: [0; 16], + checksum_body: body, + mode: EncodeMode::Rle, + body, + }); + if ans.is_ok() && !self.detached.load(Acquire) { + let mut diff = DiffCalculator::default(); + let diff = diff.calc_diff_internal( + &oplog, + &old_vv, + Some(&old_frontiers), + oplog.vv(), + Some(oplog.dag.get_frontiers()), + ); + let mut state = self.state.lock().unwrap(); + state.apply_diff(InternalDocDiff { + origin: "".into(), + local: false, + diff: (diff).into(), + from_checkout: false, + new_version: Cow::Owned(oplog.frontiers().clone()), + }); + } + self.renew_txn_if_auto_commit(); + ans + } + + /// For fuzzing tests + #[cfg(feature = "test_utils")] + pub fn import_snapshot_unchecked(&self, bytes: &[u8]) -> LoroResult<()> { + self.commit_then_stop(); + let ans = decode_snapshot(self, EncodeMode::Snapshot, bytes); + self.renew_txn_if_auto_commit(); + ans + } + fn emit_events(&self, state: &mut DocState) { let events = state.take_events(); for event in events { @@ -447,14 +506,7 @@ impl LoroDoc { pub fn export_snapshot(&self) -> Vec { self.commit_then_stop(); - debug_log::group!("export snapshot"); - let version = ENCODE_SCHEMA_VERSION; - let mut ans = Vec::from(MAGIC_BYTES); - // maybe u8 is enough - ans.push(version); - ans.push((EncodeMode::Snapshot).to_byte()); - ans.extend(encode_app_snapshot(self)); - debug_log::group_end!(); + let ans = export_snapshot(self); self.renew_txn_if_auto_commit(); ans } @@ -681,23 +733,6 @@ impl LoroDoc { } } -fn parse_encode_header(bytes: &[u8]) -> Result<(&[u8], EncodeMode), LoroError> { - if bytes.len() <= 6 { - return Err(LoroError::DecodeError("Invalid import data".into())); - } - let (magic_bytes, input) = bytes.split_at(4); - let magic_bytes: [u8; 4] = magic_bytes.try_into().unwrap(); - if magic_bytes != MAGIC_BYTES { - return Err(LoroError::DecodeError("Invalid header bytes".into())); - } - let (version, input) = input.split_at(1); - if version != [ENCODE_SCHEMA_VERSION] { - return Err(LoroError::DecodeError("Invalid version".into())); - } - let mode: EncodeMode = input[0].try_into()?; - Ok((&input[1..], mode)) -} - #[cfg(test)] mod test { use loro_common::ID; diff --git a/crates/loro-internal/src/macros.rs b/crates/loro-internal/src/macros.rs index ff9eaedb..05ac3f5e 100644 --- a/crates/loro-internal/src/macros.rs +++ b/crates/loro-internal/src/macros.rs @@ -81,14 +81,18 @@ macro_rules! array_mut_ref { }}; } -#[test] -fn test_macro() { - let mut arr = vec![100, 101, 102, 103]; - let (a, b, _c) = array_mut_ref!(&mut arr, [1, 2, 3]); - assert_eq!(*a, 101); - assert_eq!(*b, 102); - *a = 50; - *b = 51; - assert!(arr[1] == 50); - assert!(arr[2] == 51); +#[cfg(test)] +mod test { + + #[test] + fn test_macro() { + let mut arr = vec![100, 101, 102, 103]; + let (a, b, _c) = array_mut_ref!(&mut arr, [1, 2, 3]); + assert_eq!(*a, 101); + assert_eq!(*b, 102); + *a = 50; + *b = 51; + assert!(arr[1] == 50); + assert!(arr[2] == 51); + } } diff --git a/crates/loro-internal/src/op.rs b/crates/loro-internal/src/op.rs index 5385fe8b..e768a83b 100644 --- a/crates/loro-internal/src/op.rs +++ b/crates/loro-internal/src/op.rs @@ -6,9 +6,10 @@ use crate::{ }; use crate::{delta::DeltaValue, LoroValue}; use enum_as_inner::EnumAsInner; +use loro_common::IdSpan; use rle::{HasIndex, HasLength, Mergable, Sliceable}; use serde::{ser::SerializeSeq, Deserialize, Serialize}; -use smallvec::{smallvec, SmallVec}; +use smallvec::SmallVec; use std::{borrow::Cow, ops::Range}; mod content; @@ -24,6 +25,30 @@ pub struct Op { pub(crate) content: InnerContent, } +#[derive(Debug, Clone)] +pub(crate) struct OpWithId { + pub peer: PeerID, + pub op: Op, +} + +impl OpWithId { + pub fn id(&self) -> ID { + ID { + peer: self.peer, + counter: self.op.counter, + } + } + + #[allow(unused)] + pub fn id_span(&self) -> IdSpan { + IdSpan::new( + self.peer, + self.op.counter, + self.op.counter + self.op.atom_len() as Counter, + ) + } +} + #[derive(Debug, Clone, Serialize, Deserialize)] pub struct RemoteOp<'a> { pub(crate) counter: Counter, @@ -63,6 +88,7 @@ pub struct OwnedRichOp { impl Op { #[inline] + #[allow(unused)] pub(crate) fn new(id: ID, content: InnerContent, container: ContainerIdx) -> Self { Op { counter: id.counter, @@ -103,7 +129,7 @@ impl HasLength for Op { impl Sliceable for Op { fn slice(&self, from: usize, to: usize) -> Self { - assert!(to > from); + assert!(to > from, "{to} should be greater than {from}"); let content: InnerContent = self.content.slice(from, to); Op { counter: (self.counter + from as Counter), @@ -264,6 +290,14 @@ impl<'a> RichOp<'a> { pub fn end(&self) -> usize { self.end } + + #[allow(unused)] + pub(crate) fn id(&self) -> ID { + ID { + peer: self.peer, + counter: self.op.counter + self.start as Counter, + } + } } impl OwnedRichOp { @@ -314,6 +348,11 @@ impl SliceRange { Self(UNKNOWN_START..UNKNOWN_START + size) } + #[inline(always)] + pub fn new(range: Range) -> Self { + Self(range) + } + #[inline(always)] pub fn to_range(&self) -> Range { self.0.start as usize..self.0.end as usize @@ -420,54 +459,65 @@ impl<'a> Mergable for ListSlice<'a> { } #[derive(Debug, Clone)] -pub struct SliceRanges(pub SmallVec<[SliceRange; 2]>); +pub struct SliceRanges { + pub ranges: SmallVec<[SliceRange; 2]>, + pub id: ID, +} impl Serialize for SliceRanges { fn serialize(&self, serializer: S) -> Result where S: serde::Serializer, { - let mut s = serializer.serialize_seq(Some(self.0.len()))?; - for item in self.0.iter() { + let mut s = serializer.serialize_seq(Some(self.ranges.len()))?; + for item in self.ranges.iter() { s.serialize_element(item)?; } s.end() } } -impl From for SliceRanges { - fn from(value: SliceRange) -> Self { - Self(smallvec![value]) - } -} - impl DeltaValue for SliceRanges { fn value_extend(&mut self, other: Self) -> Result<(), Self> { - self.0.extend(other.0); + if self.id.peer != other.id.peer { + return Err(other); + } + + if self.id.counter + self.length() as Counter != other.id.counter { + return Err(other); + } + + self.ranges.extend(other.ranges); Ok(()) } + // FIXME: this seems wrong fn take(&mut self, target_len: usize) -> Self { - let mut ret = SmallVec::new(); + let mut right = Self { + ranges: Default::default(), + id: self.id.inc(target_len as i32), + }; let mut cur_len = 0; while cur_len < target_len { - let range = self.0.pop().unwrap(); + let range = self.ranges.pop().unwrap(); let range_len = range.content_len(); if cur_len + range_len <= target_len { - ret.push(range); + right.ranges.push(range); cur_len += range_len; } else { - let new_range = range.slice(0, target_len - cur_len); - ret.push(new_range); - self.0.push(range.slice(target_len - cur_len, range_len)); + let new_range = range.slice(target_len - cur_len, range_len); + right.ranges.push(new_range); + self.ranges.push(range.slice(0, target_len - cur_len)); cur_len = target_len; } } - SliceRanges(ret) + + std::mem::swap(self, &mut right); + right // now it's left } fn length(&self) -> usize { - self.0.iter().fold(0, |acc, x| acc + x.atom_len()) + self.ranges.iter().fold(0, |acc, x| acc + x.atom_len()) } } diff --git a/crates/loro-internal/src/op/content.rs b/crates/loro-internal/src/op/content.rs index abb79e9f..bf8586ba 100644 --- a/crates/loro-internal/src/op/content.rs +++ b/crates/loro-internal/src/op/content.rs @@ -6,7 +6,7 @@ use serde::{Deserialize, Serialize}; use crate::container::{ list::list_op::{InnerListOp, ListOp}, - map::{InnerMapSet, MapSet}, + map::MapSet, tree::tree_op::TreeOp, }; @@ -24,7 +24,7 @@ pub enum ContentType { #[derive(EnumAsInner, Debug, Clone)] pub enum InnerContent { List(InnerListOp), - Map(InnerMapSet), + Map(MapSet), Tree(TreeOp), } diff --git a/crates/loro-internal/src/oplog.rs b/crates/loro-internal/src/oplog.rs index 7dce9dd5..b908ae8c 100644 --- a/crates/loro-internal/src/oplog.rs +++ b/crates/loro-internal/src/oplog.rs @@ -9,16 +9,17 @@ use std::rc::Rc; use std::sync::Mutex; use fxhash::FxHashMap; +use loro_common::{HasCounter, HasId}; use rle::{HasLength, RleCollection, RlePush, RleVec, Sliceable}; use smallvec::SmallVec; // use tabled::measurment::Percent; use crate::change::{Change, Lamport, Timestamp}; use crate::container::list::list_op; -use crate::dag::DagUtils; +use crate::dag::{Dag, DagUtils}; use crate::diff_calc::tree::MoveLamportAndID; use crate::diff_calc::TreeDiffCache; -use crate::encoding::RemoteClientChanges; +use crate::encoding::ParsedHeaderAndBody; use crate::encoding::{decode_oplog, encode_oplog, EncodeMode}; use crate::id::{Counter, PeerID, ID}; use crate::op::{ListSlice, RawOpContent, RemoteOp}; @@ -72,6 +73,8 @@ pub struct AppDagNode { pub(crate) lamport: Lamport, pub(crate) deps: Frontiers, pub(crate) vv: ImVersionVector, + /// A flag indicating whether any other nodes depend on this node. + /// The calculation of frontiers is based on this value. pub(crate) has_succ: bool, pub(crate) len: usize, } @@ -207,7 +210,13 @@ impl OpLog { } /// This is the only place to update the `OpLog.changes` - pub(crate) fn insert_new_change(&mut self, mut change: Change, _: EnsureChangeDepsAreAtTheEnd) { + pub(crate) fn insert_new_change( + &mut self, + mut change: Change, + _: EnsureChangeDepsAreAtTheEnd, + local: bool, + ) { + self.update_tree_cache(&change, local); let entry = self.changes.entry(change.id.peer).or_default(); match entry.last_mut() { Some(last) => { @@ -217,6 +226,7 @@ impl OpLog { "change id is not continuous" ); let timestamp_change = change.timestamp - last.timestamp; + // TODO: make this a config if !last.has_dependents && change.deps_on_self() && timestamp_change < 1000 { for op in take(change.ops.vec_mut()) { last.ops.push(op); @@ -242,7 +252,7 @@ impl OpLog { /// /// - Return Err(LoroError::UsedOpID) when the change's id is occupied /// - Return Err(LoroError::DecodeError) when the change's deps are missing - pub fn import_local_change(&mut self, change: Change, from_txn: bool) -> Result<(), LoroError> { + pub fn import_local_change(&mut self, change: Change, local: bool) -> Result<(), LoroError> { let Some(change) = self.trim_the_known_part_of_change(change) else { return Ok(()); }; @@ -268,8 +278,12 @@ impl OpLog { self.dag.frontiers.retain_non_included(&change.deps); self.dag.frontiers.filter_peer(change.id.peer); self.dag.frontiers.push(change.id_last()); - let mark = self.insert_dag_node_on_new_change(&change); + let mark = self.update_dag_on_new_change(&change); + self.insert_new_change(change, mark, local); + Ok(()) + } + fn update_tree_cache(&mut self, change: &Change, local: bool) { // Update tree cache let mut tree_cache = self.tree_parent_cache.lock().unwrap(); for op in change.ops().iter() { @@ -285,21 +299,18 @@ impl OpLog { parent: tree.parent, effected: true, }; - if from_txn { - tree_cache.add_node_uncheck(node); + if local { + tree_cache.add_node_from_local(node); } else { tree_cache.add_node(node); } } } - drop(tree_cache); - self.insert_new_change(change, mark); - Ok(()) } /// Every time we import a new change, it should run this function to update the dag - pub(crate) fn insert_dag_node_on_new_change( + pub(crate) fn update_dag_on_new_change( &mut self, change: &Change, ) -> EnsureChangeDepsAreAtTheEnd { @@ -319,7 +330,7 @@ impl OpLog { change.lamport, "lamport is not continuous" ); - last.len = change.id.counter as usize + len - last.cnt as usize; + last.len = (change.id.counter - last.cnt) as usize + len; last.has_succ = false; } else { let vv = self.dag.frontiers_to_im_vv(&change.deps); @@ -449,41 +460,6 @@ impl OpLog { self.dag.cmp_frontiers(other) } - pub(crate) fn export_changes_from(&self, from: &VersionVector) -> RemoteClientChanges { - let mut changes = RemoteClientChanges::default(); - for (&peer, &cnt) in self.vv().iter() { - let start_cnt = from.get(&peer).copied().unwrap_or(0); - if cnt <= start_cnt { - continue; - } - - let mut temp = Vec::new(); - if let Some(peer_changes) = self.changes.get(&peer) { - if let Some(result) = peer_changes.get_by_atom_index(start_cnt) { - for change in &peer_changes[result.merged_index..] { - if change.id.counter < start_cnt { - if change.id.counter + change.atom_len() as Counter <= start_cnt { - continue; - } - - let sliced = change - .slice((start_cnt - change.id.counter) as usize, change.atom_len()); - temp.push(self.convert_change_to_remote(&sliced)); - } else { - temp.push(self.convert_change_to_remote(change)); - } - } - } - } - - if !temp.is_empty() { - changes.insert(peer, temp); - } - } - - changes - } - pub(crate) fn get_min_lamport_at(&self, id: ID) -> Lamport { self.get_change_at(id).map(|c| c.lamport).unwrap_or(0) } @@ -602,7 +578,7 @@ impl OpLog { } }, crate::op::InnerContent::Map(map) => { - let value = map.value.and_then(|v| self.arena.get_value(v as usize)); + let value = map.value.clone(); contents.push(RawOpContent::Map(crate::container::map::MapSet { key: map.key.clone(), value, @@ -622,35 +598,12 @@ impl OpLog { ans } - // Changes are expected to be sorted by counter in each value in the hashmap - // They should also be continuous (TODO: check this) - pub(crate) fn import_remote_changes( + pub(crate) fn import_unknown_lamport_pending_changes( &mut self, - remote_changes: RemoteClientChanges, - ) -> Result<(), LoroError> { - // check whether we can append the new changes - self.check_changes(&remote_changes)?; - let latest_vv = self.dag.vv.clone(); - // op_converter is faster than using arena directly - let ids = self.arena.clone().with_op_converter(|converter| { - self.apply_appliable_changes_and_cache_pending(remote_changes, converter, latest_vv) - }); - let mut latest_vv = self.dag.vv.clone(); - self.try_apply_pending(ids, &mut latest_vv); - if !self.batch_importing { - self.dag.refresh_frontiers(); - } - Ok(()) - } - - pub(crate) fn import_unknown_lamport_remote_changes( - &mut self, - remote_changes: Vec>, + remote_changes: Vec, ) -> Result<(), LoroError> { let latest_vv = self.dag.vv.clone(); - self.arena.clone().with_op_converter(|converter| { - self.extend_pending_changes_with_unknown_lamport(remote_changes, converter, &latest_vv) - }); + self.extend_pending_changes_with_unknown_lamport(remote_changes, &latest_vv); Ok(()) } @@ -677,12 +630,12 @@ impl OpLog { } #[inline(always)] - pub fn export_from(&self, vv: &VersionVector) -> Vec { + pub(crate) fn export_from(&self, vv: &VersionVector) -> Vec { encode_oplog(self, vv, EncodeMode::Auto) } #[inline(always)] - pub fn decode(&mut self, data: &[u8]) -> Result<(), LoroError> { + pub(crate) fn decode(&mut self, data: ParsedHeaderAndBody) -> Result<(), LoroError> { decode_oplog(self, data) } @@ -807,52 +760,7 @@ impl OpLog { ) } - pub(crate) fn iter_causally( - &self, - from: VersionVector, - to: VersionVector, - ) -> impl Iterator>)> { - let from_frontiers = from.to_frontiers(&self.dag); - let diff = from.diff(&to).right; - let mut iter = self.dag.iter_causal(&from_frontiers, diff); - let mut node = iter.next(); - let mut cur_cnt = 0; - let vv = Rc::new(RefCell::new(VersionVector::default())); - std::iter::from_fn(move || { - if let Some(inner) = &node { - let mut inner_vv = vv.borrow_mut(); - inner_vv.clear(); - inner_vv.extend_to_include_vv(inner.data.vv.iter()); - let peer = inner.data.peer; - let cnt = inner - .data - .cnt - .max(cur_cnt) - .max(from.get(&peer).copied().unwrap_or(0)); - let end = (inner.data.cnt + inner.data.len as Counter) - .min(to.get(&peer).copied().unwrap_or(0)); - let change = self - .changes - .get(&peer) - .and_then(|x| x.get_by_atom_index(cnt).map(|x| x.element)) - .unwrap(); - - if change.ctr_end() < end { - cur_cnt = change.ctr_end(); - } else { - node = iter.next(); - cur_cnt = 0; - } - - inner_vv.extend_to_include_end_id(change.id); - Some((change, vv.clone())) - } else { - None - } - }) - } - - pub(crate) fn len_changes(&self) -> usize { + pub fn len_changes(&self) -> usize { self.changes.values().map(|x| x.len()).sum() } @@ -880,6 +788,33 @@ impl OpLog { total_dag_node, } } + + #[allow(unused)] + pub(crate) fn debug_check(&self) { + for (_, changes) in self.changes().iter() { + let c = changes.last().unwrap(); + let node = self.dag.get(c.id_start()).unwrap(); + assert_eq!(c.id_end(), node.id_end()); + } + } + + pub(crate) fn iter_changes<'a>( + &'a self, + from: &VersionVector, + to: &VersionVector, + ) -> impl Iterator + 'a { + let spans: Vec<_> = from.diff_iter(to).1.collect(); + spans.into_iter().flat_map(move |span| { + let peer = span.client_id; + let cnt = span.counter.start; + let end_cnt = span.counter.end; + let peer_changes = self.changes.get(&peer).unwrap(); + let index = peer_changes.search_atom_index(cnt); + peer_changes[index..] + .iter() + .take_while(move |x| x.ctr_start() < end_cnt) + }) + } } #[derive(Debug)] diff --git a/crates/loro-internal/src/oplog/dag.rs b/crates/loro-internal/src/oplog/dag.rs index 74d7c1b5..4d7509ff 100644 --- a/crates/loro-internal/src/oplog/dag.rs +++ b/crates/loro-internal/src/oplog/dag.rs @@ -131,11 +131,27 @@ impl AppDag { pub fn get_lamport(&self, id: &ID) -> Option { self.map.get(&id.peer).and_then(|rle| { - rle.get_by_atom_index(id.counter) - .map(|x| x.element.lamport + (id.counter - x.element.cnt) as Lamport) + rle.get_by_atom_index(id.counter).and_then(|node| { + assert!(id.counter >= node.element.cnt); + if node.element.cnt + node.element.len as Counter > id.counter { + Some(node.element.lamport + (id.counter - node.element.cnt) as Lamport) + } else { + None + } + }) }) } + pub fn get_change_lamport_from_deps(&self, deps: &[ID]) -> Option { + let mut lamport = 0; + for id in deps.iter() { + let x = self.get_lamport(id)?; + lamport = lamport.max(x + 1); + } + + Some(lamport) + } + /// Convert a frontiers to a version vector /// /// If the frontiers version is not found in the dag, return None diff --git a/crates/loro-internal/src/oplog/pending_changes.rs b/crates/loro-internal/src/oplog/pending_changes.rs index 102035a6..c14a6f4b 100644 --- a/crates/loro-internal/src/oplog/pending_changes.rs +++ b/crates/loro-internal/src/oplog/pending_changes.rs @@ -1,15 +1,8 @@ use std::{collections::BTreeMap, ops::Deref}; -use crate::{ - arena::OpConverter, change::Change, encoding::RemoteClientChanges, op::RemoteOp, OpLog, - VersionVector, -}; +use crate::{change::Change, OpLog, VersionVector}; use fxhash::FxHashMap; -use itertools::Itertools; -use loro_common::{ - Counter, CounterSpan, HasCounterSpan, HasIdSpan, HasLamportSpan, LoroError, PeerID, ID, -}; -use rle::RleVec; +use loro_common::{Counter, CounterSpan, HasCounterSpan, HasIdSpan, HasLamportSpan, PeerID, ID}; use smallvec::SmallVec; #[derive(Debug)] @@ -17,6 +10,8 @@ pub enum PendingChange { // The lamport of the change decoded by `enhanced` is unknown. // we need calculate it when the change can be applied Unknown(Change), + // TODO: Refactor, remove this? + #[allow(unused)] Known(Change), } @@ -35,54 +30,22 @@ pub(crate) struct PendingChanges { changes: FxHashMap>>, } -impl OpLog { - // calculate all `id_last`(s) whose change can be applied - pub(super) fn apply_appliable_changes_and_cache_pending( - &mut self, - remote_changes: RemoteClientChanges, - converter: &mut OpConverter, - mut latest_vv: VersionVector, - ) -> Vec { - let mut ans = Vec::new(); - for change in remote_changes - .into_values() - .filter(|c| !c.is_empty()) - .flat_map(|c| c.into_iter()) - .sorted_unstable_by_key(|c| c.lamport) - { - let local_change = to_local_op(change, converter); - let local_change = PendingChange::Known(local_change); - match remote_change_apply_state(&latest_vv, &local_change) { - ChangeApplyState::CanApplyDirectly => { - latest_vv.set_end(local_change.id_end()); - ans.push(local_change.id_last()); - self.apply_local_change_from_remote(local_change); - } - ChangeApplyState::Applied => {} - ChangeApplyState::AwaitingDependency(miss_dep) => self - .pending_changes - .changes - .entry(miss_dep.peer) - .or_default() - .entry(miss_dep.counter) - .or_default() - .push(local_change), - } - } - ans +impl PendingChanges { + pub fn is_empty(&self) -> bool { + self.changes.is_empty() } +} +impl OpLog { pub(super) fn extend_pending_changes_with_unknown_lamport( &mut self, - remote_changes: Vec>, - converter: &mut OpConverter, + remote_changes: Vec, latest_vv: &VersionVector, ) { for change in remote_changes { - let local_change = to_local_op(change, converter); - let local_change = PendingChange::Unknown(local_change); + let local_change = PendingChange::Unknown(change); match remote_change_apply_state(latest_vv, &local_change) { - ChangeApplyState::AwaitingDependency(miss_dep) => self + ChangeState::AwaitingMissingDependency(miss_dep) => self .pending_changes .changes .entry(miss_dep.peer) @@ -96,42 +59,20 @@ impl OpLog { } } +/// This struct indicates that the dag frontiers should be updated after the change is applied. +#[must_use] +pub(crate) struct ShouldUpdateDagFrontiers { + pub(crate) should_update: bool, +} + impl OpLog { - pub(super) fn check_changes(&self, changes: &RemoteClientChanges) -> Result<(), LoroError> { - for changes in changes.values() { - if changes.is_empty() { - continue; - } - // detect invalid id - let mut last_end_counter = None; - for change in changes.iter() { - if change.id.counter < 0 { - return Err(LoroError::DecodeError( - "Invalid data. Negative id counter.".into(), - )); - } - if let Some(last_end_counter) = &mut last_end_counter { - if change.id.counter != *last_end_counter { - return Err(LoroError::DecodeError( - "Invalid data. Not continuous counter.".into(), - )); - } - - *last_end_counter = change.id_end().counter; - } else { - last_end_counter = Some(change.id_end().counter); - } - } - } - Ok(()) - } - - pub(crate) fn try_apply_pending( - &mut self, - mut id_stack: Vec, - latest_vv: &mut VersionVector, - ) { - while let Some(id) = id_stack.pop() { + /// Try to apply pending changes. + /// + /// `new_ids` are the ID of the op that is just applied. + pub(crate) fn try_apply_pending(&mut self, mut new_ids: Vec) -> ShouldUpdateDagFrontiers { + let mut latest_vv = self.dag.vv.clone(); + let mut updated = false; + while let Some(id) = new_ids.pop() { let Some(tree) = self.pending_changes.changes.get_mut(&id.peer) else { continue; }; @@ -152,14 +93,15 @@ impl OpLog { for pending_changes in pending_set { for pending_change in pending_changes { - match remote_change_apply_state(latest_vv, &pending_change) { - ChangeApplyState::CanApplyDirectly => { - id_stack.push(pending_change.id_last()); + match remote_change_apply_state(&latest_vv, &pending_change) { + ChangeState::CanApplyDirectly => { + new_ids.push(pending_change.id_last()); latest_vv.set_end(pending_change.id_end()); self.apply_local_change_from_remote(pending_change); + updated = true; } - ChangeApplyState::Applied => {} - ChangeApplyState::AwaitingDependency(miss_dep) => self + ChangeState::Applied => {} + ChangeState::AwaitingMissingDependency(miss_dep) => self .pending_changes .changes .entry(miss_dep.peer) @@ -171,6 +113,10 @@ impl OpLog { } } } + + ShouldUpdateDagFrontiers { + should_update: updated, + } } pub(super) fn apply_local_change_from_remote(&mut self, change: PendingChange) { @@ -192,59 +138,35 @@ impl OpLog { // debug_dbg!(&change_causal_arr); self.dag.vv.extend_to_include_last_id(change.id_last()); self.latest_timestamp = self.latest_timestamp.max(change.timestamp); - let mark = self.insert_dag_node_on_new_change(&change); - self.insert_new_change(change, mark); + let mark = self.update_dag_on_new_change(&change); + self.insert_new_change(change, mark, false); } } -pub(super) fn to_local_op(change: Change, converter: &mut OpConverter) -> Change { - let mut ops = RleVec::new(); - for op in change.ops { - let lamport = change.lamport; - let content = op.content; - let op = converter.convert_single_op( - &op.container, - change.id.peer, - op.counter, - lamport, - content, - ); - ops.push(op); - } - Change { - ops, - id: change.id, - deps: change.deps, - lamport: change.lamport, - timestamp: change.timestamp, - has_dependents: false, - } -} - -enum ChangeApplyState { +enum ChangeState { Applied, CanApplyDirectly, // The id of first missing dep - AwaitingDependency(ID), + AwaitingMissingDependency(ID), } -fn remote_change_apply_state(vv: &VersionVector, change: &Change) -> ChangeApplyState { +fn remote_change_apply_state(vv: &VersionVector, change: &Change) -> ChangeState { let peer = change.id.peer; let CounterSpan { start, end } = change.ctr_span(); let vv_latest_ctr = vv.get(&peer).copied().unwrap_or(0); if vv_latest_ctr < start { - return ChangeApplyState::AwaitingDependency(change.id.inc(-1)); + return ChangeState::AwaitingMissingDependency(change.id.inc(-1)); } if vv_latest_ctr >= end { - return ChangeApplyState::Applied; + return ChangeState::Applied; } for dep in change.deps.as_ref().iter() { let dep_vv_latest_ctr = vv.get(&dep.peer).copied().unwrap_or(0); if dep_vv_latest_ctr - 1 < dep.counter { - return ChangeApplyState::AwaitingDependency(*dep); + return ChangeState::AwaitingMissingDependency(*dep); } } - ChangeApplyState::CanApplyDirectly + ChangeState::CanApplyDirectly } #[cfg(test)] diff --git a/crates/loro-internal/src/state.rs b/crates/loro-internal/src/state.rs index 8123d70b..b46a6f37 100644 --- a/crates/loro-internal/src/state.rs +++ b/crates/loro-internal/src/state.rs @@ -6,11 +6,12 @@ use std::{ use enum_as_inner::EnumAsInner; use enum_dispatch::enum_dispatch; use fxhash::{FxHashMap, FxHashSet}; -use loro_common::{ContainerID, LoroResult}; +use loro_common::{ContainerID, LoroError, LoroResult}; use crate::{ configure::{DefaultRandom, SecureRandomGenerator}, container::{idx::ContainerIdx, ContainerIdRaw}, + encoding::{StateSnapshotDecodeContext, StateSnapshotEncoder}, event::Index, event::{Diff, InternalContainerDiff, InternalDiff}, fx_map, @@ -55,6 +56,10 @@ pub struct DocState { #[enum_dispatch] pub(crate) trait ContainerState: Clone { + fn container_idx(&self) -> ContainerIdx; + + fn is_state_empty(&self) -> bool; + fn apply_diff_and_convert( &mut self, diff: InternalDiff, @@ -103,6 +108,16 @@ pub(crate) trait ContainerState: Clone { fn get_child_containers(&self) -> Vec { Vec::new() } + + /// Encode the ops and the blob that can be used to restore the state to the current state. + /// + /// State will use the provided encoder to encode the ops and export a blob. + /// The ops should be encoded into the snapshot as well as the blob. + /// The users then can use the ops and the blob to restore the state to the current state. + fn encode_snapshot(&self, encoder: StateSnapshotEncoder) -> Vec; + + /// Restore the state to the state represented by the ops and the blob that exported by `get_snapshot_ops` + fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext); } #[allow(clippy::enum_variant_names)] @@ -399,6 +414,24 @@ impl DocState { self.in_txn = true; } + pub fn iter(&self) -> impl Iterator { + self.states.values() + } + + pub fn iter_mut(&mut self) -> impl Iterator { + self.states.values_mut() + } + + pub(crate) fn init_container( + &mut self, + cid: ContainerID, + decode_ctx: StateSnapshotDecodeContext, + ) { + let idx = self.arena.register_container(&cid); + let state = self.states.entry(idx).or_insert_with(|| create_state(idx)); + state.import_from_snapshot_ops(decode_ctx); + } + #[inline] pub(crate) fn abort_txn(&mut self) { for container_idx in std::mem::take(&mut self.changed_idx_in_txn) { @@ -798,6 +831,24 @@ impl DocState { debug_log::group_end!(); Some(ans) } + + pub(crate) fn check_before_decode_snapshot(&self) -> LoroResult<()> { + if self.is_in_txn() { + return Err(LoroError::DecodeError( + "State is in txn".to_string().into_boxed_str(), + )); + } + + if !self.is_empty() { + return Err(LoroError::DecodeError( + "State is not empty, cannot import snapshot directly" + .to_string() + .into_boxed_str(), + )); + } + + Ok(()) + } } struct SubContainerDiffPatch { @@ -896,7 +947,7 @@ pub fn create_state(idx: ContainerIdx) -> State { ContainerType::Map => State::MapState(MapState::new(idx)), ContainerType::List => State::ListState(ListState::new(idx)), ContainerType::Text => State::RichtextState(RichtextState::new(idx)), - ContainerType::Tree => State::TreeState(TreeState::new()), + ContainerType::Tree => State::TreeState(TreeState::new(idx)), } } diff --git a/crates/loro-internal/src/state/list_state.rs b/crates/loro-internal/src/state/list_state.rs index a0639500..c57e3596 100644 --- a/crates/loro-internal/src/state/list_state.rs +++ b/crates/loro-internal/src/state/list_state.rs @@ -8,6 +8,7 @@ use crate::{ arena::SharedArena, container::{idx::ContainerIdx, ContainerID}, delta::Delta, + encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder}, event::{Diff, Index, InternalDiff}, handler::ValueOrContainer, op::{ListSlice, Op, RawOp, RawOpContent}, @@ -21,7 +22,7 @@ use generic_btree::{ rle::{HasLength, Mergeable, Sliceable}, BTree, BTreeTrait, Cursor, LeafIndex, LengthFinder, UseLengthFinder, }; -use loro_common::LoroResult; +use loro_common::{IdSpan, LoroResult, ID}; #[derive(Debug)] pub struct ListState { @@ -46,13 +47,21 @@ impl Clone for ListState { #[derive(Debug)] enum UndoItem { - Insert { index: usize, len: usize }, - Delete { index: usize, value: LoroValue }, + Insert { + index: usize, + len: usize, + }, + Delete { + index: usize, + value: LoroValue, + id: ID, + }, } #[derive(Debug, Clone)] -struct Elem { - v: LoroValue, +pub(crate) struct Elem { + pub v: LoroValue, + pub id: ID, } impl HasLength for Elem { @@ -171,9 +180,16 @@ impl ListState { Some(index as usize) } - pub fn insert(&mut self, index: usize, value: LoroValue) { + pub fn insert(&mut self, index: usize, value: LoroValue, id: ID) { + if index > self.len() { + panic!("Index {index} out of range. The length is {}", self.len()); + } + if self.list.is_empty() { - let idx = self.list.push(Elem { v: value.clone() }); + let idx = self.list.push(Elem { + v: value.clone(), + id, + }); if value.is_container() { self.child_container_to_leaf @@ -182,9 +198,13 @@ impl ListState { return; } - let (leaf, data) = self - .list - .insert::(&index, Elem { v: value.clone() }); + let (leaf, data) = self.list.insert::( + &index, + Elem { + v: value.clone(), + id, + }, + ); if value.is_container() { self.child_container_to_leaf @@ -210,7 +230,11 @@ impl ListState { let elem = self.list.remove_leaf(leaf.unwrap().cursor).unwrap(); let value = elem.v; if self.in_txn { - self.undo_stack.push(UndoItem::Delete { index, value }); + self.undo_stack.push(UndoItem::Delete { + index, + value, + id: elem.id, + }); } } @@ -239,6 +263,7 @@ impl ListState { self.undo_stack.push(UndoItem::Delete { index: start, value: elem.v, + id: elem.id, }) } } else { @@ -252,9 +277,11 @@ impl ListState { // PERF: use &[LoroValue] // PERF: batch - pub fn insert_batch(&mut self, index: usize, values: Vec) { + pub fn insert_batch(&mut self, index: usize, values: Vec, start_id: ID) { + let mut id = start_id; for (i, value) in values.into_iter().enumerate() { - self.insert(index + i, value); + self.insert(index + i, value, id); + id = id.inc(1); } } @@ -262,6 +289,11 @@ impl ListState { self.list.iter().map(|x| &x.v) } + #[allow(unused)] + pub(crate) fn iter_with_id(&self) -> impl Iterator { + self.list.iter() + } + pub fn len(&self) -> usize { *self.list.root_cache() as usize } @@ -294,6 +326,14 @@ impl ListState { } impl ContainerState for ListState { + fn container_idx(&self) -> ContainerIdx { + self.idx + } + + fn is_state_empty(&self) -> bool { + self.list.is_empty() + } + fn apply_diff_and_convert( &mut self, diff: InternalDiff, @@ -301,7 +341,7 @@ impl ContainerState for ListState { txn: &Weak>>, state: &Weak>, ) -> Diff { - let InternalDiff::SeqRaw(delta) = diff else { + let InternalDiff::ListRaw(delta) = diff else { unreachable!() }; let mut ans: Delta<_> = Delta::default(); @@ -314,7 +354,7 @@ impl ContainerState for ListState { } crate::delta::DeltaItem::Insert { insert: value, .. } => { let mut arr = Vec::new(); - for slices in value.0.iter() { + for slices in value.ranges.iter() { for i in slices.0.start..slices.0.end { let value = arena.get_value(i as usize).unwrap(); if value.is_container() { @@ -331,7 +371,7 @@ impl ContainerState for ListState { .collect::>(), ); let len = arr.len(); - self.insert_batch(index, arr); + self.insert_batch(index, arr, value.id); index += len; } crate::delta::DeltaItem::Delete { delete: len, .. } => { @@ -351,8 +391,9 @@ impl ContainerState for ListState { _txn: &Weak>>, _state: &Weak>, ) { + // debug_log::debug_dbg!(&diff); match diff { - InternalDiff::SeqRaw(delta) => { + InternalDiff::ListRaw(delta) => { let mut index = 0; for span in delta.iter() { match span { @@ -361,7 +402,7 @@ impl ContainerState for ListState { } crate::delta::DeltaItem::Insert { insert: value, .. } => { let mut arr = Vec::new(); - for slices in value.0.iter() { + for slices in value.ranges.iter() { for i in slices.0.start..slices.0.end { let value = arena.get_value(i as usize).unwrap(); if value.is_container() { @@ -374,7 +415,7 @@ impl ContainerState for ListState { } let len = arr.len(); - self.insert_batch(index, arr); + self.insert_batch(index, arr, value.id); index += len; } crate::delta::DeltaItem::Delete { delete: len, .. } => { @@ -402,7 +443,7 @@ impl ContainerState for ListState { arena.set_parent(idx, Some(self.idx)); } } - self.insert_batch(*pos, list.to_vec()); + self.insert_batch(*pos, list.to_vec(), op.id); } std::borrow::Cow::Owned(list) => { for value in list.iter() { @@ -412,7 +453,7 @@ impl ContainerState for ListState { arena.set_parent(idx, Some(self.idx)); } } - self.insert_batch(*pos, list.clone()); + self.insert_batch(*pos, list.clone(), op.id); } }, _ => unreachable!(), @@ -424,39 +465,9 @@ impl ContainerState for ListState { crate::container::list::list_op::ListOp::StyleEnd { .. } => unreachable!(), }, } - debug_log::debug_dbg!(&self); Ok(()) } - #[doc = " Start a transaction"] - #[doc = ""] - #[doc = " The transaction may be aborted later, then all the ops during this transaction need to be undone."] - fn start_txn(&mut self) { - self.in_txn = true; - } - - fn abort_txn(&mut self) { - self.in_txn = false; - while let Some(op) = self.undo_stack.pop() { - match op { - UndoItem::Insert { index, len } => { - self.delete_range(index..index + len); - } - UndoItem::Delete { index, value } => self.insert(index, value), - } - } - } - - fn commit_txn(&mut self) { - self.undo_stack.clear(); - self.in_txn = false; - } - - fn get_value(&mut self) -> LoroValue { - let ans = self.to_vec(); - LoroValue::List(Arc::new(ans)) - } - #[doc = " Convert a state to a diff that when apply this diff on a empty state,"] #[doc = " the state will be the same as this state."] fn to_diff( @@ -475,6 +486,35 @@ impl ContainerState for ListState { ) } + #[doc = " Start a transaction"] + #[doc = ""] + #[doc = " The transaction may be aborted later, then all the ops during this transaction need to be undone."] + fn start_txn(&mut self) { + self.in_txn = true; + } + + fn abort_txn(&mut self) { + self.in_txn = false; + while let Some(op) = self.undo_stack.pop() { + match op { + UndoItem::Insert { index, len } => { + self.delete_range(index..index + len); + } + UndoItem::Delete { index, value, id } => self.insert(index, value, id), + } + } + } + + fn commit_txn(&mut self) { + self.undo_stack.clear(); + self.in_txn = false; + } + + fn get_value(&mut self) -> LoroValue { + let ans = self.to_vec(); + LoroValue::List(Arc::new(ans)) + } + fn get_child_index(&self, id: &ContainerID) -> Option { self.get_child_container_index(id).map(Index::Seq) } @@ -488,6 +528,32 @@ impl ContainerState for ListState { } ans } + + #[doc = "Get a list of ops that can be used to restore the state to the current state"] + fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec { + for elem in self.list.iter() { + let id_span: IdSpan = elem.id.into(); + encoder.encode_op(id_span, || unimplemented!()); + } + + Vec::new() + } + + #[doc = "Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"] + fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) { + assert_eq!(ctx.mode, EncodeMode::Snapshot); + let mut index = 0; + for op in ctx.ops { + let value = op.op.content.as_list().unwrap().as_insert().unwrap().0; + let list = ctx + .oplog + .arena + .get_values(value.0.start as usize..value.0.end as usize); + let len = list.len(); + self.insert_batch(index, list, op.id()); + index += len; + } + } } #[cfg(test)] @@ -503,11 +569,11 @@ mod test { fn id(name: &str) -> ContainerID { ContainerID::new_root(name, crate::ContainerType::List) } - list.insert(0, LoroValue::Container(id("abc"))); - list.insert(0, LoroValue::Container(id("x"))); + list.insert(0, LoroValue::Container(id("abc")), ID::new(0, 0)); + list.insert(0, LoroValue::Container(id("x")), ID::new(0, 0)); assert_eq!(list.get_child_container_index(&id("x")), Some(0)); assert_eq!(list.get_child_container_index(&id("abc")), Some(1)); - list.insert(1, LoroValue::Bool(false)); + list.insert(1, LoroValue::Bool(false), ID::new(0, 0)); assert_eq!(list.get_child_container_index(&id("x")), Some(0)); assert_eq!(list.get_child_container_index(&id("abc")), Some(2)); } diff --git a/crates/loro-internal/src/state/map_state.rs b/crates/loro-internal/src/state/map_state.rs index 6f937d1f..c08ced0d 100644 --- a/crates/loro-internal/src/state/map_state.rs +++ b/crates/loro-internal/src/state/map_state.rs @@ -5,15 +5,18 @@ use std::{ use fxhash::FxHashMap; use loro_common::{ContainerID, LoroResult}; +use rle::HasLength; use crate::{ arena::SharedArena, container::{idx::ContainerIdx, map::MapSet}, delta::{MapValue, ResolvedMapDelta, ResolvedMapValue}, + encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder}, event::{Diff, Index, InternalDiff}, handler::ValueOrContainer, op::{Op, RawOp, RawOpContent}, txn::Transaction, + utils::delta_rle_encoded_num::DeltaRleEncodedNums, DocState, InternalString, LoroValue, }; @@ -28,6 +31,14 @@ pub struct MapState { } impl ContainerState for MapState { + fn container_idx(&self) -> ContainerIdx { + self.idx + } + + fn is_state_empty(&self) -> bool { + self.map.is_empty() + } + fn apply_diff_and_convert( &mut self, diff: InternalDiff, @@ -97,6 +108,24 @@ impl ContainerState for MapState { } } + #[doc = " Convert a state to a diff that when apply this diff on a empty state,"] + #[doc = " the state will be the same as this state."] + fn to_diff( + &mut self, + arena: &SharedArena, + txn: &Weak>>, + state: &Weak>, + ) -> Diff { + Diff::Map(ResolvedMapDelta { + updated: self + .map + .clone() + .into_iter() + .map(|(k, v)| (k, ResolvedMapValue::from_map_value(v, arena, txn, state))) + .collect::>(), + }) + } + fn start_txn(&mut self) { self.in_txn = true; } @@ -123,24 +152,6 @@ impl ContainerState for MapState { LoroValue::Map(Arc::new(ans)) } - #[doc = " Convert a state to a diff that when apply this diff on a empty state,"] - #[doc = " the state will be the same as this state."] - fn to_diff( - &mut self, - arena: &SharedArena, - txn: &Weak>>, - state: &Weak>, - ) -> Diff { - Diff::Map(ResolvedMapDelta { - updated: self - .map - .clone() - .into_iter() - .map(|(k, v)| (k, ResolvedMapValue::from_map_value(v, arena, txn, state))) - .collect::>(), - }) - } - fn get_child_index(&self, id: &ContainerID) -> Option { for (key, value) in self.map.iter() { if let Some(LoroValue::Container(x)) = &value.value { @@ -162,6 +173,41 @@ impl ContainerState for MapState { } ans } + + #[doc = " Get a list of ops that can be used to restore the state to the current state"] + fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec { + let mut lamports = DeltaRleEncodedNums::new(); + for v in self.map.values() { + lamports.push(v.lamport.0); + encoder.encode_op(v.id().into(), || unimplemented!()); + } + + lamports.encode() + } + + #[doc = " Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"] + fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) { + assert_eq!(ctx.mode, EncodeMode::Snapshot); + let lamports = DeltaRleEncodedNums::decode(ctx.blob); + let mut iter = lamports.iter(); + for op in ctx.ops { + debug_assert_eq!( + op.op.atom_len(), + 1, + "MapState::from_snapshot_ops: op.atom_len() != 1" + ); + + let content = op.op.content.as_map().unwrap(); + self.map.insert( + content.key.clone(), + MapValue { + counter: op.op.counter, + value: content.value.clone(), + lamport: (iter.next().unwrap(), op.peer), + }, + ); + } + } } impl MapState { diff --git a/crates/loro-internal/src/state/richtext_state.rs b/crates/loro-internal/src/state/richtext_state.rs index d365fb7b..c0a059d8 100644 --- a/crates/loro-internal/src/state/richtext_state.rs +++ b/crates/loro-internal/src/state/richtext_state.rs @@ -5,8 +5,7 @@ use std::{ use fxhash::FxHashMap; use generic_btree::rle::{HasLength, Mergeable}; -use loro_common::{Counter, LoroResult, LoroValue, PeerID, ID}; -use loro_preload::{CommonArena, EncodedRichtextState, TempArena, TextRanges}; +use loro_common::{LoroResult, LoroValue, ID}; use crate::{ arena::SharedArena, @@ -14,16 +13,19 @@ use crate::{ idx::ContainerIdx, richtext::{ richtext_state::{EntityRangeInfo, PosType}, - AnchorType, RichtextState as InnerState, StyleOp, Styles, TextStyleInfoFlag, + AnchorType, RichtextState as InnerState, StyleOp, Styles, }, }, container::{list::list_op, richtext::richtext_state::RichtextStateChunk}, delta::{Delta, DeltaItem, StyleMeta}, + encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder}, event::{Diff, InternalDiff}, op::{Op, RawOp}, txn::Transaction, - utils::{bitmap::BitMap, lazy::LazyLoad, string_slice::StringSlice}, - DocState, InternalString, + utils::{ + delta_rle_encoded_num::DeltaRleEncodedNums, lazy::LazyLoad, string_slice::StringSlice, + }, + DocState, }; use super::ContainerState; @@ -55,11 +57,12 @@ impl RichtextState { self.state.get_mut().to_string() } + #[allow(unused)] #[inline(always)] pub(crate) fn is_empty(&self) -> bool { match &*self.state { LazyLoad::Src(s) => s.elements.is_empty(), - LazyLoad::Dst(d) => d.is_emtpy(), + LazyLoad::Dst(d) => d.is_empty(), } } @@ -138,6 +141,17 @@ impl Mergeable for UndoItem { } impl ContainerState for RichtextState { + fn container_idx(&self) -> ContainerIdx { + self.idx + } + + fn is_state_empty(&self) -> bool { + match &*self.state { + LazyLoad::Src(s) => s.is_empty(), + LazyLoad::Dst(s) => s.is_empty(), + } + } + // TODO: refactor fn apply_diff_and_convert( &mut self, @@ -349,9 +363,11 @@ impl ContainerState for RichtextState { unicode_start: _, pos, } => { - self.state - .get_mut() - .insert_at_entity_index(*pos as usize, slice.clone()); + self.state.get_mut().insert_at_entity_index( + *pos as usize, + slice.clone(), + r_op.id, + ); if self.in_txn { self.push_undo(UndoItem::Insert { @@ -447,6 +463,84 @@ impl ContainerState for RichtextState { fn get_value(&mut self) -> LoroValue { LoroValue::String(Arc::new(self.state.get_mut().to_string())) } + + #[doc = " Get a list of ops that can be used to restore the state to the current state"] + fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec { + let iter: &mut dyn Iterator; + let mut a; + let mut b; + match &*self.state { + LazyLoad::Src(s) => { + a = Some(s.elements.iter()); + iter = &mut *a.as_mut().unwrap(); + } + LazyLoad::Dst(s) => { + b = Some(s.iter_chunk()); + iter = &mut *b.as_mut().unwrap(); + } + } + + debug_log::group!("encode_snapshot"); + let mut lamports = DeltaRleEncodedNums::new(); + for chunk in iter { + debug_log::debug_dbg!(&chunk); + match chunk { + RichtextStateChunk::Style { style, anchor_type } + if *anchor_type == AnchorType::Start => + { + lamports.push(style.lamport); + } + _ => {} + } + + let id_span = chunk.get_id_span(); + encoder.encode_op(id_span, || unimplemented!()); + } + + debug_log::group_end!(); + lamports.encode() + } + + #[doc = " Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"] + fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) { + assert_eq!(ctx.mode, EncodeMode::Snapshot); + let lamports = DeltaRleEncodedNums::decode(ctx.blob); + let mut lamport_iter = lamports.iter(); + let mut loader = RichtextStateLoader::default(); + let mut id_to_style = FxHashMap::default(); + for op in ctx.ops { + let id = op.id(); + let chunk = match op.op.content.into_list().unwrap() { + list_op::InnerListOp::InsertText { slice, .. } => { + RichtextStateChunk::new_text(slice.clone(), id) + } + list_op::InnerListOp::StyleStart { + key, value, info, .. + } => { + let style_op = Arc::new(StyleOp { + lamport: lamport_iter.next().unwrap(), + peer: op.peer, + cnt: op.op.counter, + key, + value, + info, + }); + id_to_style.insert(id, style_op.clone()); + RichtextStateChunk::new_style(style_op, AnchorType::Start) + } + list_op::InnerListOp::StyleEnd => { + let style = id_to_style.remove(&id.inc(-1)).unwrap(); + RichtextStateChunk::new_style(style, AnchorType::End) + } + a => unreachable!("richtext state should not have {a:?}"), + }; + + debug_log::debug_dbg!(&chunk); + loader.push(chunk); + } + + *self.state = LazyLoad::Src(loader); + } } impl RichtextState { @@ -553,142 +647,6 @@ impl RichtextState { pub fn get_richtext_value(&mut self) -> LoroValue { self.state.get_mut().get_richtext_value() } - - #[inline] - fn get_loader() -> RichtextStateLoader { - RichtextStateLoader { - elements: Default::default(), - start_anchor_pos: Default::default(), - entity_index: 0, - style_ranges: Default::default(), - } - } - - #[inline] - pub(crate) fn iter_chunk(&self) -> Box + '_> { - match &*self.state { - LazyLoad::Src(s) => Box::new(s.elements.iter()), - LazyLoad::Dst(s) => Box::new(s.iter_chunk()), - } - } - - pub(crate) fn decode_snapshot( - &mut self, - EncodedRichtextState { - len, - text_bytes, - styles, - is_style_start, - }: EncodedRichtextState, - state_arena: &TempArena, - common: &CommonArena, - arena: &SharedArena, - ) { - assert!(self.is_empty()); - if text_bytes.is_empty() { - return; - } - - let bit_len = is_style_start.len() * 8; - let is_style_start = BitMap::from_vec(is_style_start, bit_len); - let mut is_style_start_iter = is_style_start.iter(); - let mut loader = Self::get_loader(); - let mut is_text = true; - let mut text_range_iter = TextRanges::decode_iter(&text_bytes).unwrap(); - let mut style_iter = styles.iter(); - for &len in len.iter() { - if is_text { - for _ in 0..len { - let range = text_range_iter.next().unwrap(); - let text = arena.slice_by_utf8(range.start..range.start + range.len); - loader.push(RichtextStateChunk::new_text(text)); - } - } else { - for _ in 0..len { - let is_start = is_style_start_iter.next().unwrap(); - let style_compact = style_iter.next().unwrap(); - loader.push(RichtextStateChunk::new_style( - Arc::new(StyleOp { - lamport: style_compact.lamport, - peer: common.peer_ids[style_compact.peer_idx as usize], - cnt: style_compact.counter as Counter, - key: state_arena.keywords[style_compact.key_idx as usize].clone(), - value: style_compact.value.clone(), - info: TextStyleInfoFlag::from_byte(style_compact.style_info), - }), - if is_start { - AnchorType::Start - } else { - AnchorType::End - }, - )) - } - } - - is_text = !is_text; - } - - self.state = Box::new(LazyLoad::new(loader)); - } - - pub(crate) fn encode_snapshot( - &self, - record_peer: &mut impl FnMut(PeerID) -> u32, - record_key: &mut impl FnMut(&InternalString) -> usize, - ) -> EncodedRichtextState { - // lengths are interleaved [text_elem_len, style_elem_len, ..] - let mut lengths = Vec::new(); - let mut text_ranges: TextRanges = Default::default(); - let mut styles = Vec::new(); - let mut is_style_start = BitMap::new(); - - for chunk in self.iter_chunk() { - match chunk { - RichtextStateChunk::Text(s) => { - if lengths.len() % 2 == 0 { - lengths.push(0); - } - - *lengths.last_mut().unwrap() += 1; - text_ranges.ranges.push(loro_preload::TextRange { - start: s.bytes().start(), - len: s.bytes().len(), - }); - } - RichtextStateChunk::Style { style, anchor_type } => { - if lengths.is_empty() { - lengths.reserve(2); - lengths.push(0); - lengths.push(0); - } - - if lengths.len() % 2 == 1 { - lengths.push(0); - } - - *lengths.last_mut().unwrap() += 1; - is_style_start.push(*anchor_type == AnchorType::Start); - styles.push(loro_preload::CompactStyleOp { - peer_idx: record_peer(style.peer), - key_idx: record_key(&style.key) as u32, - counter: style.cnt as u32, - lamport: style.lamport, - style_info: style.info.to_byte(), - value: style.value.clone(), - }) - } - } - } - - let text_bytes = text_ranges.encode(); - // eprintln!("bytes len={}", text_bytes.len()); - EncodedRichtextState { - len: lengths, - text_bytes: std::borrow::Cow::Owned(text_bytes), - styles, - is_style_start: is_style_start.into_vec(), - } - } } #[derive(Debug, Default, Clone)] @@ -741,12 +699,17 @@ impl RichtextStateLoader { state } + + fn is_empty(&self) -> bool { + self.elements.is_empty() + } } #[cfg(test)] mod tests { use append_only_bytes::AppendOnlyBytes; use generic_btree::rle::Mergeable; + use loro_common::ID; use crate::container::richtext::richtext_state::{RichtextStateChunk, TextChunk}; @@ -761,15 +724,15 @@ mod tests { let mut last = UndoItem::Delete { index: 20, - content: RichtextStateChunk::Text(TextChunk::from_bytes(last_bytes)), + content: RichtextStateChunk::Text(TextChunk::new(last_bytes, ID::new(0, 2))), }; let mut new = UndoItem::Delete { index: 18, - content: RichtextStateChunk::Text(TextChunk::from_bytes(new_bytes)), + content: RichtextStateChunk::Text(TextChunk::new(new_bytes, ID::new(0, 0))), }; let merged = UndoItem::Delete { index: 18, - content: RichtextStateChunk::Text(TextChunk::from_bytes(bytes.to_slice())), + content: RichtextStateChunk::Text(TextChunk::new(bytes.to_slice(), ID::new(0, 0))), }; assert!(last.can_merge(&new)); std::mem::swap(&mut last, &mut new); diff --git a/crates/loro-internal/src/state/tree_state.rs b/crates/loro-internal/src/state/tree_state.rs index 47e4e33d..34aa1929 100644 --- a/crates/loro-internal/src/state/tree_state.rs +++ b/crates/loro-internal/src/state/tree_state.rs @@ -1,12 +1,15 @@ use fxhash::{FxHashMap, FxHashSet}; use itertools::Itertools; -use loro_common::{ContainerID, LoroError, LoroResult, LoroTreeError, LoroValue, TreeID}; +use loro_common::{ContainerID, LoroError, LoroResult, LoroTreeError, LoroValue, TreeID, ID}; +use rle::HasLength; use serde::{Deserialize, Serialize}; -use std::collections::{hash_map::Iter, VecDeque}; +use std::collections::VecDeque; use std::sync::{Arc, Mutex, Weak}; +use crate::container::idx::ContainerIdx; use crate::delta::{TreeDiff, TreeDiffItem, TreeExternalDiff}; use crate::diff_calc::TreeDeletedSetTrait; +use crate::encoding::{EncodeMode, StateSnapshotDecodeContext, StateSnapshotEncoder}; use crate::event::InternalDiff; use crate::txn::Transaction; use crate::DocState; @@ -25,26 +28,66 @@ use super::ContainerState; /// using flat representation #[derive(Debug, Clone)] pub struct TreeState { - pub(crate) trees: FxHashMap>, + idx: ContainerIdx, + pub(crate) trees: FxHashMap, pub(crate) deleted: FxHashSet, in_txn: bool, undo_items: Vec, } +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub(crate) struct TreeStateNode { + pub parent: Option, + pub last_move_op: ID, +} + +impl TreeStateNode { + pub const UNEXIST_ROOT: TreeStateNode = TreeStateNode { + parent: TreeID::unexist_root(), + last_move_op: ID::NONE_ID, + }; +} + +impl Ord for TreeStateNode { + fn cmp(&self, other: &Self) -> std::cmp::Ordering { + self.parent.cmp(&other.parent) + } +} + +impl PartialOrd for TreeStateNode { + fn partial_cmp(&self, other: &Self) -> Option { + Some(self.cmp(other)) + } +} + #[derive(Debug, Clone, Copy)] struct TreeUndoItem { target: TreeID, old_parent: Option, + old_last_move_op: ID, } impl TreeState { - pub fn new() -> Self { + pub fn new(idx: ContainerIdx) -> Self { let mut trees = FxHashMap::default(); - trees.insert(TreeID::delete_root().unwrap(), None); - trees.insert(TreeID::unexist_root().unwrap(), None); + trees.insert( + TreeID::delete_root().unwrap(), + TreeStateNode { + parent: None, + last_move_op: ID::NONE_ID, + }, + ); + trees.insert( + TreeID::unexist_root().unwrap(), + TreeStateNode { + parent: None, + last_move_op: ID::NONE_ID, + }, + ); let mut deleted = FxHashSet::default(); deleted.insert(TreeID::delete_root().unwrap()); Self { + idx, trees, deleted, in_txn: false, @@ -52,18 +95,25 @@ impl TreeState { } } - pub fn mov(&mut self, target: TreeID, parent: Option) -> Result<(), LoroError> { + pub fn mov(&mut self, target: TreeID, parent: Option, id: ID) -> Result<(), LoroError> { let Some(parent) = parent else { // new root node let old_parent = self .trees - .insert(target, None) - .unwrap_or(TreeID::unexist_root()); - self.update_deleted_cache(target, None, old_parent); + .insert( + target, + TreeStateNode { + parent: None, + last_move_op: id, + }, + ) + .unwrap_or(TreeStateNode::UNEXIST_ROOT); + self.update_deleted_cache(target, None, old_parent.parent); if self.in_txn { self.undo_items.push(TreeUndoItem { target, old_parent: TreeID::unexist_root(), + old_last_move_op: old_parent.last_move_op, }) } return Ok(()); @@ -77,7 +127,7 @@ impl TreeState { if self .trees .get(&target) - .copied() + .map(|x| x.parent) .unwrap_or(TreeID::unexist_root()) == Some(parent) { @@ -86,12 +136,22 @@ impl TreeState { // move or delete or create children node let old_parent = self .trees - .insert(target, Some(parent)) - .unwrap_or(TreeID::unexist_root()); - self.update_deleted_cache(target, Some(parent), old_parent); + .insert( + target, + TreeStateNode { + parent: Some(parent), + last_move_op: id, + }, + ) + .unwrap_or(TreeStateNode::UNEXIST_ROOT); + self.update_deleted_cache(target, Some(parent), old_parent.parent); if self.in_txn { - self.undo_items.push(TreeUndoItem { target, old_parent }) + self.undo_items.push(TreeUndoItem { + target, + old_parent: old_parent.parent, + old_last_move_op: old_parent.last_move_op, + }) } Ok(()) @@ -108,7 +168,7 @@ impl TreeState { let mut node_id = node_id; loop { - let parent = self.trees.get(node_id).unwrap(); + let parent = &self.trees.get(node_id).unwrap().parent; match parent { Some(parent_id) if parent_id == maybe_ancestor => return true, Some(parent_id) if parent_id == node_id => panic!("loop detected"), @@ -120,10 +180,6 @@ impl TreeState { } } - pub fn iter(&self) -> Iter<'_, TreeID, Option> { - self.trees.iter() - } - pub fn contains(&self, target: TreeID) -> bool { if TreeID::is_deleted_root(Some(target)) { return true; @@ -135,7 +191,7 @@ impl TreeState { if self.is_deleted(&target) { None } else { - self.trees.get(&target).copied() + self.trees.get(&target).map(|x| x.parent) } } @@ -160,9 +216,32 @@ impl TreeState { .max() .unwrap_or(0) } + + fn get_is_deleted_by_query(&self, target: TreeID) -> bool { + match self.trees.get(&target) { + Some(x) => { + if x.parent.is_none() { + false + } else if x.parent == TreeID::delete_root() { + true + } else { + self.get_is_deleted_by_query(x.parent.unwrap()) + } + } + None => false, + } + } } impl ContainerState for TreeState { + fn container_idx(&self) -> crate::container::idx::ContainerIdx { + self.idx + } + + fn is_state_empty(&self) -> bool { + self.trees.is_empty() + } + fn apply_diff_and_convert( &mut self, diff: crate::event::InternalDiff, @@ -188,12 +267,18 @@ impl ContainerState for TreeState { continue; } }; - let old_parent = self + let old = self .trees - .insert(target, parent) - .unwrap_or(TreeID::unexist_root()); - if parent != old_parent { - self.update_deleted_cache(target, parent, old_parent); + .insert( + target, + TreeStateNode { + parent, + last_move_op: diff.last_effective_move_op_id, + }, + ) + .unwrap_or(TreeStateNode::UNEXIST_ROOT); + if parent != old.parent { + self.update_deleted_cache(target, parent, old.parent); } } } @@ -216,7 +301,7 @@ impl ContainerState for TreeState { match raw_op.content { crate::op::RawOpContent::Tree(tree) => { let TreeOp { target, parent, .. } = tree; - self.mov(target, parent) + self.mov(target, parent, raw_op.id) } _ => unreachable!(), } @@ -260,15 +345,28 @@ impl ContainerState for TreeState { fn abort_txn(&mut self) { self.in_txn = false; while let Some(op) = self.undo_items.pop() { - let TreeUndoItem { target, old_parent } = op; + let TreeUndoItem { + target, + old_parent, + old_last_move_op, + } = op; if TreeID::is_unexist_root(old_parent) { self.trees.remove(&target); } else { let parent = self .trees - .insert(target, old_parent) - .unwrap_or(TreeID::unexist_root()); - self.update_deleted_cache(target, old_parent, parent); + .insert( + target, + TreeStateNode { + parent: old_parent, + last_move_op: old_last_move_op, + }, + ) + .unwrap_or(TreeStateNode { + parent: TreeID::unexist_root(), + last_move_op: ID::NONE_ID, + }); + self.update_deleted_cache(target, old_parent, parent.parent); } } } @@ -285,11 +383,12 @@ impl ContainerState for TreeState { let iter = self.trees.iter().sorted(); #[cfg(not(feature = "test_utils"))] let iter = self.trees.iter(); - for (target, parent) in iter { + for (target, node) in iter { if !self.deleted.contains(target) && !TreeID::is_unexist_root(Some(*target)) { let mut t = FxHashMap::default(); t.insert("id".to_string(), target.id().to_string().into()); - let p = parent + let p = node + .parent .map(|p| p.to_string().into()) .unwrap_or(LoroValue::Null); t.insert("parent".to_string(), p); @@ -318,6 +417,42 @@ impl ContainerState for TreeState { .map(|n| n.associated_meta_container()) .collect_vec() } + + #[doc = " Get a list of ops that can be used to restore the state to the current state"] + fn encode_snapshot(&self, mut encoder: StateSnapshotEncoder) -> Vec { + for node in self.trees.values() { + if node.last_move_op == ID::NONE_ID { + continue; + } + encoder.encode_op(node.last_move_op.into(), || unimplemented!()); + } + + Vec::new() + } + + #[doc = " Restore the state to the state represented by the ops that exported by `get_snapshot_ops`"] + fn import_from_snapshot_ops(&mut self, ctx: StateSnapshotDecodeContext) { + assert_eq!(ctx.mode, EncodeMode::Snapshot); + for op in ctx.ops { + assert_eq!(op.op.atom_len(), 1); + let content = op.op.content.as_tree().unwrap(); + let target = content.target; + let parent = content.parent; + self.trees.insert( + target, + TreeStateNode { + parent, + last_move_op: op.id(), + }, + ); + } + + for t in self.trees.keys() { + if self.get_is_deleted_by_query(*t) { + self.deleted.insert(*t); + } + } + } } impl TreeDeletedSetTrait for TreeState { @@ -329,12 +464,12 @@ impl TreeDeletedSetTrait for TreeState { &mut self.deleted } - fn get_children(&self, target: TreeID) -> Vec { + fn get_children(&self, target: TreeID) -> Vec<(TreeID, ID)> { let mut ans = Vec::new(); for (t, parent) in self.trees.iter() { - if let Some(p) = parent { - if p == &target { - ans.push(*t); + if let Some(p) = parent.parent { + if p == target { + ans.push((*t, parent.last_move_op)); } } } @@ -366,12 +501,12 @@ pub struct TreeNode { } impl Forest { - pub(crate) fn from_tree_state(state: &FxHashMap>) -> Self { + pub(crate) fn from_tree_state(state: &FxHashMap) -> Self { let mut forest = Self::default(); let mut node_to_children = FxHashMap::default(); for (id, parent) in state.iter().sorted() { - if let Some(parent) = parent { + if let Some(parent) = &parent.parent { node_to_children .entry(*parent) .or_insert_with(Vec::new) @@ -381,7 +516,7 @@ impl Forest { for root in state .iter() - .filter(|(_, parent)| parent.is_none()) + .filter(|(_, parent)| parent.parent.is_none()) .map(|(id, _)| *id) .sorted() { @@ -444,7 +579,7 @@ pub(crate) fn get_meta_value(nodes: &mut Vec, state: &mut DocState) { let map = Arc::make_mut(node.as_map_mut().unwrap()); let meta = map.get_mut("meta").unwrap(); let id = meta.as_container().unwrap(); - *meta = state.get_container_deep_value(state.arena.id_to_idx(id).unwrap()); + *meta = state.get_container_deep_value(state.arena.register_container(id)); } } @@ -470,16 +605,22 @@ mod tests { #[test] fn test_tree_state() { - let mut state = TreeState::new(); - state.mov(ID1, None).unwrap(); - state.mov(ID2, Some(ID1)).unwrap(); + let mut state = TreeState::new(ContainerIdx::from_index_and_type( + 0, + loro_common::ContainerType::Tree, + )); + state.mov(ID1, None, ID::NONE_ID).unwrap(); + state.mov(ID2, Some(ID1), ID::NONE_ID).unwrap(); } #[test] fn tree_convert() { - let mut state = TreeState::new(); - state.mov(ID1, None).unwrap(); - state.mov(ID2, Some(ID1)).unwrap(); + let mut state = TreeState::new(ContainerIdx::from_index_and_type( + 0, + loro_common::ContainerType::Tree, + )); + state.mov(ID1, None, ID::NONE_ID).unwrap(); + state.mov(ID2, Some(ID1), ID::NONE_ID).unwrap(); let roots = Forest::from_tree_state(&state.trees); let json = serde_json::to_string(&roots).unwrap(); assert_eq!( @@ -490,12 +631,15 @@ mod tests { #[test] fn delete_node() { - let mut state = TreeState::new(); - state.mov(ID1, None).unwrap(); - state.mov(ID2, Some(ID1)).unwrap(); - state.mov(ID3, Some(ID2)).unwrap(); - state.mov(ID4, Some(ID1)).unwrap(); - state.mov(ID2, TreeID::delete_root()).unwrap(); + let mut state = TreeState::new(ContainerIdx::from_index_and_type( + 0, + loro_common::ContainerType::Tree, + )); + state.mov(ID1, None, ID::NONE_ID).unwrap(); + state.mov(ID2, Some(ID1), ID::NONE_ID).unwrap(); + state.mov(ID3, Some(ID2), ID::NONE_ID).unwrap(); + state.mov(ID4, Some(ID1), ID::NONE_ID).unwrap(); + state.mov(ID2, TreeID::delete_root(), ID::NONE_ID).unwrap(); let roots = Forest::from_tree_state(&state.trees); let json = serde_json::to_string(&roots).unwrap(); assert_eq!( diff --git a/crates/loro-internal/src/utils/bitmap.rs b/crates/loro-internal/src/utils/bitmap.rs deleted file mode 100644 index f76705af..00000000 --- a/crates/loro-internal/src/utils/bitmap.rs +++ /dev/null @@ -1,90 +0,0 @@ -#[derive(Clone, PartialEq, Eq)] -pub struct BitMap { - vec: Vec, - len: usize, -} - -impl std::fmt::Debug for BitMap { - fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { - let mut ans = String::new(); - for v in self.iter() { - if v { - ans.push('1'); - } else { - ans.push('0'); - } - } - - f.debug_struct("BitMap") - .field("len", &self.len) - .field("vec", &ans) - .finish() - } -} - -impl BitMap { - pub fn new() -> Self { - Self { - vec: Vec::new(), - len: 0, - } - } - - pub fn from_vec(vec: Vec, len: usize) -> Self { - Self { vec, len } - } - - pub fn into_vec(self) -> Vec { - self.vec - } - - #[allow(unused)] - pub fn len(&self) -> usize { - self.len - } - - pub fn push(&mut self, v: bool) { - while self.len / 8 >= self.vec.len() { - self.vec.push(0); - } - - if v { - self.vec[self.len / 8] |= 1 << (self.len % 8); - } - self.len += 1; - } - - pub fn get(&self, index: usize) -> bool { - if index >= self.len { - panic!("index out of range"); - } - - (self.vec[index / 8] & (1 << (index % 8))) != 0 - } - - pub fn iter(&self) -> impl Iterator + '_ { - self.vec - .iter() - .flat_map(|&v| (0..8).map(move |i| (v & (1 << i)) != 0)) - .take(self.len) - } -} - -#[cfg(test)] -mod test { - use super::BitMap; - - #[test] - fn basic() { - let mut map = BitMap::new(); - map.push(true); - map.push(false); - map.push(true); - map.push(true); - assert!(map.get(0)); - assert!(!map.get(1)); - assert!(map.get(2)); - assert!(map.get(3)); - dbg!(map); - } -} diff --git a/crates/loro-internal/src/utils/delta_rle_encoded_num.rs b/crates/loro-internal/src/utils/delta_rle_encoded_num.rs new file mode 100644 index 00000000..800a69ac --- /dev/null +++ b/crates/loro-internal/src/utils/delta_rle_encoded_num.rs @@ -0,0 +1,37 @@ +use serde_columnar::columnar; + +#[columnar(vec, ser, de)] +#[derive(Debug, Clone)] +struct EncodedNum { + #[columnar(strategy = "DeltaRle")] + num: u32, +} + +#[derive(Default)] +#[columnar(ser, de)] +pub struct DeltaRleEncodedNums { + #[columnar(class = "vec")] + nums: Vec, +} + +impl DeltaRleEncodedNums { + pub fn new() -> Self { + Self::default() + } + + pub fn push(&mut self, n: u32) { + self.nums.push(EncodedNum { num: n }); + } + + pub fn iter(&self) -> impl Iterator + '_ { + self.nums.iter().map(|n| n.num) + } + + pub fn encode(&self) -> Vec { + serde_columnar::to_vec(&self).unwrap() + } + + pub fn decode(encoded: &[u8]) -> Self { + serde_columnar::from_bytes(encoded).unwrap() + } +} diff --git a/crates/loro-internal/src/utils/id_int_map.rs b/crates/loro-internal/src/utils/id_int_map.rs new file mode 100644 index 00000000..ff1f2783 --- /dev/null +++ b/crates/loro-internal/src/utils/id_int_map.rs @@ -0,0 +1,341 @@ +use itertools::Either; +use loro_common::{HasCounter, HasCounterSpan, HasId, HasIdSpan, IdSpan, ID}; +use rle::HasLength; +use std::collections::BTreeMap; + +/// A map that maps spans of continuous [ID]s to spans of continuous integers. +/// +/// It can merge spans that are adjacent to each other. +/// The value is automatically incremented by the length of the inserted span. +#[derive(Debug)] +pub struct IdIntMap { + inner: Either, Vec<(IdSpan, i32)>>, + next_value: i32, +} + +const MAX_VEC_LEN: usize = 16; + +#[derive(Debug)] +struct Value { + len: i32, + value: i32, +} + +impl IdIntMap { + pub fn new() -> Self { + Self { + inner: Either::Right(Default::default()), + next_value: 0, + } + } + + pub fn insert(&mut self, id_span: IdSpan) { + if cfg!(debug_assertions) { + let target = self.get(id_span.id_start()); + assert!( + target.is_none(), + "ID already exists {id_span:?} {target:?} {:#?}", + self + ); + } + + match &mut self.inner { + Either::Left(map) => { + let value = self.next_value; + let len = id_span.atom_len() as i32; + self.next_value += len; + + let id = id_span.id_start(); + match map.range_mut(..&id).last() { + Some(last) + if last.0.peer == id.peer + && last.0.counter + last.1.len == id.counter + && last.1.value + last.1.len == value => + { + // merge + last.1.len += len; + } + _ => { + map.insert(id, Value { len, value }); + } + } + } + Either::Right(vec) => { + if vec.len() == MAX_VEC_LEN { + // convert to map and insert + self.escalate_to_map(); + self.insert(id_span); + return; + } + + let value = self.next_value; + let len = id_span.atom_len() as i32; + self.next_value += len; + + let pos = match vec.binary_search_by(|x| x.0.id_start().cmp(&id_span.id_start())) { + Ok(_) => unreachable!("ID already exists"), + Err(i) => i, + }; + + if pos > 0 { + if let Some(last) = vec.get_mut(pos - 1) { + if last.0.id_end() == id_span.id_start() + && last.1 + last.0.atom_len() as i32 == value + { + // can merge + last.0.counter.end += len; + return; + } + } + } + + vec.insert(pos, (id_span, value)); + } + } + } + + fn escalate_to_map(&mut self) { + let Either::Right(vec) = &mut self.inner else { + return; + }; + let mut map = BTreeMap::new(); + for (id_span, value) in vec.drain(..) { + map.insert( + id_span.id_start(), + Value { + len: id_span.atom_len() as i32, + value, + }, + ); + } + + self.inner = Either::Left(map); + } + + /// Return (value, length) that starts at the given ID. + pub fn get(&self, target: ID) -> Option<(i32, usize)> { + let ans = match &self.inner { + Either::Left(map) => map.range(..=&target).last().and_then(|(entry_key, value)| { + if entry_key.peer != target.peer { + None + } else if entry_key.counter + value.len > target.counter { + Some(( + value.value + target.counter - entry_key.counter, + (entry_key.counter + value.len - target.counter) as usize, + )) + } else { + None + } + }), + Either::Right(vec) => vec + .iter() + .rev() + .find(|(id_span, _)| id_span.contains(target)) + .map(|(id_span, value)| { + ( + *value + target.counter - id_span.ctr_start(), + (id_span.ctr_end() - target.counter) as usize, + ) + }), + }; + ans + } + + /// Call `next` for each key-value pair that is in the given span. + /// It's guaranteed that the keys are in ascending order. + pub fn get_values_in_span(&self, target: IdSpan, mut next: impl FnMut(IdSpan, i32)) { + let target_peer = target.client_id; + match &self.inner { + Either::Left(map) => { + let last = map + .range(..&target.id_start()) + .next_back() + .and_then(|(id, v)| { + if id.peer != target_peer { + None + } else if id.counter + v.len > target.ctr_start() { + Some((id, v)) + } else { + None + } + }); + + let iter = map.range(&target.id_start()..); + for (entry_key, value) in last.into_iter().chain(iter) { + if entry_key.peer > target_peer { + break; + } + + if entry_key.counter >= target.ctr_end() { + break; + } + + assert_eq!(entry_key.peer, target_peer); + let cur_span = &IdSpan::new( + target_peer, + entry_key.counter, + entry_key.counter + value.len, + ); + + let next_span = cur_span.get_intersection(&target).unwrap(); + (next)( + next_span, + value.value + next_span.counter.start - entry_key.counter, + ); + } + } + Either::Right(vec) => { + for (id_span, value) in vec.iter() { + if id_span.client_id < target_peer { + continue; + } + + if id_span.client_id > target_peer { + break; + } + + if target.ctr_start() >= id_span.ctr_end() { + continue; + } + + if target.ctr_end() <= id_span.counter.start { + break; + } + + assert_eq!(id_span.client_id, target_peer); + let next_span = id_span.get_intersection(&target).unwrap(); + (next)( + next_span, + *value + next_span.counter.start - id_span.counter.start, + ); + } + } + } + } + + /// If the given item has overlapped section with the content in the map, + /// split the item into pieces where each piece maps to a continuous series of values or maps to none. + pub(crate) fn split<'a, T: HasIdSpan + generic_btree::rle::Sliceable + 'a>( + &'a self, + item: T, + ) -> impl Iterator + 'a { + let len = item.rle_len(); + let span = item.id_span(); + // PERF: we may avoid this alloc if get_values_in_span returns an iter + let mut ans = Vec::new(); + let mut ctr_start = span.ctr_start(); + let mut index = 0; + let ctr_end = span.ctr_end(); + self.get_values_in_span(span, |id_span: IdSpan, _| { + if id_span.counter.start == ctr_start && id_span.counter.end == ctr_end { + return; + } + + if id_span.counter.start > ctr_start { + ans.push( + item.slice( + index as usize..(index + id_span.counter.start - ctr_start) as usize, + ), + ); + index += id_span.counter.start - ctr_start; + } + + ans.push(item.slice( + index as usize..(index + id_span.counter.end - id_span.counter.start) as usize, + )); + index += id_span.counter.end - id_span.counter.start; + ctr_start = id_span.ctr_end(); + }); + + if ans.is_empty() && len > 0 { + ans.push(item); + } else if index as usize != len { + ans.push(item.slice(index as usize..len)); + } + + ans.into_iter() + } +} + +#[cfg(test)] +mod test { + use super::*; + + #[test] + fn test_basic() { + let mut map = IdIntMap::new(); + map.insert(IdSpan::new(0, 0, 10)); + map.insert(IdSpan::new(0, 10, 100)); + map.insert(IdSpan::new(1, 0, 100)); + map.insert(IdSpan::new(2, 0, 100)); + map.insert(IdSpan::new(999, 0, 100)); + assert!(map.inner.is_right()); + assert_eq!(map.get(ID::new(0, 10)).unwrap().0, 10); + assert_eq!(map.get(ID::new(1, 10)).unwrap().0, 110); + assert_eq!(map.get(ID::new(2, 10)).unwrap().0, 210); + assert_eq!(map.get(ID::new(0, 0)).unwrap().0, 0); + assert_eq!(map.get(ID::new(1, 0)).unwrap().0, 100); + assert_eq!(map.get(ID::new(2, 0)).unwrap().0, 200); + assert_eq!(map.get(ID::new(999, 99)).unwrap().0, 399); + + for i in 0..100 { + map.insert(IdSpan::new(3, i * 2, i * 2 + 1)); + } + + assert!(map.inner.is_left()); + assert_eq!(map.get(ID::new(0, 10)).unwrap().0, 10); + assert_eq!(map.get(ID::new(1, 10)).unwrap().0, 110); + assert_eq!(map.get(ID::new(2, 10)).unwrap().0, 210); + assert_eq!(map.get(ID::new(0, 0)).unwrap().0, 0); + assert_eq!(map.get(ID::new(1, 0)).unwrap().0, 100); + assert_eq!(map.get(ID::new(2, 0)).unwrap().0, 200); + assert_eq!(map.get(ID::new(999, 99)).unwrap().0, 399); + for i in 0..100 { + assert_eq!(map.get(ID::new(3, i * 2)).unwrap().0, i + 400, "i = {i}"); + } + + let mut called = 0; + map.get_values_in_span(IdSpan::new(0, 3, 66), |id_span, value| { + called += 1; + assert_eq!(id_span, IdSpan::new(0, 3, 66)); + assert_eq!(value, 3); + }); + assert_eq!(called, 1); + + let mut called = Vec::new(); + map.get_values_in_span(IdSpan::new(3, 0, 10), |id_span, value| { + called.push((id_span, value)); + }); + assert_eq!( + called, + vec![ + (IdSpan::new(3, 0, 1), 400), + (IdSpan::new(3, 2, 3), 401), + (IdSpan::new(3, 4, 5), 402), + (IdSpan::new(3, 6, 7), 403), + (IdSpan::new(3, 8, 9), 404), + ] + ); + } + + #[test] + fn test_get_values() { + let mut map = IdIntMap::new(); + map.insert(IdSpan::new(0, 3, 5)); + map.insert(IdSpan::new(0, 0, 1)); + map.insert(IdSpan::new(0, 2, 3)); + + let mut called = Vec::new(); + map.get_values_in_span(IdSpan::new(0, 0, 10), |id_span, value| { + called.push((id_span, value)); + }); + assert_eq!( + called, + vec![ + (IdSpan::new(0, 0, 1), 2), + (IdSpan::new(0, 2, 3), 3), + (IdSpan::new(0, 3, 5), 0), + ] + ); + } +} diff --git a/crates/loro-internal/src/utils/lazy.rs b/crates/loro-internal/src/utils/lazy.rs index 698e4c49..1772e79a 100644 --- a/crates/loro-internal/src/utils/lazy.rs +++ b/crates/loro-internal/src/utils/lazy.rs @@ -5,10 +5,6 @@ pub enum LazyLoad> { } impl> LazyLoad { - pub fn new(src: Src) -> Self { - LazyLoad::Src(src) - } - pub fn new_dst(dst: Dst) -> Self { LazyLoad::Dst(dst) } diff --git a/crates/loro-internal/src/utils/mod.rs b/crates/loro-internal/src/utils/mod.rs index 21266962..9b4d6a49 100644 --- a/crates/loro-internal/src/utils/mod.rs +++ b/crates/loro-internal/src/utils/mod.rs @@ -1,4 +1,5 @@ -pub(crate) mod bitmap; +pub(crate) mod delta_rle_encoded_num; +pub(crate) mod id_int_map; pub(crate) mod lazy; pub mod string_slice; pub(crate) mod utf16; diff --git a/crates/loro-internal/src/version.rs b/crates/loro-internal/src/version.rs index ec314a80..c8327361 100644 --- a/crates/loro-internal/src/version.rs +++ b/crates/loro-internal/src/version.rs @@ -1,4 +1,4 @@ -use loro_common::IdSpanVector; +use loro_common::{HasCounter, HasCounterSpan, IdSpanVector}; use smallvec::smallvec; use std::{ cmp::Ordering, @@ -15,7 +15,7 @@ use crate::{ change::Lamport, id::{Counter, ID}, oplog::AppDag, - span::{CounterSpan, HasIdSpan, IdSpan}, + span::{CounterSpan, IdSpan}, LoroError, PeerID, }; @@ -134,6 +134,11 @@ impl Frontiers { pub fn filter_peer(&mut self, peer: PeerID) { self.retain(|id| id.peer != peer); } + + #[inline] + pub(crate) fn with_capacity(cap: usize) -> Frontiers { + Self(SmallVec::with_capacity(cap)) + } } impl Deref for Frontiers { @@ -606,13 +611,13 @@ impl VersionVector { false } - pub fn intersect_span(&self, target: &S) -> Option { - let id = target.id_start(); - if let Some(end) = self.get(&id.peer) { - if *end > id.counter { + pub fn intersect_span(&self, target: IdSpan) -> Option { + if let Some(&end) = self.get(&target.client_id) { + if end > target.ctr_start() { + let count_end = target.ctr_end(); return Some(CounterSpan { - start: id.counter, - end: *end, + start: target.ctr_start(), + end: end.min(count_end), }); } } diff --git a/crates/loro-internal/tests/autocommit.rs b/crates/loro-internal/tests/autocommit.rs index a8f1b247..d9dcb766 100644 --- a/crates/loro-internal/tests/autocommit.rs +++ b/crates/loro-internal/tests/autocommit.rs @@ -5,6 +5,7 @@ use serde_json::json; #[test] fn auto_commit() { let mut doc_a = LoroDoc::default(); + doc_a.set_peer_id(1).unwrap(); doc_a.start_auto_commit(); let text_a = doc_a.get_text("text"); text_a.insert(0, "hello").unwrap(); @@ -14,6 +15,7 @@ fn auto_commit() { let mut doc_b = LoroDoc::default(); doc_b.start_auto_commit(); + doc_b.set_peer_id(2).unwrap(); let text_b = doc_b.get_text("text"); text_b.insert(0, "100").unwrap(); doc_b.import(&bytes).unwrap(); diff --git a/crates/loro-preload/src/encode.rs b/crates/loro-preload/src/encode.rs index 02db89bb..864dbd57 100644 --- a/crates/loro-preload/src/encode.rs +++ b/crates/loro-preload/src/encode.rs @@ -132,10 +132,20 @@ impl<'a> EncodedAppState<'a> { #[derive(Debug, Clone, Serialize, Deserialize)] pub enum EncodedContainerState<'a> { Map(Vec), - List(Vec), + List { + elem_idx: Vec, + elem_ids: Vec, + }, #[serde(borrow)] Richtext(Box>), - Tree((Vec<(usize, Option)>, Vec)), + Tree((Vec, Vec)), +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct EncodedTreeNode { + pub node_idx: usize, + pub parent: Option, + pub id: ID, } #[derive(Debug, Default, Clone, Serialize, Deserialize)] @@ -148,6 +158,7 @@ pub struct EncodedRichtextState<'a> { /// This is encoded [TextRanges] #[serde(borrow)] pub text_bytes: Cow<'a, [u8]>, + pub ids: Vec<(u32, u32)>, /// Style anchor index in the style arena // TODO: can be optimized pub styles: Vec, @@ -174,9 +185,7 @@ pub struct TextRanges { impl TextRanges { #[inline] pub fn decode_iter(bytes: &[u8]) -> LoroResult + '_> { - let iter = serde_columnar::iter_from_bytes::(bytes).map_err(|e| { - LoroError::DecodeError(format!("Failed to decode TextRange: {}", e).into_boxed_str()) - })?; + let iter = serde_columnar::iter_from_bytes::(bytes)?; Ok(iter.ranges) } @@ -190,7 +199,7 @@ impl<'a> EncodedContainerState<'a> { pub fn container_type(&self) -> loro_common::ContainerType { match self { EncodedContainerState::Map(_) => loro_common::ContainerType::Map, - EncodedContainerState::List(_) => loro_common::ContainerType::List, + EncodedContainerState::List { .. } => loro_common::ContainerType::List, EncodedContainerState::Tree(_) => loro_common::ContainerType::Tree, EncodedContainerState::Richtext { .. } => loro_common::ContainerType::Text, } diff --git a/crates/loro-wasm/CHANGELOG.md b/crates/loro-wasm/CHANGELOG.md index 924f9d03..e7bd0825 100644 --- a/crates/loro-wasm/CHANGELOG.md +++ b/crates/loro-wasm/CHANGELOG.md @@ -1,5 +1,35 @@ # Changelog +## 0.7.2-alpha.4 + +### Patch Changes + +- Fix encoding value err + +## 0.7.2-alpha.3 + +### Patch Changes + +- Fix export compressed snapshot + +## 0.7.2-alpha.2 + +### Patch Changes + +- Add compressed method + +## 0.7.2-alpha.1 + +### Patch Changes + +- Fix v0 exports + +## 0.7.2-alpha.0 + +### Patch Changes + +- Add experimental encode methods + ## 0.7.1 ### Patch Changes diff --git a/crates/loro-wasm/deno.lock b/crates/loro-wasm/deno.lock index 50e259e4..67ad4605 100644 --- a/crates/loro-wasm/deno.lock +++ b/crates/loro-wasm/deno.lock @@ -15,7 +15,6 @@ "https://deno.land/std@0.105.0/path/posix.ts": "b81974c768d298f8dcd2c720229639b3803ca4a241fa9a355c762fa2bc5ef0c1", "https://deno.land/std@0.105.0/path/separator.ts": "8fdcf289b1b76fd726a508f57d3370ca029ae6976fcde5044007f062e643ff1c", "https://deno.land/std@0.105.0/path/win32.ts": "f4a3d4a3f2c9fe894da046d5eac48b5e789a0ebec5152b2c0985efe96a9f7ae1", - "https://deno.land/x/dirname@1.1.2/mod.ts": "4029ca6b49da58d262d65f826ba9b3a89cc0b92a94c7220d5feb7bd34e498a54", "https://deno.land/x/dirname@1.1.2/types.ts": "c1ed1667545bc4b1d69bdb2fc26a5fa8edae3a56e3081209c16a408a322a2319", "https://lra6z45nakk5lnu3yjchp7tftsdnwwikwr65ocha5eojfnlgu4sa.arweave.net/XEHs860CldW2m8JEd_5lnIbbWQq0fdcI4OkckrVmpyQ/_util/assert.ts": "e1f76e77c5ccb5a8e0dbbbe6cce3a56d2556c8cb5a9a8802fc9565af72462149", "https://lra6z45nakk5lnu3yjchp7tftsdnwwikwr65ocha5eojfnlgu4sa.arweave.net/XEHs860CldW2m8JEd_5lnIbbWQq0fdcI4OkckrVmpyQ/path/_constants.ts": "aba480c4a2c098b6374fdd5951fea13ecc8aaaf8b8aa4dae1871baa50243d676", diff --git a/crates/loro-wasm/package.json b/crates/loro-wasm/package.json index c7d2981c..786696f5 100644 --- a/crates/loro-wasm/package.json +++ b/crates/loro-wasm/package.json @@ -1,6 +1,6 @@ { "name": "loro-wasm", - "version": "0.7.1", + "version": "0.7.2-alpha.4", "description": "Loro CRDTs is a high-performance CRDT framework that makes your app state synchronized, collaborative and maintainable effortlessly.", "keywords": [ "crdt", diff --git a/crates/loro-wasm/src/lib.rs b/crates/loro-wasm/src/lib.rs index 1d19f141..6beb1d0c 100644 --- a/crates/loro-wasm/src/lib.rs +++ b/crates/loro-wasm/src/lib.rs @@ -863,11 +863,11 @@ impl Loro { let value: JsValue = vv.into(); let is_bytes = value.is_instance_of::(); let vv = if is_bytes { - let bytes = js_sys::Uint8Array::try_from(value.clone()).unwrap_throw(); + let bytes = js_sys::Uint8Array::from(value.clone()); let bytes = bytes.to_vec(); VersionVector::decode(&bytes)? } else { - let map = js_sys::Map::try_from(value).unwrap_throw(); + let map = js_sys::Map::from(value); js_map_to_vv(map)? }; @@ -1074,7 +1074,7 @@ impl LoroText { /// ``` pub fn mark(&self, range: JsRange, key: &str, value: JsValue) -> Result<(), JsError> { let range: MarkRange = serde_wasm_bindgen::from_value(range.into())?; - let value: LoroValue = LoroValue::try_from(value)?; + let value: LoroValue = LoroValue::from(value); let expand = range .expand .map(|x| { @@ -2081,7 +2081,7 @@ pub fn to_readable_version(version: &[u8]) -> Result Result, JsValue> { let map: JsValue = version.into(); - let map: js_sys::Map = map.try_into().unwrap_throw(); + let map: js_sys::Map = map.into(); let vv = js_map_to_vv(map)?; let encoded = vv.encode(); Ok(encoded) @@ -2153,6 +2153,8 @@ export type TreeID = `${number}@${string}`; interface Loro { exportFrom(version?: Uint8Array): Uint8Array; + exportFromV0(version?: Uint8Array): Uint8Array; + exportFromCompressed(version?: Uint8Array): Uint8Array; getContainerById(id: ContainerID): LoroText | LoroMap | LoroList; } /** diff --git a/crates/loro/src/lib.rs b/crates/loro/src/lib.rs index 2203793a..fb2d9ae5 100644 --- a/crates/loro/src/lib.rs +++ b/crates/loro/src/lib.rs @@ -22,6 +22,7 @@ pub use loro_internal::obs::SubID; pub use loro_internal::obs::Subscriber; pub use loro_internal::version::{Frontiers, VersionVector}; pub use loro_internal::DiffEvent; +pub use loro_internal::{loro_value, to_value}; pub use loro_internal::{LoroError, LoroResult, LoroValue, ToJson}; /// `LoroDoc` is the entry for the whole document. diff --git a/deno.lock b/deno.lock index a81388aa..4422bf51 100644 --- a/deno.lock +++ b/deno.lock @@ -53,7 +53,6 @@ "https://deno.land/x/cliui@v7.0.4-deno/build/lib/index.js": "fb6030c7b12602a4fca4d81de3ddafa301ba84fd9df73c53de6f3bdda7b482d5", "https://deno.land/x/cliui@v7.0.4-deno/build/lib/string-utils.js": "b3eb9d2e054a43a3064af17332fb1839a7dadb205c5371af4789616afb1a117f", "https://deno.land/x/cliui@v7.0.4-deno/deno.ts": "d07bc3338661f8011e3a5fd215061d17a52107a5383c29f40ce0c1ecb8bb8cc3", - "https://deno.land/x/dirname@1.1.2/mod.ts": "4029ca6b49da58d262d65f826ba9b3a89cc0b92a94c7220d5feb7bd34e498a54", "https://deno.land/x/dirname@1.1.2/types.ts": "c1ed1667545bc4b1d69bdb2fc26a5fa8edae3a56e3081209c16a408a322a2319", "https://deno.land/x/escalade@v3.0.3/sync.ts": "493bc66563292c5c10c4a75a467a5933f24dad67d74b0f5a87e7b988fe97c104", "https://deno.land/x/y18n@v5.0.0-deno/build/lib/index.d.ts": "11f40d97041eb271cc1a1c7b296c6e7a068d4843759575e7416f0d14ebf8239c", diff --git a/loro-js/CHANGELOG.md b/loro-js/CHANGELOG.md index e482f7b1..dcdad9ab 100644 --- a/loro-js/CHANGELOG.md +++ b/loro-js/CHANGELOG.md @@ -1,5 +1,45 @@ # Changelog +## 0.7.2-alpha.4 + +### Patch Changes + +- Fix encoding value err +- Updated dependencies + - loro-wasm@0.7.2 + +## 0.7.2-alpha.3 + +### Patch Changes + +- Fix export compressed snapshot +- Updated dependencies + - loro-wasm@0.7.2 + +## 0.7.2-alpha.2 + +### Patch Changes + +- Add compressed method +- Updated dependencies + - loro-wasm@0.7.2 + +## 0.7.2-alpha.1 + +### Patch Changes + +- Fix v0 exports +- Updated dependencies + - loro-wasm@0.7.2 + +## 0.7.2-alpha.0 + +### Patch Changes + +- Add experimental encode methods +- Updated dependencies + - loro-wasm@0.7.2 + ## 0.7.1 ### Patch Changes diff --git a/loro-js/package.json b/loro-js/package.json index bd7d6f39..8dac0bf4 100644 --- a/loro-js/package.json +++ b/loro-js/package.json @@ -1,6 +1,6 @@ { "name": "loro-crdt", - "version": "0.7.1", + "version": "0.7.2-alpha.4", "description": "Loro CRDTs is a high-performance CRDT framework that makes your app state synchronized, collaborative and maintainable effortlessly.", "keywords": [ "crdt",