Commit graph

21 commits

Author SHA1 Message Date
Leon Zhao
2df2a52b05
feat: Stable JSON representation for history (#368)
---------

Co-authored-by: Zixuan Chen <remch183@outlook.com>
2024-06-07 13:18:30 +08:00
Zixuan Chen
f99bfd8e21
Refactor rm unused code (#328)
* chore: init

* fix: fuzz config

* refactor: rm unused code
2024-04-22 21:20:00 +08:00
Zixuan Chen
bc27a47531
feat: stabilizing encoding (#219)
This PR implements a new encode schema that is more extendible and more compact. It’s also simpler and takes less binary size and maintaining effort. It is inspired by the [Automerge Encoding Format](https://automerge.org/automerge-binary-format-spec/).

The main motivation is the extensibility. When we integrate a new CRDT algorithm, we don’t want to make a breaking change to the encoding or keep multiple versions of the encoding schema in the code, as it will make our WASM size much larger. We need a stable and extendible encoding schema for our v1.0 version.

This PR also exposes the ops that compose the current container state. For example, now you can make a query about which operation a certain character quickly. This behavior is required in the new snapshot encoding, so it’s included in this PR.

# Encoding Schema

## Header

The header has 22 bytes.

- (0-4 bytes) Magic Bytes: The encoding starts with `loro` as magic bytes.
- (4-20 bytes) Checksum: MD5 checksum of the encoded data, including the header starting from 20th bytes. The checksum is encoded as a 16-byte array. The `checksum` and `magic bytes` fields are trimmed when calculating the checksum.
- (20-21 bytes) Encoding Method (2 bytes, big endian): Multiple encoding methods are available for a specific encoding version.

## Encode Mode: Updates

In this approach, only ops, specifically their historical record, are encoded, while document states are excluded.

Like Automerge's format, we employ columnar encoding for operations and changes.

Previously, operations were ordered by their Operation ID (OpId) before columnar encoding. However, sorting operations based on their respective containers initially enhance compression potential.

## Encode Mode: Snapshot

This mode simultaneously captures document state and historical data. Upon importing a snapshot into a new document, initialization occurs directly from the snapshot, bypassing the need for CRDT-based recalculations.

Unlike previous snapshot encoding methods, the current binary output in snapshot mode is compatible with the updates mode. This enhances the efficiency of importing snapshots into non-empty documents, where initialization via snapshot is infeasible. 

Additionally, when feasible, we leverage the sequence of operations to construct state snapshots. In CRDTs, deducing the specific ops constituting the current container state is feasible. These ops are tagged in relation to the container, facilitating direct state reconstruction from them. This approach, pioneered by Automerge, significantly improves compression efficiency.
2024-01-02 17:03:24 +08:00
Zixuan Chen
2ad7202e05
Feat-rust-api (#193) 2023-11-28 16:22:43 +08:00
Zixuan Chen
116190c817
chore: refine err msg 2023-11-08 14:18:01 +08:00
Zixuan Chen
5cac1ed092
Refactor: make changes mergeable (#146)
- Allow changes to be merged when possible. This makes realtime collaboration more efficient with Loro.
- Refactor the code to make modifications of changes in oplog in one place
- Optimize the diff calculation so that it doesn't have to go back to the beginning of the change.

Note: we still keep the invariants that dependency pointers in Loro always point to the last op in a change
2023-11-03 21:40:34 +08:00
Zixuan Chen
d942e3d7a2
Feat: Peritext-like rich text support (#123)
* feat: richtext wip

* feat: add insert to style range map wip

* feat: richtext state

* fix: fix style state inserting and style map

* fix: tiny vec merge err

* fix: comment err

* refactor: use new generic-btree & refine impl

* feat: fugue tracker

* feat: tracker

* feat: tracker

* fix: fix a few err in impl

* feat: init richtext content state

* feat: refactor arena

* feat: extract anchor_type info out of style flag

* refactor: state apply op more efficiently
we can now reuse the repr in state and op

* fix: new clippy errors

* refactor: use state chunk as delta item

* refactor: use two op to insert style start and style end

* feat: diff calc

* feat: handler

* fix: tracker checkout err

* fix: pass basic richtext handler tests

* fix: pass handler basic marking tests

* fix: pass all peritext criteria

* feat: snapshot encoding for richtext init

* refactor: replace Text with Richtext

* refacotr: rm text code

* fix: richtext checkout err

* refactor: diff of text and map

* refactor: del span

* refactor: event

* fix: fuzz err

* fix: pass all tests

* fix: fuzz err

* fix: list child cache err

* chore: rm debug code

* fix: encode enhanced err

* fix: encode enchanced

* fix: fix several richtext issue

* fix: richtext anchor err

* chore: rm debug code

* fix: richtext fuzz err

* feat: speedup text snapshot decode

* perf: optimize snapshot encoding

* perf: speed up decode & insert

* fix: fugue span merge err

* perf: speedup delete & id cursor map

* fix: fugue merge err

* chore: update utils

* perf: speedup text insert / del

* fix: cursor cache

* perf: reduce conversion by introducing InsertText

* perf: speed up by refined cursor cache

* chore: update gbtree dep

* refactor(wasm): use quill delta format

* chore: fix warnings
2023-10-29 14:02:13 +08:00
leeeon233
eb8a07641f fix: decode remove unknown 2023-09-12 15:25:45 +08:00
Zixuan Chen
be63db444e
Merge branch 'main' into feat-encode-enhance 2023-09-05 17:08:41 +08:00
Zixuan Chen
ca9325f5ed
perf: refine encode size (can be better) 2023-08-29 19:43:35 +08:00
Zixuan Chen
60201989ec
fix: speed up encode 2023-08-29 17:15:41 +08:00
Zixuan Chen
728002daf7
chore: add encode example to analysis perf 2023-08-29 15:19:01 +08:00
Zixuan Chen
5b6cc28f6b
chore: add encode example to analysis perf 2023-08-28 16:16:40 +08:00
Zixuan Chen
1e736df133
Refactor: rm legacy code (#97)
* refactor: rm legacy code

* chore: rm dead code

* refactor: mv refactored files outside

* refactor: rename files & methods

* chore: rm unused deps

* fix: compact bytes err

* chore: fix ci
2023-07-31 11:49:55 +08:00
leeeon233
117155cc54 perf: remove compress 2023-03-20 13:55:20 +08:00
leeeon233
4bb3ea8b1b feat: use the same api for container and temp container 2023-03-03 17:10:55 +08:00
leeeon233
46e2c5a960 feat: text transaction 2023-03-02 10:37:50 +08:00
leeeon233
3c9818ef82 feat: impl list map text transaction 2023-02-27 20:55:52 +08:00
leeeon233
e4189785ea fix: calculate lamport by deps 2023-02-18 18:03:05 +08:00
leeeon233
7ffac80215 fix: snapshot load diff 2023-02-16 11:23:14 +08:00
Zixuan Chen
18d32384a5 refactor: move loro-core to loro-internal 2023-01-16 20:08:43 +08:00
Renamed from crates/loro-core/examples/encoding.rs (Browse further)