* feat: delta rope support init
* perf: use generic-btree v0.9.0
* refactor: improve readability and maintainability
* fix: fix several issues about composing
* fix: a few more issue about composing deletions
* test: rich text
* fix: cover more edge cases
* refactor: use deltarope for list event
* refactor: replace text delta with DeltaRope
* fix: list fuzz err
* fix: safety issue on insert_many
* chore: refine impl of text delta
* refactor: use Replace instead of insert+del in DeltaItem (#330)
* refactor: use Replace instead of insert+del in DeltaItem
* fix: each deltaitem should have non-zero rle_len
Updated generic-btree dependency to version 0.10.3 and refactored DeltaItem and DeltaRope implementations in loro-delta. Refine compose impl
* fix: update generic-btree to fix the update leaf issue
* chore: lockfile
* chore: clippy fix
* refactor: make composing easier to understand
* refactor: simplify the impl of composing
This PR introduces support for retrieving and querying cursors.
## Motivation
Using "index" to denote cursor positions can be unstable, as positions may shift with document edits. To reliably represent a position or range within a document, it is more effective to leverage the unique ID of each item/character in a List CRDT or Text CRDT.
## Updating Cursors
Loro optimizes State metadata by not storing the IDs of deleted elements. This approach, while efficient, complicates tracking cursor positions since they rely on these IDs for precise locations within the document. The solution recalculates position by replaying relevant history to update stable positions accurately. To minimize the performance impact of history replay, the system updates cursor info to reference only the IDs of currently present elements, thereby reducing the need for replay.
Each position has a "Side" information, indicating the actual cursor position is on the left, right, or directly in the center of the target ID.
Note: In JavaScript, the offset returned when querying a Stable Position is based on the UTF-16 index.
# Example
```ts
const loro = new Loro();
const list = loro.getList("list");
list.insert(0, "a");
const pos0 = list.getStablePos(0);
list.insert(1, "b");
{
const ans = loro.queryStablePos(pos0!);
expect(ans.offset).toEqual(0);
expect(ans.side).toEqual(0);
expect(ans.update).toBeUndefined();
}
list.insert(0, "c");
{
const ans = loro.queryStablePos(pos0!);
expect(ans.offset).toEqual(1);
expect(ans.side).toEqual(0);
expect(ans.update).toBeUndefined();
}
list.delete(1, 1);
{
const ans = loro.queryStablePos(pos0!);
expect(ans.offset).toEqual(1);
expect(ans.side).toEqual(-1);
expect(ans.update).toBeDefined();
}
```
* refactor: encoding container id
* fix: container indexing when merged ops in encoding
* chore: add compress encode size for draw example
* fix: do not need cids in encoding
* chore: change name containerIdx to containerType in encoding
* refactor: rm txn.abort and related undo behavior
* perf: simplify richtext state when there is not styles
* perf: reduce text cost when there is no style
* chore: refine logs
* perf: remove cid in states to reduce mem overhead
* refactor: reduce mem overhead by using a compacter mapvalue
* refactor: rm the box inside richtext state
This PR implements a new encode schema that is more extendible and more compact. It’s also simpler and takes less binary size and maintaining effort. It is inspired by the [Automerge Encoding Format](https://automerge.org/automerge-binary-format-spec/).
The main motivation is the extensibility. When we integrate a new CRDT algorithm, we don’t want to make a breaking change to the encoding or keep multiple versions of the encoding schema in the code, as it will make our WASM size much larger. We need a stable and extendible encoding schema for our v1.0 version.
This PR also exposes the ops that compose the current container state. For example, now you can make a query about which operation a certain character quickly. This behavior is required in the new snapshot encoding, so it’s included in this PR.
# Encoding Schema
## Header
The header has 22 bytes.
- (0-4 bytes) Magic Bytes: The encoding starts with `loro` as magic bytes.
- (4-20 bytes) Checksum: MD5 checksum of the encoded data, including the header starting from 20th bytes. The checksum is encoded as a 16-byte array. The `checksum` and `magic bytes` fields are trimmed when calculating the checksum.
- (20-21 bytes) Encoding Method (2 bytes, big endian): Multiple encoding methods are available for a specific encoding version.
## Encode Mode: Updates
In this approach, only ops, specifically their historical record, are encoded, while document states are excluded.
Like Automerge's format, we employ columnar encoding for operations and changes.
Previously, operations were ordered by their Operation ID (OpId) before columnar encoding. However, sorting operations based on their respective containers initially enhance compression potential.
## Encode Mode: Snapshot
This mode simultaneously captures document state and historical data. Upon importing a snapshot into a new document, initialization occurs directly from the snapshot, bypassing the need for CRDT-based recalculations.
Unlike previous snapshot encoding methods, the current binary output in snapshot mode is compatible with the updates mode. This enhances the efficiency of importing snapshots into non-empty documents, where initialization via snapshot is infeasible.
Additionally, when feasible, we leverage the sequence of operations to construct state snapshots. In CRDTs, deducing the specific ops constituting the current container state is feasible. These ops are tagged in relation to the container, facilitating direct state reconstruction from them. This approach, pioneered by Automerge, significantly improves compression efficiency.
* fix: imported changes were not mergeable
now the small encoding size is supported in example
* fix: stupid err in richtext checkout
* fix: rle oplog encode err
- support pending changes
- start counters were wrong
* refactor: make internal and leaf use same type of cache
* refactor: add cache update
* test : add normalization to arb test
* test: fuzz
* fix: internal insert bug
* fix: missing utf16
* test: fix test sub overflow
* feat: use heapless for binary heap
* refactor: refine warning
* test: reduce test time
* perf: reduce computation when finding pos
* bench: fix ignore parse time in benching
* feat: make it compile in new sig (should be merged)
* fix: type err
* fix: fix type err
* fix: cache when merge & borrow
* refactor: simplify code
* fix: cumulated tree trait bug
* fix: a few fatal bugs (still buggy)
* fix: global tree trait
* refactor: rm an unused fn
* fix: insert at cursor bug
* fix: in cursor insert cache may be invalid
strip the checker there
* chore: remove needless check
* refactor: add inline to methods
* test: remove cfg=mem for mem example
* fix: type err