mirrors/jj

mirror of https://github.com/martinvonz/jj.git synced 2025-02-07 13:00:08 +00:00

Author	SHA1	Message	Date
Martin von Zweigbergk	5b18e89a4d	diff: fix LCS when a line/word/byte has been moved later	2021-04-28 23:33:18 -07:00
Martin von Zweigbergk	102f7a0416	diff: also recurse into final region after after unchanged regions See test case for details. Before: test bench_diff_10k_lines_reversed ... bench: 36,249,659 ns/iter (+/- 174,455) test bench_diff_10k_modified_lines ... bench: 37,258,890 ns/iter (+/- 803,963) test bench_diff_10k_unchanged_lines ... bench: 4,252 ns/iter (+/- 69) test bench_diff_1k_lines_reversed ... bench: 982,834 ns/iter (+/- 6,467) test bench_diff_1k_modified_lines ... bench: 3,343,469 ns/iter (+/- 23,243) test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 2) test bench_diff_git_git_read_tree_c ... bench: 95,559 ns/iter (+/- 816) After: test bench_diff_10k_lines_reversed ... bench: 36,186,715 ns/iter (+/- 196,903) test bench_diff_10k_modified_lines ... bench: 37,511,000 ns/iter (+/- 1,370,476) test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 8) test bench_diff_1k_lines_reversed ... bench: 986,010 ns/iter (+/- 11,565) test bench_diff_1k_modified_lines ... bench: 3,370,938 ns/iter (+/- 17,041) test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 2) test bench_diff_git_git_read_tree_c ... bench: 102,189 ns/iter (+/- 1,052) So this patch makes diffing even slower (but still easily fast enough for all cases I've run into in real life). There's probably a lot that can be done to make things faster, but the first priority is that the diffs are correct and easy to read.	2021-04-08 23:54:54 -07:00
Martin von Zweigbergk	5c10c93e64	diff: fix tests broken by the previous commit Sorry, I forgot to run the automated tests again :(	2021-04-07 11:00:04 -07:00
Martin von Zweigbergk	0dd000d236	diff: do final refinement at byte-level for non-word bytes This results in significantly more readable diffs on commits like `659393bec2` in this repo. Before: test bench_diff_10k_lines_reversed ... bench: 38,122,998 ns/iter (+/- 557,688) test bench_diff_10k_modified_lines ... bench: 32,556,563 ns/iter (+/- 548,114) test bench_diff_10k_unchanged_lines ... bench: 4,231 ns/iter (+/- 15) test bench_diff_1k_lines_reversed ... bench: 958,296 ns/iter (+/- 46,963) test bench_diff_1k_modified_lines ... bench: 3,014,723 ns/iter (+/- 15,830) test bench_diff_1k_unchanged_lines ... bench: 249 ns/iter (+/- 2) test bench_diff_git_git_read_tree_c ... bench: 78,599 ns/iter (+/- 1,079) After: test bench_diff_10k_lines_reversed ... bench: 38,289,493 ns/iter (+/- 413,712) test bench_diff_10k_modified_lines ... bench: 37,352,516 ns/iter (+/- 1,293,950) test bench_diff_10k_unchanged_lines ... bench: 4,238 ns/iter (+/- 13) test bench_diff_1k_lines_reversed ... bench: 967,253 ns/iter (+/- 8,506) test bench_diff_1k_modified_lines ... bench: 3,358,028 ns/iter (+/- 37,154) test bench_diff_1k_unchanged_lines ... bench: 233 ns/iter (+/- 1) test bench_diff_git_git_read_tree_c ... bench: 95,787 ns/iter (+/- 740) So the biggest slowdown is when there are modified lines.	2021-04-07 10:27:17 -07:00
Martin von Zweigbergk	d7395cc34a	diff: add copyright header	2021-04-06 21:26:37 -07:00
Martin von Zweigbergk	7e4e43f358	diff: first diff lines, then refine to words, producing better diffs The new diff algorithm produces pretty bad diffs in some cases, such as `cc4b1e9230` in this repo (the parent of this commit). I think the problem there is that many words are repeated over and over. Diffing first at the line level and then refining the diff of the changed ranges at the word level gives much better results. That's what this patch does. After this patch, `jj diff -r cc4b1e923091` looks pretty similar to the diff in GitHub's UI. I hope to get around to doing the same for the merge code soon. Impact on benchmarks: Before: test bench_diff_10k_lines_reversed ... bench: 42,647,532 ns/iter (+/- 765,347) test bench_diff_10k_modified_lines ... bench: 21,407,980 ns/iter (+/- 126,366) test bench_diff_10k_unchanged_lines ... bench: 4,235 ns/iter (+/- 16) test bench_diff_1k_lines_reversed ... bench: 1,190,483 ns/iter (+/- 7,192) test bench_diff_1k_modified_lines ... bench: 1,919,766 ns/iter (+/- 9,665) test bench_diff_1k_unchanged_lines ... bench: 231 ns/iter (+/- 1) test bench_diff_git_git_read_tree_c ... bench: 174,702 ns/iter (+/- 1,199) After: test bench_diff_10k_lines_reversed ... bench: 38,289,509 ns/iter (+/- 129,004) test bench_diff_10k_modified_lines ... bench: 33,140,659 ns/iter (+/- 3,989,339) test bench_diff_10k_unchanged_lines ... bench: 3,099 ns/iter (+/- 14) test bench_diff_1k_lines_reversed ... bench: 973,551 ns/iter (+/- 94,895) test bench_diff_1k_modified_lines ... bench: 3,033,818 ns/iter (+/- 29,513) test bench_diff_1k_unchanged_lines ... bench: 230 ns/iter (+/- 1) test bench_diff_git_git_read_tree_c ... bench: 79,100 ns/iter (+/- 963) So most of them get slower, as expected. The last one, taken from a real diff in the git.git repo, get faster, however (which is also what I would have expected).	2021-04-04 21:50:31 -07:00
Martin von Zweigbergk	3c35dbace6	merge: use new diff algorithm for finding sync regions With the histogram diff code from the previous patch, we can now start using that for finding the "sync regions" in 3-way merge. That helps a lot with the slow merging we had before this patch. `jj diff -r 9d540e9726` in the git.git repo drops from 22 s to 0.15 s with this patch. (That commit is a rather arbitrary merge commit from aroun 5 years ago.) With the new diff algorithm, the output of `jj diff -r 9d540e9726` in git.git looks better if we find unchanged sync regions based on lines than on words, so that's what I'm using in this patch. That's a change compared the the LCS-based diff we used before this patch. I suspect the reason that finding sync regions based on words works worse now is not because of the change from LCS to histogram but because of the change in how we define a word. My goal right now is mostly to make it faster; I'll get back to refining the diff result later.	2021-03-31 22:16:19 -07:00
Martin von Zweigbergk	1e657c5331	diff: add a histogram(-like?) diff algorithm The current diff algorithm does a full LCS on the words of the texts, which is really slow. Diffing the working copy when e.g. `src/commands.py` has changes far apart takes seconds. This patch adds an implementation inspired by JGit's Histogram diff. I say "inspired" because I just didn't quite understand it :P In particular, I didn't understand what it does when it finds non-unique elements. I decided to line up the leading common elements on both sides of the merge. I don't know if that usually gives good enough results in practice. I'm sure this can still be optimized a lot, but this seems good enough as a start. There is also many things to improve about the quality of the diffs.	2021-03-31 22:15:36 -07:00

8 commits