-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conflicts: escape conflict markers by making them longer #4969
base: main
Are you sure you want to change the base?
Conversation
c547024
to
d62d14c
Compare
Perhaps this should be tested by correctly parsing its own source code. |
Surprisingly, the source code for conflicts.rs doesn't actually contain any lines that look like conflict markers! The only two files that have lines which can be confused for conflict markers currently are tutorial.md and conflicts.md it looks like. |
These changes make the code a bit more readable, and they will make it easier to have conflict markers of different lengths in the next commit.
d62d14c
to
6446b3d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure it's necessary to disable the long markers for Git-style conflicts. @ilyagr had brought this up here: #4897 (comment). In my opinion, it's actually simpler if we just allow long markers for all conflict marker styles. Git technically does support longer conflict markers as well through gitattributes, and it seems like some merge drivers could support an argument for conflict marker size (using %L
in Git). Instead of disallowing long markers for Git, we could maybe add a "$markerLength"
argument to be used by jj resolve
merge tools so that they produce conflicts that we can parse? But that would be a separate feature after this PR gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure it's necessary to disable the long markers for Git-style conflicts.
I wouldn't make it conditional, and simply let all styles follow the same rule.
If a file contains lines which look like conflict markers, then we need to make the real conflict markers longer so that the materialized conflicts can be parsed unambiguously. When parsing the conflict, we require that the conflict markers in the file are at least as long as the materialized conflict markers, and we only parse the longest conflict markers in the file. For instance, if we have a file explaining the differences between Jujutsu's conflict markers and Git's conflict markers, it could produce a conflict with long markers like this: ``` <<<<<<<<<<< Conflict 1 of 1 %%%%%%%%%%% Changes from base to side martinvonz#1 Jujutsu uses different conflict markers than Git, which just shows the -sides of a conflict without a diff. +sides of a conflict without a diff: + +<<<<<<< +left +||||||| +base +======= +right +>>>>>>> +++++++++++ Contents of side martinvonz#2 Jujutsu uses different conflict markers than Git: <<<<<<< %%%%%%% -base +left +++++++ right >>>>>>> >>>>>>>>>>> Conflict 1 of 1 ends ```
Git materializes conflicts with 7-character markers even if the file already contains conflict markers, so it doesn't seem necessary to support generating Git-style conflicts with markers longer than 7 characters. This also means that files which use "=======" as a header (e.g. some titles in Markdown) won't require long conflict markers.
6446b3d
to
9f0be3b
Compare
lib/src/conflicts.rs
Outdated
) -> io::Result<()> { | ||
let conflict_marker: BString = iter::repeat(kind.to_byte()) | ||
.take(CONFLICT_MARKER_LEN) | ||
.collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: vec![kind.to_byte(); CONFLICT_MARKER_LEN]
also works
hunks.push(Merge::resolved(BString::from(resolved_slice))); | ||
} | ||
Some(ConflictMarkerKind::ConflictEnd) => { | ||
if let Some(conflict_start_index) = conflict_start { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can do conflict_start.take()
here to nullify it.
/// N, then the conflict markers we add will be of length (N + increment). This | ||
/// number is chosen to make the conflict markers noticeably longer than the | ||
/// existing markers. | ||
pub const CONFLICT_MARKER_LEN_INCREMENT: usize = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: doesn't have to be pub
?
let conflict_marker: BString = iter::repeat(kind.to_byte()) | ||
.take(CONFLICT_MARKER_LEN) | ||
.collect(); | ||
debug_assert!(len >= MIN_CONFLICT_MARKER_LEN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would make it assert!()
, or remove the assertion. This expression is cheap. It's harmless to call this function with len < MIN_CONFLICT_MARKER_LEN
.
debug_assert!(expected_len >= MIN_CONFLICT_MARKER_LEN); | ||
|
||
parse_conflict_marker_any_len(line) | ||
.filter(|marker| marker.len == expected_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to accept marker.len >= expected_len
?
Suppose working-copy content was materialized with N = 10
, and user removed a literal "<<<<<<<<<" (N = 9) without resolving any other conflicts, the new expected_len
would drop. iirc, the working-copy content isn't updated to use the new expected_len
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had originally, but I think it's too permissive actually. If we just had a minimum, and didn't enforce an exact number, then this could happen:
- Conflict materialized with length 12 markers since file contains length 8 marker
- User deletes length 8 marker
- Conflict is parsed expecting length >=12 markers successfully
- User adds length 8 marker back, since they made a mistake while editing
- Conflict is parsed expecting length >=7 markers now, so both the length 8 and length 12 markers are parsed, even though the length 8 marker isn't a real marker
To solve this, I made it so that only the longest length markers in a file are parsed. So the example would look like this:
- Conflict materialized with length 12 markers since file contains length 8 marker
- User deletes length 8 marker
- Conflict is parsed expecting length 12 markers successfully
- User adds length 8 marker back, since they made a mistake while editing
- Conflict is parsed expecting length 12 markers (since 12 is the longest marker present, and 12 >= 7), so the length 8 markers are correctly ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we can't know whether the re-added marker is a literal or conflict, right? My feeling is that it's safer to assume that a parsable marker is a conflict rather than invalidating any other conflicts by single addition of marker-looking line.
If we prefer stricter behavior, we'll probably need to store the materialized marker length to TreeState
on checkout.
.max() | ||
.unwrap_or_default(); | ||
|
||
if max_existing_marker_len < MIN_CONFLICT_MARKER_LEN { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: perhaps, this should be max(max_existing_marker_len + INCREMENT, MIN_LEN)
?
If existing literal were <<<<<<
, marker of the length 7 would be hard to distinguish.
MergeResult::Conflict(hunks) => { | ||
materialize_conflict_hunks(hunks, conflict_marker_style, output) | ||
let conflict_marker_len = choose_conflict_marker_len(single_hunk); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would move this to caller, and add a local function that materializes conflicts with the given marker length.
Most external callers aren't interested in the marker length.
// considered longer than necessary the next time the file is snapshotted. | ||
let expected_marker_len = materialized_marker_len | ||
.unwrap_or_else(|| choose_conflict_marker_len(&merge_hunk)) | ||
.max(max_content_marker_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we trust the marker length of the new content
? Doesn't it mean any existing conflicts can be invalidated if user added long <<<<<<<<<<<<<
literal by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they would be invalidated if the user adds a marker with length greater than or equal to the materialized marker length. I'm not sure we can work around that though, since there will always be some length of marker that's invalid for them to add.
My goal was to make it so that if a file initially had fake markers of length N when it was materialized, then the user can always add/remove as many markers as they want with length <= N without breaking parsing, even if multiple snapshot operations happen during the process. I think it makes sense that adding markers of length > N could interfere with conflict resolution, since it makes it ambiguous whether the marker is a real one or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not really sure it's necessary to disable the long markers for Git-style conflicts.
I wouldn't make it conditional, and simply let all styles follow the same rule.
Related to #3975 and #4088.
Checklist
If applicable:
CHANGELOG.md