Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflicts: escape conflict markers by making them longer #4969

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

scott2000
Copy link
Collaborator

@scott2000 scott2000 commented Nov 25, 2024

Related to #3975 and #4088.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@joyously
Copy link

Perhaps this should be tested by correctly parsing its own source code.

@scott2000
Copy link
Collaborator Author

Perhaps this should be tested by correctly parsing its own source code.

Surprisingly, the source code for conflicts.rs doesn't actually contain any lines that look like conflict markers! The only two files that have lines which can be confused for conflict markers currently are tutorial.md and conflicts.md it looks like.

These changes make the code a bit more readable, and they will make it
easier to have conflict markers of different lengths in the next commit.
@scott2000 scott2000 marked this pull request as ready for review November 26, 2024 15:46
Copy link
Collaborator Author

@scott2000 scott2000 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure it's necessary to disable the long markers for Git-style conflicts. @ilyagr had brought this up here: #4897 (comment). In my opinion, it's actually simpler if we just allow long markers for all conflict marker styles. Git technically does support longer conflict markers as well through gitattributes, and it seems like some merge drivers could support an argument for conflict marker size (using %L in Git). Instead of disallowing long markers for Git, we could maybe add a "$markerLength" argument to be used by jj resolve merge tools so that they produce conflicts that we can parse? But that would be a separate feature after this PR gets merged.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure it's necessary to disable the long markers for Git-style conflicts.

I wouldn't make it conditional, and simply let all styles follow the same rule.

If a file contains lines which look like conflict markers, then we need
to make the real conflict markers longer so that the materialized
conflicts can be parsed unambiguously.

When parsing the conflict, we require that the conflict markers in the
file are at least as long as the materialized conflict markers, and we
only parse the longest conflict markers in the file.

For instance, if we have a file explaining the differences between
Jujutsu's conflict markers and Git's conflict markers, it could produce
a conflict with long markers like this:

```
<<<<<<<<<<< Conflict 1 of 1
%%%%%%%%%%% Changes from base to side martinvonz#1
 Jujutsu uses different conflict markers than Git, which just shows the
-sides of a conflict without a diff.
+sides of a conflict without a diff:
+
+<<<<<<<
+left
+|||||||
+base
+=======
+right
+>>>>>>>
+++++++++++ Contents of side martinvonz#2
Jujutsu uses different conflict markers than Git:

<<<<<<<
%%%%%%%
-base
+left
+++++++
right
>>>>>>>
>>>>>>>>>>> Conflict 1 of 1 ends
```
Git materializes conflicts with 7-character markers even if the file
already contains conflict markers, so it doesn't seem necessary to
support generating Git-style conflicts with markers longer than 7
characters. This also means that files which use "=======" as a header
(e.g. some titles in Markdown) won't require long conflict markers.
) -> io::Result<()> {
let conflict_marker: BString = iter::repeat(kind.to_byte())
.take(CONFLICT_MARKER_LEN)
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: vec![kind.to_byte(); CONFLICT_MARKER_LEN] also works

hunks.push(Merge::resolved(BString::from(resolved_slice)));
}
Some(ConflictMarkerKind::ConflictEnd) => {
if let Some(conflict_start_index) = conflict_start {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can do conflict_start.take() here to nullify it.

/// N, then the conflict markers we add will be of length (N + increment). This
/// number is chosen to make the conflict markers noticeably longer than the
/// existing markers.
pub const CONFLICT_MARKER_LEN_INCREMENT: usize = 4;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: doesn't have to be pub?

let conflict_marker: BString = iter::repeat(kind.to_byte())
.take(CONFLICT_MARKER_LEN)
.collect();
debug_assert!(len >= MIN_CONFLICT_MARKER_LEN);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would make it assert!(), or remove the assertion. This expression is cheap. It's harmless to call this function with len < MIN_CONFLICT_MARKER_LEN.

debug_assert!(expected_len >= MIN_CONFLICT_MARKER_LEN);

parse_conflict_marker_any_len(line)
.filter(|marker| marker.len == expected_len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to accept marker.len >= expected_len?

Suppose working-copy content was materialized with N = 10, and user removed a literal "<<<<<<<<<" (N = 9) without resolving any other conflicts, the new expected_len would drop. iirc, the working-copy content isn't updated to use the new expected_len.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I had originally, but I think it's too permissive actually. If we just had a minimum, and didn't enforce an exact number, then this could happen:

  1. Conflict materialized with length 12 markers since file contains length 8 marker
  2. User deletes length 8 marker
  3. Conflict is parsed expecting length >=12 markers successfully
  4. User adds length 8 marker back, since they made a mistake while editing
  5. Conflict is parsed expecting length >=7 markers now, so both the length 8 and length 12 markers are parsed, even though the length 8 marker isn't a real marker

To solve this, I made it so that only the longest length markers in a file are parsed. So the example would look like this:

  1. Conflict materialized with length 12 markers since file contains length 8 marker
  2. User deletes length 8 marker
  3. Conflict is parsed expecting length 12 markers successfully
  4. User adds length 8 marker back, since they made a mistake while editing
  5. Conflict is parsed expecting length 12 markers (since 12 is the longest marker present, and 12 >= 7), so the length 8 markers are correctly ignored

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we can't know whether the re-added marker is a literal or conflict, right? My feeling is that it's safer to assume that a parsable marker is a conflict rather than invalidating any other conflicts by single addition of marker-looking line.

If we prefer stricter behavior, we'll probably need to store the materialized marker length to TreeState on checkout.

.max()
.unwrap_or_default();

if max_existing_marker_len < MIN_CONFLICT_MARKER_LEN {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: perhaps, this should be max(max_existing_marker_len + INCREMENT, MIN_LEN)?

If existing literal were <<<<<<, marker of the length 7 would be hard to distinguish.

MergeResult::Conflict(hunks) => {
materialize_conflict_hunks(hunks, conflict_marker_style, output)
let conflict_marker_len = choose_conflict_marker_len(single_hunk);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would move this to caller, and add a local function that materializes conflicts with the given marker length.

Most external callers aren't interested in the marker length.

// considered longer than necessary the next time the file is snapshotted.
let expected_marker_len = materialized_marker_len
.unwrap_or_else(|| choose_conflict_marker_len(&merge_hunk))
.max(max_content_marker_len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we trust the marker length of the new content? Doesn't it mean any existing conflicts can be invalidated if user added long <<<<<<<<<<<<< literal by mistake?

Copy link
Collaborator Author

@scott2000 scott2000 Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they would be invalidated if the user adds a marker with length greater than or equal to the materialized marker length. I'm not sure we can work around that though, since there will always be some length of marker that's invalid for them to add.

My goal was to make it so that if a file initially had fake markers of length N when it was materialized, then the user can always add/remove as many markers as they want with length <= N without breaking parsing, even if multiple snapshot operations happen during the process. I think it makes sense that adding markers of length > N could interfere with conflict resolution, since it makes it ambiguous whether the marker is a real one or not.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure it's necessary to disable the long markers for Git-style conflicts.

I wouldn't make it conditional, and simply let all styles follow the same rule.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants