Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desktop: Resolves #11600: OneNote imported notes have broken links when there are chineses characters on link #11602

Open
wants to merge 8 commits into
base: dev
Choose a base branch
from

Conversation

pedr
Copy link
Collaborator

@pedr pedr commented Jan 7, 2025

Should be merged after #11598
Resolves #11600

Summary

Links from notes imported from OneNote sometimes would break and not find the final delimiter when being rendered to HTML, but I could never understand why that happens, I think this is the reason:

First how the renderer for the rich text works, it has 3 parts:

  • one is the text content that is going to be displayed
  • second are the styles that are going to be applied
  • third is an index of where the break point of each style should be
Example
OneNoteConverter:  Parts: [
    "Tips from a Pro: Using Trees for Dramatic Landscape Photography",
    ".one#Tips%20from%20a%20Pro%20Using%20Trees%20for%20Dramatic%20Landscape%20Photography&sec
    "风景",
    "\u{fddf}HYPERLINK \"onenote:https://d.docs.live.net/c8d3bbab7f1acf3a/Documents/Photograph
], Indices: [
    83,
    85,
    272,
], Styles: [
    ParagraphStyling {
        charset: Some(
            Ansi,
        ),
        bold: false,
        italic: false,
        underline: false,
        strikethrough: false,
        superscript: false,
        subscript: false,
        font: None,
        font_size: None,
        font_color: None,
        highlight: None,
        next_style: None,
        style_id: None,
        paragraph_alignment: None,
        paragraph_space_before: None,
        paragraph_space_after: None,
        paragraph_line_spacing_exact: None,
        language_code: Some(
            2052,
        ),
        math_formatting: false,
        hyperlink: true,
    },
    ParagraphStyling {
        charset: Some(
            Gb2312,
        ),
        bold: false,
        italic: false,
        underline: false,
        strikethrough: false,
        superscript: false,
        subscript: false,
        font: Some(
            "Microsoft YaHei",
        ),
        font_size: None,
        font_color: None,
        highlight: None,
        next_style: None,
        style_id: None,
        paragraph_alignment: None,
        paragraph_space_before: None,
        paragraph_space_after: None,
        paragraph_line_spacing_exact: None,
        language_code: Some(
            2052,
        ),
        math_formatting: false,
        hyperlink: true,
    },
    ParagraphStyling {
        charset: Some(
            Ansi,
        ),
        bold: false,
        italic: false,
        underline: false,
        strikethrough: false,
        superscript: false,
        subscript: false,
        font: None,
        font_size: None,
        font_color: None,
        highlight: None,
        next_style: None,
        style_id: None,
        paragraph_alignment: None,
        paragraph_space_before: None,
        paragraph_space_after: None,
        paragraph_line_spacing_exact: None,
        language_code: Some(
            2052,
        ),
        math_formatting: false,
        hyperlink: true,
    },
    ParagraphStyling {
        charset: Some(
            Ansi,
        ),
        bold: false,
        italic: false,
        underline: false,
        strikethrough: false,
        superscript: false,
        subscript: false,
        font: None,
        font_size: None,
        font_color: None,
        highlight: None,
        next_style: None,
        style_id: None,
        paragraph_alignment: None,
        paragraph_space_before: None,
        paragraph_space_after: None,
        paragraph_line_spacing_exact: None,
        language_code: Some(
            1033,
        ),
        math_formatting: false,
        hyperlink: true,
    },
]

In the current implementation wasn't implemented like this, it was assumed that the index-to-styles would always match, meaning that content that should be rendered as (ignore line breaks, included to help reading):

<a href="onenote:https://d.docs.live.net/c8d3bbab7f1acf3a/Documents/Photography/风景
  .one#Tips%20from%20a%20Pro%20Using%20Trees%20for%20Dramatic%20Landscape%20Photography
  &section-id={262ADDFB-A4DC-4453-A239-0024D6769962}
  &page-id={88D803A5-4F43-48D4-9B16-4C024F5787DC}&end" style=""
>
  Tips from a Pro: Using Trees for Dramatic Landscape Photography
</a>

Would be rendered as

<a href="onenote:https://d.docs.live.net/c8d3bbab7f1acf3a/Documents/Photography/" style"">
  风景.one#Tips%20from%20a%20Pro%20Using%20Trees%20for%20Dramatic%20Landscape%20Photography
  &section-id={262ADDFB-A4DC-4453-A239-0024D6769962}
  &page-id={88D803A5-4F43-48D4-9B16-4C024F5787DC}
  &end"Tips from a Pro: Using Trees for Dramatic Landscape Photography
</a>

It is not entirely clear to me why that happens, but looking at the example above it is possible to see that a different charset is used for the two Chinese characters (index 83 and 85).

I don't know why that would be necessary, but this solution should fix the issue.

Testing

I added an automated test case that this PR is based on

@pedr pedr added bug It's a bug import Related to importing files such as ENEX, JEX, etc. labels Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug It's a bug import Related to importing files such as ENEX, JEX, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OneNote imported notes links can be broken if the href is broken in more than one style
1 participant