Markdown parser updates #18

BrutuZ · 2023-04-28T01:29:41Z

Parsed after the markdown format since it has the added benefit of avoiding its closing brackets at the end of the URL

funkyhippo

Thanks for the contribution!

Let me know if I missed something regarding the regex, but thanks for tightening it up rather than hitting it with the ol' \w\W.

proxy/source/markdown_parser.py

BrutuZ · 2023-04-29T08:24:30Z

It shouldn't break in pure Python, I did test it in the console to make sure it was matching as expected and had no issues. Maybe escape the single quote too rather than remove it? It is a valid URL character after all, and not that unusual since it's often used to enclose values in parameters

funkyhippo · 2023-04-29T09:03:26Z

Break may have been the wrong word. What's your expected input?

>>> import re
>>> def _parse_links(input_str: str) -> str:
...     input_str = re.sub(
...         r"\[([\w\W]+?)\]\(([\w\W]+?)\)",
...         r'<a href="\2">\1</a>',
...         input_str,
...         flags=re.MULTILINE,
...     )
...     return re.sub(
...         r"(?<!href=\")(https?:\/\/[-a-zA-Z0-9._~:/?#@!$&()*+,;=%]+')",
...         r'<a href="\1">\1</a>',
...         input_str,
...         flags=re.MULTILINE,
...     )
...
>>> test_str = """
... https://example.com
... [example](https://example.com)
... https://example.com'
... """
>>> print(_parse_links(test_str))

https://example.com
<a href="https://example.com">example</a>
<a href="https://example.com'">https://example.com'</a>

Did you intend to only support links that were suffixed with '? It's outside the character set.

BrutuZ · 2023-04-29T17:35:40Z

Did you intend to only support links that were suffixed with '? It's outside the character set.

That... was an oversight between converting the expression enclosure from single to double-quotes in the browser editor 😅

BrutuZ · 2023-05-02T02:44:52Z

Added some other minor changes, doesn't seem like I broke anything 😅

BrutuZ · 2023-06-01T01:00:32Z

Anything else holding this and #20 ?
I have the changes from both PRs (and a few more) running on a forked instance with no issues so far

funkyhippo · 2023-06-03T07:59:09Z

No blockers at a glance -- sorry for the slow review, I've just been incredibly busy with irl.

I'll block out some time this weekend to take a closer look and/or merge.

funkyhippo

Sorry for the slow review -- I left some comments.

proxy/source/markdown_parser.py

* Marker matches on both sides * Opener and closer is not escaped * First character inside is not a space

BrutuZ · 2023-06-05T02:08:06Z

Should have addressed all notes 🤞

funkyhippo

Sorry for the long back-and-forth, I appreciate the iteration. A set of tests would be useful so that we can capture all the edge cases and expected behaviour.

Going to merge this as-is, thanks so much for the contribution!

funkyhippo · 2023-06-05T03:26:48Z

proxy/source/markdown_parser.py



 def _parse_strong_emphasis(input_str: str) -> str:
    return "\n".join(
        re.sub(
-            r"(?:\*\*|\_\_)([\w]+?[\w\W]+?[\w]+?)(?:\*\*|\_\_)",
+            r"(?<!\\)\*\*(\w.*?)(?<!\\)\*\*|(?<!\\)__(\w.*?)(?<!\\)__",


Nice, I'm a fan of also handling the escaped character.

I think you're handling the first character being a space, but not the last?

>>> text = """ ... **asdf** ... ** asdf** ... **asdf ** ... ** asdf ** ... """ >>> print(_parse_strong_emphasis(text)) <strong>asdf</strong> ** asdf** <strong>asdf </strong> ** asdf **

It's fine though, I'm not going to block on this (especially since the original impl was flawed as it only allowed \w which ignores symbols as valid characters).

BrutuZ · 2023-06-05T03:46:56Z

I couldn't figure an elegant way to filter trailing spaces as well. While another \w at the end might handle it, it would also make it impossible to format single characters like **A** letter or _1_ digit. Turns out a space with the {0} quantifier doesn't work as a negative 🙁

* Parse standalone URLs too * move single-quote back into the set * Tighten URL characters for CFM * Improve headers formatting * Simplify expressions * Update example * Parse URLs regardless of casing * open links in new tabs * Put atx-style back into docstring * Absorb more spaces and optional closing on headers * Refine bold and italics expressions by ensuring: * Marker matches on both sides * Opener and closer is not escaped * First character inside is not a space

Parse standalone URLs too

05333ff

funkyhippo requested changes Apr 29, 2023

View reviewed changes

proxy/source/markdown_parser.py Outdated Show resolved Hide resolved

BrutuZ added 5 commits April 29, 2023 14:37

move single-quote back into the set

f822b36

Tighten URL characters for CFM

6642525

Improve headers formatting

3db8483

Simplify expressions

7d671a7

Update example

9456126

BrutuZ requested a review from funkyhippo May 2, 2023 02:45

BrutuZ added 2 commits May 4, 2023 16:46

Parse URLs regardless of casing

c60dbb5

open links in new tabs

8f14423

BrutuZ changed the title ~~Parse standalone URLs too~~ Markdown parser updates May 14, 2023

funkyhippo reviewed Jun 5, 2023

View reviewed changes

proxy/source/markdown_parser.py Outdated Show resolved Hide resolved

proxy/source/markdown_parser.py Outdated Show resolved Hide resolved

proxy/source/markdown_parser.py Outdated Show resolved Hide resolved

proxy/source/markdown_parser.py Outdated Show resolved Hide resolved

BrutuZ added 3 commits June 4, 2023 22:19

Put atx-style back into docstring

d234f0b

Absorb more spaces and optional closing on headers

17f2c72

Refine bold and italics expressions by ensuring:

6d7ac3d

* Marker matches on both sides * Opener and closer is not escaped * First character inside is not a space

funkyhippo approved these changes Jun 5, 2023

View reviewed changes

funkyhippo merged commit 3dcabbe into subject-f:develop Jun 5, 2023

BrutuZ deleted the patch-1 branch June 6, 2023 00:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Markdown parser updates #18

Markdown parser updates #18

BrutuZ commented Apr 28, 2023

funkyhippo left a comment

BrutuZ commented Apr 29, 2023 via email

funkyhippo commented Apr 29, 2023

BrutuZ commented Apr 29, 2023 •

edited

Loading

BrutuZ commented May 2, 2023

BrutuZ commented Jun 1, 2023

funkyhippo commented Jun 3, 2023

funkyhippo left a comment

BrutuZ commented Jun 5, 2023

funkyhippo left a comment

funkyhippo Jun 5, 2023

BrutuZ commented Jun 5, 2023 via email

Markdown parser updates #18

Markdown parser updates #18

Conversation

BrutuZ commented Apr 28, 2023

funkyhippo left a comment

Choose a reason for hiding this comment

BrutuZ commented Apr 29, 2023 via email

funkyhippo commented Apr 29, 2023

BrutuZ commented Apr 29, 2023 • edited Loading

BrutuZ commented May 2, 2023

BrutuZ commented Jun 1, 2023

funkyhippo commented Jun 3, 2023

funkyhippo left a comment

Choose a reason for hiding this comment

BrutuZ commented Jun 5, 2023

funkyhippo left a comment

Choose a reason for hiding this comment

funkyhippo Jun 5, 2023

Choose a reason for hiding this comment

BrutuZ commented Jun 5, 2023 via email

BrutuZ commented Apr 29, 2023 •

edited

Loading