Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Epic: Correct escaping in HTML output #1920

Closed
8 tasks done
haxtibal opened this issue Jul 16, 2024 · 11 comments
Closed
8 tasks done

Epic: Correct escaping in HTML output #1920

haxtibal opened this issue Jul 16, 2024 · 11 comments

Comments

@haxtibal
Copy link
Contributor

haxtibal commented Jul 16, 2024


To reproduce, generate an html export of the following document

[DOCUMENT]
TITLE: Written <b>bold</b>

[SECTION]
TITLE: Some text is <invisible>

[REQUIREMENT]
UID: REQ-1
TITLE: <script>alert(1)</script>
STATEMENT: >>>
We can also execute scripts.
<<<

[/SECTION]

I assume this is a bug, not a feature...

html.escape is already used in some places. I'll have a look and try to catch more cases.

@haxtibal haxtibal changed the title Singleline strings not escaped HTML output Singleline strings not escaped in HTML output Jul 16, 2024
@stanislaw
Copy link
Collaborator

Thanks for reporting this. It is a lack of feature 😄

The html.escape was implemented for multiline fields but not for single-line. Let's implement it for single-line as well.

@stanislaw stanislaw added this to the 2024-Q3 milestone Jul 17, 2024
@haxtibal
Copy link
Contributor Author

haxtibal commented Jul 17, 2024

The html.escape was implemented for multiline fields

IIUC multiline was implemented for the UI forms in #1018. Multiline in static views (like components/requirement/index.jinja) is escaped too, but for a different reason: We pipe it through docutils, which does escaping under the hood.

I thought about when and where is the best place to do the escaping:

  • backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component
  • call html.escape in Jinja2 templates where needed (lots of places)
  • pipe through |e in Jinja2 templates (same lot of places, but keeps code cleaner)
  • use Jinja2 automatic escaping and opt out where it would hurt (e.g., we must not escape pre-rendered rst markup)
  • introduce wrapper objects that can be assigned to sdoc_entity, so that getters return escaped values

I'd probably start by trying automatic escaping and see how it works out. Do you have an opinion?

@stanislaw
Copy link
Collaborator

backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component

Sorry for the delays with my responses, I am running low on my free time.

I think I would prefer the enumerate_all_fields_escaped option for the time being to have it central like you said. Adding the HTML-specific stuff to a core component should be ok because currently there is no intermediate layer between SDoc classes and Jinja. I started introducing the view object classes to be that kind of interface to Jinja but in this specific example with escaping it will probably not help. We could revisit this later if a stronger need in an intermediate layer would emerge.

If we used enumerate_all_fields_escaped everywhere, could the change be contained in only a small number of places?

@haxtibal haxtibal mentioned this issue Jul 18, 2024
6 tasks
@haxtibal
Copy link
Contributor Author

backend.sdoc (SDocNodeField.enumerate_all_fields_escaped exists right now) is central, but IMO it's wrong because it adds HTML specific stuff to a core component

Sorry for the delays with my responses, I am running low on my free time.

No hurry. Same here.

I think I would prefer the enumerate_all_fields_escaped option

I did some trials with Jinja2 autoescaping already yesterday and wanted to present it first before discarding the idea. Therefore #1921. If you don't agree with the reasoning just close that MR and let's continue discussion here.

@stanislaw stanislaw changed the title Singleline strings not escaped in HTML output Epic: Correct escaping in HTML output Jul 28, 2024
@stanislaw
Copy link
Collaborator

@haxtibal

I have added a few TODOs here but let's consider them a very low priority. You/I don't have to rush into completing them all as long as your use case is working.

I would just like to have more test coverage to make sure we didn't miss any screens just in case. StrictDoc has many such low-priority tickets which all would love attention, so let's move according to what stands out most.

haxtibal added a commit to haxtibal/strictdoc that referenced this issue Aug 15, 2024
There were some occurrences of "&nbsp;" in Python statements within
Jinja2 templates. They must be marked as safe, otherwise they would be
rendered to literal &nbsp; by auto escaping.

This happened in diff screen for any text nodes, and sections with
LEVEL: None, as well as in TOCs for sections with LEVEL: None.

Relates to strictdoc-project#1920.
haxtibal added a commit to haxtibal/strictdoc that referenced this issue Aug 15, 2024
There were some occurrences of "&nbsp;" in Python statements within
Jinja2 templates. They must be marked as safe, otherwise they would be
rendered to literal &nbsp; by auto escaping.

This happened in diff screen for any text node, and sections with
LEVEL: None, as well as in TOCs for sections with LEVEL: None.

Relates to strictdoc-project#1920.
@stanislaw stanislaw modified the milestones: 2024-Q3, 2024-Q4 Oct 13, 2024
@stanislaw
Copy link
Collaborator

It turns out that the search feature has regressed due to the autoescaping: http://127.0.0.1:5111/search?q=node.contains(&quot;TBD&quot;). We may be missing an end-to-end test to catch this.

@haxtibal
Copy link
Contributor Author

It turns out that the search feature has regressed due to the autoescaping: http://127.0.0.1:5111/search?q=node.contains(&quot;TBD&quot;). We may be missing an end-to-end test to catch this.

Will have a look, hopefully tomorrow.

@haxtibal
Copy link
Contributor Author

Hm, can't reproduce... When typing node.contains("TBD") into the search form, flask receives

INFO:     127.0.0.1:46104 - "GET /search?q=node.contains%28%22TBD%22%29 HTTP/1.1" 200 OK

This looks OK and works. How did you test it?

@stanislaw
Copy link
Collaborator

stanislaw commented Oct 18, 2024

Good to know that the search screen itself is not affected.

My testing was around the links that are on the statistics screen: http://127.0.0.1:5111/project_statistics.html. The legend has escaped/broken hrefs.

haxtibal added a commit to haxtibal/strictdoc that referenced this issue Oct 18, 2024
Search URLs generated for the project statistic page were caught by
Jinja2 autoescaping, which makes them invalid.

A good place to mark them as safe is the view object: It's the immediate
layer below Jinja templates and specifically made to provide strings
included by a Jinja2 template.

Relates to strictdoc-project#1920.
haxtibal added a commit to haxtibal/strictdoc that referenced this issue Oct 19, 2024
Search URLs generated for the project statistic page were caught by
Jinja2 autoescaping, which makes them invalid.

A good place to mark them as safe is the view object: It's the immediate
layer below Jinja templates and specifically made to provide strings
included by a Jinja2 template.

Relates to strictdoc-project#1920.
@stanislaw
Copy link
Collaborator

Bildschirmfoto 2024-11-09 um 23 31 46

Noticed another regression just now. I think it is really the last one 😄

haxtibal added a commit to haxtibal/strictdoc that referenced this issue Nov 10, 2024
For HTML escaping of the diff view we have to consider two things.

1. Diff input comes from two git checkouts of the project at specific
   revisions. The revisions sdocs are considered untrusted user input,
   could contain special characters and must be escaped.
2. After analyzing with difflib we add a bit HTML to colorize the
   output. This specific HTML fragments are trusted and safe.

Relates to strictdoc-project#1920.
haxtibal added a commit to haxtibal/strictdoc that referenced this issue Nov 10, 2024
For HTML escaping of the diff view we have to consider two things.

1. Diff input comes from two git checkouts of the project at specific
   revisions. The revisions sdocs are considered untrusted user input,
   could contain special characters and must be escaped.
2. After analyzing with difflib we add a bit HTML to colorize the
   output. This specific HTML fragments are trusted and safe.

Relates to strictdoc-project#1920.
@stanislaw
Copy link
Collaborator

I think, now we can close this one 😄 Thanks @haxtibal for supporting this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants