Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]Implement vocabulary and spelling linter in this repo #599

Closed
kolchfa-aws opened this issue Oct 2, 2024 · 8 comments · Fixed by #620
Closed

[FEATURE]Implement vocabulary and spelling linter in this repo #599

kolchfa-aws opened this issue Oct 2, 2024 · 8 comments · Fixed by #620
Labels
enhancement New feature or request

Comments

@kolchfa-aws
Copy link
Contributor

Is your feature request related to a problem?

We're currently working on generating documentation from this API spec. All changes to the text will be made in this repo. In the doc repo, we have added a Vale linter and a vocabulary specific to OpenSearch. We'd like to add a similar linter to this repo.

What solution would you like?

A linter to lint text in this repo and enforce spelling/grammar/punctuation rules.

What alternatives have you considered?

I have experimented with adding Vale itself to this repo. Vale treats yaml files as unsupported (default) files and lints line by line. As is, it produces lots of false positives. Some of those may be remedied by adding a pattern to TokenIgnores in .vale.ini, but I tried adding regexes for variables with underscores and camel case, and Vale still flagged those as errors.

Do you have any additional context?

Ideally, it would be great to add a linter that is designed to work with yaml files.

@dblock
Copy link
Member

dblock commented Oct 17, 2024

I attempted to add Vale in #620 and it does work. But I think it needs to become useful as the examples flagged seem to be mostly false positives using all the styles from documentation-website. Q: @kolchfa-aws is there a more minimal setup than copying something from documentation-website where we could begin to get value from vale? Where should I start?

Screenshot 2024-10-17 at 1 51 42 PM

@kolchfa-aws
Copy link
Contributor Author

That's what I meant: it flags all variables with underscores, etc. Normally, you can tell Vale to ignore those by adding a regex rule to TokenIgnores in .vale.ini. It works on the documentation site because Vale is meant to work with Markdown, but it has no effect on yaml files (because ignore patterns are only supported for a handful of markup languages, not yaml). See Vale docs. A minimal setup will not help here (but can be achieved by just leaving Vale.Spelling and Vale.Terms in .vale.ini and setting all rules starting with OpenSearch to NO).

@dblock
Copy link
Member

dblock commented Oct 17, 2024

@kolchfa-aws I opened errata-ai/vale#911, you obviously know more about this, add your comments there of what Vale could/would do for our use-case.

@kolchfa-aws
Copy link
Contributor Author

Found a similar issue in errata-ai that I linked here

@dblock
Copy link
Member

dblock commented Oct 21, 2024

I wrote a simple preprocessor (still in #620) that turns YAML into text without changing the position of the text with better results.

Screenshot 2024-10-21 at 11 33 38 AM

It's a little hacky I suppose, but seems to work well. WDYT @kolchfa-aws @nhtruong @Xtansia?

@kolchfa-aws
Copy link
Contributor Author

This works! Does the preprocessor make Vale work only on description strings? We only want to lint those, I think. Also, some of the files can be pruned. For example, you don't need some rules from the doc repo (like any rules that start with "Heading" or "Links", "MergeConflicts", "StackedHeadings", or "TableHeadings") or the tests directory. Does this flag words with underscores? Because in descriptions, we want something like status_exception to be flagged because it must be surrounded by tic marks.

@nhtruong
Copy link
Collaborator

@dblock that looks greate!

@dblock
Copy link
Member

dblock commented Oct 21, 2024

This works! Does the preprocessor make Vale work only on description strings? We only want to lint those, I think.

We will want to lint titles as well, but leaving this for a future PR.

Also, some of the files can be pruned. For example, you don't need some rules from the doc repo (like any rules that start with "Heading" or "Links", "MergeConflicts", "StackedHeadings", or "TableHeadings") or the tests directory.

I got rid of tests, but I'd prefer to keep all the rules to make updates easier by copying from the project-website. Headings will be useful for titles, too.

Does this flag words with underscores? Because in descriptions, we want something like status_exception to be flagged because it must be surrounded by tic marks.

I don't know (yet). Let's see in a future update.

Post your comments on #620, I'll iterate to green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants