Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finish spec-driven-development of the document_structure page #31

Merged
merged 1 commit into from
Jan 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ The project is in its infancy as of December 2023 and in **no way ready to use.*

You're welcome to follow along and contribute with the understanding that I may or may not drive this project a mature (1.0) release.

## Known limitations

* Parsing UTF-16 content is not supported. (UTF-16 documents must be re-encoded to UTF-8 prior to parsing with this crate.) I have no plans to support UTF-16 content.

## License

The `asciidoc-parser` crate is distributed under the terms of both the MIT license and the Apache License (Version 2.0).
Expand Down
6 changes: 6 additions & 0 deletions src/document/document.rs
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,12 @@ impl<'a> Document<'a> {
///
/// Note that the document references the underlying source string and
/// necessarily has the same lifetime as the source.
///
/// **IMPORTANT:** The AsciiDoc language documentation states that UTF-16
/// encoding is allowed if a byte-order-mark (BOM) is present at the
/// start of a file. This format is not directly supported by the
/// `asciidoc-parser` crate. Any UTF-16 content must be re-encoded as
/// UTF-8 prior to parsing.
pub fn parse(source: &'a str) -> Result<Self, Error> {
// TO DO: Add option for best-guess parsing?

Expand Down
126 changes: 72 additions & 54 deletions src/tests/asciidoc_lang/root/document_structure.rs
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,9 @@ mod lines {

assert_eq!(attr.value(), TAttributeValue::Value("value more value"));
}

// No test cases:

// Empty lines can also be significant.
// A single empty line separates the header from the body.
// Many blocks are also separated by an empty line, as you saw in the two
Expand All @@ -399,57 +402,72 @@ mod lines {
// Keep these points in mind as you're learning about the AsciiDoc syntax.
}

// == Blocks

// Blocks in an AsciiDoc document lay down the document structure.
// Some blocks may contain other blocks, so the document structure is inherently
// hierarchical (i.e., a tree structure). You can preview this section
// structure, for example, by enabling the automatic table of contents. Examples
// of blocks include paragraphs, sections, lists, delimited blocks, tables, and
// block macros.

// Blocks are easy to identify because they're usually offset from other blocks
// by an empty line (though not always required). Blocks always start on a new
// line, terminate at the end of a line, and are aligned to the left margin.

// Every block can have one or more lines of block metadata.
// This metadata can be in the form of block attributes, a block anchor, or a
// block title. These metadata lines must be above and directly adjacent to the
// block itself.

// Sections, non-verbatim delimited blocks, and AsciiDoc table cells may contain
// other blocks. Despite the fact that blocks form a hierarchy, even nested
// blocks start at the left margin. By requiring blocks to start at the left
// margin, it avoids the tedium of having to track and maintain levels of
// indentation and makes the content more reusable.

// == Text and inline elements

// Surrounded by the markers, delimiters, and metadata lines is the text.
// The text is the main focus of a document and the reason the AsciiDoc syntax
// gives it so much room to breathe. Text is most often found in the lines of a
// block (e.g., paragraph), the block title (e.g., section title), and in list
// items, though there are other places where it can exist.

// Text is subject to substitutions.
// Substitutions interpret markup as text formatting, replace macros with text
// or non-text elements, expand attribute references, and perform other sorts of
// text replacement.

// Normal text is subject to all substitutions, unless specified otherwise.
// Verbatim text is subject to a minimal set of substitutions to allow it to be
// displayed in the output as it appears in the source. It's also possible to
// disable all substitutions in order to pass the text through to the output
// unmodified (i.e., raw). The parsing of text ends up being a mix of inline
// elements and other forms of transformations.

// == Encodings and AsciiDoc files

// An AsciiDoc file is a text file that has the _.adoc_ file extension (e.g.,
// [.path]_document.adoc_). Most AsciiDoc processors assume the text in the file
// uses UTF-8 encoding. UTF-16 encodings are supported only if the file starts
// with a BOM.

// An AsciiDoc processor can process AsciiDoc from a string (i.e., character
// sequence). However, most of the time you'll save your AsciiDoc documents to a
// file.
mod blocks {
// No test cases:

// == Blocks

// Blocks in an AsciiDoc document lay down the document structure.
// Some blocks may contain other blocks, so the document structure is
// inherently hierarchical (i.e., a tree structure). You can preview
// this section structure, for example, by enabling the automatic table
// of contents. Examples of blocks include paragraphs, sections, lists,
// delimited blocks, tables, and block macros.

// Blocks are easy to identify because they're usually offset from other
// blocks by an empty line (though not always required). Blocks always
// start on a new line, terminate at the end of a line, and are aligned
// to the left margin.

// Every block can have one or more lines of block metadata.
// This metadata can be in the form of block attributes, a block anchor, or
// a block title. These metadata lines must be above and directly
// adjacent to the block itself.

// Sections, non-verbatim delimited blocks, and AsciiDoc table cells may
// contain other blocks. Despite the fact that blocks form a hierarchy,
// even nested blocks start at the left margin. By requiring blocks to
// start at the left margin, it avoids the tedium of having to track and
// maintain levels of indentation and makes the content more reusable.
}

mod text_and_inline_elements {
// No test cases:

// == Text and inline elements

// Surrounded by the markers, delimiters, and metadata lines is the text.
// The text is the main focus of a document and the reason the AsciiDoc
// syntax gives it so much room to breathe. Text is most often found in
// the lines of a block (e.g., paragraph), the block title (e.g.,
// section title), and in list items, though there are other places
// where it can exist.

// Text is subject to substitutions.
// Substitutions interpret markup as text formatting, replace macros with
// text or non-text elements, expand attribute references, and perform
// other sorts of text replacement.

// Normal text is subject to all substitutions, unless specified otherwise.
// Verbatim text is subject to a minimal set of substitutions to allow it to
// be displayed in the output as it appears in the source. It's also
// possible to disable all substitutions in order to pass the text
// through to the output unmodified (i.e., raw). The parsing of text
// ends up being a mix of inline elements and other forms of
// transformations.
}

mod encodings_and_asciidoc_files {
// == Encodings and AsciiDoc files

// An AsciiDoc file is a text file that has the _.adoc_ file extension
// (e.g., [.path]_document.adoc_). Most AsciiDoc processors assume the
// text in the file uses UTF-8 encoding. .[line-through]#UTF-16
// encodings are supported only if the file starts with a BOM.#
// *UNSUPPORTED: The UTF-16 encoding is not directly supported by the
// `asciidoc-parser` crate.*

// An AsciiDoc processor can process AsciiDoc from a string (i.e., character
// sequence). However, most of the time you'll save your AsciiDoc documents
// to a file.
}
Loading