Skip to content

Latest commit

 

History

History
381 lines (257 loc) · 13.1 KB

lexical.adoc

File metadata and controls

381 lines (257 loc) · 13.1 KB

Lexical Matters

Source code

Source code is a sequence of Unicode code points. It is strongly recommended to keep source code in UTF-8 encoded text files. The compiler supports different encodings through the -encoding command line option, though.

Example
 # use default encoding of the java platform (dangerous!)
 java -jar fregec.jar -encoding DEFAULT your.fr

 # use russian encoding (not tested, for lack of keyboard)
 java -jar fregec.jar -encoding KOI8_R your.fr

 # use UTF-8 (the only sensible decision)
 java -jar fregec.jar  your.fr
Note
With default encoding, it is likely that your source code will not compile on different platforms if it contains any non-ASCII characters, because those characters are in some obscure proprietary encoding nobody else on the planet does understand. (Windows users, I’m looking at you!).
Note
The eclipse plugin for Frege does support UTF-8 only.

Tokens

The lexical analyzer recognizes lexemes, or tokens, as described below. It turns a sequence of characters into a sequence of tokens.

Whitespace

Whitespace can appear in arbitrary quantity between two tokens. However, whitespace at the beginning of a line is syntactically relevant in Layout mode.

Two tokens must be separated by whitespace if the second token is a valid suffix of the first one.

Greedy Lexing
v10           -- one token
v 10          -- two tokens

Comments

Ordinary comments can appear everywhere whitespace can.

Line Comment
-- extends to the end of the line
Block Comment
{- extends {- to matching -} closing -}

Block comments do nest and can extend over multiple lines. They can be used to separate tokens.

Two tokens separated by block comment
v{-separates!-}10

Documentation Text

Looks like a comment at first sight, but is a syntactical element.

Documentation Text
  {--
    First paragraph.

    Second paragraph.
  -}
  --- this is a single paragraph of documentation

Documentation text may appear

  • before the module header, multiple documentation texts may appear.

  • before a (non local) declaration of a function, variable, type class, data type, instance or type class. Syntactically, documentation text is treated here like a definition. This means

    • in Layout mode, documentation text must start with the same indentation as any other definition

    • in explicit mode, documentation text and the next definition must be separated with semicolon.

  • immediately before a constructor is introduced, that is, before the constructor name.

  • immediately after the constructor declaration

  • after a field declaration, where it can stand in place of a comma.

Example
--- This is a nice module.
--- Have fun!
module Nice where

--- Complex numbers
{--
  This is merely a specialised tuple that holds two
  double numbers.

  An infix constructor ':+:' is available.
-}
data Complex =
    {-- The constructor is also named @Complex@,
        I simply had no better idea. Sorry. -}
    Complex {
              !re :: Double        --- real part
              !im :: Double        --- imaginary part
            }

--- construct a 'Complex' number
a :+: b = Complex a b
Note
Let bound definitions can not have documentation, as they are not accessible from outside the module.

Multiple adjacent documentation texts are joined with a paragraph break inbetween. The documentation extracted from a sequence of documentationn texts is attached to the documented item.

You write documentation with an easy Documentation Text Markup language.

You should write documentation texts because

  • a module documentation can be generated from your compiled module

  • the eclipse plugin can show the documentation of your functions on mouse hover and in code completion suggestions when your module is imported.

  • the Hoogle database for your module can make use of documentation text.

Literals

Integer Literals
17 0x11 021       -- int value 17 in decimal, hexadecimal and octal
5_483_438         -- trailing groups of 3 digits can be separated
123L 217l         -- long values (like in java)
5467n 7654N       -- big integer literal
Floating Point Literals
1.34e-17 1.0d 0D  -- double literals
0.2F 2.0f         -- float literal

Either one or more of the fractional part, the exponent or the suffix characters D, F, d or f may be omitted, but not all (this would result in an integer literal).

Boolean Literals
true false        -- like in Java
True False        -- like in Haskell

Quoted constructs

Character Literals
'a'
'\''              -- apostrophe
'\\'              -- backslash
'\u2200'          -- the character '∀'
'\n'              -- newline
'\012'            -- yet another newline

All escape characters/sequences allowed in Java are also allowed in Frege. Character literals are 16-bit quantities, like in Java. This means that Unicode code points above 0xffff are not characters in Java and Frege.

String Literals
"like in Java"
"𝕲𝖔𝖙𝖙𝖑𝖔𝖇"

Strings can contain any unicode characters. However, code points from the higher plane are encoded as a surrogate pairs.

Regular Expression Literals
´^foo\\´          -- "foo" at the start followed by backslash
'(foo|bar)'       -- upright quotes ok when more than 1 char
'(?:)x'           -- trick: same as ´x´

Regular expressions can be given as literals. They are checked for validity at compile time. No backslash duplications is needed, as is the case when one specifies them as string in Java.

The first example above corresponds to the following Java code:

final public static java.util.regex.Pattern p =
    java.util.regex.Pattern.compile("^foo\\\\");

where p is some fresh name the compiler uses internally.

A quoted construct in upright quotes is interpreted as regular expression literal when it can’t possibly be a character. This is for the convenience of those that don’t have acute accent marks on their keyboard.

Withing regular expression literals, escape sequences follow the regular expression syntax. For example \b is a word break in regular expressions, but a backspace in strings.

Separators

The following characters are separators and have certain syntactic meanings

{ } [ ] ( ) , ;

Keywords

 abstract case class data default derive deriving do
 else false forall foreign if import in
 infix infixl infixr
 instance interface let module native newtype of
 package private protected public
 then throws true type where

 = | \
 -> .. :: <- =>
 →  …  ∷  ←  ⇒   ∀

The last line lists some Unicode symbols that have the same meaning as the 2-character ascii symbols above them. The has the same meaning as forall.

The following are keywords only when the next token is the keyword native

pure mutable

Operators

Any sequence of characters that doesn’t contain separators, quotes, apostrophes, acute/grave accent marks, letters, digits or whitespace is a lexical operator, unless it is a keyword. Operators are used to form infix expressions.

When recognizing operators, the lexer considers the longest sequence of operator characters available. Symbolic keywords are not recognized when they appear as subsequence of an operator.

::*           -- operator
:: *          -- double colon, operator

This provides enormous symbolic freedom for user defined operators.

In addition, a variable or data constructor can be turned into an operator by enclosing its name in acute accent marks:

f `fmap` xs         -- the fmap function used as operator

Variable Names

Are used to name functions, variables, type variables and fields.

_foo _Foo foo foo' f2o__o'' f'o'o'

Variable names start with a lowercase letter or an underscore and may be followed by arbitrary many letters, digits, apostrophes and underscores.

A sole underscore is a variable name reserved for use in pattern matching, where it indicates an unused value.

For the purpose of Frege, all letters that are not uppercase letters are counted as lowercase.

Constructor Names

Are used to name namespaces, types, type classes and data constructors. Also, the last component of a module name must lexically be a constructor name.

Such a name starts with an uppercase letter, which may be followed by an arbitrary number of letters, digits, apostrophes and underscores.

Namespaces can have the same name as types or type classes. Data constructors can have the same name as namespaces, types or type classes.

A not so extreme example
  module Foo where

  data Foo = Foo

Editors for Frege should highlight or colour constructor names in such a way that they are easily distinguished.

Qualifier

A constructor name immediately followed by . This is used to form qualified names.

Qualified Names

A data constructor, variable or operator can be qualified by a namespace, a type name or by a namespace and a type name. Namespace and type name must be given as qualifiers, that is, they must be immediately followed by a dot.

Foo . bar         -- not a qualified name
Foo.bar           -- a qualified name
Foo. bar          -- the same
Mod.Typ.###       -- fully qualified operator

Module names

A sequence of names, separated by dots. The last part must be a construtor name. Since this will be the fully qualified name of the Java class that is generated for this module, it is expected that the name follows Java customs. See also Module Names.

Native names

A fully qualified name of some existing static method, class or interface. If this contains characters that are not allowed in Frege names (like $) or words that are keywords, it can be given as a string literal.

java.lang.String.charAt
"javafx.scene.control.TabPane$TabClosingPolicy"

Layout

Like in Haskell, Frege code can be written using blocks delimitted by curly braces, where subsequent definitions are separated by semicolons.

In fact, this is the language the parser understands. The so-called layout feature allows omission of those braces and semicolons, by inferring their positions based on the indentation of the program text and inserting them as needed before parsing.

Note
Syntax diagrams will always show the explicit syntax with braces and semicolons.

Informally stated, the braces and semicolons are inserted as follows.

The layout (or ”offside”) rule takes effect whenever the open brace is omitted after the keyword where, let, do, or of.

When this happens, the indentation of the next lexeme (whether or not on a new line) is remembered and the omitted open brace is inserted (the whitespace preceding the lexeme may include comments). If the next lexeme is not more indented than the current indentation level, an additional closing brace is inserted.

For each subsequent line, if it contains only whitespace or is indented more, then the previous item is continued (nothing is inserted); if it is indented the same amount, then a new item begins (a semicolon is inserted); and if it is indented less, then the layout list ends (a close brace is inserted).

The layout rule matches only those open braces that it has inserted; an explicit open brace must be matched by an explicit close brace. Within these explicit open braces, no layout processing is performed for constructs outside the braces, even if a line is indented to the left of an earlier implicit open brace.

Layout Examples 1
module Foo where

bar = 1
baz = bar + x
  where
     x = y+2
     y = bar*5

becomes

module Foo where

{bar = 1
;baz = bar + x
    where
      {x = y+2
      ;y = bar*5
      }
}

Documentation Text Markup

The following markup is supported by the documentation tool and the eclipse plugin:

*bold*  _italic_ @monospaced@ 'reference'

A reference is the (possibly qualified) name of a frege type, function, etc. This should turn to a hyperlink when processed. The reference will be resolved in the context of the module that contains the comment. What this means is that reference must be a name that would be valid on the toplevel of the module. If the name resolution fails, the text enclosed in the apostrophes will be shown in red color.

But sometimes one needs to reference some item from another module that is not imported. For this, the following syntax is possible:

'some.other.Package#something'

The validity of such a reference can not be checked, of course.

Finally, if one needs a generic link, it can be written like thus:

'http://projecteuler.net/index.php?section=problems&id=12 Euler probelm 12'

The part before the first space character is taken as URL, the rest is the text that will be shown.

An empty line serves as paragraph break. Special paragraphs are

    #   Header 1
    ##  Header 2
    ### Header 3
    > preformatted text (i.e. code examples)
    > each line must start with ">"

    - unordered list item
    1. ordered list item
    (2) ordered list item
    [item] list item tagged with "item"

Paragraphs do not nest.