Source code is a sequence of Unicode code points. It is strongly recommended to keep source code in UTF-8 encoded text files. The compiler supports different encodings through the -encoding
command line option, though.
# use default encoding of the java platform (dangerous!) java -jar fregec.jar -encoding DEFAULT your.fr # use russian encoding (not tested, for lack of keyboard) java -jar fregec.jar -encoding KOI8_R your.fr # use UTF-8 (the only sensible decision) java -jar fregec.jar your.fr
Note
|
With default encoding, it is likely that your source code will not compile on different platforms if it contains any non-ASCII characters, because those characters are in some obscure proprietary encoding nobody else on the planet does understand. (Windows users, I’m looking at you!). |
Note
|
The eclipse plugin for Frege does support UTF-8 only. |
The lexical analyzer recognizes lexemes, or tokens, as described below. It turns a sequence of characters into a sequence of tokens.
Whitespace can appear in arbitrary quantity between two tokens. However, whitespace at the beginning of a line is syntactically relevant in Layout mode.
Two tokens must be separated by whitespace if the second token is a valid suffix of the first one.
v10 -- one token v 10 -- two tokens
Ordinary comments can appear everywhere whitespace can.
-- extends to the end of the line
{- extends {- to matching -} closing -}
Block comments do nest and can extend over multiple lines. They can be used to separate tokens.
v{-separates!-}10
Looks like a comment at first sight, but is a syntactical element.
{--
First paragraph.
Second paragraph.
-}
--- this is a single paragraph of documentation
Documentation text may appear
-
before the module header, multiple documentation texts may appear.
-
before a (non local) declaration of a function, variable, type class, data type, instance or type class. Syntactically, documentation text is treated here like a definition. This means
-
in Layout mode, documentation text must start with the same indentation as any other definition
-
in explicit mode, documentation text and the next definition must be separated with semicolon.
-
-
immediately before a constructor is introduced, that is, before the constructor name.
-
immediately after the constructor declaration
-
after a field declaration, where it can stand in place of a comma.
--- This is a nice module.
--- Have fun!
module Nice where
--- Complex numbers
{--
This is merely a specialised tuple that holds two
double numbers.
An infix constructor ':+:' is available.
-}
data Complex =
{-- The constructor is also named @Complex@,
I simply had no better idea. Sorry. -}
Complex {
!re :: Double --- real part
!im :: Double --- imaginary part
}
--- construct a 'Complex' number
a :+: b = Complex a b
Note
|
Let bound definitions can not have documentation, as they are not accessible from outside the module. |
Multiple adjacent documentation texts are joined with a paragraph break inbetween. The documentation extracted from a sequence of documentationn texts is attached to the documented item.
You write documentation with an easy Documentation Text Markup language.
You should write documentation texts because
-
a module documentation can be generated from your compiled module
-
the eclipse plugin can show the documentation of your functions on mouse hover and in code completion suggestions when your module is imported.
-
the Hoogle database for your module can make use of documentation text.
17 0x11 021 -- int value 17 in decimal, hexadecimal and octal 5_483_438 -- trailing groups of 3 digits can be separated 123L 217l -- long values (like in java) 5467n 7654N -- big integer literal
1.34e-17 1.0d 0D -- double literals 0.2F 2.0f -- float literal
Either one or more of the fractional part, the exponent or the suffix characters D
, F
, d
or f
may be omitted, but not all (this would result in an integer literal).
true false -- like in Java True False -- like in Haskell
'a' '\'' -- apostrophe '\\' -- backslash '\u2200' -- the character '∀' '\n' -- newline '\012' -- yet another newline
All escape characters/sequences allowed in Java are also allowed in Frege. Character literals are 16-bit quantities, like in Java. This means that Unicode code points above 0xffff are not characters in Java and Frege.
"like in Java" "𝕲𝖔𝖙𝖙𝖑𝖔𝖇"
Strings can contain any unicode characters. However, code points from the higher plane are encoded as a surrogate pairs.
´^foo\\´ -- "foo" at the start followed by backslash '(foo|bar)' -- upright quotes ok when more than 1 char '(?:)x' -- trick: same as ´x´
Regular expressions can be given as literals. They are checked for validity at compile time. No backslash duplications is needed, as is the case when one specifies them as string in Java.
The first example above corresponds to the following Java code:
final public static java.util.regex.Pattern p =
java.util.regex.Pattern.compile("^foo\\\\");
where p
is some fresh name the compiler uses internally.
A quoted construct in upright quotes is interpreted as regular expression literal when it can’t possibly be a character. This is for the convenience of those that don’t have acute accent marks on their keyboard.
Withing regular expression literals, escape sequences follow the regular expression syntax. For example \b
is a word break in regular expressions, but a backspace in strings.
The following characters are separators and have certain syntactic meanings
{ } [ ] ( ) , ;
abstract case class data default derive deriving do
else false forall foreign if import in
infix infixl infixr
instance interface let module native newtype of
package private protected public
then throws true type where
= | \
-> .. :: <- =>
→ … ∷ ← ⇒ ∀
The last line lists some Unicode symbols that have the same meaning as the 2-character ascii symbols above them. The ∀
has the same meaning as forall
.
The following are keywords only when the next token is the keyword native
pure mutable
Any sequence of characters that doesn’t contain separators, quotes, apostrophes, acute/grave accent marks, letters, digits or whitespace is a lexical operator, unless it is a keyword. Operators are used to form infix expressions.
When recognizing operators, the lexer considers the longest sequence of operator characters available. Symbolic keywords are not recognized when they appear as subsequence of an operator.
::* -- operator :: * -- double colon, operator
This provides enormous symbolic freedom for user defined operators.
In addition, a variable or data constructor can be turned into an operator by enclosing its name in acute accent marks:
f `fmap` xs -- the fmap function used as operator
Are used to name functions, variables, type variables and fields.
_foo _Foo foo foo' f2o__o'' f'o'o'
Variable names start with a lowercase letter or an underscore and may be followed by arbitrary many letters, digits, apostrophes and underscores.
A sole underscore is a variable name reserved for use in pattern matching, where it indicates an unused value.
For the purpose of Frege, all letters that are not uppercase letters are counted as lowercase.
Are used to name namespaces, types, type classes and data constructors. Also, the last component of a module name must lexically be a constructor name.
Such a name starts with an uppercase letter, which may be followed by an arbitrary number of letters, digits, apostrophes and underscores.
Namespaces can have the same name as types or type classes. Data constructors can have the same name as namespaces, types or type classes.
module Foo where
data Foo = Foo
Editors for Frege should highlight or colour constructor names in such a way that they are easily distinguished.
A data constructor, variable or operator can be qualified by a namespace, a type name or by a namespace and a type name. Namespace and type name must be given as qualifiers, that is, they must be immediately followed by a dot.
Foo . bar -- not a qualified name Foo.bar -- a qualified name Foo. bar -- the same Mod.Typ.### -- fully qualified operator
A sequence of names, separated by dots. The last part must be a construtor name. Since this will be the fully qualified name of the Java class that is generated for this module, it is expected that the name follows Java customs. See also Module Names.
Like in Haskell, Frege code can be written using blocks delimitted by curly braces, where subsequent definitions are separated by semicolons.
In fact, this is the language the parser understands. The so-called layout feature allows omission of those braces and semicolons, by inferring their positions based on the indentation of the program text and inserting them as needed before parsing.
Note
|
Syntax diagrams will always show the explicit syntax with braces and semicolons. |
Informally stated, the braces and semicolons are inserted as follows.
The layout (or ”offside”) rule takes effect whenever the open brace is omitted after the keyword where
, let
, do
, or of
.
When this happens, the indentation of the next lexeme (whether or not on a new line) is remembered and the omitted open brace is inserted (the whitespace preceding the lexeme may include comments). If the next lexeme is not more indented than the current indentation level, an additional closing brace is inserted.
For each subsequent line, if it contains only whitespace or is indented more, then the previous item is continued (nothing is inserted); if it is indented the same amount, then a new item begins (a semicolon is inserted); and if it is indented less, then the layout list ends (a close brace is inserted).
The layout rule matches only those open braces that it has inserted; an explicit open brace must be matched by an explicit close brace. Within these explicit open braces, no layout processing is performed for constructs outside the braces, even if a line is indented to the left of an earlier implicit open brace.
module Foo where
bar = 1
baz = bar + x
where
x = y+2
y = bar*5
becomes
module Foo where
{bar = 1
;baz = bar + x
where
{x = y+2
;y = bar*5
}
}
The following markup is supported by the documentation tool and the eclipse plugin:
*bold* _italic_ @monospaced@ 'reference'
A reference is the (possibly qualified) name of a frege type, function, etc. This should turn to a hyperlink when processed. The reference will be resolved in the context of the module that contains the comment. What this means is that reference must be a name that would be valid on the toplevel of the module. If the name resolution fails, the text enclosed in the apostrophes will be shown in red color.
But sometimes one needs to reference some item from another module that is not imported. For this, the following syntax is possible:
'some.other.Package#something'
The validity of such a reference can not be checked, of course.
Finally, if one needs a generic link, it can be written like thus:
'http://projecteuler.net/index.php?section=problems&id=12 Euler probelm 12'
The part before the first space character is taken as URL, the rest is the text that will be shown.
An empty line serves as paragraph break. Special paragraphs are
# Header 1
## Header 2
### Header 3
> preformatted text (i.e. code examples)
> each line must start with ">"
- unordered list item
1. ordered list item
(2) ordered list item
[item] list item tagged with "item"
Paragraphs do not nest.