The Librarian model contains the definitions of the core concepts that are needed to describe software libraries and control flow graphs (CFGs) that call parts of those libraries. The model consists of three layers:
- Concepts are the fundamental building block of the model. A concept describes any entity that can be found in a library or a CFG, e.g. a function, a datatype or the call to a function.
- Paradigms are a combination of a collection of concepts that are relevant for the paradigm and an optional collection of concept instances that can always be found in the paradigm. Examples for paradigms are object-orientation, functional programming or logic programming.
- Ecosystems are a special type of paradigm. They represent a particular combination of paradigms, syntax and tooling that make up a language ecosystem. Python for example has concepts from the object-oriented and functional paradigm, represents those concepts via the syntax of the Python language and allows executing CFGs expressed in that syntax via the Python interpreter. All the information required to represent and execute a program in a given ecosystem is encoded in the definition of an ecosystem model.
Librarian comes with a collection of general purpose concepts that can be found across many paradigms and ecosystems. It also comes with basic paradigm definitions for object-oriented and functional programming. Lastly it also provides an ecosystem for Python.
In the following sections we will first describe how to define such components, then the mentioned builtin components are described in detail.
The model offers three macros to define concepts, paradigms and ecosystem:
defconcept
, defparadigm
and defecosystem
.
They can be found in the librarian.model.syntax
namespace.
(defconcept name [optional vector of parent concepts ...]
:attributes { ... A datascript schema for the concept ... }
:preprocess { ... A map from attributes to preprocessor functions ... }
:postprocess A postprocessor function
:spec A Clojure spec to validate concept instances)
Defines a new concept with name name
in the current namespace.
The concept is described by a sequence of key-value pairs.
A concept description can contain the following pairs, all of which are optional:
:attributes
: A map describing datascript attributes, i.e. a database schema for the concept.:preprocess
: A map from attributes to preprocessor functions for those attributes. Useful to mirror attribute values.:postprocess
: A function that takes a datascript database and the id of an instance of this concept. The function returns a datascript transaction that should be executed as part of the transaction that adds the given concept. Useful to compute derived attributes for concept instances.:spec
: A Clojure spec that should be used to check the validity of supposed instances of this concept.
In addition to the key-value pairs the concept description can be preceded by a vector of concepts that the newly defined concept should inherit from.
Example:
(require '[librarian.model.syntax :refer [defconcept]]
'[clojure.spec.alpha :as s]
'[librarian.helpers.spec :as hs]
'[my.other.concepts :refer [parent-concept1 other-concept]])
(defconcept parent-concept2) ; Useless, but allowed.
(defconcept my-concept [parent-concept1 parent-concept2]
:attributes {::x {:db/doc "A test attribute."}
::y {:db/valueType :db.type/ref
:db/doc "A reference to another concept."}}
:postprocess (fn [db id] [:db/add id ::x 42]) ; All instances get x=42 auto-assigned.
:spec ::my-concept)
(s/def ::my-concept (hs/entity-keys :req [::x ::y]))
(s/def ::x int?)
(s/def ::y (hs/instance? other-concept))
It is strongly recommended that all concept attributes are fully-qualified keywords (::attribute
instead of :attribute
) to prevent accidental collisions with other concepts.
It is also recommended to create a separate namespace for each concept (not like the example where parent-concept2
and my-concept
are defined in a single namespace).
(defparadigm name [optional vector of parent paradigms ...]
:concepts { ... A map of concept aliases to concepts ... }
:builtins [ ... A collection of concept instance descriptions ... ])
Defines a new paradigm with name name
in the current namespace.
The name is followed by an optional vector of paradigms that should be included in the new paradigm.
Then a sequence of key-value pairs follows:
:concepts
: A map of unnamespaced concept alias keywords to concepts.:builtins
: A collection of builtin concept instances in the defined paradigm. The predefined instances should be created vialibrarian.model.syntax/instanciate
. Builtins are intended to define things like the globalObject
class in Java.
Example:
(require '[librarian.model.syntax :refer [defparadigm instanciate]]
'[my.concepts :refer [my-concept]]
'[my.other.concepts :refer [other-concept]]
'[my.other.paradigms :refer [parent-paradigm]])
(defparadigm my-paradigm [parent-paradigm]
:concepts {:my-concept my-concept
:other-concept other-concept}
:builtins [(instanciate my-concept
:y (instanciate other-concept
:foo "bar"))])
In this example a paradigm with two concepts is created.
Instances of both concepts are defined as builtins.
See the docstring of the instanciate
function for details on how it works.
Note: All instances described via instanciate
will be processed via the correspondig concept's :preprocess
and :postprocess
functions as well as validated via :spec
to prevent the addition of inconsistent or invalid builtins.
The same processing and validation steps are also performed by the scraper.
The concept aliases :my-concept
and :other-concept
assign a shorthand name for each concept that is relevant in a particular paradigm.
Aliases simplify the specification of scraper configurations and initial generator states.
Without aliases, each concept would have to be referenced using its fully qualified name, e.g. :my.concepts/my-concept
.
Using aliases one can write :my-concept
instead.
Another purpose of aliases is to simplify the attribute syntax:
; Assuming my-concept has properties x, y; parent-concept has property z.
; The full attribute names would be:
:my.concepts/x :my.concepts/y :my.other.concepts/z
; If my-concept extends other-concept, instances have all three attributes.
; Attribute aliases simplify referring to those attributes:
:my-concept/x :my-concept/y :my-concept/z
; Thus the original place of definition of an attribute is not necessary to refer to it.
Lastly aliases are also useful to refer to a concept via multiple names, e.g. a namespace
concept could be aliased to :package
in a Java environment and to :module
in a Python environment.
(defecosystem name [optional vector of parent paradigms or ecosystems ...]
:concepts { ... A map of concept aliases to concepts ... }
:builtins [ ... A collection of concept instance descriptions ... ]
:generate A generator function
:executor An executor factory)
Defines a new ecosystem with name name
in the current namespace.
Similar to defparadigm
but accepts additional key-value pair types:
:generate
: A function that takes a metadata map and a database containing an executable CFG and that returns a snippet of executable code for the ecosystem.:executor
: Similar to the generator defined above but returns a function that executes the code snippet and returns the result of the execution instead of simply returning the code snippet string.
Ecosystems are essentially paradigms with support for some specific syntax via the :generator
and :executor
functions.
They typically also have a much more extensive set of :builtins
than paradigms.
The Librarian model comes with a collection of builtin components that are described next.
Concepts that are useful across multiple programming paradigms and languages are defined in the librarian.model.concepts
namespace.
Now follows an overview of those concepts.
For each concept the list of its attributes and parent concepts is given.
Legend:
- Derived attributes will be automatically computed via a preprocessor or postprocessor and should not be manually provided.
- Indexed attributes allow a fast reverse lookup of the entities having a given attribute value. By default only the forward direction is indexed.
- Unique attributes are indexed and also guarantee that a reverse lookup will find at most one entity for any given value.
Names & Positions | ||||
---|---|---|---|---|
Concept | Attribute | Type / Cardinality / Index | Description | |
named
|
A named entity. | |||
name |
String, indexed | Name of the entity. | ||
namespace extends named
|
A namespace. | |||
id |
Derived from name , unique |
Unique id of the namespace. | ||
member |
Ref to namespaced , multiple (0..n), indexed |
A member of the namespace. | ||
namespaced extends named
|
A namespace member. | |||
id |
Derived vector, unique | Fully-qualified name of the member: [namespace-name member-name] |
||
positionable
|
Something with an ordinal position. | |||
position |
Integer, optional (0..1) | Ordinal position of the entity. | ||
Datatypes | ||||
Concept | Attribute | Type / Cardinality / Index | Description | |
datatype
|
A datatype. | |||
datatype |
Ref to datatype , multiple (0..n), indexed |
A supertype of the datatype. | ||
basetype extends named , datatype
|
A basic type (like int or boolean ).
|
|||
id |
Derived from name , unique |
Unique id of the basetype. | ||
semantic-type extends positionable , datatype
|
A datatype representing all values that have a certain semantic.
Semantic types are fuzzy since their semantic is described via natural language.
They can be ordered via position if the semantic value s for a key represent some sequence, e.g. a sequence of paragraphs.
|
|||
key |
String, optional (0..1) | A context for the semantic value , e.g. "description" or "unit" |
||
value |
String | A string describing the semantics of the type. | ||
role-type extends datatype
|
Role types represent the set of values that can take a certain role.
The role type with id :dataset for example could represent all training dataset arrays.
While role types describe some kind of semantic, similar to semantic-type , they are not fuzzy and are assumed to have a clearly defined meaning.
|
|||
id |
Keyword, unique | Unique id of the role type. | ||
typed
|
A concept with datatypes. The datatype of an entity with multiple types is the union type of those types. | |||
datatype |
Ref to datatype , multiple (0..n), indexed |
A datatype of the concept. | ||
Callables | ||||
Concept | Attribute | Type / Cardinality / Index | Description | |
callable extends typed
|
Represents something that can be called with parameters and returns results. It is typed so that semantic and role information can be attached to it. | |||
parameter |
Ref to parameter , multiple (0..n), indexed |
A parameter of the callable. | ||
result |
Ref to result , multiple (0..n), indexed |
A returned result of the callable. | ||
io-container extends named , typed , positionable
|
Represents an input or output (parameter or result) of a callable. | |||
No attributes. | ||||
parameter extends io-container , data-receiver
|
Represents a parameter of a callable. | |||
optional |
Boolean, optional (default: false ) |
Denotes whether this parameter is optional. | ||
result extends io-container , data-receiver
|
Represents a returned result of a callable. | |||
No attributes. | ||||
Control Flow Graph Nodes | ||||
Concept | Attribute | Type / Cardinality / Index | Description | |
call extends typed
|
Represents a call to some callable .
|
|||
callable |
Ref to callable |
The callable of this call. | ||
parameter |
Ref to call-parameter , multiple (0..n), indexed |
A parameter of this call. | ||
result |
Ref to call-result , multiple (0..n), indexed |
A result of this call. | ||
data-receivable
|
Something that can be received by a data-receiver .
|
|||
No attributes. | ||||
data-receiver extends data-receivable
|
A concept that can receive a value from some data-receivable .
A receiver either has or receives some value to which it can optionally also get some additional semantic information from the outside.
|
|||
receives |
Ref to data-receivable , multiple (0..n), indexed |
A receivable from which this receiver gets its value and thus has to be able to accept the datatype of the received value. | ||
receives-semantic |
Ref to data-receivable , multiple (0..n), indexed |
A receivable from which this receiver gets the semantic-type s of the value it holds. |
||
call-parameter extends typed , positionable , data-receiver
|
Represents a parameter of a call .
|
|||
parameter |
Ref to parameter |
The parameter for which this call-parameter provides a value. |
||
call-result extends typed , positionable , data-receiver
|
Represents a result of a call .
|
|||
result |
Ref to result |
The result that provides the value for this call-result . |
||
constant extends typed , datatype , data-receivable
|
Represents a constant value that can be received by call-parameters .
The constant concept is implemented as a typed datatype, where a constant is its own instance.
This was done to be able to represent enum types as disjunctions of constants (disjunctions are however not yet supported).
|
|||
value |
String or integer or boolean | The value of the constant. | ||
snippet
|
Represents a code snippet/template as a partial CFG. A snippet is a concept that points to the CFG nodes that make up its partial CFG. | |||
value |
Ref to a CFG-node or any concept with a truthy :placeholder attribute |
A control-flow concept that is part of the snippet. |
The model comes with three builtin paradigms.
A universal paradigm of concepts that are common in many paradigms.
Concept Aliases:
:named
:named
:namespace
:namespace
:namespaced
:namespaced
:datatype
:datatype
:basetype
:basetype
:semantic-type
:semantic-type
:role-type
:role-type
:typed
:typed
:callable
:callable
:io-container
:io-container
:parameter
:parameter
:result
:result
:call
:call
:data-receiver
:data-receiver
:call-parameter
:call-parameter
:call-result
:call-result
:constant
:constant
:snippet
:snippet
No additional concepts or builtin instances are defined.
A paradigm for functional languages.
Additional Concept Aliases:
:function
:function
Functional Concepts | ||||
---|---|---|---|---|
Concept | Attribute | Type / Cardinality / Index | Description | |
function extends namespaced , callable
|
A function. | |||
No attributes. |
No builtin instances are defined.
A paradigm for object oriented languages.
Additional Concept Aliases:
:class
:class
:constructor
:constructor
:method
:method
OOP Concepts | ||||
---|---|---|---|---|
Concept | Attribute | Type / Cardinality / Index | Description | |
class extends typed , namespaced , datatype
|
A class. | |||
constructor |
Ref to constructor , multiple (1..n), indexed |
Constructor of the class. | ||
method |
Ref to method , multiple (0..n), indexed |
Method of the class. | ||
constructor extends callable
|
A constructor of a class. | |||
No attributes. | ||||
method extends named , callable
|
A method of a class. | |||
No attributes. |
No builtin instances are defined.
The model provides its builtin ecosystems via the librarian.model.core/ecosystems
map:
{:python python}
Every ecosystem has a keyword alias with which it can be referenced in scraper configuration files.
Currently only an ecosystem for Python (:python
) is provided.
An ecosystem for Python.
Additional Concept Aliases:
:class
:python/class
(overridesclass
):constructor
:python/constructor
(overridesconstructor
):basetype
:python/basetype
(overridesbasetype
)
Python Concepts | ||||
---|---|---|---|---|
Concept | Attribute | Type / Cardinality / Index | Description | |
python/class extends class
|
A Python class.
Like class but can only have a single constructor and automatically recognizes methods named __init__ as its constructor.
|
|||
No attributes. | ||||
python/constructor extends constructor
|
Like constructor but with a unique reference to its class.
|
|||
class |
Derived ref to python/class , unique |
A reference to the constructor's class. In Python this uniquely identifies a constructor. | ||
python/basetype extends basetype
|
Like basetype but only allows the Python basetype names:
"object", "int", "float", "complex", "string", "boolean".
|
|||
No attributes. |
Builtin Instances:
basetype
instances:int
,float
,complex
,string
,boolean
which all extendobject
.- Typecasting
function
s:str(x)
:object -> string
int(x)
:object -> int
float(x)
:object -> float
Other Python builtins can be added when needed.