Skip to content
This repository has been archived by the owner on Jul 26, 2023. It is now read-only.

Latest commit

 

History

History
774 lines (717 loc) · 24.2 KB

README.md

File metadata and controls

774 lines (717 loc) · 24.2 KB

librarian/model

The Librarian model contains the definitions of the core concepts that are needed to describe software libraries and control flow graphs (CFGs) that call parts of those libraries. The model consists of three layers:

  1. Concepts are the fundamental building block of the model. A concept describes any entity that can be found in a library or a CFG, e.g. a function, a datatype or the call to a function.
  2. Paradigms are a combination of a collection of concepts that are relevant for the paradigm and an optional collection of concept instances that can always be found in the paradigm. Examples for paradigms are object-orientation, functional programming or logic programming.
  3. Ecosystems are a special type of paradigm. They represent a particular combination of paradigms, syntax and tooling that make up a language ecosystem. Python for example has concepts from the object-oriented and functional paradigm, represents those concepts via the syntax of the Python language and allows executing CFGs expressed in that syntax via the Python interpreter. All the information required to represent and execute a program in a given ecosystem is encoded in the definition of an ecosystem model.

Librarian comes with a collection of general purpose concepts that can be found across many paradigms and ecosystems. It also comes with basic paradigm definitions for object-oriented and functional programming. Lastly it also provides an ecosystem for Python.

In the following sections we will first describe how to define such components, then the mentioned builtin components are described in detail.

1. Definition of Model Components

The model offers three macros to define concepts, paradigms and ecosystem: defconcept, defparadigm and defecosystem. They can be found in the librarian.model.syntax namespace.

1.1. Defining Concepts

(defconcept name [optional vector of parent concepts ...]
  :attributes { ... A datascript schema for the concept ... }
  :preprocess { ... A map from attributes to preprocessor functions ... }
  :postprocess A postprocessor function
  :spec A Clojure spec to validate concept instances)

Defines a new concept with name name in the current namespace. The concept is described by a sequence of key-value pairs. A concept description can contain the following pairs, all of which are optional:

  • :attributes: A map describing datascript attributes, i.e. a database schema for the concept.
  • :preprocess: A map from attributes to preprocessor functions for those attributes. Useful to mirror attribute values.
  • :postprocess: A function that takes a datascript database and the id of an instance of this concept. The function returns a datascript transaction that should be executed as part of the transaction that adds the given concept. Useful to compute derived attributes for concept instances.
  • :spec: A Clojure spec that should be used to check the validity of supposed instances of this concept.

In addition to the key-value pairs the concept description can be preceded by a vector of concepts that the newly defined concept should inherit from.

Example:

(require '[librarian.model.syntax :refer [defconcept]]
         '[clojure.spec.alpha :as s]
		 '[librarian.helpers.spec :as hs]
		 '[my.other.concepts :refer [parent-concept1 other-concept]])

(defconcept parent-concept2) ; Useless, but allowed.

(defconcept my-concept [parent-concept1 parent-concept2]
  :attributes {::x {:db/doc "A test attribute."}
               ::y {:db/valueType :db.type/ref
			        :db/doc "A reference to another concept."}}
  :postprocess (fn [db id] [:db/add id ::x 42]) ; All instances get x=42 auto-assigned.
  :spec ::my-concept)

(s/def ::my-concept (hs/entity-keys :req [::x ::y]))
(s/def ::x int?)
(s/def ::y (hs/instance? other-concept))

It is strongly recommended that all concept attributes are fully-qualified keywords (::attribute instead of :attribute) to prevent accidental collisions with other concepts. It is also recommended to create a separate namespace for each concept (not like the example where parent-concept2 and my-concept are defined in a single namespace).

1.2. Defining Paradigms

(defparadigm name [optional vector of parent paradigms ...]
  :concepts { ... A map of concept aliases to concepts ... }
  :builtins [ ... A collection of concept instance descriptions ... ])

Defines a new paradigm with name name in the current namespace. The name is followed by an optional vector of paradigms that should be included in the new paradigm. Then a sequence of key-value pairs follows:

  • :concepts: A map of unnamespaced concept alias keywords to concepts.
  • :builtins: A collection of builtin concept instances in the defined paradigm. The predefined instances should be created via librarian.model.syntax/instanciate. Builtins are intended to define things like the global Object class in Java.

Example:

(require '[librarian.model.syntax :refer [defparadigm instanciate]]
		 '[my.concepts :refer [my-concept]]
		 '[my.other.concepts :refer [other-concept]]
		 '[my.other.paradigms :refer [parent-paradigm]])

(defparadigm my-paradigm [parent-paradigm]
  :concepts {:my-concept my-concept
             :other-concept other-concept}
  :builtins [(instanciate my-concept
			   :y (instanciate other-concept
				    :foo "bar"))])

In this example a paradigm with two concepts is created. Instances of both concepts are defined as builtins. See the docstring of the instanciate function for details on how it works.

Note: All instances described via instanciate will be processed via the correspondig concept's :preprocess and :postprocess functions as well as validated via :spec to prevent the addition of inconsistent or invalid builtins. The same processing and validation steps are also performed by the scraper.

The concept aliases :my-concept and :other-concept assign a shorthand name for each concept that is relevant in a particular paradigm. Aliases simplify the specification of scraper configurations and initial generator states. Without aliases, each concept would have to be referenced using its fully qualified name, e.g. :my.concepts/my-concept. Using aliases one can write :my-concept instead. Another purpose of aliases is to simplify the attribute syntax:

; Assuming my-concept has properties x, y; parent-concept has property z.
; The full attribute names would be:
:my.concepts/x :my.concepts/y :my.other.concepts/z
; If my-concept extends other-concept, instances have all three attributes.
; Attribute aliases simplify referring to those attributes:
:my-concept/x :my-concept/y :my-concept/z
; Thus the original place of definition of an attribute is not necessary to refer to it.

Lastly aliases are also useful to refer to a concept via multiple names, e.g. a namespace concept could be aliased to :package in a Java environment and to :module in a Python environment.

1.3. Defining Ecosystems

(defecosystem name [optional vector of parent paradigms or ecosystems ...]
  :concepts { ... A map of concept aliases to concepts ... }
  :builtins [ ... A collection of concept instance descriptions ... ]
  :generate A generator function
  :executor An executor factory)

Defines a new ecosystem with name name in the current namespace. Similar to defparadigm but accepts additional key-value pair types:

  • :generate: A function that takes a metadata map and a database containing an executable CFG and that returns a snippet of executable code for the ecosystem.
  • :executor: Similar to the generator defined above but returns a function that executes the code snippet and returns the result of the execution instead of simply returning the code snippet string.

Ecosystems are essentially paradigms with support for some specific syntax via the :generator and :executor functions. They typically also have a much more extensive set of :builtins than paradigms.

2. Builtin Model Components

The Librarian model comes with a collection of builtin components that are described next.

2.1. Builtin General-Purpose Concepts

Concepts that are useful across multiple programming paradigms and languages are defined in the librarian.model.concepts namespace. Now follows an overview of those concepts. For each concept the list of its attributes and parent concepts is given.

Legend:

  • Derived attributes will be automatically computed via a preprocessor or postprocessor and should not be manually provided.
  • Indexed attributes allow a fast reverse lookup of the entities having a given attribute value. By default only the forward direction is indexed.
  • Unique attributes are indexed and also guarantee that a reverse lookup will find at most one entity for any given value.
Names & Positions
Concept Attribute Type / Cardinality / Index Description
named A named entity.
name String, indexed Name of the entity.
namespace extends named A namespace.
id Derived from name, unique Unique id of the namespace.
member Ref to namespaced, multiple (0..n), indexed A member of the namespace.
namespaced extends named A namespace member.
id Derived vector, unique Fully-qualified name of the member: [namespace-name member-name]
positionable Something with an ordinal position.
position Integer, optional (0..1) Ordinal position of the entity.
Datatypes
Concept Attribute Type / Cardinality / Index Description
datatype A datatype.
datatype Ref to datatype, multiple (0..n), indexed A supertype of the datatype.
basetype extends named, datatype A basic type (like int or boolean).
id Derived from name, unique Unique id of the basetype.
semantic-type extends positionable, datatype A datatype representing all values that have a certain semantic. Semantic types are fuzzy since their semantic is described via natural language. They can be ordered via position if the semantic values for a key represent some sequence, e.g. a sequence of paragraphs.
key String, optional (0..1) A context for the semantic value, e.g. "description" or "unit"
value String A string describing the semantics of the type.
role-type extends datatype Role types represent the set of values that can take a certain role. The role type with id :dataset for example could represent all training dataset arrays. While role types describe some kind of semantic, similar to semantic-type, they are not fuzzy and are assumed to have a clearly defined meaning.
id Keyword, unique Unique id of the role type.
typed A concept with datatypes. The datatype of an entity with multiple types is the union type of those types.
datatype Ref to datatype, multiple (0..n), indexed A datatype of the concept.
Callables
Concept Attribute Type / Cardinality / Index Description
callable extends typed Represents something that can be called with parameters and returns results. It is typed so that semantic and role information can be attached to it.
parameter Ref to parameter, multiple (0..n), indexed A parameter of the callable.
result Ref to result, multiple (0..n), indexed A returned result of the callable.
io-container extends named, typed, positionable Represents an input or output (parameter or result) of a callable.
No attributes.
parameter extends io-container, data-receiver Represents a parameter of a callable.
optional Boolean, optional (default: false) Denotes whether this parameter is optional.
result extends io-container, data-receiver Represents a returned result of a callable.
No attributes.
Control Flow Graph Nodes
Concept Attribute Type / Cardinality / Index Description
call extends typed Represents a call to some callable.
callable Ref to callable The callable of this call.
parameter Ref to call-parameter, multiple (0..n), indexed A parameter of this call.
result Ref to call-result, multiple (0..n), indexed A result of this call.
data-receivable Something that can be received by a data-receiver.
No attributes.
data-receiver extends data-receivable A concept that can receive a value from some data-receivable. A receiver either has or receives some value to which it can optionally also get some additional semantic information from the outside.
receives Ref to data-receivable, multiple (0..n), indexed A receivable from which this receiver gets its value and thus has to be able to accept the datatype of the received value.
receives-semantic Ref to data-receivable, multiple (0..n), indexed A receivable from which this receiver gets the semantic-types of the value it holds.
call-parameter extends typed, positionable, data-receiver Represents a parameter of a call.
parameter Ref to parameter The parameter for which this call-parameter provides a value.
call-result extends typed, positionable, data-receiver Represents a result of a call.
result Ref to result The result that provides the value for this call-result.
constant extends typed, datatype, data-receivable Represents a constant value that can be received by call-parameters. The constant concept is implemented as a typed datatype, where a constant is its own instance. This was done to be able to represent enum types as disjunctions of constants (disjunctions are however not yet supported).
value String or integer or boolean The value of the constant.
snippet Represents a code snippet/template as a partial CFG. A snippet is a concept that points to the CFG nodes that make up its partial CFG.
value Ref to a CFG-node or any concept with a truthy :placeholder attribute A control-flow concept that is part of the snippet.

2.2. Builtin Paradigms

The model comes with three builtin paradigms.

2.2.1. The common paradigm

A universal paradigm of concepts that are common in many paradigms.

Concept Aliases:

  • :named: named
  • :namespace: namespace
  • :namespaced: namespaced
  • :datatype: datatype
  • :basetype: basetype
  • :semantic-type: semantic-type
  • :role-type: role-type
  • :typed: typed
  • :callable: callable
  • :io-container: io-container
  • :parameter: parameter
  • :result: result
  • :call: call
  • :data-receiver: data-receiver
  • :call-parameter: call-parameter
  • :call-result: call-result
  • :constant: constant
  • :snippet: snippet

No additional concepts or builtin instances are defined.

2.2.2. The functional paradigm (extends common)

A paradigm for functional languages.

Additional Concept Aliases:

  • :function: function
Functional Concepts
Concept Attribute Type / Cardinality / Index Description
function extends namespaced, callable A function.
No attributes.

No builtin instances are defined.

2.2.3. The oo paradigm (extends common)

A paradigm for object oriented languages.

Additional Concept Aliases:

  • :class: class
  • :constructor: constructor
  • :method: method
OOP Concepts
Concept Attribute Type / Cardinality / Index Description
class extends typed, namespaced, datatype A class.
constructor Ref to constructor, multiple (1..n), indexed Constructor of the class.
method Ref to method, multiple (0..n), indexed Method of the class.
constructor extends callable A constructor of a class.
No attributes.
method extends named, callable A method of a class.
No attributes.

No builtin instances are defined.

2.3. Builtin Ecosystems

The model provides its builtin ecosystems via the librarian.model.core/ecosystems map:

{:python python}

Every ecosystem has a keyword alias with which it can be referenced in scraper configuration files.

Currently only an ecosystem for Python (:python) is provided.

2.3.1. The python ecosystem (extends functional, oo)

An ecosystem for Python.

Additional Concept Aliases:

  • :class: python/class (overrides class)
  • :constructor: python/constructor (overrides constructor)
  • :basetype: python/basetype (overrides basetype)
Python Concepts
Concept Attribute Type / Cardinality / Index Description
python/class extends class A Python class. Like class but can only have a single constructor and automatically recognizes methods named __init__ as its constructor.
No attributes.
python/constructor extends constructor Like constructor but with a unique reference to its class.
class Derived ref to python/class, unique A reference to the constructor's class. In Python this uniquely identifies a constructor.
python/basetype extends basetype Like basetype but only allows the Python basetype names: "object", "int", "float", "complex", "string", "boolean".
No attributes.

Builtin Instances:

  • basetype instances: int, float, complex, string, boolean which all extend object.
  • Typecasting functions:
    • str(x): object -> string
    • int(x): object -> int
    • float(x): object -> float

Other Python builtins can be added when needed.