Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add streamline resource framework #1253

Merged
merged 26 commits into from
Jan 29, 2024
Merged

Conversation

DavidLegg
Copy link
Contributor

  • Tickets addressed: N/A
  • Review: By file
  • Merge strategy: Merge (no squash)

Description

Implements a new resource framework, based on the one currently being used by Clipper and lessons learned from an older framework developed for Clipper.

The most important changes, compared to previous resource frameworks, are:

  • Pull-based derived resources
    • PublishingResource and its related resources use a push-based model for derived resources. This required resolving derived resource values eagerly with an expensive looping task.
    • Clipper's current framework replaced that with a lazy pull-based model, which proved much more efficient. This framework adopts the same pull-based approach.
  • Abstraction strategy
    • A resource's sampling strategy is "upstream" of the resource itself, and shouldn't affect how that resource can be used - hence it should not be part of the resource type. Instead of "DerivedResource" and "SamplingResource" and "SemiLazilyDerivedState", just have "DiscreteResource" and three ways of constructing one on Clipper's current model. This idea is carried forward, but taken further:
    • Instead of one resource type per dynamics type, e.g. DiscreteResource, NumericResource, etc., we have just one resource interface parameterized by dynamics type. This cleans up derived resources, since we only need Resource -> Resource derivation, not Discrete -> Discrete and Discrete -> Numeric and Numeric -> Discrete and ...
      Avoiding that combinatorial explosion makes adding new dynamics types easier. This in turn gives us better modeling fidelity, since we can define and use an unusual dynamics type if needed. For example, we define and regularly use in derivations an exact clock based on Duration.
  • Improved expiry and dynamics comparison semantics
    • Dynamics are compared by equals, not DynamicsIdentity (as in Clipper model presently), which simplifies dynamics types.
    • Expiry info is carried in a wrapper around dynamics, rather than in dynamics themselves. This again simplifies dynamics types.
  • Built-in error catching for effects and derived resources
    • Dynamics are wrapped in ErrorCatching, a type that can replace the normal value with an exception.
    • When applying effects or performing a derivation (through monad machinery, see below), exceptions are caught and given to the ErrorCatching layer. Errors cascade through derivations, so we don't propagate "fake" data when errors occur.
    • Allows simulation to continue when some resources "fail" - Those resources have no value (so don't have a wrong value), while unaffected resources continue to work normally.
    • When registering a resource, errors are shunted to a separate errors resource with additional debugging information attached. Hence errors and partial sim results are visible in the UI.
  • Use of monads to lift operations
    • Operations can be written against the simplest type they logically work on, then lifted through monadic operators to act on resources. This keeps the handling for expiry, errors, and resources consistent and centralized. This is discussed in more detail here:
      streamline-framework.pdf
  • Streamlined cell and effect handling
    • On Clipper, even with the option to define custom cells or effect types, modelers preferred to just use SettableState/Register and "set" effects. This leads to unnecessary conflicts for concurrent effects. Although effects "should" commute, because (for example) they represent increment and decrement operations, they don't actually commute because they're implemented through set effects.
    • In this framework, there's one kind of resource that permits effects, CellResource. Additionally, effects are functions on the dynamics; we don't define different effect types for different cells. Finally, we define an "automatic" effect trait, which combines concurrent effects by trying both orderings to see if the effects commute in practice.
    • In practice, we observe very few concurrent effects, so the penalty to performance from using the automatic effect trait should be negligible. The benefit is that modelers can write arbitrary effects as easily as they could write the read-compute-set style:
    var v = resource.getDynamics();
    var u = someFunction(v);
    resource.set(u);
    
    becomes
    resource.emit(v -> someFunction(v));
    
  • "Reactions" implemented by efficiently trampolining with replaying tasks
    • On Clipper, we define a number of daemon tasks that "react" to some condition repeatedly throughout the simulation, performing a very small amount of work each time. For example, computing an integral every time an integrand changes, or processing a command when one is added to a queue.
    • Using a loop and threaded task is inefficient; time spent thread switching dwarfs time spent doing useful work. Clipper solved this using replaying tasks to avoid thread switching, and spawning a new task each iteration instead of looping to avoid long replays.
    • This technique is carried forward in the Reactions class, which defines several useful reaction patterns.
  • Equivalence of Conditions and discrete boolean resources
    • Since resources carry expiry information, a Condition is equivalent to a profile segment on a discrete boolean resource. To realize this equivalence and halve the amount of work for methods like lessThan, we have a function when to turn a boolean resource into the condition "that resource is true". This is especially useful in conjunction with reactions.
  • Model-level debugging utilities
    • Since stack traces and procedural debugger stepping are not very useful for debugging things like derived resources, the "Tracing" class offers a way to trace when a resource is invoked, and what it returns. Tracing stacks, so a chain of traced derived resources appear as such in the output, e.g. A -> B -> C -> result.
    • Additional context is attached to each effect, meant to label it with the activity that emitted it and the resource it was emitted on. Effects usually fail when applied, removed in time from the blameworthy activity. Carrying this info with the effect lets us determine its cause more quickly.
      • Note that this feature is mostly a stop-gap until topic metadata is well-supported. At that point, most of the labelling technology in this framework will likely be replaced by topic metadata.
  • Flexible unit awareness framework
    • Clipper used a specially-designed QuantityState to efficiently handle units. This framework generalizes that to a UnitAware wrapper interface, which can be wrapped around any type, provided an appropriate scaling function to do the unit conversions with. This unifies unit handling, since as a fallback, we can always take unit-naive operations, and apply them to unit-aware objects by requesting the unit-naive values in the unit implicitly expected by the unit-naive operation. This way, units are controlled at the point of use, instead of the point of definition, and different uses can transparently ask for different units.
    • Some operations, like basic arithmetic, are generalized across types to be done in a truly unit-aware way. See the unit-aware overloads of methods in PolynomialResources for examples.

Verification

Some core classes were unit-tested as needed for debugging. Additionally, Pranav on SRL has been using a preview version of this framework for a few weeks to help flush out bugs and missing features.

Documentation

streamline-framework.pdf
The intended audience for this document is a modeler looking to use the framework. I'd like to convert this document into a set of wiki pages so they're more easily accessible, and expand it with examples and lessons learned by modelers using this or other resource frameworks.

Future work

Fleshing out standard dynamics types, effects, and resource derivation functions as more models use them and find use cases that are missing or poorly supported.

There are known efficiency improvements around task handling and providing expiry information directly to Aerie when sampling resources. These may have small knock-on effects to this framework.

Also, as mentioned briefly above, Aerie's topic metadata feature will likely supplant the labelling system in this framework in the future.

(PR copied from #1198, to get CI tests to work.)

@DavidLegg DavidLegg requested a review from a team as a code owner December 12, 2023 02:10
@mattdailis mattdailis self-requested a review December 12, 2023 22:23
@DavidLegg DavidLegg force-pushed the contrib-streamline-framework branch from ae8331b to ec262c7 Compare December 19, 2023 03:07
@DavidLegg DavidLegg force-pushed the contrib-streamline-framework branch from ec262c7 to c1006cb Compare December 19, 2023 18:54
@DavidLegg DavidLegg force-pushed the contrib-streamline-framework branch from c1006cb to 12e0ed6 Compare December 19, 2023 20:48
@mattdailis mattdailis self-assigned this Jan 4, 2024
David Legg added 7 commits January 23, 2024 17:02
Adds the core resource interfaces. A Resource is defined as a ThinResource (equivalent to a Supplier)
returning an ErrorCatching and Expiring Dynamics. This mirrors the information stored in a cell,
which has proven useful to carry with every value tracked by the model.
To support methods like `map` accepting functions of varying arities,
we need to define a function interface for each supported arity above 3.
Similarly, we need to define `map` and `bind` methods for every supported arity for each monad.
Since Java's type system doesn't support abstraction across type functors,
we do this by generating the methods with a python script instead.
Adds monads for all core interfaces:
 * ExpiringMonad
 * ErrorCatchingMonad
 * ThinResourceMonad

These monads abstract the handling of expiries, errors, and stitching together multiple dynamics segments into a resource.

Also adds two monads that compose the above:
 * DynamicsMonad = ExpiringMonad + ErrorCatchingMonad
 * ResourceMonad = ThinResourceMonad + DynamicsMonad

Monad users can generally write code that operates on base dynamics objects, and use the monad methods
to lift that code to fully-wrapped dynamics or resources. This makes sure the wrapping layers are handled correctly
and consistently, and keeps downstream code more focused.
Adds MutableResource and supporting types for defining cells and effects.

This design emphasizes separation of concerns in two primary ways:
 * First, since every resource dynamics carries an expiry and stepping behavior,
   we don't need to define a different cell type depending on how that value is computed (like an Accumulator)
   nor by what kind of dynamics are stored (e.g. Discrete vs. Real).
 * Second, since the DynamicsEffect interface defines a fully general effect type,
   we don't need to define a different cell type depending on the supported class of effects.
   Taken together, we can define a single cell type.

This design also seeks to reduce overhead for modelers to handle effects the "right" way.
By this, we mean using semantically correct effects, rather than (ab)using Registers for everything.
 * Instead of defining a new type for effects, we use a general DynamicsEffect interface.
   We also have the DynamicsMonad.effect method, so effects can be written against the base dynamics type,
   often as a small in-line lambda.
 * To support these "black-box" effects, we use an "automatic" effect trait by default,
   which tests concurrent effects for commutativity.
   Since effects are rarely concurrent in practice, this is performant enough in most use cases.
   Furthermore, it combines with the error-handling wrapper to bubble-up useful error messages,
   as well as let independent portions of the simulation continue normally.

Taken together, the above means there's a single "default" way to build a cell,
which provides enough flexibility and performance for most use cases.
Adds tools for debugging incorrect or poorly-performing simulations.

 * Context and Naming - These allow us to attach names to scopes in the code and to objects in memory.
   At strategic points, we can query this information to bubble up to the user.
   For example, we can attach the context an effect was emitted in to that effect.
   If the effect fails, we can inject that context into the error, which is more useful to debugging than a stack trace.
   Similarly, we can name resources when they're registered, and we can derive names for one object from others.
   When debugging resources, this can de-anonymise the lambda functions that make up the bulk of a resource model.
 * Tracing - Since debuggers struggle to "step" across simulation engine iterations,
   we borrow tracing techniques from functional programming. Tracing attaches print statement when a resource, task,
   or condition starts and stops.  We also respect nested tracing, yield a log that mirrors the model's structure and
   lets a programmer "unpack" derived values step-by-step.
 * Profiling - Since all resource calculations often have the same or very similar stack frames,
   profilers often don't supply a useful level of detail. We provide profiling tools that distinguish calls by instance,
   to tell which resources / cells / conditions / tasks are hot spots.
 * Dependencies - Since it's sometimes useful to visualize the structure of a resource model, we track resource-level dependencies,
   and these can be queried from a debugger or debugging code.
Adds a registrar wrapping and adapting the standard Merlin Registrar.
This registrar unwraps the ErrorCatching and Expiring layers from resources,
with options to either throw or log errors.
David Legg added 11 commits January 23, 2024 17:02
Adds the Discrete dynamics type, as well as utilities for working with discrete Resources:
 * DiscreteEffects - utility methods for common operations on MutableResource<Discrete>, like increment/decrement, set value, etc.
 * DiscreteResources - utility methods for defining and deriving discrete resources, including integer and double-precision arithmetic,
   boolean logic, etc.

Notable / unusual features of this code include:
 * `DiscreteResources.when` - Converts a discrete boolean resource into a condition, satisfied when the resource is true.
   This realizes the equivalence between conditions and boolean resources. By deriving boolean resources, we get the equivalent condition with no additional code.
 * `DiscreteResources.discreteResource` - Declares a discrete MutableResource, with special handling for Double values.
   When effects are applied to doubles, floating-point precision mismatches can make effects that logically commute appear not to commute.
   This special constructor for MutableResources uses a toleranced equality for checking commutativity to solve this problem.
 * Monads - The `Discrete` wrapper forms a (trivial) monad, which can be composed with the other major monads.
   This means derivations on discrete resources can use the value directly, and monad methods will lift that all the way to acting on resources.
Adds the Linear dynamics type, the analog of Merlin's Real type.
This type is named Linear rather than Real, to distinguish it from Polynomial or other potentially real-valued dynamics types.
Adds Clock and VariableClock dynamics types, both of which use Durations to exactly represent time.
Using Duration avoids floating-point issues that crop up when using Linear or Polynomial resource to represent time,
as well as not needing conversions to and from Duration.

Of particular note are the stopwatch-style effects and resource-level comparison functions for VariableClock.
Combining these with `DiscreteResources.when` and `Reactions` is especially useful for expressing time-based conditions and behaviors.
Adds polynomial resources, which are the primary non-discrete resource type.

Of particular note are the arithmetic and comparison functions for derivations to and from polynomials.
These functions incorporate higher-order coefficients correctly, including performing root-finding
to calculate when comparisons will expire.
Adds a solver for linear inequality constraints posed on polynomial resources,
which proceeds by boundary consistency without backtracking.

For some resource problems, specifying behavior in terms of comparisons and arithmetic can be
hard to ready and error-prone. In particular, PolynomialResources.clampedIntegrate was found to be this way.
These problems are often readily formulated as linear inequality constraints over variables with
polynomial dynamics segments for values. Since polynomials form a vector space over the real numbers,
such linear inequalities are amenable to solving by boundary consistency without backtracking.*
Additionally, by specifying how to resolve under-constrained problems, a simple greedy optimizer can be defined
for linear objective functions.

See `PolynomialResources.clampedIntegrate` for an example of how this solver is used.

*Linear programming can't be used because that requires division.
Adds types and utilities for unit-awareness.

In particular, adds the `UnitAware` interface, which is generalized on the type of values to which units are attached.
This is to support adding units to both value types, like `Double`, and resource types, like `Resource<Polynomial>` or `Resource<Discrete<Double>>`.
Units are restricted to "absolute" units to simplify conversion, representation, and usage.
Units are represented by a Dimension and a floating-point scale compared to a base unit for that dimension.
Dimensions are represented exactly, as a product of rational powers of incommensurate base dimensions.

When using units on resources, we wrap UnitAware around Resource, and not vice-versa.
This applies a single unit to the entire lifetime of the resource, rather than letting the unit vary over time.
This lets us check dimensionality on resources once during initialization, then "bake in" constant conversion factors
to be used during simulation, a setup that has proven performant in Clipper's model.
Unlike other dynamics types, Unstructured and Differentiable dynamics do not present enough information to globally describe their behavior.
This means they can express functions that are not otherwise exactly representable, like trignometry functions for Differentiable,
or even calls to external libraries through Unstructured.

Unstructured resources can represent any function of time and/or other resources, but need to be approximated before being given to a Registrar or some other components.
Unstructured resources can also represent continuously-varying values of unusual types. For example, a string representing the current time down to the microsecond.
While these kinds of values are not often registered directly, they can form intermediate steps when deriving other (unstructured) resources, which can be approximated and registered.

While the Unstructured type itself forms a monad over the values it contains, it does not compose with Resource or Dynamics monads.
Instead, we revert to an applicative, which still lets us write derivations on unstructured resources as functions on their plain values.
Adds utilities for constructing approximations.

These are particularly useful for Unstructured resources with Double as a value,
but some approximations generalize beyond this.

Approximation is broken into several stages:
 1. Choose an approximation type.
    The ready-made choices here are "discrete", "secant", or "Taylor" approximations, but the interface permits others to be defined later.
 2. Choose a strategy for running those approximations - choosing a divergence estimator or interval function, depending on the approximation type.
    There are ready-made strategies for using uniformly-spaced samples or attempting to bound the final error.
 3. If error-bounding is used, choose an error tolerance and error measurement.
    Again, there are several ready-made options, based on both direct methods and analytic estimates, depending on the available information.
    Further parameters may be needed to configure these error estimates.

Breaking the problem down like this allows for more focused testing, as well as component re-use.
Doing approximation on an Unstructured resource allows the value of the result to influence the sampling strategy -
rather than heuristically choosing a sampling strategy to get "good enough" data, we can measure the result to choose samples.
This can be an efficiency boon when the approximated resource is cheap, but downstream resources or tasks are expensive.
Adds a small, non-executable Demo class. This class shows some important features of the framework in a condensed way.
Add an executable example model used to test and demonstrate the framework.

The example model comprises three main components:
 * ErrorTestingModel
 * DataModel
 * ApproximationModel

The ErrorTestingModel demonstrates error-handling behavior for resources.
Using the `CauseError` activity, we can trigger one of several kinds of errors from the plan,
including effects which fail directly, concurrent effects which conflict, and resources which violate a constraint.
We can also test out different combinations of naming and error handling behaviors, to see what information is propagated back to the user for debugging.
Finally, we can observe that, when logging errors, portions of the model can fail while independent portions of the model continue to function normally.

The DataModel demonstrates a complicated resource modeling problem, leveraging the LinearBoundaryConsistencySolver.
This problem models a data system with bin space shared across multiple buckets, where some buckets have priority over others,
and each bucket independently sets desired write or delete rates.
The problem of allocating bin space to each bucket, allowing higher-priority buckets to overwrite lower-priority ones only if needed,
is phrased as linear inequalities on polynomial resources and given to the solver.
The `ChangeDesiredRate` activity can be used to set up different scenarios for this model.

The ApproximationModel shows three resources, a polynomial, a rational function, and a complicated trig function, approximated various ways.
The `ChangeApproximationInput` activity can be used to change the polynomials feeding this model,
and the simulation configuration contains a setting for the approximation accuracy, affecting those resources which approximate by bounding their maximum error.
In particular, the `approximation/trig/default` resource displays the most complicated resource function, comprising trig and exponentials,
defined through the Java Math library, approximated by the default secant approximation method.
Adds unit tests for a variety of classes in the streamline framework.

These unit tests were added as-needed to debug problems, and aren't meant to give comprehensive or systematic coverage.
Copy link
Contributor

@bradNASA bradNASA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't resolve the infinite loop problem, we should create an issue for it, #1305 . I'm approving because I don't think this is a big enough reason to hold it back.

Copy link
Collaborator

@mattdailis mattdailis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all your hard work, @DavidLegg. This PR will empower mission modelers to more easily harness the power of Aerie. The terse expressiveness of your libraries, and the fact that I've had a number of people over the past few weeks ask me repeatedly, "when are you going to merge this PR?" are a testament to how much care you've put into the design and implementation of this framework.

This is the beginning, not the end; I expect us to continue to iterate on this framework. Some specific areas I'd like to see us explore are:

  • Can we avoid the singleton pattern in Resources.CLOCK?
  • Can Resource.signalling leverage expiry, rather than emitting no-op events?
  • Can we serialize errors using events instead of resources, for better visualization?

With that, I say :shipit:

@mattdailis mattdailis merged commit 7c9ae5f into develop Jan 29, 2024
6 checks passed
@mattdailis mattdailis deleted the contrib-streamline-framework branch January 29, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants