Merge buffers, C-syntax backend builder, improved syntax extensions
From the CHANGELOG:
Added
- A new backend "cc": C based on a configurable C compiler command, defaulting to
cc
. - Merge buffers representational abstraction (one per virtual device):
- backends just need to support device-to-device transfers,
- merging gets implemented in "user space".
- CUDA streaming multiprocessor parallelism via streams <-> virtual devices.
- Support for
cuda-gdb
andcompute-sanitizer
(pass the right arguments to cudajit). - Inline declarations for (non-differentiable) tensors in the
%cd
syntax. - A minimal wrapper
Sync_backend
creating CPU backends with a single device only, where all calls are synchronous. (It's a baseline and helps debugging.) - In progress: proper (condition variables based) scheduler. The legacy scheduler (pipes based) kept for now as baseline and to help debugging.
- Documentation for the syntax extensions.
%op
syntax: when under a~config
parameter, refine the inline declared params' labels withconfig.label
.%op
syntax: incorporate the input tensor's (if any) label in the resulting tensor's label.- Comments in config files using the line prefix
~~
.
Changed
- Terminology in the API: Renamed almost all uses of "jit" into uses of "compile" and / or "link".
- Split the compile-to-ptx phase from the build-module and build-kernel-launcher phase.
- Migrated the CUDA backend to ppx_minidebug-based execution tracing.
- Fixes for mixed precision computations.
- Further terminology refactoring: Renamed
Low_level.compile
toLow_level.lower
;- and
Low_level.compiled
toLow_level.optimized
, making it a record.
- and
- Further refactoring of the
Backends
API:- split the
device
type into virtualdevice
andphysical_device
, - removed the direct support for
merge
, instead relying on merge buffers.
- split the
- Updated to cudajit 0.4.
- A template for C-syntax backends, refactoring CC and CUDA backends.
- Improvements to handling of tensor node labels, and to the
Tnode.debug_name
function. - Output files generated by backends, and files generated by logging, in separate subdirectories.
- C-syntax logging: also output the pre-assignment value when logging an assignment.
- Migrated to ppx_minidebug 2.0 with the benefits it brings: no runtime passing,
Utils.settings.log_level
unified with ppx_minidebug's log levels.
Fixed
- Allow verifying that non-embedded tensor nodes of the tensor(s) associated with a linked code are already in the context passed to
link
(resp.link_batch
), since they won't get introduced into the context. It is the responsibility of helper functions (such as those inTrain
) to ensure the check. - Fixed both known and newly discovered shortcomings of the syntax extensions.
- In particular,
%op
syntax: lift~config
applications out of (tensor) functions. - Multiple other tiny fixes.