Skip to content

Commit

Permalink
Prepare release v0.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
lukstafi committed Jun 3, 2023
1 parent c2a2073 commit 60ee882
Show file tree
Hide file tree
Showing 4 changed files with 41 additions and 5 deletions.
29 changes: 29 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,32 @@
## [0.2.0] -- 2023-06-03

### Added

- The Gccjit backend operates using "on device" copies of tensors, where the "device memory" is the stack of the C function. This is intended to improve cache locality and reduce cache contention.
- Three / four synchronization heuristics:
- "parallel": a slice of the tensor is copied host-to-device at the beginning and device-to-host at the end, without interference because each task has a different slice.
- "update on host": the tensor is copied host-to-device at the beginning; each write is an update, it reads the old value from host to update it on the host. Thus each write is a synchronization point.
- "replicated": the tensor is copied host-to-device at the beginning; only task 0 copies device-to-host.
- "device-only": no copying to/from host.
- On-device-only tensors that are not materialized on the OCaml side.
- A new category of axis dimensions is introduced: `Frozen`. It is analogous to the `Parallel` axis category in that a single task execution / "device call" only processes a 1D slice of the axis.
- Currently, for tensors processed in parallel, we only support processing of a contiguous tensor slice (copied "to device" using `memcpy`).
- A new syntax `%nn_rs` ("postprocess results" variant of `%nn_dt`) for computations that should happen at the end of task execution / refresh step. It's meant to prepare the data to be copied back to the host.

### Changed

- Got rid of backend-agnostic synchronization. It was not worth the complexity / implementation effort at this point.
- Keeping the `Rebalance` constructor around, but it is not playing any role.
- Got rid of `debug_virtual_nodes`, was tricky to maintain.
- Dynamic indexing now skips over parallel axes: when there is a `Parallel` axis on the left, it is preserved in the resulting tensor (slice), and the next-right axis is indexed into instead.
- Removed the "indexing axes from-right" functionality for now (fails as not implemented).
- Dynamic indexing now can produce virtual nodes.

### Fixed

- Dynamic indexing fixes.


## [0.1.2] -- 2023-05-12

### Added
Expand Down
13 changes: 10 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,16 +30,23 @@ Warning disclaimer: this project is still "not announced". The features describe

## Future milestones

For past milestones see [CHANGES](CHANGES.md).

* Skipping v0.2.
* **v0.3-GPU**: a CUDA backend.
* **v0.3.1-tiling**: the tiling optimization.
* **v0.4-usability**: examples covering most of Andrej Karpathy's "Neural Networks Zero to Hero" series; data loading; checkpointing.
* **v0.5-documentation**: `.mli` files and maybe more documentation.
* **v0.6-scale**: distributed computation; runtime-autotuning optimization settings.
* **v1-completeness**: whatever not-yet-implemented features that still seem needed and impact the framework design. (E.g. at the time of v0.1.X, convolutions, reshaping, concatenation are not easily expressible.)

### Releases

For details, see [CHANGES](CHANGES.md).

* **v0.2**: for multicore CPU, improve cache locality and reduce cache contention by treating the C function stack as the "device memory".
* **v0.1.2**: multicore computations using a thread-local "task id" index.
* **v0.1.1**: inlining scalar constants, improved inlining for virtual nodes.
* **v0.1.0**: a `Gccjit` backend, single and double precision floats, code compiled as a monolithic update step function.


## Why not just use [OWL](https://ocaml.xyz/)?

OCANNL follows different design choices than [OWL](https://ocaml.xyz/). For example:
Expand Down
2 changes: 1 addition & 1 deletion dune-project
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

(name ocannl)

(version 0.1.2)
(version 0.2.0)

(generate_opam_files true)

Expand Down
2 changes: 1 addition & 1 deletion ocannl.opam
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file is generated by dune, edit dune-project instead
opam-version: "2.0"
version: "0.1.2"
version: "0.2.0"
synopsis:
"A from-scratch Deep Learning library with CUDA, operator fusion, staged compilation, backprop"
description: "A longer description"
Expand Down

0 comments on commit 60ee882

Please sign in to comment.