Merge branch 'ershi/clear_cache' into 'main'

Add the wp.clear_kernel_cache() to the public API See merge request omniverse/warp!450
NVIDIA · Jul 3, 2024 · e33ccab · e33ccab
2 parents fa3cef4 + 2b73052
commit e33ccab
Show file tree

Hide file tree

Showing 98 changed files with 136 additions and 117 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -11,9 +11,11 @@
   compiled `(compiled)`, loaded from the cache `(cached)`, or was unable to be
   loaded `(error)`.
 - `wp.config.verbose = True` now also prints out a message upon the entry to a `wp.ScopedTimer`.
-- Add additional documentation and examples demonstrating `wp.copy()`, `wp.clone()`, and `array.assign()` differentiability.
+- Add additional documentation and examples demonstrating `wp.copy()`, `wp.clone()`, and `array.assign()` differentiability
 - Fix adding `__new__()` methods for all class `__del__()` methods to
   anticipate when a class instance is created but not instantiated before garbage collection.
+- Add code-completion support for wp.config variables.
+- Add `wp.clear_kernel_cache()` to the public API. This is equivalent to `wp.build.clear_kernel_cache()`.
 - Add code-completion support for `wp.config` variables.
 - Remove usage of a static task (thread) index for CPU kernels to address multithreading concerns.
 - The `mask` argument to `wp.sim.eval_fk` now accepts both integer and bool arrays

diff --git a/docs/basics.rst b/docs/basics.rst
@@ -205,6 +205,7 @@ please see :ref:`Structs Reference <Structs>` for more details.
 
     As with kernel parameters, all attributes of a struct must have valid type hints at class definition time.
 
+.. _Compilation Model:
 
 Compilation Model
 -----------------

diff --git a/docs/faq.rst b/docs/faq.rst
@@ -35,15 +35,15 @@ Does Warp support all of the Python language?
 No, Warp supports a subset of Python that maps well to the GPU. Our goal
 is to not have any performance cliffs so that users can expect
 consistently good behavior from kernels that is close to native code.
-Examples of unsupported concepts that don’t map well to the GPU are
+Examples of unsupported concepts that don't map well to the GPU are
 dynamic types, list comprehensions, exceptions, garbage collection, etc.
 
 When should I call ``wp.synchronize()``?
 ----------------------------------------
 
 One of the common sources of confusion for new users is when calls to
 ``wp.synchronize()`` are necessary. The answer is “almost never”!
-Synchronization is quite expensive, and should generally be avoided
+Synchronization is quite expensive and should generally be avoided
 unless necessary. Warp naturally takes care of synchronization between
 operations (e.g.: kernel launches, device memory copies).
 
@@ -83,14 +83,15 @@ and :ref:`synchronization guidance <synchronization_guidance>`.
 What happens when you differentiate a function like ``wp.abs(x)``?
 ------------------------------------------------------------------
 
-Non-smooth functions such as ``y=|x|`` do not have a single unique
-gradient at ``x=0``, rather they have what is known as a
-``subgradient``, which is formally the convex hull of directional
+Non-smooth functions such as :math:`y=|x|` do not have a single unique
+gradient at :math:`x=0`, rather they have what is known as a
+*subgradient*, which is formally the convex hull of directional
 derivatives at that point. The way that Warp (and most
 auto-differentiation frameworks) handles these points is to pick an
 arbitrary gradient from this set, e.g.: for ``wp.abs()``, it will
 arbitrarily choose the gradient to be 1.0 at the origin. You can find
-the implementation for these functions in ``warp/native/builtin.h``.
+the implementation for these functions in
+`warp/native/builtin.h <https://github.com/NVIDIA/warp/blob/main/warp/native/builtin.h>`_.
 
 Most optimizers (particularly ones that exploit stochasticity), are not
 sensitive to the choice of which gradient to use from the subgradient,
@@ -107,25 +108,25 @@ conventions of PyTorch and use aliases such as ``cuda:0``, ``cuda:1``,
 Should I switch to Warp over IsaacGym/PhysX?
 ----------------------------------------------
 
-Warp is not a replacement for IsaacGym, IsaacSim, or PhysX - while Warp
-does offer some physical simulation capabilities this is primarily aimed
+Warp is not a replacement for IsaacGym, IsaacSim, or PhysX—while Warp
+does offer some physical simulation capabilities, this is primarily aimed
 at developers who need differentiable physics, rather than a fully
 featured physics engine. Warp is also integrated with IsaacGym and is
 great for performing auxiliary tasks such as reward and observation
 computations for reinforcement learning.
 
 Why aren't assignments to Warp arrays supported outside of kernels?
--------------------------------------------------------------------
+------------------------------------------------------------------------
 
 For best performance, reading and writing data that is living on the GPU can 
 only be performed inside Warp CUDA kernels. Otherwise individual element accesses
 such as ``array[i] = 1.0`` in Python scope would require prohibitively slow device
 synchronization and copies.
 
 We recommend to either initialize Warp arrays from other native arrays
-(e.g.: Python list, NumPy array, ...) or by launching a kernel to set its values.
+(Python lists, NumPy arrays, etc.) or by launching a kernel to set its values.
 
-For the common use case of wanting to fill an array with a given value, we
+For the common use case of filling an array with a given value, we
 also support the following forms:
 
 - ``wp.full(8, 1.23, dtype=float)``: initializes a new array of 8 float values set

diff --git a/docs/limitations.rst b/docs/limitations.rst
@@ -40,9 +40,9 @@ This is not always possible for kernels launched with multi-dimensional grid bou
 `hardware limitations <https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#features-and-technical-specifications-technical-specifications-per-compute-capability>`_
 on CUDA block dimensions.
 
-Warp will automatically fallback to using
+Warp will automatically fall back to using
 `grid-stride loops <https://developer.nvidia.com/blog/cuda-pro-tip-write-flexible-kernels-grid-stride-loops/>`_ when
-it is not possible for a CUDA thread to process only one element from the Warp grid
+it is not possible for a CUDA thread to process only one element from the Warp grid.
 When this happens, some CUDA threads may process more than one element from the Warp grid.
 Users can also set the ``max_blocks`` parameter to fine-tune the grid-striding behavior of kernels, even for kernels that are otherwise
 able to process one Warp-grid element per CUDA thread. 
@@ -104,7 +104,7 @@ The behavior of the modulus operator in a Warp kernel follows that of C++11: The
 Power Operator
 """"""""""""""
 
-The power operator (``**``) in Warp kernels only works on floating-point numbers (also see :func:`wp.pow <pow>`).
+The power operator (``**``) in Warp kernels only works on floating-point numbers (also see :func:`wp.pow() <pow>`).
 In Python, the power operator can also be used on integers.
 
 Inverse Sine and Cosine

diff --git a/docs/modules/runtime.rst b/docs/modules/runtime.rst
@@ -15,23 +15,30 @@ Kernels are launched with the :func:`wp.launch() <launch>` function on a specifi
 
     wp.launch(simple_kernel, dim=1024, inputs=[a, b, c], device="cuda")
 
-Kernels may be launched with multi-dimensional grid bounds. In this case threads are not assigned a single index,
+Note that all the kernel inputs must live on the target device or a runtime exception will be raised.
+Kernels may be launched with multi-dimensional grid bounds. In this case, threads are not assigned a single index,
 but a coordinate in an n-dimensional grid, e.g.::
 
     wp.launch(complex_kernel, dim=(128, 128, 3), ...)
 
-Launches a 3D grid of threads with dimension 128 x 128 x 3. To retrieve the 3D index for each thread use the following syntax::
+Launches a 3D grid of threads with dimension 128 x 128 x 3. To retrieve the 3D index for each thread, use the following syntax::
 
     i,j,k = wp.tid()
 
 .. note::
-    Currently kernels launched on CPU devices will be executed in serial.
+    Currently, kernels launched on CPU devices will be executed in serial.
     Kernels launched on CUDA devices will be launched in parallel with a fixed block-size.
 
-.. note::
-    Note that all the kernel inputs must live on the target device, or a runtime exception will be raised.
+In the Warp :ref:`Compilation Model`, kernels are just-in-time compiled into dynamic libraries and PTX using
+C++/CUDA as an intermediate representation.
+To avoid excessive runtime recompilation of kernel code, these files are stored in a cache directory
+named with a module-dependent hash to allow for the reuse of previously compiled modules.
+The location of the kernel cache is printed when Warp is initialized.
+:func:`wp.clear_kernel_cache() <clear_kernel_cache>` can be used to clear the kernel cache of previously
+generated compilation artifacts as Warp does not automatically try to keep the cache below a certain size.
 
 .. autofunction:: launch
+.. autofunction:: clear_kernel_cache
 
 .. _Runtime Kernel Creation:
 

diff --git a/warp/__init__.py b/warp/__init__.py
@@ -10,7 +10,7 @@
 # for autocomplete on builtins
 # from warp.stubs import *
 
-from warp.types import array, array1d, array2d, array3d, array4d, constant
+from warp.types import array, array1d, array2d, array3d, array4d, constant, from_ptr
 from warp.types import indexedarray, indexedarray1d, indexedarray2d, indexedarray3d, indexedarray4d
 from warp.fabric import fabricarray, fabricarrayarray, indexedfabricarray, indexedfabricarrayarray
 
@@ -31,9 +31,9 @@
 from warp.types import bvh_query_t, hash_grid_query_t, mesh_query_aabb_t, mesh_query_point_t, mesh_query_ray_t
 
 # device-wide gemms
-from warp.types import matmul, adj_matmul, batched_matmul, adj_batched_matmul, from_ptr
+from warp.types import matmul, adj_matmul, batched_matmul, adj_batched_matmul
 
-# deprecated
+# discouraged, users should use wp.types.vector, wp.types.matrix
 from warp.types import vector as vec
 from warp.types import matrix as mat
 
@@ -99,6 +99,8 @@
 
 from warp.dlpack import from_dlpack, to_dlpack
 
+from warp.build import clear_kernel_cache
+
 from warp.constants import *
 
 from . import builtins

diff --git a/warp/build.py b/warp/build.py
@@ -71,8 +71,12 @@ def init_kernel_cache(path=None):
     os.makedirs(warp.config.kernel_cache_dir, exist_ok=True)
 
 
-def clear_kernel_cache():
-    """Clear the kernel cache."""
+def clear_kernel_cache() -> None:
+    """Clear the kernel cache directory of previously generated source code and compiler artifacts.
+
+    Only directories beginning with ``wp_`` will be deleted.
+    This function only clears the cache for the current Warp version.
+    """
 
     warp.context.init()
 

diff --git a/warp/examples/benchmarks/benchmark_cloth_warp.py b/warp/examples/benchmarks/benchmark_cloth_warp.py
@@ -7,7 +7,7 @@
 
 import warp as wp
 
-wp.build.clear_kernel_cache()
+wp.clear_kernel_cache()
 
 
 @wp.kernel

diff --git a/warp/examples/benchmarks/benchmark_launches.py b/warp/examples/benchmarks/benchmark_launches.py
@@ -128,7 +128,7 @@ def ksz(s: Sz):
     tid = wp.tid()  # noqa: F841
 
 
-wp.build.clear_kernel_cache()
+wp.clear_kernel_cache()
 
 devices = wp.get_devices()
 num_launches = 100000

diff --git a/warp/stubs.py b/warp/stubs.py
@@ -23,7 +23,7 @@
 IndexedFabricArray = Generic[DType]
 
 
-from warp.types import array, array1d, array2d, array3d, array4d, constant
+from warp.types import array, array1d, array2d, array3d, array4d, constant, from_ptr
 from warp.types import indexedarray, indexedarray1d, indexedarray2d, indexedarray3d, indexedarray4d
 from warp.fabric import fabricarray, fabricarrayarray, indexedfabricarray, indexedfabricarrayarray
 
@@ -42,7 +42,7 @@
 from warp.types import Bvh, Mesh, HashGrid, Volume, MarchingCubes
 from warp.types import bvh_query_t, hash_grid_query_t, mesh_query_aabb_t, mesh_query_point_t, mesh_query_ray_t
 
-from warp.types import matmul, adj_matmul, batched_matmul, adj_batched_matmul, from_ptr
+from warp.types import matmul, adj_matmul, batched_matmul, adj_batched_matmul
 
 from warp.types import vector as vec
 from warp.types import matrix as mat
@@ -108,6 +108,8 @@
 
 from warp.dlpack import from_dlpack, to_dlpack
 
+from warp.build import clear_kernel_cache
+
 from warp.constants import *
 
 from . import builtins

diff --git a/warp/tests/disabled_kinematics.py b/warp/tests/disabled_kinematics.py
@@ -233,5 +233,5 @@ class TestKinematics(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2, failfast=False)
diff --git a/warp/tests/test_adam.py b/warp/tests/test_adam.py
@@ -151,5 +151,5 @@ class TestAdam(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_arithmetic.py b/warp/tests/test_arithmetic.py
@@ -1084,5 +1084,5 @@ class TestArithmetic(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2, failfast=False)
diff --git a/warp/tests/test_array.py b/warp/tests/test_array.py
@@ -2411,5 +2411,5 @@ def test_array_new_del(self):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_array_reduce.py b/warp/tests/test_array_reduce.py
@@ -144,5 +144,5 @@ class TestArrayReduce(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_async.py b/warp/tests/test_async.py
@@ -662,5 +662,5 @@ def test_func(
 #                     value_offset=0))
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_atomic.py b/warp/tests/test_atomic.py
@@ -135,5 +135,5 @@ class TestAtomic(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_bool.py b/warp/tests/test_bool.py
@@ -208,5 +208,5 @@ class TestBool(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_builtins_resolution.py b/warp/tests/test_builtins_resolution.py
@@ -1286,5 +1286,5 @@ def test_vec4_float_args_precision(self):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_bvh.py b/warp/tests/test_bvh.py
@@ -163,5 +163,5 @@ def test_bvh_new_del(self):
 add_function_test(TestBvh, "test_bvh_ray", test_bvh_query_ray, devices=devices)
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_closest_point_edge_edge.py b/warp/tests/test_closest_point_edge_edge.py
@@ -223,5 +223,5 @@ class TestClosestPointEdgeEdgeMethods(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_codegen.py b/warp/tests/test_codegen.py
@@ -558,5 +558,5 @@ class TestCodeGen(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2, failfast=True)
diff --git a/warp/tests/test_compile_consts.py b/warp/tests/test_compile_consts.py
@@ -213,5 +213,5 @@ def test_constant_math(self):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_conditional.py b/warp/tests/test_conditional.py
@@ -240,5 +240,5 @@ class TestConditional(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_copy.py b/warp/tests/test_copy.py
@@ -226,5 +226,5 @@ class TestCopy(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_ctypes.py b/warp/tests/test_ctypes.py
@@ -626,5 +626,5 @@ class TestCTypes(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_dense.py b/warp/tests/test_dense.py
@@ -61,5 +61,5 @@ class TestDense(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_devices.py b/warp/tests/test_devices.py
@@ -85,5 +85,5 @@ def test_devices_unmap_imaginary_device(self):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_dlpack.py b/warp/tests/test_dlpack.py
@@ -524,5 +524,5 @@ class TestDLPack(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_examples.py b/warp/tests/test_examples.py
@@ -398,6 +398,6 @@ class TestFemDiffusionExamples(unittest.TestCase):
 
 if __name__ == "__main__":
     # force rebuild of all kernels
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
 
     unittest.main(verbosity=2, failfast=True)
diff --git a/warp/tests/test_fabricarray.py b/warp/tests/test_fabricarray.py
@@ -951,5 +951,5 @@ def test_fabricarray_new_del(self):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_fast_math.py b/warp/tests/test_fast_math.py
@@ -56,5 +56,5 @@ def test_fast_math_cpu(self):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
diff --git a/warp/tests/test_fem.py b/warp/tests/test_fem.py
@@ -1294,5 +1294,5 @@ class TestFemShapeFunctions(unittest.TestCase):
 
 
 if __name__ == "__main__":
-    wp.build.clear_kernel_cache()
+    wp.clear_kernel_cache()
     unittest.main(verbosity=2)
-Original file line number
+Diff line change
@@ Expand Up @@
         As with kernel parameters, all attributes of a struct must have valid type hints at class definition time.
+    .. _Compilation Model:
     Compilation Model
     -----------------
@@ Expand Down @@