Changes in version 0.4.0 Features - Added QR, LU, SVD, and symmetric eigendecomposition support on both CPU and CUDA via the FFI registration mechanism. - Added an vignette on how to register custom calls via the FFI registration mechanisms with coverage of both CUDA and CPU-specific aspects. - Added support for the bit64 package to better support long integers. - pjrt_buffer(), pjrt_scalar(), and as_array() gain a check argument (default FALSE). When TRUE, the call errors instead of silently losing information: on input if data contains NAs, on output if the materialized R vector contains a value that's indistinguishable from NA or that has wrapped through the integer container. - as_array() on a ui32 buffer now returns a bit64::integer64 instead of a base integer, so values >= 2^31 round-trip losslessly rather than wrapping to negative. Changes in version 0.3.0 Features - Added buffer_copy() function to copy buffer between devices. - New pjrt_register_custom_call() allows external packages to register C/C++ XLA FFI handlers with the PJRT plugin. Registration is deferred until the plugin loads, so handlers can be registered during .onLoad(). - pjrt_device() now returns cached PJRTDevice instances, so repeated calls for the same device yield objects with stable identity (useful for hashing and caching, e.g. in {anvil}'s JIT). Bug fixes - The configure script now uses the protoc compiler from the same installation as the linked protobuf library, preventing version mismatches when multiple protobuf versions are installed. - Compiling a program for a specific CPU device (e.g. cpu:1) now targets that device instead of silently falling back to cpu:0. - Fixed device targeting when compiling against a distributed PJRT client, where global device IDs and local hardware ordinals diverge. Error messages - Improved error message when attempting to use CUDA on unsupported OS or platform. Changes in version 0.2.0 Asynchronous API Operations such as host <-> device transfers and program execution were previously only synchronous. Now, they are asynchronous which has considerable performance benefits, especially on GPU. Specifically: - pjrt_buffer() and pjrt_execute() return immediately, but the returned buffer is not necessarily ready. To await a transfer or computation of a buffer, use await(). However, this is handled within PJRT, so this function never has to be called by a user. - as_array() is still synchronous, but there is now the asynchronous version as_array_async() but this is rarely needed. If used, it returns a PJRTArrayPromise object which can be converted to an R array/vector via value(). - To check whether a PJRTBuffer or PJRTArrayPromise is ready, use is_ready(). Features - Added dtype support for PJRTBuffers via the tengen::dtype S3 generic. "bool" is now accepted as an alias for "i1"/"pred". - Accept DataType objects in the dtype parameter of pjrt_buffer(). - Support device argument in pjrt_compile(). Bug fixes - Protect from segfaults in raw to buffer conversion. - Protect from segfault during device mismatch in pjrt_execute(). Platforms & Installation - Added support for Linux ARM (aarch64) using CPU backend. - Simplified CUDA installation via the cuda12.8 package, which now only requires compatible drivers to be installed. Miscellaneous - The printer for PJRTBuffer now uses "bool" instead of "pred" to avoid discrepancies with {anvl}. Changes in version 0.1.1 Bug fixes - Fix formatting of +-Inf/NaN for f64 Changes in version 0.1.0 - Initial release