--- title: "FAQ" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{FAQ} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ## Why is my function slow? There can be many reasons why an {anvl} program is not as fast as one might expect. See the [Efficiency](efficiency.html) vignette which explains various pitfals and levers for optimizing program runtime. ## Why does timing my function show suspiciously fast results? Many operations in {anvl} happen asynchronously under the hood (c.f. [Asynchronous execution](efficiency.html#asynchronous-execution)). In order to properly time an {anvl} program, you therefore need to ensure that: 1. All creation of buffers used in the benchmark function have finished. 2. The results of the computation are awaited. This is especially relevant on GPU, where almost all work is dispatched asynchronously and the call into the compiled function returns essentially immediately. On CPU, whether an operation runs asynchronously depends on the FLOPs of the operation: XLA may execute small operations synchronously and only dispatch larger ones to a background thread. As a result, the same benchmarking mistake may produce realistic numbers for some CPU ops and wildly optimistic ones for others -- so always `await()` the result regardless of the device. See also [jax#4218](https://github.com/jax-ml/jax/discussions/4218) for a related discussion in JAX. ```{r} library(anvl) mul_n <- function(x, n) { for (i in 1:n) { x <- x * x } return(x) } # returns immediately x <- nv_array(rnorm(1e8)) # Ensure buffer creation is finished await(x) # Bad (does not capture the whole computation): system.time(mul_n(x, 20)) # Good (also measures actual computation): system.time(await(mul_n(x, 20))) ``` ## How do I control the number of threads used by XLA? XLA (the compiler backend used by `anvl`) may use multiple CPU threads for parallelism. On shared systems such as HPC clusters, it is often necessary to restrict which cores a process can use. The recommended approach is to control thread affinity **from outside the process**, using OS-level tools. On Linux, `taskset` is the most common option: ```bash # Pin the R process to cores 0-3 taskset -c 0-3 Rscript my_script.R ``` See [jax#15866](https://github.com/jax-ml/jax/issues/15866) for more information.