Performance Notes#
Matrix-backed operators have a small fixed Python wrapper cost around
apply, rapply, vapply, and rvapply. In eager NumPy this is
typically on the order of 0.5-1 microsecond per call, depending on the Python
version and machine. That overhead is visible when repeatedly applying tiny
operators inside a Python loop.
For arrays above a few thousand elements, the wrapper cost is usually
sub-percent relative to BLAS, sparse-matrix, or backend execution time. Batched
methods amortize the wrapper cost further: vapply and rvapply perform
one Python call for the whole leading-axis batch instead of one call per
element.
Under jax.jit, the wrapper and mode-dispatch logic is trace-time constant
and compiles away from the executed computation. If eager NumPy on very small
operands is performance-critical, prefer batching with vapply/rvapply
or moving the tight loop into a JIT-compatible backend.