Performance Notes ================= Matrix-backed operators have a small fixed Python wrapper cost around ``apply``, ``rapply``, ``vapply``, and ``rvapply``. In eager NumPy this is typically on the order of 0.5-1 microsecond per call, depending on the Python version and machine. That overhead is visible when repeatedly applying tiny operators inside a Python loop. For arrays above a few thousand elements, the wrapper cost is usually sub-percent relative to BLAS, sparse-matrix, or backend execution time. Batched methods amortize the wrapper cost further: ``vapply`` and ``rvapply`` perform one Python call for the whole leading-axis batch instead of one call per element. Under ``jax.jit``, the wrapper and mode-dispatch logic is trace-time constant and compiles away from the executed computation. If eager NumPy on very small operands is performance-critical, prefer batching with ``vapply``/``rvapply`` or moving the tight loop into a JIT-compatible backend.