3 · Functionals and gradient descent ==================================== A **functional** is a scalar-valued map :math:`F : X \to \mathbb{R}` — an objective to minimise. SpaceCore functionals do one thing that ordinary ``f(x)`` callables do not: their **gradient is an element of the space**, represented in the space’s own geometry. That single design choice is what makes ``x \leftarrow x - \eta\,\nabla F(x)`` *correct* steepest descent, whatever inner product the space carries. **You will learn to** 1. build functionals (``InnerProductFunctional``, ``LinOpQuadraticForm``); 2. write your **own** functional by subclassing ``Functional``; 3. run plain gradient descent and watch it converge on a contour plot; 4. see why the gradient must respect the space geometry. .. code:: python import numpy as np import matplotlib as mpl import matplotlib.pyplot as plt import spacecore as sc # A clean, consistent palette + style for every figure in the tutorials. BLUE, INDIGO, CYAN = "#2563eb", "#4f46e5", "#0891b2" PINK, AMBER, GREEN = "#db2777", "#d97706", "#059669" SLATE, GRID = "#334155", "#e5e9f0" mpl.rcParams.update({ "figure.figsize": (7.2, 4.2), "figure.dpi": 120, "savefig.dpi": 120, "figure.facecolor": "white", "axes.facecolor": "white", "axes.edgecolor": SLATE, "axes.linewidth": 1.0, "axes.grid": True, "axes.axisbelow": True, "grid.color": GRID, "grid.linewidth": 1.0, "axes.spines.top": False, "axes.spines.right": False, "axes.titlesize": 13, "axes.titleweight": "bold", "axes.titlecolor": SLATE, "axes.labelcolor": SLATE, "axes.labelsize": 11, "xtick.color": SLATE, "ytick.color": SLATE, "xtick.labelsize": 10, "ytick.labelsize": 10, "font.size": 11, "legend.frameon": False, "legend.fontsize": 10, "lines.linewidth": 2.4, "lines.markersize": 6, "image.cmap": "magma", }) mpl.rcParams["axes.prop_cycle"] = mpl.cycler( color=[BLUE, PINK, GREEN, AMBER, INDIGO, CYAN]) print("spacecore", sc.__version__, "| numpy", np.__version__) .. parsed-literal:: spacecore 0.4.0 | numpy 2.4.2 .. code:: python ctx = sc.Context(sc.NumpyOps(), dtype=np.float64) 1 · Built-in functionals ------------------------ The two everyday building blocks are a **linear** functional :math:`\ell_c(x) = \langle c, x\rangle` and a **quadratic** form. ``LinOpQuadraticForm(Q, linear, a)`` represents .. math:: F(x) = \tfrac{1}{2}\,\langle x, Q x\rangle_X + \ell(x) + a, \qquad \nabla F(x) = Q x + c, where the linear part is itself an ``InnerProductFunctional``. The gradient comes back as a domain element you can step along directly. .. code:: python X = sc.DenseVectorSpace((3,), ctx) Q = sc.DiagonalLinOp(ctx.asarray([2.0, 3.0, 5.0]), X, ctx) # SPD quadratic part c = ctx.asarray([1.0, 0.0, -1.0]) linear = sc.InnerProductFunctional(c, X) # ℓ(x) = F = sc.LinOpQuadraticForm(Q, linear, a=0.5) x = ctx.asarray([1.0, 2.0, 3.0]) print("F(x) :", float(F.value(x))) # 0.5 xᵀQx + cᵀx + 0.5 print("grad F(x) :", F.grad(x)) # Qx + c print("minimiser :", -np.asarray([1,0,-1]) / np.asarray([2,3,5])) # -c/diag(Q) print("grad at min :", F.grad(ctx.asarray(-np.asarray([1,0,-1]) / np.asarray([2,3,5])))) .. parsed-literal:: F(x) : 28.0 grad F(x) : [ 3. 6. 14.] minimiser : [-0.5 0. 0.2] grad at min : [0. 0. 0.] 2 · Write your own functional ----------------------------- To define a custom objective, subclass ``Functional`` and implement ``value``. If you also implement ``grad``, return a **domain element**: take the raw coordinate gradient :math:`\partial F/\partial x` and pass it through ``domain.riesz_inverse(...)``. For Euclidean geometry that is the identity, so the two agree — but writing it this way makes the functional correct under *any* geometry (we exploit that in §4). Here is a general quadratic bowl :math:`F(x) = \tfrac12 x^\top H x`. .. code:: python class QuadraticBowl(sc.Functional): """F(x) = 0.5 * xᵀ H x, with a metric-aware gradient.""" def __init__(self, H, dom, ctx=None): super().__init__(dom, ctx) self.H = self.domain.ctx.asarray(H) def value(self, x): return 0.5 * self.ops.vdot(x, self.ops.matmul(self.H, x)) def grad(self, x): coord_grad = self.ops.matmul(self.H, x) # cotangent ∂F/∂x = Hx return self.domain.riesz_inverse(coord_grad) # → gradient in the space geometry # pytree hooks (required): leaves are parameters, aux is static structure def tree_flatten(self): return (self.H,), (self.domain, self.ctx) @classmethod def tree_unflatten(cls, aux, children): dom, ctx = aux return cls(children[0], dom, ctx) H = np.array([[3.0, 0.0], [0.0, 1.0]]) X2 = sc.DenseVectorSpace((2,), ctx) bowl = QuadraticBowl(H, X2) print("value at (2,2):", float(bowl.value(ctx.asarray([2.0, 2.0])))) print("grad at (2,2):", bowl.grad(ctx.asarray([2.0, 2.0]))) .. parsed-literal:: value at (2,2): 8.0 grad at (2,2): [6. 2.] 3 · Gradient descent -------------------- With a metric-correct gradient in hand, descent is the textbook loop. We minimise an anisotropic bowl (it is :math:`12\times` steeper in :math:`x_1` than :math:`x_2`) and record the path. .. code:: python def gradient_descent(F, x0, step, n_steps): x = x0 xs, fs = [np.asarray(x).copy()], [float(F.value(x))] for _ in range(n_steps): x = x - step * F.grad(x) # F.grad returns a domain element → valid step xs.append(np.asarray(x).copy()); fs.append(float(F.value(x))) return np.array(xs), np.array(fs) H = np.array([[12.0, 0.0], [0.0, 1.0]]) bowl = QuadraticBowl(H, X2) path, vals = gradient_descent(bowl, ctx.asarray([2.7, 2.7]), step=0.16, n_steps=60) # contour field (F is just 0.5 xᵀHx, evaluated on a grid) g = np.linspace(-3, 3, 240); GX, GY = np.meshgrid(g, g) Z = 0.5 * (H[0, 0] * GX**2 + H[1, 1] * GY**2) fig, axes = plt.subplots(1, 2, figsize=(10.6, 4.4)) axes[0].contour(GX, GY, Z, levels=np.linspace(0.3, 50, 16), colors=GRID, linewidths=1) axes[0].plot(path[:, 0], path[:, 1], color=BLUE, marker="o", ms=3, lw=1.6) axes[0].scatter([0], [0], color=GREEN, s=90, marker="*", zorder=6, label="minimum") axes[0].set_aspect("equal"); axes[0].set_title("Descent path on an anisotropic bowl") axes[0].set_xlabel("$x_1$"); axes[0].set_ylabel("$x_2$"); axes[0].legend() axes[1].semilogy(vals, color=INDIGO) axes[1].set_title("Objective $F(x_k)$"); axes[1].set_xlabel("iteration") axes[1].set_ylabel("value (log)") plt.tight_layout(); plt.show() print("final point:", path[-1], " final value:", vals[-1]) .. image:: 03_functionals_files/03_functionals_8_0.png .. parsed-literal:: final point: [1.81398380e-02 7.72895765e-05] final value: 0.0019743253289275227 The path zig-zags: a single Euclidean step size cannot suit both the steep :math:`x_1` and the shallow :math:`x_2` direction at once. That is a *geometry* problem — and SpaceCore lets us fix it by changing the inner product. 4 · Gradients respect the geometry ---------------------------------- Choosing the inner product is choosing a **preconditioner**. If we give the space the weighted geometry :math:`W = \mathrm{diag}(H)`, the *same* ``QuadraticBowl`` code now produces the gradient :math:`W^{-1} H x` — because ``grad`` ends in ``domain.riesz_inverse(...)``. Steepest descent in this metric heads almost straight at the minimum. Nothing in the functional changed; only the space did. .. code:: python w = ctx.asarray(np.diag(H)) # weights = (12, 1) Xw = sc.DenseVectorSpace((2,), ctx, geometry=sc.WeightedInnerProduct(w)) bowl_e = QuadraticBowl(H, X2) # Euclidean space bowl_w = QuadraticBowl(H, Xw) # weighted space — identical class! path_e, val_e = gradient_descent(bowl_e, ctx.asarray([2.7, 2.7]), step=0.16, n_steps=60) path_w, val_w = gradient_descent(bowl_w, ctx.asarray([2.7, 2.7]), step=0.85, n_steps=60) fig, axes = plt.subplots(1, 2, figsize=(10.6, 4.4)) axes[0].contour(GX, GY, Z, levels=np.linspace(0.3, 50, 16), colors=GRID, linewidths=1) axes[0].plot(path_e[:, 0], path_e[:, 1], color=BLUE, marker="o", ms=3, lw=1.6, label="Euclidean gradient") axes[0].plot(path_w[:, 0], path_w[:, 1], color=PINK, marker="o", ms=3, lw=1.8, label="weighted (natural) gradient") axes[0].scatter([0], [0], color=GREEN, s=90, marker="*", zorder=6) axes[0].set_aspect("equal"); axes[0].set_title("Same functional, two geometries") axes[0].set_xlabel("$x_1$"); axes[0].set_ylabel("$x_2$"); axes[0].legend() axes[1].semilogy(val_e, color=BLUE, label="Euclidean") axes[1].semilogy(val_w, color=PINK, label="weighted") axes[1].set_title("Convergence"); axes[1].set_xlabel("iteration") axes[1].set_ylabel("$F(x_k)$ (log)"); axes[1].legend() plt.tight_layout(); plt.show() print("Euclidean final value:", val_e[-1]) print("weighted final value:", val_w[-1]) .. image:: 03_functionals_files/03_functionals_11_0.png .. parsed-literal:: Euclidean final value: 0.0019743253289275227 weighted final value: 6.406074302637951e-98 The pink path — steepest descent under the metric matched to the problem — converges orders of magnitude faster than the blue Euclidean path, with no change to the objective. This is the seed of **Riemannian / natural-gradient** optimisation, which :doc:`tutorial 7 <07_manifold_descent>` develops on a curved manifold. Recap ----- - A **``Functional``** is a scalar map whose ``grad(x)`` is a *domain element*, so ``x - η·grad`` is a valid step. - Build them with ``InnerProductFunctional`` / ``LinOpQuadraticForm``, or **subclass ``Functional``** and implement ``value`` (+ optionally ``grad``). - When you write ``grad`` by hand, finish with ``domain.riesz_inverse(coord_grad)`` so it is correct in any geometry. - Choosing the inner product preconditions descent — the same code converges far faster under a fitted metric. **Next:** :doc:`4 · Tree spaces <04_tree_spaces>` — structured elements and block operators.