Function Notation : Evaluation

Adam Henderson

June 1, 2024

“What is the most efficient notation?” is a recurring question when I read math, machine learning, or other technical publications. Which is followed by the debate of working with the default syntax of the text - or mapping to my own preferred syntax.

For example : Do we convert expressions in “conventional matrix notation” \(A^TBv\) into abstract index notation \(A_{ba} B_{bc} v_c\)?

Given the repetition of the question and redundant efforts I want to collect observations on syntax to streamline this debate. I’m also inspired by deeper investigations into syntax from likes of Djikstra (On Notation and Adopted Notation), Knuth, and Iverson.

I’ll start with the most recent obsession of syntax for functions. There are a few types of function notation that frequently occur :

Evaluating a function \(f\) at a value \(x\).
Function composition - applying function \(g\) to the output of \(f\).
Expressing “\(f\) is a function from \(X\) to \(Y\)”.
Expressing the set of all functions \(X\) to \(Y\).
Currying

To keep this short I’ll focus further on function evaluation.

Notations for Function Evaluation :

Function evaluation, or applying a function to an input, can be expressed in various ways, each with specific advantages and context-dependent appropriateness.

Univariate

Bracketed :
- \(f(x)\) : Common notation for math & most programming languages.
  - Reading time: Unambiguous, universally familiar. Nesting reads inside-out: \(h(g(f(x)))\).
  - Ease of manipulation: Composition is well-supported by convention but reads right-to-left — evaluation order runs against writing order.
- \(f[x]\) : Some use for functions whose domains are spaces of functions.
  - Reading time: Requires domain knowledge to disambiguate from array/sequence indexing. High ambiguity cost outside functional analysis.
  - Ease of manipulation: Same as \(f(x)\) once interpretation is settled, but the ambiguity tax rarely makes it worth it.
Subscript/Superscript : \(f_x\)/\(f^{x}\)
- Coordinates of a vector \(v_i\); element of a sequence \(x_n\) — these are subscript function evaluation, not conflicts with it.
- Reading time: Subscript carries a strong convention of discrete indexing — generalizing to arbitrary domains feels like fighting the notation. Superscript is genuinely ambiguous with exponentiation: \(f^x\) reads as a power before it reads as evaluation.
- Ease of manipulation: Natural and compact within its intended domain (sequences, coordinates, tensors). Chains poorly outside it.
Dijkstra period: \(f.x\)
- Reading time: Compact and left-to-right, but conflicts with decimal notation and has near-zero adoption outside Dijkstra’s own writing.
- Ease of manipulation: Composes naturally left-to-right — \(f.g.x = f(g(x))\). The adoption cost makes it impractical regardless of its merits.
Juxtaposition : \(fx\)
- Category theory: application of functor \(Sf\); linear algebra: \(Ax\) for matrix \(A\), vector \(x\).
- Reading time: Minimal character count, maximum ambiguity — is \(fx\) one symbol or two? Only works where strong domain conventions are established.
- Ease of manipulation: Composition is just \(gfx\) — extremely compact in purely functional or linear settings. Clean at the cost of parseability.
Function Application :
- apply(f, x)
- Lisp Style : (f x)
  - Reading time: Unambiguous, handles nesting uniformly (f (g x)). Character overhead grows with depth. Unfamiliar outside Lisp/ML-family languages.
  - Ease of manipulation: Nesting is structurally explicit. Composition is regular with no special cases.
Pipe / Reverse Application : x |> f
- Reading time: Unambiguous, left-to-right data flow — matches the order of application. Low parsing overhead for chains.
- Ease of manipulation: Excellent for linear pipelines x |> f |> g |> h; breaks down when the argument is not the last parameter or when nesting is required.

Why So Many?

Why can’t we just pick one and standardize? The “best” notation depends on context. The key properties to balance are :

Reading Time as driven by ambiguity of parsing, ease of parsing, reliance on context and backtracking, amount of redundancy,consistency with notation in related domains, and character count. These are in order of importance (to me) with character count being dangerous to directly optimize for and unambiguous parsing being table stakes.

Ease of Manipulation : Does the syntax make it easier to perform common transformations / calculations?

Multivariate

Comma Delimited :
- \(f(x,y)\), \(f_{x,y}\)
- Standard notation emphasizing the function’s domain as a product space, \(f : X_1 \times X_2 .. \to Y\)
Parentheses w/ Different Delimiter :
- Vert - Conditional Probability (single param, single variable case) : \(P(x \vert \phi)\)
- Semicolon - May be used to separate variables from fixed parameters - link - \(f(x;y)\)
Curry Everything : \(f(x)(y)\)
Lisp Style : (f x y)
Mixed Syntax :
- \(f_i(x)\)
- \(\rho_{\theta}(x \vert \mu)\)
Einstein / Index Notation : \(A_i^{\;j} u^i\), \(T_{ijk} u^i v^j w^k\)
- Applies to multilinear maps — tensor contraction generalizes matrix-vector products to arbitrary rank.
- Index position encodes the type of binding: up vs down distinguishes the map from its dual. \(A_i^{\;j}v^i\) applies \(A\); \(A_i^{\;j}u_j\) applies \(A^T\). The covariant/contravariant distinction that \(f(x,y)\) drops entirely is carried explicitly in the notation.
- Binding is by name, not position: \(A_i^{\;j}u^i\) and \(u^i A_i^{\;j}\) are the same expression — index labels are named arguments, so reordering is free.
- Does not extend beyond multilinear maps — nonlinear functions of tensors fall outside the summation convention.
- Natural generalization: the binding-by-name, typed-wire semantics of Einstein notation are exactly those of string diagrams in symmetric monoidal categories. String diagrams are the right generalization — but diagrammatic representations resist automatic parsing. The open question is the right textual syntax for the same semantics beyond the linear case. Existing moves: einsum-style notation, Catlab.jl (programmatic wiring diagrams in Julia), and the formal term syntax of free symmetric monoidal categories. A full treatment deserves its own post.

Equivalent Spaces of Functions but Different Emphasis

The different notations for multivariate functions emphasize different isomorphic spaces of functions. The set of functions \(f: X \times Y \to Z\) is equivalent to the set of functions \(f: X \to (Y \to Z)\) or \(f: Y \to (X \to Z)\). Each is tied to a family of notations for multivariate functions.

Comma delimited \(f(x,y)\) => \(f: X \times Y \to Z\)
Currying \(f(x)(y)\) => \(f: Y \to (X \to Z)\)
Parameters \(f(x ; y)\) => \(f: Y \to (X \to Z)\)

Mixed syntax behaves similar to currying

Vector valued function \(v_i(x)\) => \(v : X \to (I \to Y)\)

The variety of syntax is valuable to emphasizing different isomorphic but practically different representations.

Positional vs Named Inputs

This distinction cuts across all the notation families above — it is a property of how arguments are bound, independent of the evaluation syntax chosen.

Positional: \(f(2, 3, \text{"Fred"})\) — inputs provided in the order the function expects. Concise, but the meaning of each position must be remembered or recovered from context.

Named: \(f(x=2, y=3, \text{cat}=\text{"fred"})\) — each argument explicitly labeled. Self-documenting, at the cost of verbosity.

Einstein notation is the clearest example of named binding in mathematical notation — index labels are argument names, and reordering is free because the contraction is determined by label matching, not position. Index height additionally encodes which slot of the dual pairing is being filled, a distinction positional notation drops entirely.

Special Cases

There are functions that occur so commonly in their associated domain they get special compact syntax. For example there is a wide variety of two arguments functions which use bracket syntax without a function name * Norms : \(\vert x\vert\), \(\vert \vert x \vert \vert\) * Brackets : Commutator \([x,y]\), Poisson \(\\{x,y\\}\) * Inner products : \((x, y)\), \(\langle x \vert y \rangle\).

Infix notation (\(x+y\), \(f \circ g\)) is especially valuable for associative binary operations where \(+(x, +(y, +(z, w))))\) is awful, but \(x + y + z + w\) is easy to read.

There is similarly a large family of unary functions that occur commonly enough to show up as little “decorations” on the arguments (\(\bar{x}\), \(x^*\), \(x^{\dagger}\)). These are most common in “involutions” where are their own inverse, so that painful expression like \(x^{\dagger \dagger \dagger}\) don’t occur.

My Current Preference

Adopted: a mixed use of parentheses \(f(x)\) and subscript/superscript.

Avoiding: juxtaposition, and mixing \(f(x)\) with \(g[x]\) in the same context.

The variety of notation is a feature — each choice encodes a different emphasis on the underlying function space, and the right choice depends on what structure you are trying to make visible.