Tensor Concentration Inequalities (problem 16)
Consider symmetric deterministic tensors $T_1, \ldots, T_n \in (\mathbb{R}^d)^{\otimes r}$ and let $g_1, \ldots, g_n$ be i.i.d. standard gaussian random variables. We are interested in determining upper bounds for the following quantity:
\begin{equation}\label{eq:expmaxmodel} \mathbb{E} \left( \max_{||x||_p \leq 1} \left | \sum_{i=1}^n g_i \langle T_i, x^{\otimes r} \rangle \right | \right) = \mathbb{E} \left \| \sum_{i=1}^n g_i T_i \right \|_{ \mathcal{I}_p} \end{equation}
Here, $ ||x||_p = (\sum_{i=1}^d |x_i|^p)^{\frac1p}$ denotes the $\ell_p$ norm and $\langle T_i, x^{\otimes r} \rangle = \sum_{k_1, \ldots, k_r =1}^d (T_i)_{k_1, \ldots, k_r } x_{k_1} \cdots x_{k_r}$. Moreover, $\left| \left| T_i \right| \right|_{ \mathcal{I}_p} = \max_{||x||_p \leq 1} \left| \langle T_i, x^{\otimes r} \rangle \right|$ is the symmetric injective $\ell_p$ norm. Perhaps surprisingly, this question forms an overlap between coding theory, dispersive partial differential equations, tensor principal component analysis, banach space geometry and gaussian process theory (see (Bandeira et al., 2024)). Particularly tantalizing to me is the following conjecture:
To get a better sense of what this conjecture is saying, let us focus on the case $r = p = 2$, which is the case where we deal with symmetric matrices $M_i$ and their euclidean operator norm. To the best of my knowledge, the first result of this type had first been established by Tomczak-Jaegermann in (Tomczak-Jaegermann, 1974), and a more concrete version was shown by Ahlswede and Winter in (Ahlswede & Winter, 2002), which says that for some universal constant $C>0$ we have
\begin{equation}\label{eq:AWmatrixbound} \mathbb{E} \left \| \sum_{i=1}^n g_i M_i \right \|_{ \mathcal{I}_2} \leq C \sqrt{\log (d+1)} \sqrt{\sum_{i=1}^n \left \| M_i \right \|_{ \mathcal{I}_2}^2 }. \end{equation}
In the scalar case where $d=1$, this statement is of course trivial, as the term on the right side is simply the standard deviation of the gaussian variable on the left multiplied by some constant. The interesting part here is that this inequality tells us that even for much larger $d$, matrices behave similarly to scalars in the sense of this square-root cancellation given by the variance (I consider log factors for now to be a small price to pay). Indeed, the proofs of this inequality mimic techniques used for scalar random variables. The success of this approach lies in the fact that the spectral norm of a symmetric matrix $M$ is well-approximated by traces of higher powers of $M$
\begin{equation}\label{eq:matrixmomentmethod} || M || \leq \operatorname{Tr}(M^{2k})^{\frac{1}{2k}} \leq d^{\frac{1}{2k}} || M ||. \end{equation}
Computing expectations of traces is more manageable and indeed, it is possible to refine \eqref{eq:AWmatrixbound} to arrive at the celebrated noncommutative Khintchine inequality by Lust-Piquard and Pisier (Lust-Piquard, 1986), (Lust-Piquard & Pisier, 1991). Since then numerous generalizations and refinements have appeared (Oliveira, 2010), (Tropp, 2012), (A. S. Bandeira et al., 2023), (Brailovskaya & van Handel, 2024), which have proven to be tremendously useful in applications.
If we return to the case for general $r,p \geq 2$, we unfortunately lose access to \eqref{eq:matrixmomentmethod} and it is generally NP-hard to approximate injective norms, so one has to come up with a different way to approach the general problem. Observe that \eqref{eq:expmaxmodel} is the supremum of a gaussian process, so its expectation is controlled by a specific distance function on $\mathbb{B}_p^d = \{ x \in \mathbb{R}^d : ||x||_p \leq 1 \}$ defined as follows: \begin{equation} D(x,y) = \mathbb{E} \left ( \left | \sum_{i=1}^n g_i \langle T_i, x^{\otimes r}- y^{\otimes r}\rangle \right |^2 \right) \end{equation} If one has decent estimates for the smallest number of balls $\mathcal{N}(\mathbb{B}_p^d,D, \varepsilon)$ with respect to the distance $D$ to cover $\mathbb{B}_p^d$, then one automatically gets a good estimate for the expected injective norm, since Dudley’s entropy integral provides the following inequality:
\begin{equation} \frac{1}{C \log(n)} \int_{0}^{\infty} \sqrt{\log \mathcal{N}(\mathbb{B}_p^d,D, \varepsilon)} d \varepsilon \leq \mathbb{E} \left \| \sum_{i=1}^n g_i T_i \right \|_{ \mathcal{I}_p} \leq C \int_{0}^{\infty} \sqrt{\log \mathcal{N}(\mathbb{B}_p^d,D, \varepsilon)} d \varepsilon \end{equation}
Unfortunately, controlling $\mathcal{N}(\mathbb{B}_p^d,D, \varepsilon)$ seems to still be a remarkably challenging task. In the case $p=2$ Latała provides a covering number bound for this space which is then combined with an intricate chaining-style argument to get a dimension-free bound for the injective $\ell_2$ norm of random tensors that was used to get (optimal!) moment estimates for gaussian chaoses (Latała, 2006). We (independently) extended this covering number estimate for $p \geq 2$ in (Bandeira et al., 2024) to settle \eqref{eq:tensorAWNCK} in the regime $p \geq 2r$. Sadly, a volumetric barrier prevents us from proving \eqref{eq:tensorAWNCK} for $p<2r$, which would consist of very interesting cases for further applications (see (Bandeira et al., 2024)). Even proving it in the known case $r=p=2$ using purely geometric techniques is, to the best of my knowledge, still open.
To keep the post at a reasonable length, I will refer to (Bandeira et al., 2024) for more concrete motivations for this problem and further interesting open questions.
Lastly, let me point out some interesting related work about norms of nonhomogeneous random tensors (tensors with very irregular randomness patterns) outside of the setting $r=p=2$. Quite recently Aden-Ali removed a logarithmic factor and a constant dependency on $p$ in (Aden-Ali, 2025) from our bound in (Bandeira et al., 2024) by hammering the problem with the so-called PAC-Bayesian Lemma, which also provides a slick proof of Latała’s chaos moment estimate. Boedihardjo proved in (Boedihardjo, 2024) a remarkably sharp inequality for tensors with independent entries when $p=2$, which is often tight up to a constant. In the independent entry case for matrices people have gone beyond $p=2$ to prove stunningly precise norm estimates (Adamczak et al., 2024), (Latała & Strzelecka, 2025). The independent entry world seems to allow a fight against logarithmic factors to get dimension-free estimates, while in the general case it seems that we are not even close to proving a crude bound of the form \eqref{eq:tensorAWNCK}. For the case $p=2$ it is also possible to adapt Rudelson’s argument in (Rudelson, 1996) to get the analogous bound for rank-$1$ tensors (though just as in the independent entry case, it is much more straightforward to prove a bound with suboptimal logs). In (Masri & Tonge, 2005) the authors use Hardy space theory to bound the injective $\ell_2$ norm of random hankel tensors and they get in fact a two-sided estimate with the right logarithmic factor. In their paper the tensor has unit-variance rademacher entries, but it can be slightly generalized to allow for other coefficients. In the case $p= \infty$ people have used Kikuchi matrix techniques (Janzer & Manohar, 2024), (Basu et al., 2024) to get bounds for matching tensors, which tend to have nastier covariance structures, as unlike in the independent entry case, the tensors $T_i$ are not sparse. All of the mentioned examples are consistent with \eqref{eq:tensorAWNCK} (and in fact many of them completely outclass it), unfortunately it still seems to be far out of (my) reach.
References
- Bandeira, A. S., Gopi, S., Jiang, H., Lucca, K., & Rothvoss, T. (2024). A Geometric Perspective on the Injective Norm of Sums of Random Tensors. ArXiv Preprint ArXiv:2411.10633.
- Tomczak-Jaegermann, N. (1974). The moduli of smoothness and convexity and the Rademacher averages of trace classes S\sbp(1≤p<∞). Studia Math., 50, 163–182.
- Ahlswede, R., & Winter, A. (2002). Strong converse for identification via quantum channels. IEEE Trans. Inform. Theory, 48(3), 569–579. https://doi.org/10.1109/18.985947
- Lust-Piquard, F. (1986). Inégalités de Khintchine dans C_p;(1<p<∞). C. R. Acad. Sci. Paris Sér. I Math., 303(7), 289–292.
- Lust-Piquard, F., & Pisier, G. (1991). Noncommutative Khintchine and Paley inequalities. Ark. Mat., 29(2), 241–260. https://doi.org/10.1007/BF02384340
- Oliveira, R. I. (2010). Sums of random Hermitian matrices and an inequality by Rudelson. Electron. Commun. Probab., 15, 203–212. https://doi.org/10.1214/ECP.v15-1544
- Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math., 12(4), 389–434. https://doi.org/10.1007/s10208-011-9099-z
- A. S. Bandeira, M. Boedihardjo, & R. van Handel. (2023). Matrix Concentration Inequalities and Free Probability. Inventiones Mathematicae, 234, 419–487.
- Brailovskaya, T., & van Handel, R. (2024). Universality and sharp matrix concentration inequalities. Geom. Funct. Anal., 34(6), 1734–1838. https://doi.org/10.1007/s00039-024-00692-9
- Latała, R. (2006). Estimates of moments and tails of Gaussian chaoses. The Annals of Probability, 34(6), 2315–2331. https://doi.org/10.1214/009117906000000421
- Aden-Ali, I. (2025). On the Injective Norm of Sums of Random Tensors and the Moments of Gaussian Chaoses. ArXiv Preprint ArXiv:2503.10580.
- Boedihardjo, M. T. (2024). Injective norm of random tensors with independent entries. ArXiv Preprint ArXiv:2412.21193.
- Adamczak, R., Prochno, J., Strzelecka, M., & Strzelecki, M. (2024). Norms of structured random matrices. Mathematische Annalen, 388(4), 3463–3527.
- Latała, R., & Strzelecka, M. (2025). Operator \ell_p\to \ell_q norms of Gaussian matrices. ArXiv Preprint ArXiv:2502.02186.
- Rudelson, M. (1996). Random vectors in the isotropic position. ArXiv Preprint Math/9608208.
- Masri, I., & Tonge, A. (2005). Norm estimates for random multilinear Hankel forms. Linear Algebra and Its Applications, 402, 255–262. https://doi.org/https://doi.org/10.1016/j.laa.2005.01.017
- Janzer, O., & Manohar, P. (2024). A k^\fracqq-2 Lower Bound for Odd Query Locally Decodable Codes from Bipartite Kikuchi Graphs. ArXiv Preprint ArXiv:2411.14276.
- Basu, A., Hsieh, J.-T., Kothari, P. K., & Lin, A. D. (2024). Improved Lower Bounds for all Odd-Query Locally Decodable Codes. ArXiv Preprint ArXiv:2411.14361.
Comments powered by Disqus.