All posts – Tom Shlomo’s Blog

Bilinear Muon

Adam, and even Muon, optimize attention’s query and key matrices as if they were independent. Treating them as the single bilinear form they jointly define yields a family of Muon-style update rules.

Jun 14, 2026

Tom Shlomo

Teaching Rust to Python Developers

Over the last two years, I have developed a strong interest in Rust.
While my day-to-day work primarily involves Python, I have been exploring ways to leverage Rust in our…

Dec 7, 2024

Tom Shlomo

The sparse approximation algorithm no one talks about

A somewhat unique introduction to greedy algorithms for the sparse approximation problem, and proposing an obvious algorithm that seems to be overlooked.

Dec 6, 2024

Tom Shlomo

Efficient leave one out cross validation - part 2

The non quadratic case

Mar 30, 2024

Tom Shlomo

Efficient leave one out cross validation - part 1

The derivation and implementation of a method for leave one out cross validation with neglible extra runtime compared to fitting alone.

Feb 27, 2024

Tom Shlomo

MUSIC as a sparse decomposition method

A unique introduction to the MUSIC algorithm, as a general method to solve the multisnapshot sparse decomposition problem.

Jan 30, 2024

Tom Shlomo

A practical interpertation of the Pearson correlation coefficient

\(\rho=1\) means perfect positive correlation, \(\rho=-1\) means perfect negative correlation, \(\rho=0\) means no correlation. But what does \(\rho=0.72\) mean?

Jan 20, 2024

Tom Shlomo

Augmentation is Regularization

On the equivalence of training data augmentation and quadratic regularization for linear models - a very useful (but not well known) result.

Jan 15, 2024

Tom Shlomo