Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Can gzip be a language model? (nathan.rs)

12 points by nathan-barry 1 days ago | 3 comments

eventualcomp 13 hours ago [-]

Reminds me of this youtube video: https://m.youtube.com/watch?v=jkdWzvMOPuo

I liked the comments explaining why this worked.

nathan-barry 1 days ago [-]

LLMs are very good at lossless compression via arithmetic coding. But I didn't know that it was possible to go the reverse direction (do language modeling via a compressor). It's not super great quality, but I'm surprised it worked! Other compression algorithms (like PPMd) use variable n-grams under the hood, and should be much better (although less interesting due to already containing basic language models internally).

chinallm_ai 1 days ago [-]

This reminds me of the classic "compression = intelligence" argument. If gzip works as a language model at all, it suggests we should be paying more attention to compression ratio as a proxy for understanding, not just benchmark scores. Tangentially — the fact that we're even asking this question in 2026, when LLMs are solving PhD-level problems, says something about how poorly we understand what's happening inside these models.

Rendered at 17:42:41 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.