Sign in

Topic

ai > concepts

A collection of 4 issues

From PPO to GRPO

2025 marked a maturation of LLM tech. The breakthrough wasn't just more data; it was a fundamental shift in how we use Reinforcement Learning (RL) to train models to reason. Here is the breakdown of how we moved from the complexity of PPO to the streamlined efficiency of

Reasoning in AI (2025)

From “think step by step” to thinking as a first-class system primitive. For a long time, reasoning in language models felt like a discovery. “let’s think step by step” — and the model would suddenly appear more competent. Better at following logic, improved math and coding skills, and less

Reinforcement Learning and RLHF in a nutshell

A very high-level summary of RL. Dall-e's somewhat inaccurate representation of RL state-space RL is an approach to efficiently search state-space of a well-defined system. We can think of the system as a graph with states (vertex) and actions (edges) that lead to

Transformers - Part 1 NLP

Transformer architectures have unlocked tremendous potential in the context of Machine Learning problems. It has become the basic building block for learning and generating all modalities: language, vision, speech. But what changed with Transformers? We had kernel methods available for decades. In short, transformers allows for efficient context-aware learning.