Sign in

Sahil Thaker

From PPO to GRPO

2025 marked a maturation of LLM tech. The breakthrough wasn't just more data; it was a fundamental shift in how we use Reinforcement Learning (RL) to train models to reason. Here is the breakdown of how we moved from the complexity of PPO to the streamlined efficiency of

Reasoning in AI (2025)

From “think step by step” to thinking as a first-class system primitive. For a long time, reasoning in language models felt like a discovery. “let’s think step by step” — and the model would suddenly appear more competent. Better at following logic, improved math and coding skills, and less likely

What is AGI and Super-intelligence

I’ve come across technologists in two different camps: Camp A talks about AGI like a milestone: “AGI will be achieved”, or that “No way will we achieve AGI with current algorithms”. Camp B considers AGI or Superintelligence as an ill-defined term, therefore purely a marketing play. I want to

Building AI Agents

Our transition from heuristic to goal-based software is just beginning, and while we have lessons from the content-era (search, ads, feed) on applying predictive ML to consumer facing products, GenAI opens up a whole new vector of product application of Machine Learning (AI) tech. Traditional software practices focus on deterministic

Expansive AI

There are few narratives on the impact of AI*. I’d like to propose 1 more. 1. AI is a Sustaining technology 2. AI is a Disruptive technology Sustaining Technology Under this view, advantage lies with the incumbent. Existing businesses benefit due to their distribution, proprietary datasets, and infrastructure advantages.

Ethereum's Evolving Architecture

Ethereum infrastructure is constantly evolving. The big change in 2022 was the move from Proof-of-Work (PoW) to Proof-of-Stake (PoS) consensus. This upgrade also came with a big architectural shift: from monolith to modular. The direction of modular architecture starts decoupling responsibilities across various parts of the network, creating new roles

How long can public clouds keep growing?

Public Cloud Infrastructure adoption is expected to grow at CAGR 20%. What’s more interesting is seeing whether it can sustain this growth. * 2024 Revenue is forecasted to reach over $300B. A 20% CAGR is stellar for an industry this large. * Its biggest component revolves around data - Analytics, Warehousing,

Coding Assistant AI across large codebases

I was surprised to find that Github Copilot still doesn't support answering questions over your entire codebase through @workspace /explain command. This feature seems to be part of their roadmap. It got me thinking what the real challenges are behind scaling reasoning across entire codebase given that they

Copyright and AI

How could copyright laws and the legal intellectual property framework impact the development of AI technology? Generative AI has resurfaced questions around intellectual property rights in the era of AI based automation. Many of the GenAI products (OpenAI, Stability, and MidJourney) are already facing lawsuits around copyright infringement the precedents

Reinforcement Learning and RLHF in a nutshell

A very high-level summary of RL. Dall-e's somewhat inaccurate representation of RL state-space RL is an approach to efficiently search state-space of a well-defined system. We can think of the system as a graph with states (vertex) and actions (edges) that lead to change in state. In its