โ Feed
๐ป **Show HN: Needle: We Distilled Gemini Tool Calling into a 26M Model**
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.We were always frustrated by the little effort made towards building agentic models that run on budge...
๐ https://github.com/cactus-compute/needle
#tech #news
Hey HN, Henry here from Cactus. We open-sourced Needle, a 26M parameter function-calling (tool use) model. It runs at 6000 tok/s prefill and 1200 tok/s decode on consumer devices.We were always frustrated by the little effort made towards building agentic models that run on budge...
๐ https://github.com/cactus-compute/needle
#tech #news