Notes tagged
#llm
2 posts
- 2 min read
The full turn: the model is an API call
The full turn: the model is an API call for kopecks. Two circuits — public on CPU, sovereign on-site. The edge lives in the layer above the model.
- 2 min read
Picking the model for the hardware I have
Picking a model for the trusted-AI build on 24 GB. The bench beat intuition: a MoE beat a dense model twice its size; 'thinking' only got in the way.