Notes tagged
1 posts
$330, 16 hours on an H200, #9 of 30 on EduBench-RU. The 32B version I trained in parallel cost 3× the GPU and scored worse.