Overview

Andrej Karpathy demonstrates that the cost of training GPT-2 level models has dropped dramatically, showing 600X cost reduction over 7 years through hardware and software improvements. What once cost $43K and took a week can now be achieved for $73 in 3 hours.

The Breakdown

  • GPT-2 original training required 32 TPU v3 chips for 168 hours at $43K total cost to achieve 0.256525 CORE score - establishing the baseline performance benchmark
  • Modern nanochat implementation reaches higher CORE score in 3.04 hours on single 8XH100 node for ~$73 - demonstrating 600X cost reduction through optimized training methods
  • Training cost improvements follow consistent trend of 2.5X annual reduction over 7-year period, making advanced AI training accessible to smaller organizations
  • CORE score metric from DCLM paper provides standardized evaluation across 22 benchmarks including ARC and MMLU for comparing model capabilities