Quoting Andrej Karpathy

Overview

Andrej Karpathy demonstrates that the cost of training GPT-2 level models has dropped dramatically, showing 600X cost reduction over 7 years through hardware and software improvements. What once cost $43K and took a week can now be achieved for $73 in 3 hours.

View Original

The Breakdown

GPT-2 original training required 32 TPU v3 chips for 168 hours at $43K total cost to achieve 0.256525 CORE score - establishing the baseline performance benchmark
Modern nanochat implementation reaches higher CORE score in 3.04 hours on single 8XH100 node for ~$73 - demonstrating 600X cost reduction through optimized training methods
Training cost improvements follow consistent trend of 2.5X annual reduction over 7-year period, making advanced AI training accessible to smaller organizations
CORE score metric from DCLM paper provides standardized evaluation across 22 benchmarks including ARC and MMLU for comparing model capabilities