The user attempted to train a GPT-2 small sized base model on their own hardware using the Hugging Face FineWeb-series datasets and achieved a level of performance almost as good as the original in just over 48 hours. They used the Chinchilla heuristic to determine the optimal number of tokens to train for, which resulted in a 3.2B token training run that could be completed in 44 hours using ...