To estimate the cost of serving AI models, we calculate the dollar cost-per-user based on GPU cluster scale, model architecture, and inference math. With a 32B model on a B200 GPU, we can serve 300 users comfortably, with a lifetime cost of $133 per user or $0.013 per user per hour.