On 67% of real-world user fact-checks, the five strongest frontier LLMs disagree, with at least one model picking a verdict 2 or more buckets away from another. The disagreement is not just about calibration, but also about substantive differences in verdicts, with some models concentrating verdicts at the True/False poles and others distributing more broadly across the middle two buckets.