Researchers evaluated three projection sharing constraints in transformers, finding that sharing query, key, and value projections (Q-K=V) achieves comparable or better performance with 50% cache reduction in language modeling. This approach is complementary to head sharing, enabling up to 96.9% cache reduction for practical on-device inference.