Google DeepMind's Gemini 2.0 models enable robots to perform complex tasks without training, using multimodal outputs like text, video, and audio. The Gemini Robotics model allows robots to react to new objects, environments, and instructions without further training, making them highly dextrous and interactive.