Moondream's AI model, Photon, achieves fast inference by overlapping CPU and GPU work using a technique called pipelined decoding, which removes idle time and increases throughput by up to 35%. The technique involves using ping-pong slots, a forward/sampling split, and zombie refcounting to ensure efficient and concurrent execution of AI model steps.