Apple Researchers Challenge AI Reasoning Claims With Controlled Puzzle Tests
Apple researchers have found that state-of-the-art “reasoning” AI models like OpenAI’s o3-mini, Gemini (with thinking mode-enabled), Claude 3.7, DeepSeek-R1 face complete performance collapse [PDF] beyond certain complexity thresholds when tested on controllable puzzle environments. The finding raises questions about th … ⌘ Read more