Tag: 15.8 TFLOPS

  • M4 Developer Bypasses Apple AI Restrictions

    M4 Developer Bypasses Apple AI Restrictions

    Key Takeaway

    – M4 Neural Engine is locked to inference by default; developer bypassed restrictions to unlock 15.8 TFLOPS for training
    – Custom Model Intermediate Language built from scratch, bypassing Apple’s official tools like CoreML and Metal
    – Workaround uses system RAM instead of NAND flash to maintain speed during intensive training
    – Special execute command respawns stuck processes to prevent crashes during backpropagation
    – Breakthrough proves M4 hardware is fully capable of local AI training despite Apple’s limitations


    Apple’s M4 Neural Engine Unlocked

    Apple’s M4 processors pack a massive amount of AI computing power, but the company has historically kept the hardware tightly locked down. By default, the Neural Engine inside the M4 is restricted entirely to inference. This means developers can only use it to run pre-trained AI models rather than actually training new ones from scratch.

    Hidden Performance Revealed

    How ever, a developer has managed to bypass these strict software limitations, fully reverse-engineering the chip to unlock 15.8 TFLOPS of hidden AI crunching power. The breakthrough comes from a researcher known as 0x0SojalSec, who recently shared code on GitHub detailing how they tapped into the true potential of the M4. What makes this achievement particularly impressive is that it was done completly outside of Apple’s official development ecosystem.

    Custom Software Solution

    Because Apple does not grant the necessary permission levels to communicate directly with the Neural Engine for these advanced tasks, the developer had to figure out a way to work without using standard tools like CoreML, Metal, or even relying on the graphics processing unit. To pull this off, they built a custom Model Intermediate Language from the ground up. This custom software successfully bridged the gap, allowing for full backpropagation and transformer training directly on the Apple Neural Engine.

    Stability Workarounds

    Since the hardware is heavily restricted by design, the developer also had to use some very clever workarounds to keep the system stable. For example, if a process gets stuck during the intensive training phase, the custom language uses a specific execute command to essencially respawn the process. This allows the system to refresh its current state and pick the machine learning right back up without crashing the entire program.

    Memory Optimization

    Speed was also a major factor in getting this heavy workload to run effectively. To ensure the training operated as smoothly as possible, the developer configured the process to write everything entirely to the system RAM. By actively avoiding the much slower NAND flash storage, the entire operation remained increadibly fast. For anyone using an M4 equipped Mac or iPad, this fascinating workaround proves that the silicon is more than capable of handling heavy duty AI training workloads, even if Apple officially prefers to keep those specific capabilities locked away.

    Sources