Tag: local AI model

  • Discover Affordable Methods to Run DeepSeek’s 671B AI Model

    Discover Affordable Methods to Run DeepSeek’s 671B AI Model

    Launched on January 20, 2025, the DeepSeek-R1 is a Mixture-of-Experts (MoE) model with a whopping 671B parameters, featuring 37B active parameters for every token. This model is specifically designed for complex reasoning tasks and is capable of handling 128K token inputs while generating outputs of up to 32K tokens. Its unique MoE structure allows it to deliver exceptional performance, all while consuming fewer resources compared to traditional dense models.

    Competitive Performance

    Recent independent tests indicate that the R1 language model performs similarly to OpenAI’s O1, making it a strong contender for important AI applications. Let’s explore what’s necessary to set it up for local use.

    System Requirements

    To run this model, you’ll need a setup that revolves around dual AMD Epyc CPUs and 768GB of DDR5 RAM—no pricey GPUs are required.

    After putting the hardware together, you must install Linux and llama.cpp to get the model up and running. It’s also important to tweak the BIOS by setting NUMA groups to 0, which doubles the efficiency of the RAM for improved performance. You can download the complete 700GB of DeepSeek-R1 weights from Hugging Face.

    Impressive Output

    This configuration can produce 6-8 tokens per second, which is quite impressive for a fully local, high-end AI model. The absence of a GPU isn’t a mistake; it’s by design. Using Q8 quantization (for high quality) on GPUs would demand over 700GB of VRAM, which could cost upwards of $100K. Even with its substantial capabilities, the whole system operates under 400W, showcasing its efficiency.

    For those seeking ultimate control over cutting-edge AI without cloud dependencies or limitations, this innovation is groundbreaking. It demonstrates that advanced AI can function locally in a completely open-source manner, while ensuring data privacy, reducing risks of breaches, and cutting off reliance on external platforms.

    Source:
    Link