Category: Artificial intelligence

  • Open Source AI Video Generator Pyramid Flow Now Online

    Open Source AI Video Generator Pyramid Flow Now Online

    Already gaining traction in YouTube tutorial clips, Pyramid Flow is an innovative AI system trained on freely available datasets, amounting to about 10 million videos. This project is a collaborative effort between AI specialists from Peking University, Kuaishou Technology, and Beijing University of Posts and Telecommunications. Notably, Pyramid Flow is itself open-source. Licensed under the MIT License, it can produce virtual high-resolution (768p) video content, and it particularly excels at 384p. Its developers claim that it can generate a five-second video in under a minute, utilizing an A100 GPU in an unspecified hardware setup.

    Performance Insights

    In various situations, Pyramid Flow performs exceptionally well. Nevertheless, when handling certain text prompts, the output can be inadequate. Like many generative AI tools, there is a degree of unpredictability involved with this model. On the positive side, Pyramid Flow requires significantly less computational power compared to its rivals. Furthermore, since its code is open-source, those who are interested can implement it in local or cloud settings without any licensing concerns.

    Copyright Concerns

    While the AI team behind Pyramid Flow has provided a list of all datasets used for its training, they did not address potential copyright issues that could arise. Some content creators argue that using open-source materials to make virtual videos infringes on the rights of copyright owners. Nevertheless, Pyramid Flow might be beneficial for refining such content without needing to engage third parties.

    Pyramid Flow (on GitHub, via Tech Xplore)

  • Tech Firms Shift from Green Energy as AI Demand Soars

    Tech Firms Shift from Green Energy as AI Demand Soars

    AI usage has rapidly expanded recently, leading tech giants like Microsoft to consider nuclear energy. This shift is driven by the rise of generative AI chatbots, such as OpenAI’s ChatGPT, and integrated AI tools like Microsoft CoPilot in Windows 11. The surge in demand for data center power is so significant that wind and solar energy alone cannot fulfill it.

    Future Power Demand

    According to McKinsey & Company, the demand for power in data centers is expected to rise from 3.7 percent of total power consumption in the US to 11.7 percent by the decade’s end. Morgan Stanley also predicts that global CO2 emissions will increase from 200 million tons to 600 million tons due to the expansion of data centers.

    In Memphis, Tennessee, a data center that trains and runs the Grok 3 AI from X seeks to raise its power needs from 50 MW to 150 MW. This amount of power could supply electricity to around 80,000 homes. Additionally, the facility consumes 30,000 gallons of water daily from underground wells for cooling purposes.

    The Energy Challenge

    The energy requirements of AI models stem from the vast number of calculations needed to answer user queries. Researchers from the University of California, Riverside, in collaboration with the Washington Post, found that generating a simple 100-word email using OpenAI’s GPT-4 AI necessitates a bottle of water for cooling and enough electricity to run 14 light bulbs for an hour.

    Constructing power plants and electrical transmission systems is a slow process. Many energy companies are already dealing with shortages of power distribution units, switchgear, and transformers, leading to delays that can exceed a year. Power generation in various regions near current data centers is either at capacity or nearing it, causing rolling blackouts in areas like California.

    Nuclear Energy as a Solution

    In response, tech firms are increasingly looking to nuclear power to satisfy their electricity needs for AI data centers. These nuclear plants can produce large quantities of energy without requiring as much land as solar and wind farms. Additionally, nuclear energy isn’t reliant on sunlight or wind conditions.

    Microsoft is not only funding the development of a new nuclear power facility but has also invested in restarting a reactor at the notorious Three Mile Island nuclear power plant, which was the site of a nuclear meltdown in 1979. This incident released radioactive gases into the atmosphere, marking it as the most severe nuclear disaster in the US, although it is less catastrophic compared to the Chernobyl and Fukushima disasters.

    Waste Disposal Concerns

    Nuclear power stations in the US produce highly dangerous radioactive waste. Regrettably, the government has yet to determine a long-term disposal solution for this waste following the cessation of funding for the Yucca Mountain nuclear waste repository during the Obama administration.

    For those looking to make a positive impact on the environment, purchasing a solar panel kit (like one available on Amazon) can help charge laptops and phones using solar energy. AI enthusiasts may also consider running AI LLM models on solar-powered laptops at home, instead of relying on nuclear-powered data centers.

    Sources include McKinsey & Company, WSJ, Washington Post, Constellation Energy, MIT Technology Review, Time, CBS Evening News on YouTube, Nuclear Energy Institute, and The Register.


    Image 1
    Image 1
    Image 1
  • Humans Outperform AI, Says Apple-Funded Study

    Humans Outperform AI, Says Apple-Funded Study

    Earlier this month, a group of six AI experts supported by Apple released a study introducing GSM-Symbolic, a new benchmark for AI that "allows for more controllable assessments, giving important insights and more dependable metrics for evaluating the reasoning abilities of models." Unfortunately, it appears that large language models (LLMs) still face significant limitations and are missing even the most fundamental reasoning skills, as shown by initial tests using GSM-Symbolic with AI systems from major companies like Meta and OpenAI.

    Issues with Current Models

    The research pointed out a major issue with current models, which is their lack of consistency when faced with similar questions. The findings indicated that minor changes in wording, which wouldn’t change the meaning for a human, often result in varied responses from AI systems. No specific model was identified as performing notably well.

    The report stated, "In particular, the effectiveness of all models drops [even] when just the numerical values in the question are modified in the GSM-Symbolic benchmark." It also found that "the weakness of mathematical reasoning in these models [shows] that their performance worsens significantly as the number of clauses in a question goes up."

    Study Details

    This 22-page study is accessible here (PDF file). The final two pages include problems with some irrelevant details added at the end, which shouldn’t change the answer for a human. Yet, the AI systems considered these parts, leading to incorrect answers.

    In conclusion, AI systems remain trapped in pattern recognition and still do not possess general problem-solving skills. This year saw the introduction of several LLMs, including Meta AI’s Llama 3.1, Nvidia’s Nemotron-4, Anthropic’s Claude 3, the Fugaku-LLM from Japan (the largest model ever trained solely on CPU power), and Nova by Rubik’s AI, which was launched earlier this month.

    Upcoming Publication

    Tomorrow, O’Reilly will publish the first edition of "Hands-On Large Language Models: Language Understanding and Generation" by Jay Alammar and Maarten Grootendorst. It is priced at $48.99 for the Kindle edition and $59.13 for the paperback version.

  • New AI Scam Calls Threaten Billions of Gmail Users: Experts Warn

    New AI Scam Calls Threaten Billions of Gmail Users: Experts Warn

    A surge in AI-driven scams is now aiming at Gmail users, and even experienced professionals are struggling to dodge them. These phishing schemes, which imitate Google support, are becoming increasingly clever, and it’s alarming when experts in the field raise the red flag. Sam Mitrovic, a consultant at Microsoft, recently recounted how he nearly fell prey to a very convincing scam phone call.

    The Start of a Deceptive Scheme

    It all began with what seemed like a normal notification about a Gmail account recovery. Mitrovic decided to ignore it, but about 40 minutes later, he received a call from someone claiming to be from Google support. The caller, who spoke with an American accent, inquired whether Mitrovic had logged in from Germany and asserted that someone had been accessing his account for a week. Although Mitrovic sidestepped the trap, he highlighted just how polished and believable the scam was, even replicating Google’s official phone numbers (in his case, an Australian number) to lend it more authenticity.

    Another Victim’s Close Call

    Garry Tan, a venture capitalist and the founder of Y Combinator, also alerted others about a similar phishing scheme. In his instance, the scam suggested that a family member had submitted a death certificate to retrieve his account. The AI-powered caller pressured Tan to confirm his identity in a manner that was meant to induce panic, similar to Mitrovic’s experience.

    These scams are evidently leveraging AI’s capability to mimic genuine conversations and fabricate real Google processes. The attackers are even utilizing tools like Google Forms to enhance the authenticity of their scams, tricking users into thinking the threat is genuine. Both Mitrovic and Tan caution that anyone, no matter their level of tech savvy, could be caught off guard by these advanced strategies—especially in the wrong moment or situation. Moreover, these scams are likely to become more challenging to identify as AI technology evolves.

    Google’s Response to the Threat

    To combat these dangers, Google has teamed up with the Global Anti-Scam Alliance and the DNS Research Federation to introduce the Global Signal Exchange. This initiative aims to share real-time information about scams across various sectors. Furthermore, Google’s Advanced Protection Program now includes support for passkeys, providing an additional layer of security that could determine whether you keep your account or lose it.


    Image 1
  • MIT’s Future You AI: Chat with Your 60-Year-Old Self for Motivation

    MIT’s Future You AI: Chat with Your 60-Year-Old Self for Motivation

    MIT Media researchers have introduced the Future You demo web service, which allows young individuals to engage in conversations with AI representations of their 60-year-old selves. This innovative simulation utilizes the OpenAI GPT-3.5 large-language model alongside StyleCLIP image aging software.

    Impact of Mental Health on Youth

    In the United States, mental health challenges are more pronounced among younger individuals than their older counterparts. Factors like mass shootings, cyberbullying, and excessive social media engagement contribute to this rise in stress. Even a simple phrase like ‘cat lady’ can ignite strong reactions, particularly among fans of Taylor Swift.

    The Cost of Therapy

    Modern mental health treatments, such as talk therapy, can be prohibitively expensive for those without health insurance. Sessions with therapists typically cost between $50 and $400 per hour, which poses a significant barrier for many young people earning low wages.

    Future Self-Continuity Theory

    The researchers build on the idea of future self-continuity, a concept explored by Hershfield in 2011. He notes that "when the future self resembles the present self, is depicted in realistic and vivid terms, and is viewed positively, individuals are more inclined to make decisions today that could be beneficial in the future."

    Study Details and Findings

    The study involved 344 English-speaking participants aged 18 to 30. They were divided into two groups: one interacting with the Future You AI for about half an hour and the other only filling out surveys or talking to a standard AI chatbot. The results showed that those who used the Future You service reported improved well-being, with decreased anxiety and boosted motivation compared to the control group.

    Limitations and Considerations

    While the initial findings suggest promising potential for AI-assisted therapy, challenges like AI bias and hallucinations must be addressed before these tools can be safely implemented. For those feeling down, a comforting teddy bear (like those available on Amazon) could be a great source of comfort. Additionally, anyone in the U.S. struggling with mental health can reach out to the 988 Lifeline for support at any time.

    MIT Media Lab, Future You: A Conversation with an AI-Generated Future Self Reduces Anxiety, Negative Emotions, and Increases Future Self-Continuity paper, MIT news release, MIT on YouTube

    AI simulation offers a view into one’s potential future self

    By facilitating conversations with an older version of oneself, Future You aims to alleviate anxiety and help young individuals make informed choices.


    Image 1
    Image 1
  • Meta Launches Movie Gen AI for Quick Video and Music Creation

    Meta Launches Movie Gen AI for Quick Video and Music Creation

    Meta has introduced Movie Gen, an advanced AI that can produce and edit videos while incorporating music and sound effects based on text prompts. This AI stands out due to its exceptional video and audio generation abilities, offering features and realism that surpass those of any other AI available.

    AI Specifications

    Movie Gen is built on a 30-billion parameter AI model that can create 16-second HD clips from text prompts. It has been pre-trained with one billion images and 100 million videos, selected from a much larger dataset to enhance quality for training purposes. On the audio side, Movie Gen Audio utilizes a 13-billion parameter model designed to generate 48 kHz sound effects and music from text prompts, having been pre-trained on one million hours of audio. The AI has been improved through human feedback along with high-quality audio and video samples.

    Realistic Video Generation

    When provided with a photo of a person and a description of that person in a specific scene, the AI can produce a realistic video featuring an animated actor in that environment. It has been programmed with knowledge of 22 different camera motions and positions, such as wide angle, tilt up, and truck left, allowing filmmakers to determine virtual camera placement and movements similar to actual filming. For filmmakers who prefer traditional methods, high-end DSLRs like the Nikon Z6III, available on Amazon, can still be utilized. Interestingly, Movie Gen is capable of editing videos in a way that is both precise and realistic, which is something other AIs currently struggle to achieve.

    Audio Integration and Limitations

    Additionally, text prompts can be used to incorporate professional-quality audio into the video clips, featuring sound effects and music scores. While the AI can generate music lasting several minutes, it is restricted to 16-second video clips due to the significant computing power required. The audio is synchronized with the scene’s beats and can produce off-screen sounds, like birds chirping in a forest, based on the scene’s context.

    Meta is actively working on implementing safeguards for Movie Gen and plans to launch the AI once it is assured of its safety.


    Image 1
    Image 1
    Image 1
    Image 1
  • Google Lens Introduces Video and Voice Search Features

    Google Lens Introduces Video and Voice Search Features

    Google has introduced new voice and video search features for Google Lens during the I/O 2024 event back in May. Now users can easily long-press and ask questions using their voice, making the search process much simpler and more convenient.

    Custom Gemini Model Powers Video Search

    I/O event preview

    Enhanced Interaction with Google Lens

    Once Lens starts capturing video, users can pose questions about what they observe. For instance, when asked, “Why are they swimming together?” the Lens responded through Google Gemini. This video search capability allows users to present their phone with moving objects and inquire about them, enhancing the usefulness of Google Lens in various situations. To access this feature, users can participate in the “AI Overviews and more” experiment within Search Labs.

    Rajan Patel, Google’s vice president of engineering, explained how the feature operates. Google captures the video as a series of image frames, applying existing computer vision techniques used in Lens. Importantly, the responses are generated by a custom Gemini model designed to interpret multiple frames in sequence. Once the frames are processed, the model pulls relevant information from the web to formulate an answer.

    In conclusion, this development effectively utilizes existing technology, adding significant value to Google Lens.


    Image 1
  • OneUI 7 to Feature Useful Apple Intelligence Tool

    OneUI 7 to Feature Useful Apple Intelligence Tool

    All the latest events showcasing high-end smartphones have focused on AI, and this trend is set to persist. Samsung’s OneUI 6.1 already includes numerous useful AI tools, and a leak hints that the next version, OneUI 7, may introduce a feature akin to Apple’s AI search in its Gallery app.

    AI Search Feature in OneUI 7’s Gallery App

    The AI capabilities will allow users to search their photo collections more effectively. Instead of endlessly scrolling through countless screenshots to find a specific image, users will be able to simply search for it. This not only streamlines the process but greatly enhances the overall experience for users.

    The information comes from the credible source ICE Universe, who also mentioned that the Gallery app in OneUI 7 will receive further enhancements. However, there were no details shared about other potential updates.

    Current AI Features in Samsung’s Galaxy Devices

    At present, some of the Galaxy AI tools include Circle to Search, various note-taking and summarization options, as well as translation and transcription capabilities that function with third-party applications like WhatsApp, among others.

    Xiaomi 14T / 14T Pro was released

    Timeline for OneUI 7 Release

    As for when OneUI 7 will be rolled out, there are no set dates as of yet. Samsung generally unveils its S series flagship models alongside a significant update. Therefore, it’s likely that the Galaxy S25 Ultra will be the first device to showcase OneUI 7, expected to launch in early 2025, potentially in January.

    In addition to AI advancements, the flagship model will feature a significant redesign. Previous rumors have indicated that Samsung is moving towards rounded corners for the S25 Ultra, which should make it feel much more comfortable in hand.

  • SenseRobot Launches AI Chess Robot with 2,900 ELO for Kids

    SenseRobot Launches AI Chess Robot with 2,900 ELO for Kids

    SenseRobot has introduced the SenseRobot Chess, an innovative AI robotic chess coach designed to assist children in improving their chess strategies. This robot offers a wide range of difficulty levels, spanning from 200 to 2,900 ELO, making it suitable for both novices and seasoned players. In its first match, it successfully outplayed Hou Yifan, the world’s top active female grandmaster, who has a standard ELO of 2,633.

    Advanced Features

    The SenseRobot Chess features automatic player log-in through facial recognition, enabling it to remember player settings. It is equipped with a camera that recognizes chess pieces in 3D and has a unique three-fingered claw for moving them. This robotic coach is capable of managing over 145 endgame scenarios and offers 2,000 training exercises, along with the ability to reset pieces for new games. This design minimizes the hassle of moving pieces around when practicing solo. Additionally, it is built to be pinch-free, ensuring safety for younger users, unlike the Russian Konstantin Kosteniuk chess robot, which has been known to injure fingers.

    Learning and Playing

    While playing, the robotic coach gives verbal feedback and advice on moves, aiding children in their chess learning process as it plays against them. It includes a collection of a hundred games played by chess masters to assist children in cultivating advanced strategies and tactics. Furthermore, the robot supports remote chess gaming through Lichess, allowing players to connect globally. All games played can be recorded and shared for later review.

    The SenseRobot Chess can be purchased on JD.com for an MSRP of 4,799 yuan (~$680). Individuals who are unable to import it from China might consider alternative options, like an AI-powered talking chess board available on Amazon. For those curious about the evolution of chess computers, there are resources discussing the first computer that defeated a grandmaster, which can be found in a book on Amazon.

    SenseRobot, SenseRobot press release

    SenseRobot Logo (PRNewsfoto/SenseRobot)


    Image 1
    Image 1
  • Should Nvidia Be Concerned About Huawei’s Rising AI Chips?

    Should Nvidia Be Concerned About Huawei’s Rising AI Chips?

    Huawei is currently testing its new AI chip, the Ascend 910C, with potential clients in China. This chip is designed to serve as a robust alternative to Nvidia’s top-tier GPUs, particularly following US restrictions that have limited Nvidia’s sales in China. Samples of the Ascend 910C have been provided to major server companies in China for testing and hardware setup.

    Upgraded Technology

    The Ascend 910C is an enhanced version of Huawei’s Ascend 910B chip, which has already been utilized in various sectors within China as a substitute for Nvidia’s A100 chip, particularly in AI training applications.

    Consequences of US Sanctions on Nvidia

    Since August 2022, US sanctions have barred Nvidia from selling its A100 and H100 GPUs to China. In response, Nvidia created modified versions, including the A800 and H800; however, these too faced additional export restrictions in 2023. Despite these challenges, Nvidia continues to be a significant player in China’s AI market, introducing new products such as the H20, L20, and L2 GPUs. The H20 chip is anticipated to generate substantial revenue in China, with expected sales reaching US$12 billion in 2024, despite previous low demand.expected sales reaching US$12 billion

    Huawei’s Expanding Role in China

    The US sanctions imposed on Nvidia have opened doors for Huawei to enhance its AI infrastructure and computing capabilities in China. Eric Xu Zhijun, Huawei’s rotating chairman, highlighted that the company has established two computing divisions over the past five years to bolster the domestic AI sector. This strategic move has positioned Huawei as a formidable competitor in the AI chip industry.

    While Huawei’s AI chips, including the Ascend 910C, show significant promise, the company does encounter challenges. Huawei generally packages its AI chips with additional services, such as network and storage solutions, which might dissuade some potential clients. Moreover, many of Huawei’s AI chips currently in use are still the older 910B models.

    As the competition between Huawei and Nvidia escalates, Huawei’s ongoing advancements in AI technology may enable it to become a pivotal player in China’s AI chip market, especially as it strives for greater self-sufficiency in semiconductor manufacturing.