Category: Artificial intelligence

  • Grok Unveils Advanced Image Generation Model with Text and Face Features

    Grok Unveils Advanced Image Generation Model with Text and Face Features

    xAI has recently introduced an image generation feature to Grok, marking a significant enhancement for the platform. Currently, this feature is accessible to X users in select countries, with a worldwide launch anticipated in approximately one week.

    Advanced Image Creation

    The image generator, which was originally named Aurora, is now integrated into the Grok family. It employs a sophisticated autoregressive mixture-of-experts system that has been trained on billions of examples sourced from the internet. In simple terms, it can foresee the next pieces of information by merging text and visuals, enabling it to produce far more lifelike images than before.

    Enhanced Functionality

    However, the capabilities extend beyond just generating images from nothing. This system can also modify existing images, allowing users to adjust them or draw inspiration for new designs. According to xAI, the model excels particularly in areas where other generators tend to falter, such as accurately rendering text, logos, and human faces.

    Continuous Improvement

    This update follows the launch of Grok 1.5V in April, which provided the platform with its initial experience in visual processing. xAI has plans for ongoing improvement—currently, they are enhancing their Colossus supercomputer located in Memphis, which already boasts 100,000 Nvidia H100 and H200 GPUs, with intentions to soon double that capacity.

    The timing of this release is noteworthy, especially since OpenAI has just unveiled its own video generation model, Sora. This development highlights the intensifying competition in generative AI among major industry players.

    Source: Link

  • Amazon Launches New AI-Agent R&D Lab in San Francisco

    Amazon Launches New AI-Agent R&D Lab in San Francisco

    Amazon has launched a new research and development lab in San Francisco aimed at establishing "foundational" abilities for AI agents. This initiative will be headed by David Luan, who co-founded the AI startup Adept and previously served as its CEO.

    Leadership Background

    David Luan has an impressive background, having worked as the vice president of Engineering at OpenAI and spent a year in a leadership role at Google Research. He started Adept in 2022 and then transitioned to Amazon, where he now leads the Artificial General Intelligence (AGI) lab in San Francisco.

    Strategic Hiring

    In June, Amazon brought Luan on board along with his co-founders Augustus Odena, Maxwell Nye, Erich Elsen, and Kelsey Szot. This move was part of a larger agreement that allows Amazon to use certain technology licenses from Adept. The startup had recently secured $350 million in a Series B funding round in March 2023, reaching a valuation of $1 billion.

    The AGI SF team is set to collaborate closely with Amazon’s extensive research group to develop AI agents capable of "taking actions in both digital and physical environments." Their primary goal is to create AI agents that can "carry out real-world tasks, learn from feedback provided by humans, self-correct, and understand our objectives."

    Source: Link

  • Microsoft Launches Copilot Vision Beta for Select Pro Subscribers

    Microsoft Launches Copilot Vision Beta for Select Pro Subscribers

    Microsoft Copilot Labs has launched beta testing for Copilot Vision, which is exclusive for some invited Copilot Pro subscribers. This new Vision AI monitors what users are doing in the Microsoft Edge browser to offer help, information, and tips in real-time.

    Integration with Microsoft Products

    The Copilot AI is built into the newest editions of Microsoft Windows, Edge, and Office. It responds to user prompts through text input, providing answers and support. With the addition of Copilot Vision, users no longer need to describe visual elements like objects and maps in text, as the AI can recognize everything happening within Microsoft Edge.

    Enhancing the Gaming Experience

    Gamers can benefit from the advice and insights Vision provides during gameplay, although it currently can’t control games directly. While users browse the web, the Vision AI identifies objects, assisting them in comparing items for purchases such as hotels, toys, or other goods. It can also provide specific product details, like washing instructions for clothing. For those who are unsure about what to buy or how to spend their day, they can ask the AI for recommendations, making life easier for busy individuals.

    Limited Availability and Data Management

    At the moment, Copilot Vision is restricted to a small number of websites during its beta phase, but this selection will grow in the future. The visual information and user interactions that Copilot Vision gathers during a session are erased once the session concludes, but Microsoft retains all the responses generated by the AI.

    People bogged down by endless meetings might find it helpful to get a Plaud AI voice recorder (available on Amazon) that can automatically transcribe and summarize what they missed.

    For more information, visit Microsoft Copilot Labs, check out the Microsoft Copilot blog, or watch Microsoft Copilot on YouTube, and don’t forget to review the Microsoft Privacy Statement.

  • X Unveils New Image Generator for Limited Time

    X Unveils New Image Generator for Limited Time

    xAI, the AI startup started by Elon Musk, launched a new image generator called Aurora over the weekend, but then quickly took it down again. The company shared news about this generator, and Musk himself said that it was in beta.

    Quick Removal

    Just a few hours after Aurora was made available, the model was pulled offline. The option to choose it in Grok’s model picker was removed. TechCrunch had the chance to try out the model and noted that it did not have any restrictions regarding public figures or celebrities.

    Creative Outputs

    Some users who were able to access the generator shared some fun images. Among these were pictures of Adam Sandler and Ray Romano on a sitcom set, Sam Altman riding a giraffe, and a boxing match between Mickey Mouse and Luigi.

    Specifications and Future Improvements

    Details about the model’s specifications are not clear, but Musk mentioned that it was an internal model in beta that would “improve very fast.” Recently, the social media platform owned by Musk made Grok free for all users, but with certain limitations.

    TechCrunch’s coverage highlights the excitement and mystery surrounding the sudden launch and removal of the Aurora image generator.

  • Google Docs Introduces AI for Easy Formatted Document Creation

    Google Docs Introduces AI for Easy Formatted Document Creation

    Google has rolled out an exciting new tool for Docs that utilizes the Gemini AI model to help users create formatted documents.

    Ease of Document Creation

    As posted on the company’s support pages (via Gadgets360), this feature allows you to request Docs to produce a variety of documents such as proposals, project trackers, document ideas, blog posts, press releases, campaign briefs, dinner party menus, newsletters, itineraries, and even more.

    To get started, users can simply click on "help me create" and provide a brief description of what they need. It is important to include at least one existing document by typing "@filename" for Gemini to generate content effectively.

    Availability and Limitations

    Currently, this feature is exclusively available in Google Workspace Alpha and the initial testing phase known as Google Workspace Labs. Google has indicated that they are gradually making this feature available, but it’s presently limited to desktop users and can only be used in new documents.

    However, there are certain restrictions. Google has pointed out that it cannot "incorporate web search results or content from your Workspace files," nor can it "generate cover or inline images of people." Additionally, it is restricted to "content extraction" from files and is unable to replicate the "structure or style" of those documents.

  • Try Grok AI Assistant for Free: Available to All Users Now

    Try Grok AI Assistant for Free: Available to All Users Now

    X has made its Grok AI chatbot available for free to everyone. We first learned about the free version of Grok last month, when X began testing it in select countries. Now, it seems that this feature is accessible all around the globe.

    Availability and Limitations

    As noticed by X user @blankspeaker, xAI’s main chatbot is now being rolled out to all users on the platform. Previously, this chatbot was only for X Premium and X Premium Plus subscribers. As anticipated, the free version has some limitations; users can ask only 10 questions every 2 hours and analyze just 3 images each day. To get more access, people will need to subscribe. These limits are much stricter compared to the free versions of ChatGPT and Claude.

    How to Access Grok

    Currently, Grok is only available through the X platform, though there are rumors of a standalone app being developed. To try it out, sign into your X account and look for the Grok tab on the left sidebar if you’re on desktop. If you’re using the mobile app, it can be found in the bottom navigation bar, third from the right.

    Competitors and Future Prospects

    For those who don’t know, Grok is a generative AI chatbot made by Elon Musk’s AI firm, xAI. While it competes directly with ChatGPT and Claude, it hasn’t received much attention due to its limited access. Launching a free tier is a positive move, but Grok still has a significant distance to cover before it can truly compete with others in the market.


    Image 1
  • Evaxion Reveals AI-Driven Cancer Vaccine Concept

    Evaxion Reveals AI-Driven Cancer Vaccine Concept

    Evaxion, a frontrunner in AI-driven biotechnology, is set to participate in this year’s ESMO Immuno-Oncology Congress, where it will present its customizable cancer vaccine development system, fueled by its own AI technology. This innovative system utilizes AI to analyze patient data, focusing on a specific target: ERV tumor antigens that exhibit a shared pattern among numerous patients. In simple terms, the AI identifies antigens that induce immune responses and are common across different cancer patients, then tailors them into a vaccine suitable for various cancer types, and in some cases, for individual patients.

    Details of the Development Models

    Evaxion remains rather secretive about the specifics of its proprietary AI systems. However, the company reveals that it has created four distinct models that work together within this product. The EDEN model identifies antigens that can trigger immune responses capable of eliminating bacterial infections. OBSERV complements the patient’s existing antigens, focusing on ERVs, which are remnants of ancient immunities encoded in the human genome over time. PIONEER is designed to seek out patient-specific antigens that can be stimulated through immunotherapy. Lastly, RAVEN assesses the effectiveness of potential vaccine candidates. When combined, these models are theoretically equipped to discover the most effective immune treatment for an individual cancer patient.

    Business Strategies and Partnerships

    Evaxion refers to this system as AI-Immunology and has heavily invested in it since the company’s inception. After going public and attracting a diverse array of investors in 2021, Evaxion has secured significant partnerships with well-known pharmaceutical companies. A notable recent agreement is with American pharmaceutical giant Merck & Co., which is based on milestones and could potentially earn Evaxion over $1 billion if everything unfolds favorably. The company’s commercial portfolio currently features vaccines targeting Staph and gonorrhea, among others.

  • Google Launches PaliGemma 2 Vision-Language Models

    Google Launches PaliGemma 2 Vision-Language Models

    Google has revealed the successor to its visual-language model PaliGemma, which was introduced in May 2024. The new version, PaliGemma 2, comes in a range of sizes, featuring parameter counts from 3 billion to 28 billion, and resolution options that go up to 896px.

    Advanced Performance Features

    According to the company, this model showcases “top-tier performance in recognizing chemical formulas, musical scores, spatial reasoning, and generating reports from chest X-rays.”

    Enhanced Captioning Abilities

    Additionally, it boasts long captioning functionality, offering “thorough, contextually relevant captions for images that go beyond basic object recognition to include descriptions of actions, emotions, and the overall story of the scene.”

    Accessible and Flexible Options

    The new models are designed to be a “drop-in replacement” across various sizes without the need for “significant code changes.” Pre-trained versions can be found on platforms like Hugging Face and Kaggle, available for free to anyone interested in testing them. It also provides support for several frameworks like Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

    Google emphasizes that PaliGemma 2’s “adaptability makes it easy to fine-tune for particular tasks and datasets, allowing you to customize its functions to meet your specific requirements.”

  • Google DeepMind Genie 2: Real-Time 3D World Generator

    Google DeepMind Genie 2: Real-Time 3D World Generator

    Google DeepMind, a research branch of Google focused on AI, has introduced Genie 2, a foundational world model capable of creating "action-controllable, playable 3D environments" for fast prototyping and training AI agents.

    Advanced Capabilities

    According to the company, Genie 2 enhances the abilities of its earlier version and can produce "a vast diversity of rich 3D worlds." It’s capable of simulating interactions between objects, animations for characters, physics, and Non-Playable Characters (NPCs) along with their animations and interactions. This model can take both text and visual cues as input.

    Memory and Perspective

    Genie 2 is designed to remember elements of the world that aren’t visible to the player and can render them when they become visible again. This is akin to the Level of Detail (LOD) technique used in gaming, which adjusts the complexity of the objects and environments based on the player’s Field Of View (FOV).

    The model can create new content in real-time and keep a stable world "for up to a minute." It also offers the ability to render environments from various viewpoints, such as first-person, third-person, or isometric perspectives.

    Realistic Effects

    Additionally, it can produce sophisticated effects, including smoke, object interactions, fluid dynamics, gravity, and advanced lighting and reflections. DeepMind claims this model can facilitate the quick prototyping of fresh concepts and ideas. Users can also create and manage AI agents with straightforward prompts.

    Numerous companies are developing foundational world models that can simulate and build representations of environments. For instance, Decart’s Oasis allows users to engage with a real-time AI-generated version of Minecraft, while AI leader Fei Fei Li’s start-up, World Labs, also features a 3D generator.

    Google DeepMind’s contributions are setting a new standard in the realm of AI and simulated environments.

  • OpenAI Launches $200 Monthly ChatGPT Pro Plan for Users

    OpenAI Launches $200 Monthly ChatGPT Pro Plan for Users

    OpenAI, the organization behind ChatGPT, has introduced a new monthly subscription called ChatGPT Pro, priced at $200 (€189 or £157). They claim this subscription is designed for "researchers, engineers, and other individuals who utilize advanced intelligence on a daily basis to enhance their productivity and stay updated with the latest AI developments."

    Enhanced Model

    With the Pro plan, users will gain access to a more advanced version of the o1 model. This enhanced model utilizes additional computing power to "think longer" and deliver more precise answers, especially for inquiries related to "data science, programming, and legal case analysis."

    According to OpenAI’s evaluations, the o1 pro model outperforms the standard o1 and its preview in areas like mathematics, science, and programming tasks. Users subscribed to the Pro plan can select the o1 pro from the model selection tool within the chatbot interface.

    Waiting for Answers

    Generating answers with the o1 pro model will take a bit more time. To help with this, OpenAI has incorporated a progress bar to show users how much longer they need to wait, and you will receive an in-app notification when your answer is ready. You also have the option to switch between different chats while you wait.

    OpenAI has also revealed a grants initiative aimed at medical researchers in the United States. Initially, this program will support only ten researchers in the US, but it is expected to grow as the Pro plan expands to additional regions.