Category: Artificial intelligence

December 11, 2024

Grok Unveils Advanced Image Generation Model with Text and Face Features

xAI has recently introduced an image generation feature to Grok, marking a significant enhancement for the platform. Currently, this feature is accessible to X users in select countries, with a worldwide launch anticipated in approximately one week.

Advanced Image Creation

The image generator, which was originally named Aurora, is now integrated into the Grok family. It employs a sophisticated autoregressive mixture-of-experts system that has been trained on billions of examples sourced from the internet. In simple terms, it can foresee the next pieces of information by merging text and visuals, enabling it to produce far more lifelike images than before.

Enhanced Functionality

However, the capabilities extend beyond just generating images from nothing. This system can also modify existing images, allowing users to adjust them or draw inspiration for new designs. According to xAI, the model excels particularly in areas where other generators tend to falter, such as accurately rendering text, logos, and human faces.

Continuous Improvement

This update follows the launch of Grok 1.5V in April, which provided the platform with its initial experience in visual processing. xAI has plans for ongoing improvement—currently, they are enhancing their Colossus supercomputer located in Memphis, which already boasts 100,000 Nvidia H100 and H200 GPUs, with intentions to soon double that capacity.

The timing of this release is noteworthy, especially since OpenAI has just unveiled its own video generation model, Sora. This development highlights the intensifying competition in generative AI among major industry players.

Source: Link

Tags: Aurora, Grok, xAI
December 11, 2024

Microsoft Launches Copilot Vision Beta for Select Pro Subscribers

Microsoft Copilot Labs has launched beta testing for Copilot Vision, which is exclusive for some invited Copilot Pro subscribers. This new Vision AI monitors what users are doing in the Microsoft Edge browser to offer help, information, and tips in real-time.

Integration with Microsoft Products

The Copilot AI is built into the newest editions of Microsoft Windows, Edge, and Office. It responds to user prompts through text input, providing answers and support. With the addition of Copilot Vision, users no longer need to describe visual elements like objects and maps in text, as the AI can recognize everything happening within Microsoft Edge.

Enhancing the Gaming Experience

Gamers can benefit from the advice and insights Vision provides during gameplay, although it currently can’t control games directly. While users browse the web, the Vision AI identifies objects, assisting them in comparing items for purchases such as hotels, toys, or other goods. It can also provide specific product details, like washing instructions for clothing. For those who are unsure about what to buy or how to spend their day, they can ask the AI for recommendations, making life easier for busy individuals.

Limited Availability and Data Management

At the moment, Copilot Vision is restricted to a small number of websites during its beta phase, but this selection will grow in the future. The visual information and user interactions that Copilot Vision gathers during a session are erased once the session concludes, but Microsoft retains all the responses generated by the AI.

People bogged down by endless meetings might find it helpful to get a Plaud AI voice recorder (available on Amazon) that can automatically transcribe and summarize what they missed.

For more information, visit Microsoft Copilot Labs, check out the Microsoft Copilot blog, or watch Microsoft Copilot on YouTube, and don’t forget to review the Microsoft Privacy Statement.

Tags: Microsoft Copilot Vision, Microsoft Edge, Plaud AI voice recorder
December 9, 2024

X Unveils New Image Generator for Limited Time

xAI, the AI startup started by Elon Musk, launched a new image generator called Aurora over the weekend, but then quickly took it down again. The company shared news about this generator, and Musk himself said that it was in beta.

Quick Removal

Just a few hours after Aurora was made available, the model was pulled offline. The option to choose it in Grok’s model picker was removed. TechCrunch had the chance to try out the model and noted that it did not have any restrictions regarding public figures or celebrities.

Creative Outputs

Some users who were able to access the generator shared some fun images. Among these were pictures of Adam Sandler and Ray Romano on a sitcom set, Sam Altman riding a giraffe, and a boxing match between Mickey Mouse and Luigi.

Specifications and Future Improvements

Details about the model’s specifications are not clear, but Musk mentioned that it was an internal model in beta that would “improve very fast.” Recently, the social media platform owned by Musk made Grok free for all users, but with certain limitations.

TechCrunch’s coverage highlights the excitement and mystery surrounding the sudden launch and removal of the Aurora image generator.

Tags: Aurora, Grok, xAI
December 9, 2024

Google Docs Introduces AI for Easy Formatted Document Creation

Google has rolled out an exciting new tool for Docs that utilizes the Gemini AI model to help users create formatted documents.

Ease of Document Creation

As posted on the company’s support pages (via Gadgets360), this feature allows you to request Docs to produce a variety of documents such as proposals, project trackers, document ideas, blog posts, press releases, campaign briefs, dinner party menus, newsletters, itineraries, and even more.

To get started, users can simply click on "help me create" and provide a brief description of what they need. It is important to include at least one existing document by typing "@filename" for Gemini to generate content effectively.

Availability and Limitations

Currently, this feature is exclusively available in Google Workspace Alpha and the initial testing phase known as Google Workspace Labs. Google has indicated that they are gradually making this feature available, but it’s presently limited to desktop users and can only be used in new documents.

However, there are certain restrictions. Google has pointed out that it cannot "incorporate web search results or content from your Workspace files," nor can it "generate cover or inline images of people." Additionally, it is restricted to "content extraction" from files and is unable to replicate the "structure or style" of those documents.

Tags: Gemini AI, Google Docs, Google Workspace
December 9, 2024

Try Grok AI Assistant for Free: Available to All Users Now

X has made its Grok AI chatbot available for free to everyone. We first learned about the free version of Grok last month, when X began testing it in select countries. Now, it seems that this feature is accessible all around the globe.

Availability and Limitations

As noticed by X user @blankspeaker, xAI’s main chatbot is now being rolled out to all users on the platform. Previously, this chatbot was only for X Premium and X Premium Plus subscribers. As anticipated, the free version has some limitations; users can ask only 10 questions every 2 hours and analyze just 3 images each day. To get more access, people will need to subscribe. These limits are much stricter compared to the free versions of ChatGPT and Claude.

How to Access Grok

Currently, Grok is only available through the X platform, though there are rumors of a standalone app being developed. To try it out, sign into your X account and look for the Grok tab on the left sidebar if you’re on desktop. If you’re using the mobile app, it can be found in the bottom navigation bar, third from the right.

Competitors and Future Prospects

For those who don’t know, Grok is a generative AI chatbot made by Elon Musk’s AI firm, xAI. While it competes directly with ChatGPT and Claude, it hasn’t received much attention due to its limited access. Launching a free tier is a positive move, but Grok still has a significant distance to cover before it can truly compete with others in the market.

Tags: Elon Musk, Grok AI, xAI
December 6, 2024

Evaxion Reveals AI-Driven Cancer Vaccine Concept

Evaxion, a frontrunner in AI-driven biotechnology, is set to participate in this year’s ESMO Immuno-Oncology Congress, where it will present its customizable cancer vaccine development system, fueled by its own AI technology. This innovative system utilizes AI to analyze patient data, focusing on a specific target: ERV tumor antigens that exhibit a shared pattern among numerous patients. In simple terms, the AI identifies antigens that induce immune responses and are common across different cancer patients, then tailors them into a vaccine suitable for various cancer types, and in some cases, for individual patients.

Details of the Development Models

Evaxion remains rather secretive about the specifics of its proprietary AI systems. However, the company reveals that it has created four distinct models that work together within this product. The EDEN model identifies antigens that can trigger immune responses capable of eliminating bacterial infections. OBSERV complements the patient’s existing antigens, focusing on ERVs, which are remnants of ancient immunities encoded in the human genome over time. PIONEER is designed to seek out patient-specific antigens that can be stimulated through immunotherapy. Lastly, RAVEN assesses the effectiveness of potential vaccine candidates. When combined, these models are theoretically equipped to discover the most effective immune treatment for an individual cancer patient.

Business Strategies and Partnerships

Evaxion refers to this system as AI-Immunology and has heavily invested in it since the company’s inception. After going public and attracting a diverse array of investors in 2021, Evaxion has secured significant partnerships with well-known pharmaceutical companies. A notable recent agreement is with American pharmaceutical giant Merck & Co., which is based on milestones and could potentially earn Evaxion over $1 billion if everything unfolds favorably. The company’s commercial portfolio currently features vaccines targeting Staph and gonorrhea, among others.

Tags: AI-Immunology, EDEN, Evaxion
December 6, 2024

Google Launches PaliGemma 2 Vision-Language Models

Google has revealed the successor to its visual-language model PaliGemma, which was introduced in May 2024. The new version, PaliGemma 2, comes in a range of sizes, featuring parameter counts from 3 billion to 28 billion, and resolution options that go up to 896px.

Advanced Performance Features

According to the company, this model showcases “top-tier performance in recognizing chemical formulas, musical scores, spatial reasoning, and generating reports from chest X-rays.”

Enhanced Captioning Abilities

Additionally, it boasts long captioning functionality, offering “thorough, contextually relevant captions for images that go beyond basic object recognition to include descriptions of actions, emotions, and the overall story of the scene.”

Accessible and Flexible Options

The new models are designed to be a “drop-in replacement” across various sizes without the need for “significant code changes.” Pre-trained versions can be found on platforms like Hugging Face and Kaggle, available for free to anyone interested in testing them. It also provides support for several frameworks like Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

Google emphasizes that PaliGemma 2’s “adaptability makes it easy to fine-tune for particular tasks and datasets, allowing you to customize its functions to meet your specific requirements.”

Tags: google, PaliGemma 2, visual-language model
December 6, 2024

Google DeepMind Genie 2: Real-Time 3D World Generator

Google DeepMind, a research branch of Google focused on AI, has introduced Genie 2, a foundational world model capable of creating "action-controllable, playable 3D environments" for fast prototyping and training AI agents.

Advanced Capabilities

According to the company, Genie 2 enhances the abilities of its earlier version and can produce "a vast diversity of rich 3D worlds." It’s capable of simulating interactions between objects, animations for characters, physics, and Non-Playable Characters (NPCs) along with their animations and interactions. This model can take both text and visual cues as input.

Memory and Perspective

Genie 2 is designed to remember elements of the world that aren’t visible to the player and can render them when they become visible again. This is akin to the Level of Detail (LOD) technique used in gaming, which adjusts the complexity of the objects and environments based on the player’s Field Of View (FOV).

The model can create new content in real-time and keep a stable world "for up to a minute." It also offers the ability to render environments from various viewpoints, such as first-person, third-person, or isometric perspectives.

Realistic Effects

Additionally, it can produce sophisticated effects, including smoke, object interactions, fluid dynamics, gravity, and advanced lighting and reflections. DeepMind claims this model can facilitate the quick prototyping of fresh concepts and ideas. Users can also create and manage AI agents with straightforward prompts.

Numerous companies are developing foundational world models that can simulate and build representations of environments. For instance, Decart’s Oasis allows users to engage with a real-time AI-generated version of Minecraft, while AI leader Fei Fei Li’s start-up, World Labs, also features a 3D generator.

Google DeepMind’s contributions are setting a new standard in the realm of AI and simulated environments.

Tags: 3D world model, Genie 2, Google DeepMind
December 6, 2024

OpenAI Launches $200 Monthly ChatGPT Pro Plan for Users

OpenAI, the organization behind ChatGPT, has introduced a new monthly subscription called ChatGPT Pro, priced at $200 (€189 or £157). They claim this subscription is designed for "researchers, engineers, and other individuals who utilize advanced intelligence on a daily basis to enhance their productivity and stay updated with the latest AI developments."

Enhanced Model

With the Pro plan, users will gain access to a more advanced version of the o1 model. This enhanced model utilizes additional computing power to "think longer" and deliver more precise answers, especially for inquiries related to "data science, programming, and legal case analysis."

According to OpenAI’s evaluations, the o1 pro model outperforms the standard o1 and its preview in areas like mathematics, science, and programming tasks. Users subscribed to the Pro plan can select the o1 pro from the model selection tool within the chatbot interface.

Waiting for Answers

Generating answers with the o1 pro model will take a bit more time. To help with this, OpenAI has incorporated a progress bar to show users how much longer they need to wait, and you will receive an in-app notification when your answer is ready. You also have the option to switch between different chats while you wait.

OpenAI has also revealed a grants initiative aimed at medical researchers in the United States. Initially, this program will support only ten researchers in the US, but it is expected to grow as the Pro plan expands to additional regions.

Tags: AI advancements, ChatGPT Pro, o1 model
December 5, 2024

Arc to Dia: New AI Web Browser Plans from The Browser Company

Arc, the beloved Chromium browser known for its innovative handling of tabs and organization, is no longer the center of excitement for Josh Miller, the CEO of The Browser Company. Instead, the startup has shifted its focus to Dia, a more mainstream product that leverages AI to simplify web browsing. Set to debut in early 2025, Dia is not just an upgrade of Arc; it’s meant to be a complete replacement.

Features Unveiled

In a recent presentation cleverly disguised as a recruiting video, Dia showcases three prototype demonstrations that hint at potential features of its new ‘computing environment’. One standout feature transforms the traditional insertion cursor into an AI-driven tool that can suggest text to help you “write the next line” and more. This personalized cursor can analyze the whole browser window rather than just a single text box. For instance, it can take prompts to copy Amazon links from open tabs and seamlessly add them into an email draft.

Smart Searching

In another demonstration by Josh Miller, he uses the address bar to locate a document by inputting only the sender and the theme. Dia efficiently locates the exact Notion document and, upon request, sends it via the chosen email client. These features utilize natural-language processing capabilities from LLMs, along with memory and autonomous actions that should be inherent to the browser.

Automating Tasks

The third prototype reveals Dia’s ability to automatically add items from an email to an Amazon shopping cart – a rather complicated AI challenge. In yet another demo, the browser uses a template to send personalized emails to each team member based on a list with their specific call times. In both instances, Miller suggests that the browser can be trained to handle such repetitive tasks with ease.

Miller appears to believe that AI functionalities like these will turn the simple browser into a robust operating system through this new computing environment. Understandably, users who have been loyal to Arc are not taking this news lightly. However, if the recent updates and bug fixes are any indication, Arc might still receive some support, though new features are unlikely. With Dia on the way, The Browser Company has fully embraced AI advancements, and there’s no turning back now.

Source: Link

Tags: AI Browser, Arc, Dia

Category: Artificial intelligence

Advanced Image Creation

Enhanced Functionality

Continuous Improvement

Integration with Microsoft Products

Enhancing the Gaming Experience

Limited Availability and Data Management

Quick Removal

Creative Outputs

Specifications and Future Improvements

Ease of Document Creation

Availability and Limitations

Availability and Limitations

How to Access Grok

Competitors and Future Prospects

Details of the Development Models

Business Strategies and Partnerships

Advanced Performance Features

Enhanced Captioning Abilities

Accessible and Flexible Options

Advanced Capabilities

Memory and Perspective

Realistic Effects

Enhanced Model

Waiting for Answers

Features Unveiled

Smart Searching

Automating Tasks