Tag: Yandex

  • Yandex Launches Yambda: Open-Source Music Recommendation Dataset

    Yandex Launches Yambda: Open-Source Music Recommendation Dataset

    Key Takeaways

    1. Yandex has launched the open-source Yambda dataset to provide insights into music listener preferences for building a streaming audio service similar to Spotify.

    2. The dataset includes detailed recordings of 4.79 billion user interactions with 9.39 million music tracks over ten months from 28 million monthly Yandex Music users.

    3. Yandex aims to use the dataset for AI-driven playlist customization, unlike other platforms that keep their algorithms private for competitive advantage.

    4. The Yambda dataset is available for download in different sizes: 5 billion, 500 million, and 50 million events, with the largest needing at least 85 GB of storage.

    5. The dataset is formatted in Apache Parquet, allowing for easier analysis and research, and can be accessed on HuggingFace.


    Yandex has announced the launch of its open-source Yambda dataset, which provides insights into music listener preferences. This dataset aims to help build a streaming audio service akin to Spotify, featuring AI-driven playlist customization.

    Playlist Creation with AI

    Platforms such as Spotify, Tidal, and Qobuz utilize software algorithms or AI technologies to generate playlists tailored to individual user tastes. However, these companies typically keep their codes and models under wraps, viewing their ability to automatically curate enjoyable song selections as a valuable trade secret that contributes to their competitive edge.

    Extensive Data Collection

    Over a span of ten months, Yandex collected data consisting of 4.79 billion user interactions with 9.39 million music tracks from its 28 million monthly Yandex Music users. This dataset encompasses crucial feedback from listeners, detailing their listening choices, as well as their preferences and aversions. Each interaction is recorded with a timestamp for better accuracy.

    Dataset Availability

    The Yambda dataset is available for download in various sizes: five billion (1 million users), five hundred million (100,000 users), and fifty million (10,000 users) events, with the largest dataset needing a minimum of 85 GB storage. It is formatted in Apache Parquet, a column-oriented file format that simplifies analysis and research.

    Readers can also consider gifting a Spotify gift card to share the joy of streaming music.

    Yambda can be found at HuggingFace, as noted in the Yandex press release.

    Source:
    Link

  • Yandex Enters Indonesia’s AI Market: Opportunities and Impact

    Yandex Enters Indonesia’s AI Market: Opportunities and Impact

    Founded in 1997, Yandex started its search engine the very same year. Nowadays, Yandex Search leads the search engine industry in Russia, holding a market share exceeding 70 percent. Beyond just search, Yandex offers various services such as online shopping, ridesharing, streaming media, cloud computing, and web mapping, and is also creating a web browser. This giant, which employs over 25,000 people, is now focusing on the AI market in Indonesia.

    Meeting About Expansion

    According to Reuters, Indonesia’s communications minister, Meutya Hafid, had a meeting with Alexander Popovskiy, who heads Yandex’s international search division. They discussed Yandex’s plans to "expand the search engine platform in Indonesia.” However, neither party gave any comments regarding the specifics of the investment, including its financial aspects and timeline.

    AI Investments in Indonesia

    Earlier this year, Nvidia along with the Indonesian telecom company PT Indosat Ooredoo Hutchinson revealed intentions to establish an AI center in Central Java. The investment amount for this venture has been disclosed as $200 million.

    Yandex rolled out the MatrixNet machine learning algorithm back in 2009 and has integrated it into many of its products ever since. In fact, CERN utilizes this technology to process the enormous data generated by the Large Hadron Collider.