Key Takeaways
1. The Open Web Index (OWI) will launch a pilot program next month, providing access to nearly 1 petabyte of web data, with plans to expand to 5 PB and eventually 10 PB.
2. The OWI serves as a collective digital library, allowing third-party services to search for documents, reducing Europe’s dependence on US-based search engines.
3. The initiative aims to improve search quality and language options, promoting a non-profit, standards-based index that complies with European data protection laws.
4. During the pilot phase, academic groups, startups, and developers can request data under research or commercial licenses, contributing to user-focused improvements.
5. The project aligns with the European Commission’s InvestAI initiative, which seeks to boost funding for AI projects, potentially enhancing European competitiveness in search and AI technologies.
The OpenWebSearch.eu group is set to launch the first federated Open Web Index (OWI) across Europe for external testers next month. This pilot program will provide access to nearly one petabyte of web data that has been collected, marking a significant move towards a comprehensive index planned to grow to 5 PB and eventually to 10 PB of content.
A New Way to Search
The OWI is not like a traditional search engine; instead, it acts as a collective digital library that allows third-party services like search portals, large language model providers, or research teams to search for and find documents. This initiative is backed by a collaboration of 14 universities, supercomputing centers, tech companies, and CERN, aiming to lessen Europe’s reliance on proprietary indexes from American companies like Google and Microsoft.
Challenging the Status Quo
Supporters of this project say that the current focus on advertising-driven platforms has hurt the quality of search results and restricted language options. By creating a non-profit, standards-based index in line with European regulations, the consortium hopes to promote services that abide by local data protection laws, offer results in various languages, and avoid aggressive advertising or biased results. Regulators in Brussels and London have often criticized the dominance of US tech giants for these very reasons.
During the pilot phase, academic groups, startups, and individual developers will be able to request the dataset under a general research license or apply for a commercial license. Community manager Ursula Gmelch refers to this launch as “a first step towards true European digital sovereignty,” noting that initial feedback will help shape the index to better meet the needs of users. The team is particularly keen on enhancing vertical and argumentative search, retrieval-augmented generation, and other AI-related applications.
Aligning with European Goals
This timeline coincides with the InvestAI initiative from the European Commission, which aims to raise €200 billion for AI projects. An open Zoom meeting is planned for June 6, from 10 a.m. to noon CEST, where participants will be introduced to the platform and receive access credentials. If the pilot is successful, it could provide small and mid-sized European companies with the essential tools to develop competitive search and AI technologies that operate outside of the dominant US ecosystems.
Source:
Link