You're not the only one who turns to Wikipedia for quick facts. Lately,Australia a deluge of AI bots training on Wikipedia articles has put enormous strain on the organization's servers.
To curb the influx of "non-human traffic" scraping the site for training data, Wikipedia is taking a proactive approach: serving up its data directly to AI developers.
On Wednesday, the Wikimedia Foundation announced a partnership with Google-owned company Kaggle to release a beta dataset "featuring structured Wikipedia content in English and French." Uploaded on April 15, the company said the dataset "simplifies access to clean, pre-parsed article data that’s immediately usable for modeling, benchmarking, alignment, fine-tuning, and exploratory analysis."
According to Ars Technica, bots that scrape Wikipedia and Wikimedia Commons pages have consumed 50 percent of its bandwidth, putting a massive strain on the nonprofit's entire operation. Wikimedia hopes that serving up data to developers will dissuade them from deploying bots all over its pages.
The rise of generative AI has let loose a flood of scraping bots hungrily crawling all corners of the internet for more data. To compete against rivals, AI companies have a seemingly insatiable appetite for data. This has included copyrighted works, a contentious issue with artists. Authors, artists, and musicians are arguing in court that this training violates copyright law when it's done without credit, compensation, or consent.
That's why companies like Meta and OpenAI are currently embroiled in legal battles over copyright infringement from plaintiffs like the Authors Guild and The New York Times,who argue this practice is not protected by the fair use doctrine.
But the difference here is that all Wikipedia content is licensed under the Creative Commons Attribution-ShareAlike license, which means its content is free to use as long as it's properly attributed and distributed under the same license. The Wikimedia Foundation told Gizmodo that Kaggle paid for the data through the Wikimedia Enterprise, and AI companies "are still expected to respect Wikipedia’s attribution and licensing terms."
The partnership between Wikimedia and Kaggle represents a more nuanced way forward, allowing AI companies to train models on internet data that's been legally and, at least more ethically, obtained.
Topics Artificial Intelligence
Republican primary debate: How to factBama rush TikToks are huge, but so are takedowns of sorority cultureThe Morning News Roundup for August 13, 2014Facebook buried a report on popular posts. So much for transparency.Best Apple Watch Ultra deal: Save $80 at AmazonBoule de SuifAnnouncing Our #ReadEverywhere ContestWhat time 'Ahsoka' will be streaming on Disney+ this weekTikTok to display ads alongside search resultsOnlyFans acknowledges 'sex workers' for the first time on TwitterRead Everywhere by The Paris ReviewThe Misery of Seasonal Allergies in LiteratureThe end of annoying CAPTCHAs? Web browsers will soon help users skip themAnnouncing Our #ReadEverywhere ContestHow Does Weird Al Write His Songs?Hollywood sign erected in Wrexham and locals think Ryan Reynolds is responsibleElon Musk shows off Cybertruck 'production candidate'Danny DeVito gets his Twitter checkmark back after being temporarily unverifiedElon Musk shows off Cybertruck 'production candidate'Danny DeVito gets his Twitter checkmark back after being temporarily unverified Wordle today: The answer and hints for August 26 iPhone 16: 5 new rumored features that may make you ditch your phone Rams vs. Texans livestream: How to watch NFL preseason for free Los Angeles Sparks vs. Dallas Wings 2024 livestream: Watch live WNBA Dallas Wings vs. New York Liberty 2024 livestream: Watch WNBA for free Google Maps on iPhone is getting a redesign: Here’s what it will look like. Jools Lebron, the creator of 'very demure, very mindful,' might not own its trademark Our galaxy might crash into Andromeda. What would happen to Earth? Apple's macOS Sequoia is coming earlier than usual, report claims Thiem vs. Shelton 2024 livestream: Watch US Open for free Sinéad O'Connor's 'Famine' becomes TikTok's soundtrack for Irish reckoning How to identify AI Best air purifier deal: Get the Coway Airmega Air Purifier for just $146 at Amazon A decade retrospective of the Try Guys, from BuzzFeed to streaming Las Vegas Aces vs. Minnesota Lynx 2024 livestream: Watch live WNBA Albot vs. Djokovic 2024 livestream: Watch US Open for free Ronda Rousey apologizes for Sandy Hook truther past after getting roasted on Reddit Best Amazon Echo deal: Get the Amazon Echo Spot for just $54.99 Donald Trump is launching some kind of cryptocurrency thing Patriots vs. Commanders livestream: How to watch NFL preseason for free