Categories: AI News

Google confirms it’s training AI using scraped web data

On Monday, Gizmodo found that the search giant has updated its privacy policy to disclose that various AI services, such as Bard and Cloud AI, can be trained on public data the company pulls from the web.

“Our privacy policy has long been transparent that Google uses publicly available information from the open web to train language models for services like Google Translate,” said Google spokesperson Christa Muldoon at The Verge. “This latest update clarifies that newer services like Bard are also included. We incorporate privacy principles and safeguards into the development of our AI technologies, in line with our Principles of AI.

These are the latest changes to Google’s privacy policy. The company now openly admits where your data is used at least…
Image: Google

After being updated on July 1, 2023, Google’s privacy policy now states that “Google uses the information to improve our services and to develop new products, features, and technologies that benefit to our users and the public” and that the company may “use publicly available information to help train Google’s AI models and develop products and features such as Google Translate, Bard, and Cloud AI capabilities .”

You can see from the policy change history that the update provides more clarity about the services that will be trained using the collected data. For example, the document now says that information can be used for “AI Models” rather than “language models,” which gives Google more freedom to train and build systems alongside LLMs on your public data. And even that note is buried under an embedded link for “publicly accessible sources” under the “Your Local Information” tab of the policy that you have to click to open the relevant section.

The updated policy specifies that “publicly available information” is used to train Google’s AI products but does not say how (or if) the company can prevent copyrighted materials from being included in that pool. of data. Many publicly accessible websites have policies that prohibit data collection or web scraping for the purpose of training large language models and other AI tools. It will be interesting to see how this approach plays out with various global regulations such as GDPR that protect people against their data being used without consent, too.

The combination of these laws and growing competition in the market makes the makers of popular generative AI systems such as OpenAI’s GPT-4 more savvy about where they get the data used to train them and whether it includes of social media posts or copyrighted works of human artists and authors.

The matter of whether or not the doctrine of fair use applies to this type of application currently sits in a legal gray area. The uncertainty has sparked various lawsuits and pushed lawmakers in some countries to introduce stricter laws better equipped to control how AI companies collect and use their training data. It also raises questions about how this data is processed to ensure it doesn’t contribute to dangerous failures within AI systems, with people tasked with sorting through huge piles of data. training data that are often subjected to long hours and harsh working conditions.

Gannett, the largest newspaper publisher in the United States, is suing Google and its parent company, Alphabet, saying that advances in AI technology have helped the search giant have a monopoly on the market. in digital advertising. Products like Google’s AI search beta have also been called “plagiarism engines” and criticized for starving websites of traffic.

Meanwhile, Twitter and Reddit – two social platforms with a lot of public information – have recently been taken over drastic measures to try and prevent other companies from freely harvesting their data. The API changes and limitations placed on the platforms were met with backlash in their communities, as the anti-scraping changes negatively affected the core user experience of Twitter and Reddit.

cleantechstocks

Recent Posts

Aduro’s Disruptive Oil Upgrading Technology Moves Closer to Commercialization

  Aduro's Disruptive Oil Upgrading Technology Moves Closer to Commercialization Alberta's oil sands produce vast…

11 months ago

Global Markets: Retail sales increase in July

WINNIPEG – The following is a glance at the news moving markets in Canada and…

11 months ago

Top picks in REIT sector from BMO and RBC analysts

Daily roundup of research and analysis from The Globe and Mail’s market strategist Scott Barlow…

11 months ago

Investors look to AI-darling Nvidia’s earnings as US stocks rally wobbles

The logo of technology company Nvidia is seen at its headquarters in Santa Clara, California…

11 months ago

China’s ‘Lehman Moment’? Which domino will fall next as property crisis grows? – South China Morning Post

China’s ‘Lehman Moment’? Which domino will fall next as property crisis grows?  South China Morning Post…

11 months ago

Slide in euro zone service sector sharpens ECB’s rates dilemma

LONDON, Aug 23 (Reuters) - Euro zone business activity declined far more than thought in…

11 months ago