Google introduces DataGemma LLMs that focus on accuracy

DataGemma uses statistical data from Google’s Data Commons to reduce hallucinations in AI models and thus improve the accuracy of the information generated.

Large language models (LLMs) can produce impressive results, but sometimes they also generate incorrect information, a phenomenon known as “hallucination.” DataGemma, a new model leveraging Google’s Data Commons, aims to solve this problem. By linking AI models to large amounts of real-world data, DataGemma aims to improve the factual accuracy of the information generated.

You can view the source code via Hugging Face. Under the hood, the LLMs are based on Gemma 2 27B, an open source LLM that Google released this summer.

Data Commons as a source

Google’s Data Commons is a public knowledge graph with more than 240 billion data points from trusted sources such as the United Nations and the World Health Organization. This dataset includes information on health, economics, demographics, and more.

By linking it to DataGemma models, users can access this data through natural language interactions, allowing researchers and policymakers to analyze, for example, trends in electricity access in African countries or relationships between income and diabetes in the United States, Google said.

Two methods to combat hallucinations

DataGemma combines two methods to reduce hallucinations in AI models:

RIG (Retrieval Interleaved Generation): This process proactively searches for reliable statistical data in data commons while generating answers, verifying facts before they are presented.
RAG (Retrieval Augmented Generation): DataGemma uses the RAG method to retrieve contextual information before generating the answer. This enables more accurate and comprehensive answers by leveraging the information from the data commons.

Preliminary tests show that this approach significantly improves the accuracy of AI models on numerical facts. This reduces the likelihood of hallucinations, which is useful in applications such as research and decision making. The DataGemma models are now available to researchers and developers who can use them via dedicated hardware for the RIG and RAG methods.

Source: IT Daily

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024