April 25, 2025
Trending News

Google introduces DataGemma LLMs that focus on accuracy

  • September 16, 2024
  • 0

DataGemma uses statistical data from Google’s Data Commons to reduce hallucinations in AI models and thus improve the accuracy of the information generated. Large language models (LLMs) can

Google introduces DataGemma LLMs that focus on accuracy

DataGemma uses statistical data from Google’s Data Commons to reduce hallucinations in AI models and thus improve the accuracy of the information generated.

Large language models (LLMs) can produce impressive results, but sometimes they also generate incorrect information, a phenomenon known as “hallucination.” DataGemma, a new model leveraging Google’s Data Commons, aims to solve this problem. By linking AI models to large amounts of real-world data, DataGemma aims to improve the factual accuracy of the information generated.

You can view the source code via Hugging Face. Under the hood, the LLMs are based on Gemma 2 27B, an open source LLM that Google released this summer.

Data Commons as a source

Google’s Data Commons is a public knowledge graph with more than 240 billion data points from trusted sources such as the United Nations and the World Health Organization. This dataset includes information on health, economics, demographics, and more.

By linking it to DataGemma models, users can access this data through natural language interactions, allowing researchers and policymakers to analyze, for example, trends in electricity access in African countries or relationships between income and diabetes in the United States, Google said.

Two methods to combat hallucinations

DataGemma combines two methods to reduce hallucinations in AI models:

  1. RIG (Retrieval Interleaved Generation): This process proactively searches for reliable statistical data in data commons while generating answers, verifying facts before they are presented.
  2. RAG (Retrieval Augmented Generation): DataGemma uses the RAG method to retrieve contextual information before generating the answer. This enables more accurate and comprehensive answers by leveraging the information from the data commons.

Preliminary tests show that this approach significantly improves the accuracy of AI models on numerical facts. This reduces the likelihood of hallucinations, which is useful in applications such as research and decision making. The DataGemma models are now available to researchers and developers who can use them via dedicated hardware for the RIG and RAG methods.

Source: IT Daily

Leave a Reply

Your email address will not be published. Required fields are marked *