The fever for chatbots, their applications in search engines and other tools, and for artificial intelligence has been continuously rising for several weeks. The arrival of ChatGPT has heated up the sector in such a way that it has turned the big tech company upside down. Of course, for some, such as Microsoft and Google, working with AI and developing chatbots and models is not new. In fact, many large companies in the IT sector have been working with AI for months and even years. As Googlewhich carries years in the development of chatbots and language models. The result of this work are models like BERT. AND LaMDA, the most advanced.
It is also a model that was able to not only make the machine “understand” what the user tells it. It also made him the most capable of all when it comes to logical and interesting conversation on any topic. So much so that the Google engineer who eventually lost his employee, Blake Lemoine, claimed that LaMDA had feelings. But what exactly is LaMDA and how does it work? Find out below.
What is Google LaMDA and how does it work?
LaMDA stands for Language models for a dialog applicationi.e, Language models for dialog applications. It was created to give software the ability to better communicate in a natural and fluent conversation. It is based on the same architecture as BERT and GPT-3, but thanks to its training, it is able to understand and distinguish nuances in questions and natural conversations of different types.
The open-ended nature of natural conversations means that you may end up talking about a completely different topic than what you were talking about when you start talking. Even if the conversation initially focuses on a single topic. This behavior is confusing for most conversational models and chatbots. However, LaMDA is specially developed and trained to overcome this problem. This was shown by Google during its I/O event last year.
In a company demonstration at the time, it was shown that LaMDA could naturally engage in conversation on a randomly chosen topic. Despite a stream of questions, some of which were unrelated to his main topic, the model managed to keep the thread of the conversation going.
This model was developed from Google’s open source neural network, Transformer. This is used to understand natural language. Once created, it was trained to find patterns in sentences as well as correlations between the words used in them. Even to be able to predict the most likely word that will appear next in the conversation. LaMDA can do this because instead of analyzing individual words, it studies datasets that consist of dialogues.
An AI conversational system is similar to chatbot software, but has some differences from it. For example, chatbots are trained with limited and specific datasets. They can only have a limited conversation based on the exact questions and data they have trained with. But LAMDA can have open-ended conversations because it is trained on different datasets.
During the LaMDA training process detects nuances in open dialogues and adapts. Depending on the flow of the conversation, you can answer questions on many topics. Therefore, it enables conversations that are very similar to human interaction. Much more than what chatbots can achieve.
LaMDA training
According to Google, a two-scenario process was used to train LaMDA, including pre-training and debugging. In total, the model was trained on 1.56 trillion words with 137 billion parameters. For pre-training phasethe Google team created a dataset of 1.56 TB of words, output from various public web documents.
This dataset was then de-chained to create phrases, “tokenized” into the 2.81 TB of tokens that were initially used to train the model. During this pre-training phase, the model uses scalable and general parallelization to predict the next part of the conversation. To do this, it is based on the previous tokens it has checked.
Then it goes to adjustment phase, during which LaMDA is trained to perform generation and classification tasks. Basically, the LaMDA generator that predicts the next part of the conversation generates different relevant answers based on the exchange of words and phrases. The LaMDA classifiers then use the quality and safety scores to predict the possible response the model must give in the interview.
Possible answers with a low safety score are filtered out before the answer with the highest score is chosen to continue the conversation. These scores are based on safety, sensitivity, specificity, and percentages of interest. Its aim is to provide the most relevant, high-quality and secure answer possible.
LaMDA’s main goals and metrics
To guide the training of the model, Google set several goals: quality, safety and ground reality. The former is measured based on the achieved levels of sensitivity, specificity and interest. It is used to ensure that the answer makes sense in the context in which it is asked and is specific to the question being asked. Also provide necessary information for creating better dialogs.
In terms of security, it should be considered that the model conforms to the standards of responsible artificial intelligence. Therefore, there is a list of security targets that are used to capture and control the behavior of the model. In this way, it can be ensured that the sentences produced by the model are not biased, inappropriate or erratic.
Finally, what is known as ground reality is used to measure the responses to be as factual as possible. Measured as the percentage of responses that contain real-world statements, it is a variable that allows chat system users to judge the validity of a response based on the reliability of the sources it uses.
Model evaluation
The evaluation of a once-trained model and its usual behavior in interviews is constant. To this end, their progress, the responses produced by the pre-trained model and the post-adjusted model are quantified. Also the answers from the people in charge of their ratings. All of this is reviewed to assess the answers LaMDA gives in relation to the mentioned quality, safety and ground reality metrics.
The results of the LaMDA evaluation so far have reached several conclusions. It’s the first one your quality metrics improve with the number of parameters, and its security does so with modification and fine-tuning. As for the ground reality, it gets better as the model size increases.
Possible use of LaMDA
Although the development and accuracy work of LaMDA is not yet finished, there are already provisions for using the model in different situations and use cases. For example, improve the customer experience different types of establishments. Also to launch chatbots which offer a conversation more similar to the one we humans have. In addition, LaMDA integration for Google search navigation has a high chance of becoming a reality.
On the other hand, it should be taken into account that quite LaMDA is likely to ultimately affect SEO. By focusing on language and conversational models, Google hints at its vision for the future of search and points to a change in the way its products are developed. This will also lead to a possible change in the search behavior of Internet users.
The LaMDA model will undoubtedly be key to understanding the questions asked by information seekers. And it emphasizes the need to ensure that content available on the Internet is optimized for people, not search engines. Also update content regularly to ensure it evolves and remains relevant over time.
It is possible that in the future, instead of answering a question with a text box with a list of independent sentences, a search engine will generate natural language text offering explanations, facts and links to resources.
Major difficulties and obstacles for LaMDA
As with all AI models, LaMDA has issues and difficulties that need to be addressed. The two main ones have to do with security and with the earthly reality we just saw.
When it comes to security, the main obstacle for LaMDA is avoid bias. Since the answers can be obtained from anywhere on the Internet, there is a good chance that the answers provided by the model will amplify the bias and reflect what is shared online. Therefore, to ensure that the model does not generate unpredictable and even harmful results, Google has made the resources used to analyze and train the model open source.
The company thus allows different groups to participate in the creation of the datasets it uses to train the model. This helps identify any bias and minimize the sharing of misinformation or harmful information.
As for the ground reality, it should be taken into account that it is not always easy to verify the reliability of the answers produced by artificial intelligence models, since they collect resources from all over the web. To overcome this problem, the Google team working with LaMDA allows the model to query various external sources, including information retrieval systems. Even use a calculator. All so that we can provide accurate results.
The model measure of earth reality also ensures this The answers the model provides are based on known sources. These resources are shared so that users can verify the results offered and to prevent false information from being offered.
Google is clear that there are advantages and disadvantages to using open dialog models like LaMDA. That is why they are committed to improving your safety and the level of your earthly reality. They do this so that they can offer a more reliable and unbiased experience.
In the future, we can also look forward to training LaMDA models with different data, which may include images and videos. This opens up new possibilities for conversations to be had with them. But we still don’t know when all this will become a reality. Google has not yet offered data on specific data or integrations for LaMDA. But everything points to the fact that they will be part of his future.