ChatGPT failed accounting exams

Last month, OpenAI introduced its latest artificial intelligence chatbot product, GPT-4. The company claims that the bot, which uses machine learning to generate text in natural language style, performs exceptionally well in a variety of exams. Most importantly, she scored in the 90th percentile on the bar exam, passed 13 of the 15 AP exams, and earned a near-perfect score on the GRE oral test.

Scientists from Brigham Young University and 186 other institutions were interested in the effectiveness of OpenAI technology in accounting exams. So they tested the original ChatGPT model. The researchers said that while ChatGPT still needs improvement in accounting, it has the potential to revolutionize the way teaching and learning are made better.

“When this technology first came out, everyone was worried that students might now use it to cheat,” said lead study author David Wood, a BYU accounting professor. “But there were always opportunities to cheat. So for us, we’re trying to focus on what we couldn’t do before to improve the learning process for teachers and the learning process for students, now focus on what we can do with this technology. The test was an eye opener.”

Since its debut in November 2022, ChatGPT has grown to 100 million users in less than two months, making it the fastest growing tech platform. In response to heated debate about how models like ChatGPT should impact education, Wood decided to hire as many professors as possible to see how AI performs on college accounting students.

Demand for contributors exploded through social media, with 327 respondents from 186 institutions from 14 countries participating in the survey and submitting 25,181 accounting exam questions. They also commissioned BYU undergraduates (including Wood’s daughter, Jessica) to provide ChatGPT with an additional 2,268 questions from the textbook test bank. Questions covered accounting information systems (AIS), auditing, financial accounting, managerial accounting, and taxation, and varied in difficulty and type (true/false, multiple choice, short answer, etc.).

While ChatGPT’s performance was impressive, students performed better. The overall average score of students with a ChatGPT score of 47.4% is 76.7%. ChatGPT scored higher than the student average on 11.3% of the questions, and AIS and Audit performed particularly well. But the AI bot performed worse in tax, financial and management assessments, possibly because ChatGPT had problems with the mathematical processes required for the second type.

In terms of question type, ChatGPT outperformed true/false questions (68.7% true) and multiple-choice questions (59.5%), but struggled with short-answer questions (28.7% vs 39.1%). Overall, ChatGPT had a hard time answering higher-level questions. Sometimes ChatGPT even provides authoritative written explanations for wrong answers or answers the same question in different ways.

“It’s not perfect; “You’re not going to use it for everything,” said Jessica Wood, a freshman at BYU. “Trying to learn with ChatGPT alone is stupid.”

The researchers also discovered some other exciting trends during the study, including:

ChatGPT doesn’t always recognize it when doing math and makes silly mistakes like adding two numbers or dividing numbers incorrectly in a subtraction problem.
ChatGPT often provides explanations for their answers, even if they are incorrect. In other cases, ChatGPT’s explanations are correct, but then jump to the wrong multiple choice answer.
ChatGPT sometimes makes up facts. For example, when providing a link, it creates an actual link that is completely fabricated. There are no works, and sometimes authors.

However, the authors fully expect that GPT-4 will exponentially improve the accounting questions and above-mentioned issues raised in their work. What they see most promising is how a chatbot can help improve teaching and learning, including using it to design and test assignments or perhaps write parts of a project.

“It’s an opportunity to consider whether we’re teaching extra knowledge,” said Melissa Larson, co-author of the study and BYU accounting professor. “It’s a disruption and we need to evaluate where we’re going from here. Of course I’ll still have TAs, but that will force us to use them in different ways.”

Source: Port Altele

Mary

As an experienced journalist and author, Mary has been reporting on the latest news and trends for over 5 years. With a passion for uncovering the stories behind the headlines, Mary has earned a reputation as a trusted voice in the world of journalism. Her writing style is insightful, engaging and thought-provoking, as she takes a deep dive into the most pressing issues of our time.

Recent Posts

An unprecedented ecosystem found under the ice of an Antarctic lake December 24, 2024

Vivo Y29 5G officially launched: entry level with LED Dynamic Light December 24, 2024

Can animals have schizophrenia like humans? December 24, 2024