Using AI in Business

Comparison of AI language models: ChatGPT Leads, Other Models Catch Up

Benchmarks can barely keep up with the enormous leaps in the development of large language models (LLMs). Many people have already tested ChatGPT, Aleph Alpha and similar tools and were impressed by the linguistic quality of the results. However, the question remains as to what added value these language models offer in business use and which ones can best be integrated into corporate IT solutions. The AI experts at Lufthansa Industry Solutions (LHIND) examined the business value of various LLMs using concrete examples. The focus was on how the models analyze content, recognize, and categorize required information, and present it in a bundled form.

Norderstedt, February 2, 2024 - The topic of artificial intelligence (AI) has arrived in German companies. According to a recent survey by the digital industry association Bitkom, only 15 percent of respondents are using AI in their companies, but more than two-thirds (68 percent) consider AI to be the most important technology of the future. Generative AI, however, is used by only 2 percent of companies, and more than half of those surveyed see little benefit in it.

"LLMs can make a significant contribution to analyzing large volumes of unstructured data and gaining valuable insights. Companies can use these models, for example, to analyze customer feedback, process company-specific documents in a precise and understandable way for employees, or even develop predictive models," says Lasse Neumann, IT Consultat at LHIND. "However, selecting the right language model and adapting it to the respective requirements is crucial."

For this reason, the AI experts at LHIND regularly test the most important LLMs for their suitability in practice. The models are confronted with tasks from selected areas that come close to the requirements of various use cases from operational practice.

The LLMs tested are

  • PaLm
  • GPT 3.5
  • GPT 4
  • Llama 2 (70 b)
  • Aleph alpha (Luminous-supreme control)
  • Falcon (180 b)
  • Claude V2

The Method

The various LLMs were tested on a data set relevant to German customers with different tasks to determine how they could analyze content and recognize, categorize and bundle the required information. Unlike previous benchmarks, the tests were conducted in German. The assessment was based on ten sample content items from different knowledge areas, including current news. The items were grouped into categories that play a significant role in business practice:

  • Recognition of named entities
  • Summary in 3-4 sentences
  • Q&A (answerable questions)
  • Q&A (non-answerable questions)

Performance in each task was scored on a scale of 1 (incomplete, incorrect) to 5 (complete, perfect). The overall results for each Large Language Model were determined by combining the individual scores.

The Results

OpenAI's GPT performed best in all tasks, with overall scores of 4.23 (GPT 3.5) and 4.52 (GPT 4). The other models scored between 3.03 (Aleph Alpha) and 3.67 (PaLM). The AI performed particularly well in the Summary and Q&A categories. GPT 3.5 showed weaknesses in named entity recognition with a score of 3.37, while PaLM tied with GPT 4 (3.83) for the best score of all models with a score of 3.74.

"GPT's excellent results are not surprising, as the model has been continuously trained for a year on the open market with millions of users," said Soniya Prasad, IT Consultant at LHIND. "However, compared to other test results, we can see that other language models are catching up with GPT.

About Lufthansa Industry Solutions

Lufthansa Industry Solutions is a service provider for IT consulting and system integration. This Lufthansa subsidiary helps its clients with the digital transformation of their companies. Its customer base includes companies both within and outside the Lufthansa Group, as well as more than 300 companies in various lines of business. The company is based in Norderstedt and employs more than 2,500 members of staff at several branch offices in Germany, Albania, Switzerland and the USA.