Large language models for aspect-based sentiment analysis

A finetuned GPT-3.5 Turbo model achieves state-of-the-art performance in aspect-based sentiment analysis (ABSA). Zero-shot and few-shot settings with GPT-4 and GPT-3.5 reach decent performance too.

The big picture: In August, OpenAI announced fine-tuning for GPT-3.5 Turbo. Fine-tuning enables the general model to be optimized for a specific task. My colleague Paavo Huoviala and me tested the performance of a fine-tuned GPT-3.5 Turbo on the SemEval 2014 Task 4 joint aspect term extraction and polarity classification task. We found that the model achieves state-of-the-art performance. However, this comes at the price of 1000 times more model parameters and thus increased inference cost. We also tested zero-shot and few-shot settings with GPT-4 and GPT-3.5. These models reach decent performance too, without requiring training data.

Learn more: My colleague Paavo Huoviala and me recently published an article on arXiv. The related code is available on Github.

Aspect-based sentiment analysis

In contrast to regular sentiment analysis that assigns one polarity label to an entire text, aspect-based sentiment analysis (ABSA) aims to identify the polarity of individual aspects of a text. For example, in the sentence “The food was great, but the service was terrible”, ABSA would identify the aspect “food” as positive and “service” as negative.

Implications for practitioners

Fine-tuning GPT-3.5 isn’t difficult or expensive. In this case, it cost less than $30 to fine-tune on 5572 training examples.
Fine-tuned large language models (LLMs) can achieve better performance in classic NLP tasks than smaller transformer models like RoBERTa.
A fine-tuned model doesn’t seem to benefit from prompt engineering. This reduces the number of input tokens and thus inference cost.
For ad-hoc projects, acceptable performance can be reached with just a few examples. After the proof of concept, more examples can be collected with help from the model.