Brighter than Gold: Figurative Language in User Generated Comparisons

Vlad Niculae and Cristian Danescu-Niculescu-Mizil
EMNLP 2014 paper. [ pdf ] [ cite ] [ data ] [ slides ]

When you spend a large part of your life underground, you develop a very literal mind. Dwarfs have no use for metaphor and simile. Rocks are hard, the darkness is dark. Start messing around with descriptions like that and you’re in big trouble, is their motto.
—Terry Pratchett, Guards! Guards!

TL; DR

Unlike dwarfs, we humans love trouble, so we study figurative comparisons (similes) in the wild, from Amazon product reviews. We make available an annotated dataset. We manage to detect figurativeness with high accuracy, using linguistic features. This puts us in the novel position of being able to investigate the interaction of figurative language use and social context: we show strong relationships between figurative and review rating and helpfulness.

What are comparisons and similes?

Comparisons are phrases that express the likeness of two things. They are useful for communicating something potentially new, helping the audience picture it better and frame it better in relation to something known.

Often, comparisons are not meant to be taken literally. Figurative comparisons are an important figure of speech called simile. The difference can be seen in the following examples, paraphrased from Amazon reviews:

Sterling is much cheaper than gold.
Her voice makes this song shine brighter than gold.

There is no simple way to automatically tell whether a comparison is literal or figurative. The difference between the two is sometimes subtle and subjective, to the point that humans find it difficult and sometimes disagree when having to distinguish between such tricky cases. Using linguistic and domain-specific cues, we manage to get within 10% of human performance on our data.

Why do they matter?

People like similes, as shown by the popularity of the best ones on goodreads! But figurative language is not at all restricted to literature and poetic language. It turns out people use it a lot when describing stuff. We find that about 30% of the comparisons in Amazon reviews are figurative.

The use of similes is strongly related to extreme opinion in terms of review ratings:

Also, the comparisons in reviews found helpful by more people tend to be more literal:

Dataset

Download the dataset (contains this readme).

This dataset contains a collection of 1400 comparisons annotated for figurativeness together with the context in which they appeared. The comparisons are extracted mostly from Amazon.com product reviews (1260 comparisons) and from the general web (140 comparisons).

Paper

Download the PDF.

BibTeX entry:

@inproceedings{niculae14brighter,
  author    = {Vlad Niculae and Cristian Danescu-Niculescu-Mizil},
  title     = {{Brighter than gold: Figurative language in user generated comparisons}},
  booktitle = {Proceedings of EMNLP},
  month     = {October},
  year      = {2014},
}

Abstract. Comparisons are common linguistic devices used to indicate the likeness of two things. Often, this likeness is not meant in the literal sense—for example, "I slept like a log" does not imply that logs actually sleep. In this paper we propose a computational study of figurative comparisons, or similes. Our starting point is a new large dataset of comparisons extracted from product reviews and annotated for figurativeness. We use this dataset to characterize figurative language in naturally occurring comparisons and reveal linguistic patterns indicative of this phenomenon. We operationalize these insights and apply them to a new task with high relevance to text understanding: distinguishing between figurative and literal comparisons. Finally, we apply this framework to explore the social context in which figurative language is produced, showing that similes are more likely to accompany opinions showing extreme sentiment, and that they are uncommon in reviews deemed helpful.