Recognizing similes

a work in progress


Victoria Ianeva

Vlad Niculae

With many tHanks to prof. Patrick Hanks!

Current status

  • Similes have been neglected in favour of metaphor
  • Several works on simile structure, classification, understanding
  • Existing theories weren't made with NLP in mind

Objectives

  • Ask the right questions about similes
    • distinguish similes / comparisons / others
    • identify the parts of a simile
    • analyze a simile (e.g. is it clear? is it creative?)
  • Try to answer computationally
  • Invite others to answer them

Motivation

  • People with ASD have trouble with figurative language (nod to Victoria's talk)
  • Similes have a specific but flexible structure
  • Similes are not "simpler metaphors"!
  • not usually, at least: "Similes often can't be reformulated as metaphors" (Hanks, 2012)

The structure of comparisons

First note that:

  • all similes are comparisons by definition
  • similes and comparison have the same structure

Syntactic elements (in Hanks, 2012)

  1. Topic
  2. Event or state
  3. Property
  4. Comparator
  5. Vehicle

[He $^1$] [looked $^2$] [like $^4$] [a broiled frog $^5$], [hunched $^3$] over his desk, grinning and satisfied.

Other points of view

  • Fishelov (1993) identifies $T$, $P$, $C$, $V$, overlooks $E$
  • Fishelov (2007) studies simile comprehension with toy phrases (fixed human $T$, $\pm P$, variable $V$)
  • Veale (2010) mines the "as $\alpha$ as $X$" pattern to learn conventional properties

What's missing

  • The focus is on questionable types of simile
  • We'd prefer corpus evidence to know what the real problems are.
  • Hanks (2012):
    • Similes are used to introduce new things
    • The vehicle is rarely something experienced
  • It's clearly not enough to handle conventional similes and salient traits!

What else is missing

  • The importance of the event or state marker $E$ is underestimated
  • We argue it is the core of the simile:
    • Syntactically, all elements orbit around it
  • A specific $E$ seems to make it easier to understand, give hints for guessing $P$:

My coffee is like rain-soaked cigars.

versus

Help! My coffee tastes like rain-soaked cigars.

  • Idea: To simplify a simile, make $E$ more specific.

GLARF

  • Grammatical and Logical Argument Representation Framework
  • Basically a parse tree enricher
  • Meyers et al (2001) at NYU, software released late 2011
  • Hope to avoid it eventually, good for boodstrapping

pyglarf

  • Developed by Vlad last summer during an FBK internship (see slides)
  • Available on GitHub
  • Lets you call GLARF and play around with the output trees easily:
In [56]:
from pyglarf import GlarfWrapper, GlarfTree

text = "My coffee tastes like rain-soaked cigars."
with GlarfWrapper() as glarf:
    sent, parsed, glarfed = glarf.make_sentences(text)

print parsed[0]  # the zero is since there is only one sentence
(S1 (S (NP (PRP$ My) (NN coffee)) (VP (VBZ tastes) (PP (IN like) (NP (JJ rain-soaked) (NNS cigars)))) (. .)))
In [55]:
print glarfed[0][:512] + '...'  # And here is a part of the corresponding GLARF tree
((S
  (SBJ
   (NP (T-POS (PRP$ My 0)) (HEAD (NN coffee 1)) (PTB2-POINTER |0+1|)
    (INDEX 3)))
  (PRD
   (VP
    (HEAD
     (VG (HEAD (VBZ tastes 2)) (P-ARG0 (NP (EC-TYPE PB) (INDEX 3))) (INDEX 9)
      (BASE TASTE) (VERB-SENSE 1) (SENSE-NAME "USE ONE'S TASTEBUDS")))
    (ADV
     (PP (HEAD (IN like 3))
      (OBJ
       (NP
        (S-A-POS
         (S (I-SBJ (NP (EC-TYPE REL) (INDEX 5)))
          (PRD
           (VP (S-STEM (NX (HEAD (NN rain 4)) (INDEX 7)))
            (PUNCTUATION (HYPH - 5))
        ...

Extract relations

  • Triggered by verbs and nominalizations
  • Carry around all the information we can
In [57]:
tree = GlarfTree.glarf_parse(glarfed[0])
for rel in tree.rels():
    print rel
TASTE/2 [CATEGORY: VG, INDEX: 9, SENSE-NAME: "USE ONE'S TASTEBUDS", VERB-SENSE: 1, PARENT_CATEGORY: VP]
P-ARG0 [SBJ NP INDEX: 3]:  My/0 coffee/1 (NP+T-POS 0-1)
ADV [PP INDEX: None]: like/3 rain/4 soaked/6 cigars/7 (HEAD+IN 3-3, OBJ+NP 4-7)

SOAK/6 [CATEGORY: VG, INDEX: 10, SENSE-NAME: "ABSORB", PARENT_CATEGORY: VP, VERB-SENSE: 2, VOICE: PASSIVE]
P-ARG1 [HEAD NX INDEX: 5]:  cigars/7 (NX+HEAD 7-7)
P-ARG2 [S-STEM NX INDEX: 7]:  rain/4 (NX+HEAD 4-4)

  • And with some custom code (not shown in slides, available if you right-click and view source), we can match all comparisons that use "like":
In [51]:
for comparison in find_comparison_nodes(tree):
    pprint(get_args(*comparison))
{'comparator': 'like*',
 'state': 'TASTE',
 'theme': 'My coffee',
 'vehicle': 'like rain soaked cigars'}

What now?

  • We developed this on the 20 examples of similes and comparisons from Hanks (2005)
  • Build a comparison dataset, with as much variation as possible!
  • More patterns to identify comparison nodes
  • Move away from GLARF, try to use only a dependency parser

What comes after?

  • $P$ is more tricky, find it if it appears!
  • After identification comes categorization
  • Machine learning, word similarity to decide if it's a simile
  • Corpus evidence to decide conventional vs. creative
  • Fill in the blanks!

This presentation (and this presenter) is powered by: