Diplomacy Betrayal Dataset ========================== This dataset contains a collection of interaction sequences between allies in online Diplomacy [1] games. A sequence consists of consecutive game seasons during which the two players exchange messages and help each other in the game. Half of the sequences end with betrayal, while the other half are part of lasting friendships. URL: http://vene.ro/betrayal License: Open Data Commons Attribution (ODC-By 1.0) Summary: http://opendatacommons.org/licenses/by/summary/ See LICENSE.txt for more details. Authors: Vlad Niculae Srijan Kumar Jordan Boyd-Graber Cristian Danescu-Niculescu-Mizil Version: 1.0 (02/23/2016) The dataset is further described in our paper: Vlad Niculae, Srijan Kumar, Jordan Boyd-Graber and Cristian Danescu-Niculescu-Mizil. Linguistic harbingers of betrayal: A case study on an online strategy game. In: Proceedings of ACL, 2015. Description ----------- Diplomacy [1] is a popular and engaging strategic board game that is often played online [2, 3]. It is based heavily on communication between the players. Due to its military domination setting, Diplomacy is a well suited environment for studying naturally occurring betrayal and deception. From a collection of Diplomacy game logs, we identified and extracted *ongoing, established, and reciprocal* friendships: relationships that contain at least two consecutive and reciprocated acts of support that span at least three seasons in game time, with no more than five seasons passing between two acts of friendship. We then identified 250 *betrayals*: the subset of friendships described above that are followed by at least two attacks. To match each betrayal, we selected a friendship that is not followed by any offensive action, but is otherwise nearly identical (in terms of length and relative time within the game). The current dataset consists of these selected betrayals and friendships only. Each relationship contains a sequence of seasons. Within each season, we provide features extracted from the messages sent by each player. More details about the message representation are available in the "Data format" section of this README. Files ----- * diplomacy_data.json - a JSON file containing the dataset; * imbalance_plot.py - a sample Python script to reproduce one of the images from the website (http://vene.ro/betrayal/imba.png); * README.txt - this readme; * LICENSE.txt - full text of the ODC-BY-1.0 license. If redistributing any subset of the dataset, please include the README.txt and LICENSE.txt files. Data format ----------- The dataset is a UTF-8 encoded JSON file: >>> import json >>> from io import open >>> with open("diplomacy_data.json", "r") as f: ... diplomacy = json.load(f) ... It is structured as a list of dictionaries, one for each of the 500 sequences. >>> len(diplomacy) 500 This is an example of one such entry, with the fields explained: >>> entry = diplomacy[0] >>> entry { 'idx': 0, # unique identifier of the dataset entry 'game': 74, # unique identifier of the game it comes from 'betrayal': True, # whether the friendship ended in betrayal 'people': u'AT', # the countries represented by the two players # (in this case, Austria and Turkey) 'seasons': ... } The 'seasons' field is again a list of dictionaries, one for each game season in the friendship sequence. In the example below, there are 8 seasons, each identified by the game year. Decimal notation is used to denote the season in each year. For example, 1906.0 is the spring of 1906 and 1906.5 is the fall of 1906. Each season is also marked with what interaction the two players have at the end of the discussion: whether the players supported one another ('support'), attacked one another ('attack'), or did not have explicit military interactions (null). >>> seasons = entry['seasons'] >>> len(seasons) 8 >>> seasons[0] { 'season': 1906.5, # fall of the year 1906 (game time) 'interaction': { 'victim': u'support', # the victim supported the betrayer 'betrayer': u'support' # the betrayer supported the victim }, 'messages': { 'victim': ..., 'betrayer': ... } } The ['messages']['victim'] and ['messages']['betrayer'] fields are lists of features of each message sent by the victim to the betrayer, and by the betrayer to the victim, respectively: >>> msgs = seasons[0]['messages']['betrayer'] >>> len(msgs) 6 >>> msgs[0] { "n_words": 146, # number of words in the message "n_sentences": 9, # number of sentences in the message "n_requests": 7, # number of request sentences "politeness": 0.8320, # politeness of the requests (from 0 to 1) # (using the Stanford Politeness # Classifier available at [4]) "sentiment": { "positive": 1, # no. sentences with positive sentiment "neutral": 3, # " " neutral sentiment "negative": 5 # " " negative sentiment }, # (using Stanford Sentiment Analysis [5]) "lexicon_words": { # words and phrases matching several "disc_expansion": [ # linguistic and psycholinguistic lexicons "until", # (see below for details) "yet", "instead" ], "premise": [ "for", "for" ], ... }, "frequent_words": [ # frequent words in the message "more", # (occurring in at least 50 messages "let", # and 5 friendships overall) "keep", "... ] } The words in each list are in random order. The order of messages within a season is also randomized. This measure is in place to protect the privacy of the players and of their conversations. The lexicons used to construct the "lexicon_words" field are: * 'claim', 'premise': Argumentation structure markers [6] * 'allsubj': Subjective markers [7] * 'disc_*': Discourse markers from the Penn Discourse Treebank. [8] Includes 'disc_comparison', 'disc_expansion', 'disc_contingency', 'disc_temporal_future' and 'disc_temporal_rest' (we manually split 'temporal' from PDT into 'temporal_future' and 'temporal_rest' to capture planning). References ---------- [1] https://en.wikipedia.org/wiki/Diplomacy_%28game%29 [2] http://www.floc.net/dpjudge/ [3] http://usak.asciiking.com/ [4] http://politeness.mpi-sws.org/ [5] http://nlp.stanford.edu/sentiment/ [6] C. Stab and I. Gurevych. Identifying Argumentative Discourse Structures in Persuasive Essays. In: Proceedings of EMNLP, 2014. https://www.ukp.tu-darmstadt.de/data/argumentation-mining/ [7] E. Riloff and J. Wiebe. Learning extraction patterns for subjective expressions. In: Proceedings of EMNLP, 2003. http://www.anthology.aclweb.org/W/W03/W03-1014.pdf [8] https://www.seas.upenn.edu/~pdtb/