Skip to main content


Story Datasets

Dataset Papers Paper Code Hugging Face Link Leaderboard
FanFiction Archive - Beyond Canonical Texts: A Computational Analysis of Fanfiction and Harry Potter and the Action Prediction Challenge from Natural Language
Deep Dungeons and Dragons (DDD) Corpus - Deep Dungeons and Dragons: Learning Character-Action Interactions from Role-Playing Game Transcripts
ROCStories - 5-sentence crowdsourced stories for Story Cloze Test A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories and LSDSem 2017 Shared Task: The Story Cloze Test
CaTeRS - Causal and temporal relations using ROC Stories CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures
Scifi TV Plots - science fiction shows on Story Realization: Expanding Plot Events into Sentences
WritingPrompts - r/WritingPrompts Hierarchical Neural Story Generation
Lit Bank - Project Gutenberg An Annotated Dataset of Literary Entities and Literary Event Detection
STORIUM - (gamified storytelling) STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation
ESTER - tagged events from news articles from the TempEval3(TE3) workshop ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning
CMU Movie Summary Corpus - Wikipedia movie summaries Learning Latent Personas of Film Characters
The Children’s Book Test - kids' books from Project Gutenberg The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations
Cornell Movie Dialog - movie scripts and metadata Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs
ScriptWriter - from GraphMovie, which no longer exists (descriptions of movie plots) ScriptWriter: Narrative-Guided Script Generation
NarrativeQA - movie scripts from various sources and Project Gutenberg books The NarrativeQA Reading Comprehension Challenge
MCTest - 150-300 word stories written by crowdworkers MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text
InSentive - authored stories from BookCorpus Inspiration through Observation: Demonstrating the Influence of Automatically Generated Text on Creative Writing
dScryb - human-written scene descriptions

Mixed Visual + Textual Datasets

Dataset Papers Paper Code Hugging Face Link
BookCorpus Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books and Skip-thought vectors
COIN COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis
WikiHow WikiHow: A Large Scale Text Summarization Dataset
VIST Visual Storytelling
MovieGraphs MovieGraphs: Towards Understanding Human-Centric Situations from Videos

Cloze Tests

Dataset Papers Leaderboard
Narrative Cloze Test Unsupervised Learning of Narrative Event Chains
Story Cloze Test A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories and LSDSem 2017 Shared Task: The Story Cloze Test
BookTest - Cloze Test using Project Gutenberg Embracing Data Abundance
Who-did-What - Cloze Test using LDC English Gigaword newswire corpus Who did What: A Large-Scale Person-Centered Cloze Dataset

Data Scrapers & Processors

IF Environments

Dataset Papers Paper Code
LIGHT Learning to Speak and Act in a Fantasy Text Adventure Game
Jericho Modeling Worlds in Text and Interactive Fiction Games: A Colossal Adventure
TextWorld TextWorld: A Learning Environment for Text-based Games

Planning Systems

Planner/Code Papers
Glaive Glaive: a state-space narrative planner supporting intentionality and conflict
StoryAssembler StoryAssembler: An Engine for Generating Dynamic Choice-Driven Narratives
Belief and Intentional PDDL Using Domain Compilation to Add Belief to Narrative Planners
STRIPS Planner for Python

Character Modeling

Knowledge Bases

Knowledge Base Papers Hugging Face Link
VerbNet VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon
FrameNet FrameNet II: Extended Theory and Practice
WordNet WordNet: An Electronic Lexical Database
ConceptNet ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
ATOMIC ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning
GLUCOSE GLUCOSE: GeneraLized and COntextualized Story Explanations
Power and Agency in modern films Connotation Frames of Power and Agency in Modern Films
Social Bias Frames Social Bias Frames: Reasoning about Social and Power Implications of Language
Eraser - Movie Rationales ERASER: A Benchmark to Evaluate Rationalized NLP Models
The NOC List Round Up The Usual Suspects: Knowledge-Based Metaphor Generation
NULEX - combines WordNet, VerbNet, and Wiktionary NULEX: An Open-License Broad Coverage Lexicon
CausalBank Guided Generation of Cause and Effect

Other Code

Code Papers
Plot-guided Coherence Evaluation Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation
Story Gen BART Content Planning for Neural Story Generation with Aristotelian Rescoring
EnGen Neural text generation in stories using entity representations as context
Choose Your Own Adventure Evaluation Choose Your Own Adventure: Paired Suggestions in Collaborative Writing for Evaluating Story Generation Models
Sentence Mover's Similarity Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts
AI Dungeon 2
COMET - uses ATOMIC and ConceptNet COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Plan-And-Write Plan-and-Write: Towards Better Automatic Storytelling
ASTER (Automated Story-Telling using Event Representations) Event Representations for Automated Story Generation with Deep Neural Nets

Libraries & Toolkits



Noteable IF Games in Research


Other Courses

Content Generation for TRPGs and IF

Other Languages for Writing Interactive Fiction