Skip to main content

Resources

Story Datasets

Dataset Papers Paper Code Hugging Face Link Leaderboard
FanFiction Archive - fanfiction.net Beyond Canonical Texts: A Computational Analysis of Fanfiction and Harry Potter and the Action Prediction Challenge from Natural Language http://github.com/smilli/fanfiction
Deep Dungeons and Dragons (DDD) Corpus - roleplayerguild.com Deep Dungeons and Dragons: Learning Character-Action Interactions from Role-Playing Game Transcripts
ROCStories - 5-sentence crowdsourced stories for Story Cloze Test A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories and LSDSem 2017 Shared Task: The Story Cloze Test http://github.com/smilli/fanfiction https://competitions.codalab.org/competitions/15333
CaTeRS - Causal and temporal relations using ROC Stories CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures
Scifi TV Plots - science fiction shows on fandom.com Story Realization: Expanding Plot Events into Sentences https://github.com/rajammanabrolu/StoryRealization https://huggingface.co/datasets/lara-martin/Scifi_TV_Shows
WritingPrompts - r/WritingPrompts Hierarchical Neural Story Generation https://github.com/pytorch/fairseq https://huggingface.co/datasets/rewardsignal/reddit_writing_prompts
Lit Bank - Project Gutenberg An Annotated Dataset of Literary Entities and Literary Event Detection https://github.com/dbamman/litbank
STORIUM - storium.com (gamified storytelling) STORIUM: A Dataset and Evaluation Platform for Machine-in-the-Loop Story Generation https://github.com/dojoteef/storium-gpt2
ESTER - tagged events from news articles from the TempEval3(TE3) workshop ESTER: A Machine Reading Comprehension Dataset for Event Semantic Relation Reasoning https://github.com/PlusLabNLP/ESTER https://eventqa.github.io/
CMU Movie Summary Corpus - Wikipedia movie summaries Learning Latent Personas of Film Characters
The Children’s Book Test - kids' books from Project Gutenberg The Goldilocks Principle: Reading Children’s Books with Explicit Memory Representations https://github.com/facebookarchive/bAbI-tasks
Cornell Movie Dialog - movie scripts and metadata Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs https://convokit.cornell.edu/documentation/movie.html https://huggingface.co/datasets/cornell_movie_dialog
ScriptWriter - from GraphMovie, which no longer exists (descriptions of movie plots) ScriptWriter: Narrative-Guided Script Generation https://github.com/DaoD/ScriptWriter
NarrativeQA - movie scripts from various sources and Project Gutenberg books The NarrativeQA Reading Comprehension Challenge https://github.com/deepmind/narrativeqa https://huggingface.co/datasets/narrativeqa https://paperswithcode.com/sota/question-answering-on-narrativeqa
MCTest - 150-300 word stories written by crowdworkers MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text https://huggingface.co/datasets/sagnikrayc/mctest https://paperswithcode.com/dataset/mctest
InSentive - authored stories from BookCorpus Inspiration through Observation: Demonstrating the Influence of Automatically Generated Text on Creative Writing https://github.com/roemmele/InSentive
dScryb - human-written scene descriptions

Mixed Visual + Textual Datasets

Dataset Papers Paper Code Hugging Face Link
BookCorpus Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books and Skip-thought vectors https://github.com/ryankiros/skip-thoughts https://huggingface.co/datasets/bookcorpus
COIN COIN: A Large-scale Dataset for Comprehensive Instructional Video Analysis https://github.com/coin-dataset
WikiHow WikiHow: A Large Scale Text Summarization Dataset https://github.com/mahnazkoupaee/WikiHow-Dataset https://huggingface.co/datasets/wikihow
VIST Visual Storytelling
MovieGraphs MovieGraphs: Towards Understanding Human-Centric Situations from Videos

Cloze Tests

Dataset Papers Leaderboard
Narrative Cloze Test Unsupervised Learning of Narrative Event Chains
Story Cloze Test A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories and LSDSem 2017 Shared Task: The Story Cloze Test https://competitions.codalab.org/competitions/15333
BookTest - Cloze Test using Project Gutenberg Embracing Data Abundance
Who-did-What - Cloze Test using LDC English Gigaword newswire corpus Who did What: A Large-Scale Person-Centered Cloze Dataset https://tticnlp.github.io/who_did_what/leaderBoard.html

Data Scrapers & Processors

IF Environments

Dataset Papers Paper Code
LIGHT Learning to Speak and Act in a Fantasy Text Adventure Game https://github.com/facebookresearch/ParlAI
Jericho Modeling Worlds in Text and Interactive Fiction Games: A Colossal Adventure https://github.com/JerichoWorld/JerichoWorld
TextWorld TextWorld: A Learning Environment for Text-based Games https://github.com/Microsoft/TextWorld

Planning Systems

Planner/Code Papers
Glaive Glaive: a state-space narrative planner supporting intentionality and conflict
StoryAssembler StoryAssembler: An Engine for Generating Dynamic Choice-Driven Narratives
Belief and Intentional PDDL Using Domain Compilation to Add Belief to Narrative Planners
STRIPS Planner for Python

Character Modeling

Knowledge Bases

Knowledge Base Papers Hugging Face Link
VerbNet VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon
FrameNet FrameNet II: Extended Theory and Practice
WordNet WordNet: An Electronic Lexical Database
ConceptNet ConceptNet 5.5: An Open Multilingual Graph of General Knowledge https://huggingface.co/datasets/conceptnet5
ATOMIC ATOMIC: An Atlas of Machine Commonsense for If-Then Reasoning https://huggingface.co/datasets/atomic
GLUCOSE GLUCOSE: GeneraLized and COntextualized Story Explanations https://huggingface.co/datasets/glucose
Power and Agency in modern films Connotation Frames of Power and Agency in Modern Films
Social Bias Frames Social Bias Frames: Reasoning about Social and Power Implications of Language
Eraser - Movie Rationales ERASER: A Benchmark to Evaluate Rationalized NLP Models https://huggingface.co/datasets/movie_rationales
ECIpedia
The NOC List Round Up The Usual Suspects: Knowledge-Based Metaphor Generation
NULEX - combines WordNet, VerbNet, and Wiktionary NULEX: An Open-License Broad Coverage Lexicon
CausalBank Guided Generation of Cause and Effect

Other Code

Code Papers
Plot-guided Coherence Evaluation Plot-guided Adversarial Example Construction for Evaluating Open-domain Story Generation
Story Gen BART Content Planning for Neural Story Generation with Aristotelian Rescoring
EnGen Neural text generation in stories using entity representations as context
Choose Your Own Adventure Evaluation Choose Your Own Adventure: Paired Suggestions in Collaborative Writing for Evaluating Story Generation Models
Sentence Mover's Similarity Sentence Mover’s Similarity: Automatic Evaluation for Multi-Sentence Texts
AI Dungeon 2
COMET - uses ATOMIC and ConceptNet COMET: Commonsense Transformers for Automatic Knowledge Graph Construction
Plan-And-Write Plan-and-Write: Towards Better Automatic Storytelling
ASTER (Automated Story-Telling using Event Representations) Event Representations for Automated Story Generation with Deep Neural Nets

Libraries & Toolkits

Tutorials

Extras

Noteable IF Games in Research

Inspiration

Other Courses

Content Generation for TRPGs and IF

Other Languages for Writing Interactive Fiction