ML in Modern Storytelling
By Team Clarifai
With over a billion uploads and over 5 million writers contributing every month, evaluating and sorting stories has become impossible for Wattpad to do manually.
This becomes particularly evident when you consider how complex stories are to begin with. There are a diverse set of building blocks that make up a great story, including genre, grammar, tone, dialogue, sentence structure, setting, and characters to name a few.
This is where Wattpad relies on their proprietary “Story DNA” technology and AI. Inspired by Pandora’s music genome project, story DNA technology leverages machine learning to generate insights content from the world’s most diverse stories and their data. Story DNA helps understand the content of the story at a much deeper level.
Wattpad has been collecting stories from the public domain and has more than a billion uploads in 50 different languages. Wattpad is able to use this data to train models on how to sort content, and suggest stylistically similar content.
Before the development of Story DNA, Wattpad relied on conventional “Trend Data” to find successful stories. Story DNA helps identify great stories before the story collects significant readership. Story DNA can evaluate the story-quality, story-sentiment, readership base, and social outcomes such as sharing, adding to the library, and commenting.
Scale and Addressing Bias in Training Data
Wattpad organizes the world’s largest writing competition every year for the writers around the world. The contest, known as the “Wattys” has been a major catalyst for story writing innovation since its inception in 2009. There have been over 1.2 million entries to date which turned into some of the biggest hits such as “Kissing both”. The biggest challenge for Wattpad was to judge the writing contest where there were thousands of entries each year.
Wattpad uses their quality indexer which is an ML algorithm checking for grammar, sentence structure, and similar features of a story. This ML model was trained using 20,000 Gutenberg classics stories available under the public domain and also 9 years of hand-picked Watty’s stories. Each story submitted by the writer goes through this Quality Indexer. Quality indexer scores each story and helps in finding the story with the best grammar and sentence structure.
There is some limitation of this model because the “classical literature” represented by the Gutenberg classics collection tend to come from a limited cultural context. This limited perspective can result in bias. For example, it has been observed over time that high fantasy stories always get a higher score on the Quality Indexer.
To address these issues of bias, humans and machines at Wattpad are working together to curate content that will enable the creation of the next generation of blockbuster stories. Story DNA helps Wattpad to surface the new voices of storytellers in every field and genre.
AI as an Essential Writing Tool
Will machine learning ever be able to write its own blockbuster story? Not yet, and there is no need to do so because millions of people are already sharing their original stories on Wattpad. However, ML algorithms certainly can help unlock the opportunities that exist.
Originally published at https://www.clarifai.com.