Introducing Our New AI Rating Tool for Published Content

Date

February 28, 2025

We're introducing the first version of a new tool to beta that helps creators rate their published content more accurately on AI Dungeon. There's a lot to say on the subject, so we will put the most critical information first and then provide more detailed explanation for those who are interested. We expect there will be lots of questions and feedback as we roll out this new feature. We're excited to hear from you on how we can make this system better.

About the New AI-Assisted Rating Check

Starting today, when you are publishing a scenario (in Beta), you will see a button prompting you to run an AI rating check. When that happens, all of the content in your scenario, including the title, description, image, prompts, story cards, multiple-choice scenarios, and scripting, will be sent to an AI (Anthropic’s Claude Sonnet 3.5) for processing. We have been tirelessly testing, iterating, and getting feedback from creators on a set of instructions that Claude follows to determine a rating for your content.

You can then decide whether to make adjustments to your content and recheck the rating, or set your rating and proceed to publish. One of the benefits of this approach is that the feedback you get is immediate. You don’t need to worry about your content being moderated later by our staff.

Additionally, we've tailored our instructions so that the AI will provide you with a helpful analysis of your scenario, which could help inform changes or edits that you can make to target a desired rating.

These AI calls will be the most expensive AI calls players can make on AI Dungeon, so to start, you will be limited to 5 rating checks per day.

We’ve been working closely with our content creators as we have built this tool. This has been a truly collaborative effort, and our team is actively discussing this with some of our platforms most recognizable creators. Early skepticism has generally shifted to optimism, and creators are generally on board with the direction.

Why are we building this tool?

Moderation is an incredibly difficult and important task. We've consulted with executives from other well-known platforms who have struggled with content moderation as well—it’s not a challenge unique to AI Dungeon. In many cases, these teams have spent huge amounts of money building teams or outsourcing this problem to specialized companies. These moderation teams can be huge, often requiring thousands of people. As a small independent development team, we really don't have the resources to take that approach. And, frankly, we don’t even think that would significantly improve your experience on AI Dungeon. It might even make it worse!

Instead, we are leveraging our deep expertise working with AI to help us solve this important challenge. Although we understand some of you may be skeptical, as we've battle-tested this system, our confidence is growing. We think this will be a win-win situation for everyone: Creators, Players, and even our moderation team.

Here’s why:

Reason 1: More accurate ratings

We are discovering we can provide more accurate ratings by having AI be part of the process than through human review alone.

Our moderation team doesn't catch everything. We may miss a detail in a story card, and we can’t manually review all published content. This leads to some creators feeling like only their content is being checked, while other similar content goes unchecked on the Discover page. With AI, all content will be checked equally so that everyone has the same experience when publishing content on AI Dungeon.

Humans are also prone to bias and may have different understanding or interpretations of our guidelines. We have great staff moderators, but there have been cases where our team disagrees on how content should be rated. Or, we may not understand a reference and rate incorrectly. AI solves this because it’s trained on large data sets, so it has full awareness of references, language, existing IP, etc. Then, with well crafted instructions, it can rate consistently each time.

Although we expect the AI to make some mistakes, our early testing has proven that this will lead to more consistent moderation results overall. We will still have a human in the loop to handle escalated cases and do manual reviews. Each time we review content, we will update our instructions, and the examples we provide the AI, to improve the system and make it more accurate.

Reason 2: Better player experience

One of the best parts of the AI Dungeon platform is our user-generated content. Creators are able to publish and share their content for you to discover and play, and this is great for everyone. Not all of our players are interested in meticulously crafting worlds and scenarios for their adventures. Some are much more interested in playing the creations of others.

Like every other platform that supports user-generated content, we need to moderate content to make sure that when users come to our site and browse for content, they are finding the content they expect. When scenarios are rated incorrectly, it can cause players to be uncomfortable, and even quit playing. We have seen convincing evidence that this is true for AI Dungeon. In user tests, players have expressed that they don't feel like the platform is for them. We have seen social posts on Reddit and Discord lamenting the state of our discover page and content that makes them uncomfortable.

We have also seen in player surveys and feedback that some of you are uncomfortable recommending or sharing AI Dungeon with family and friends because of the content that can be found on our platform. This isn't terribly surprising. During a recent audit, we found that nearly 50% of all content on the platform was being rated incorrectly. The result is that nearly all of you are being exposed to content that does not match your preferences of content material.

We believe that with the AI, we can rate content with a 98% accuracy. This will be a dramatically better experience for all of you and make it easier for you to find content you are interested in without encountering content that makes you uncomfortable.

Reason 3: Better Creator Experience

We love our creators, and it pains us to see how much frustration the moderation process can cause. For instance, our team tries to have a turnaround time of less than 24 hours on work days. As hard as we try to be responsive, that’s still a pretty drawn out process that feels cumbersome and frustrating.

With this new AI-assisted rating tool, creators can get near-instant feedback about their content and recommendations on how to edit their content to fit their desired rating. After making changes, they can check again and once again get immediate feedback.

Creators have already shared positive feedback about this experience. Some have pointed out the convenience of being able to do a self-service rating check and get immediate feedback. Others have appreciated the detailed evidence, examples and reasoning that the AI provides when delivering it’s rating, helping them better understand how their content is perceived by players.

Reason 4: More Transparent Process

We’re also optimistic this new AI tool will address feedback that our moderation process isn't as transparent or understandable as it could be. We have always had to rely, to some degree, on moderator discretion. This has led creators to wonder whether our team have hidden rules about content readings that we haven't published. That all changes today.

We’ve published the instructions that we are using for moderation so that as creators have concerns about how content is moderated, they can see exactly which examples, guidelines, and instructions we are sending to the AI. As we've been developing this tool, we have been publicly sharing our instructions. Our creator community has provided feedback and recommendations, which has helped us to improve the system.

We love this! Our team has been able to have open dialogue with our creator community and work together with them to improve the the instruction set used by the AI.

Why does this matter to you?

Some of you may be thinking, "So what? No big deal. I am pretty happy with the content I find. What's the point of changing it?”

Even if misrated content on AI Dungeon doesn't bother you, your experience is still being negatively impacted by misrated content.

Because people are turned away or frustrated by our content ratings, it means there are fewer players on the platform. Many of these players would've created content that you might be interested in playing. As more players come onto the platform, it also increases the audience size for our creators, which gives them more motivation and incentive to publish more frequently and to increase the quality of the content that they create.

The growth also results in revenue for Latitude. As revenue increases, we are reinvesting that revenue into new development for AI Dungeon and Heroes. For instance, it lets us pursue hiring new talent so we can move faster. Or, at times, it lets us do things like double AI context for all tiers. Since our goal and focus is to give you as much value as possible, additional resources just increases the ways we can make AI Dungeon a better experience for you.

Is rating accuracy really the issue?

As we’ve been working on this project, some of you have correctly pointed out that improving our content ratings alone is not going to completely solve the issue of delivering content to players that they enjoy. This is true. Improving our rating accuracy is really just the first of several important projects that we plan to work on this year.

In addition to content ratings, we are already exploring more sophisticated recommendation algorithms to better personalize the content we show you. We also plan to improve our search using these same technologies. This will allow you to find content more easily that fits the genre, topic, and play style that you enjoy most on AI Dungeon.

Increasing the quality of our content delivery is one of the key objectives for our platform team.

Other platforms, including social networks like Facebook, YouTube, TikTok, and X, invest heavily in recommendation algorithms so that players or users can find content they are interested in. We similarly expect to make ongoing and consistent investments in this important function of our platform.

What’s next for AI Content Ratings?

The version of the AI ratings check that we're launching today is an informational tool to help creators increase the accuracy of their ratings. We will gather feedback as we launch this tool and monitor the accuracy of the ratings.

We are aiming to achieve at least a 98% accuracy for ratings using the AI. If we are able to do this, the plan is to have the AI set rating for all published content. Our expectation is that this would have a dramatic impact on the accuracy of ratings across the platform.

In cases where the creator does not agree with the rating provided, they will be able to contact our team for a manual review. Our team will review the content and adjust the rating if needed. We’ll also determine whether an update is needed for our AI instructions to better handle similar content in the future.

Our expectation is that over time our tool will become more and more accurate as we iterate on the instructions. We also expect that accuracy will improve over time as new models are introduced. We are currently using Claude Sonnet 3.5 for this task and Anthropic has recently released Sonnet 3.7, which we expect to begin testing broadly soon. Our early tests indicate that it will only improve the performance of our rating tool.

Wait! AI Moderation is a terrible idea! What are you doing?

Each of you has probably had a negative experience with AI moderation in the past, whether on our platform or elsewhere. We’ve used (and currently use) different AI models to perform safety checks, and we realize that they have flaws and limitations. The filter we use for uploaded images is inconsistent and can just be wildly wrong (in fact, that’s what gave birth to the AI Dungeon hamster meme—our image filter labeled a cute hamster as NSFW). We also have some safety checks in game to prevent the AI from generating content that glorifies the sexual exploitation of children. We even had AI content moderation a few years ago, and it didn’t work well. Some of our favorite creators left in protest over that process.

We are NOT interested in implementing a system that will let you down or create a worse experience.

One major difference with this system is the model, Claude Sonnet 3.5 (and likely 3.7 soon). Other safety systems we use are based on extremely small and under powered AI models. Sonnet is extremely powerful, so the results are profoundly different.

We’ve shied away from using large models in the past due to the costs. But, as we looked at the numbers, we realized if these large models work, they will actually save us money AND provide you a better experience. We’d be silly not to!

As we’ve been developing this tool, we’ve gone to great lengths to get feedback from our creators. They’ve reviewed the instructions. They’ve given us feedback on the rating accuracy. They’ve pointed out where the rating explanations are helpful, or fall short. They’ve been rating their own content and making adjustments based on what they find. We’ve also been manually using this tool and rating content, and we’re seeing improvements in the rating accuracy and the state of our discover page.

We’re excited to get your feedback as we launch this new rating tool. We hope you’ll find it helpful as you publish content and set ratings that will help players find the content they are looking for.

If you have any feedback about the tool, the instructions, or anything else about this process, please contact us at [email protected]. You’re also free to DM or ping @matu on Discord, or seaside-rancher on Reddit.