Moderator – Natural Language Processing in the news

Moderator is a web-service that was born out of the need to handle the rising amount of user-generated content on the site of «20 Minuten Online», the biggest online newspaper in Switzerland. Up to this date Moderator has processed over 2 million comments.

The goal is to increase efficiency and leverage user-generated content.

Hansi Voigt, 20 Minuten

Newspapers are by Swiss law responsible for all content published on their webpage. Thus small armies of part-time employees, mostly students, comb through the users comments and make sure that no inadequate content gets published. But as the amount of user-generated content scales up exponentially, this solution hits the wall. In the light of raising moderation efforts, newspapers are faced with two challenges:

Increase the efficiency of the moderation process, Leverage the user-generated content for the editorial content to justify the moderation costs.

In its initial form, Moderator was designed as a simple API that calculates an indicative value between 0% and 100% for each comment. This value represents the likelihood that a comment contains offensive or inadequate content and uses natural language processing algorithms for its decisions. The values are used to increase the efficiency of the moderation process by better framing and grouping possibilities, and where possible the process is automated.

After its initial setup, the service was extended to recognize comments written in swiss-german dialect, which is generally not a proper form of writing in a newspaper. Editorial rules complemented the algorithm in order to recognize patterns like one-word comments or excessive use of emoticons and punctuation. The use shifted from a simple yes-no machine towards a tool that could help editors and moderating staff to gain more fine-grained control over the user-generated content.

The work of the moderation team was improved by results from the algorithm.

Gabriel Hase

A statistics view was developed to visualize, among other things, the differences between the algorithm’s predictions and the decisions of the moderation staff. While Moderators natural language processing approach didn’t do a perfect job on all occasions, the moderation staff didn’t either. Thus Moderator could also be used to improve the work of the moderation team in an important way: Helping to find editorial rules and standards based on real content and analysis of patterns, not only in user-generated content, but also in the moderation process itself.

We believe that a tool for controlling user-generated content is a match-maker for a modern newspaper. As user-generated content on the web scales up, newspapers need to tackle the initially mentioned challenges of moderation efficiency and leveraging their user-generated content. For this reason Moderator is not aimed at patching antiquated processes that ultimately will hit a wall, but to help modern newsrooms to treat user-generated content as part of their editorial work.

The next big development for Moderator will be an extension that allows not only to spot offensive and inadequate content, but also its counterpart: expert information and opinion leaders from the community. We believe that user-generated content is a powerful source and that newspapers need to take full advantage of this source.