Why you shouldn’t say “data scientist” when it’s all about data craftsmanship

    |    Wednesday February 28th, 2018

Trump’s unexpected election on November 8th 2016 highlighted the perverse results of the echo chamber phenomenon: news coming from social media match what you already believe in and flatter your opinion, leading groups of citizens to think the same way and giving them no contradiction. In this way Facebook – among others – is being blamed for its algorithms that do not allow the users to encounter new ideas or topics that they didn’t specifically select. In other words, a personalized recommendation system requiring a lot of data with no room left for serendipity. In this context, the data craftsmen have a role to play. Who are they, and why should we prefer this term instead of the over-used term “data scientist”? How is data craftsmanship the key to escape the hazardous echo chambers?

Let’s zoom out from social media and apply that same issue to cultural content: if the data scientist doesn’t take into consideration the complexity of these content or the complexity of a user’s tastes, the users end up being recommended only content they are already familiar with. The users, henceforth, lose the chance to be unsettled, surprised, or shaken: if you only recommend users something similar to what they already like, you’ll leave out the accident, the inherent risk brought by any cultural activity, and the ability to expand their horizons. Lastly, as the tastes of users are often multifaceted, is it enough to recommend what they like the most, when you could embrace their entire spectrum of interests?
At the age of big data, the ones called “data scientists” have to dive into tremendous amounts of data in order to conceive the best personalized recommendations. But when it’s about cultural content, such as movies for example, the best solution is not to hunt for more and more personal data, most of it being clearly incapable of increasing the value of personalized recommendation. Why should you know about the age, location or relationships of a user fond of zombie movies, when the only question we should be asking him is: what do you like to watch?

My colleagues at Spideo, having spent years crafting innovative personalized recommendation tools, believe that the solution lies in smart data. First step: an upstream metadata design to imagine tailor-made semantically-enhanced data suitable to content’s complexity and users’ tastes. Second step: precise and transparent fishing of the few useful and relevant data you need in order to create good recommendation. This approach helps increase users’ trust, as they are often wary of sharing private information with unscrupulous algorithms and their sketchy privacy policy.
Based on this in-house expert analysis of metadata and content, Spideo crafted tools to provide personalized recommendations that feel human while always offering the option of serendipity. Thanks to semantically-enhanced and weighted keywords divided into several categories (moods, themes, characters, settings, …) we can free ourselves from genres and always suggest multifaceted interests to the users.

For instance our discovery moodboard’s purpose is not to infiltrate undecided users’ heads by requesting their feelings. We don’t ask if they are sad or happy, in short we don’t freeze them under one label. What we do is: catch and convey the movie’s mood thanks to our data craftsmanship, allowing the users to identify to emotional capsules according to their current wishes. By asking “what are you in the mood for?” instead of “how do you feel”, we can offer the users cinematic experiences matching their tastes, even when they don’t have any history yet.

A movie is a complex and multi-faceted object. Guessing which part of it appealed to the users is not possible since each viewing is a unique experience. We can however offer content that are similar to the original ones chosen by the users, and lists of themes related to one or more aspects of each content. In this way we create new associations, free the users from the idea of genres and explain the “similarity”.

And last but not least, by definition a personalized recommendation, as good as it can get, will always confine the users in their habits. Aware of this limitation, my colleagues at Spideo imagined tools to help users who want to get out of their comfort zone.
Spideo’s “I feel lucky” module is called “surprise”. When you increase the serendipity ratio (ex: 50% based on the user’s profile 50% based on serendipity) and provide automatically generated thematic lists, you help the users to explore new categories of movies, but still suggest them content they are familiar with.

Now back to the data craftsmen and their role in keeping serendipity alive. We think calling them “data scientists” is inadequate, and favor the term “data craftsmen”.
It is not about working on tremendously big data, it’s about narrowing it down to smart data. Furthermore, thinking and creativity are the prevailing qualities in this job where you have to create your own tools, handle them, test them and improve them: these are qualities specific to craftmanship. Just like the baker kneads the bread, the data craftsmen handle and transform data in order to make it meaningful and valuable.
Spideo works with this craftsmanship way of thinking, more human, closer to the data and the elements related to it, and leaves the idea of a hollow science behind. Our way of thinking the personalized recommendation brings us at the closest of the content’s very nature.
We are convinced echo chambers are not a fatality every recommendation system has to face. By outlining data, we can craft features allowing us to spot the users’ habits with precision, and offer them the opportunity to extract themselves from it. This way, personalized recommendation is closer to exploration, and inspires users’ thirst for discovery.

Author: Arthur Vauthier, Marketing Manager at Spideo

Spideo is a content recommendation and analytics platform that uses semantic-based discovery to deliver personalized content suggestions based on profile, theme, mood and related content.