Antje Marx | Tuesday September 4th, 2018
The following tech blog article is part of a series of detailed tech explanations of a recommender system built together by realeyz (EYZ Media GmbH) and DAI-Labor (TU Berlin). The tech blog functions as a knowledge base and exchange about technical solutions supporting transnational marketing, branding and distribution of European audiovisual works on VOD services.
Traditional recommendation strategies, like content-based items’ similarity calculation and user/item based collaborative filtering methods, heavily rely on the user behavior in the system. E.g. content-based similar items candidates are always generated based on the user currently watching item or her/his recent watching history, the K-Nearest Neighbors in collaborative filtering produce the neighbors for users only on the basis of their viewed items. However, users’ choices inside the system is always biased towards few popular items, and it governs the corresponding recommendation result, thus results in some niche items always being overlooked. Thence in realeyz recommender, we propose to capture the domain relevant events in social media like Twitter, to give the recommendations independent from the system user behavior and further overcome the biased user choices in the system.
In the proposed approach, we define EVENT as the named entities (e.g. actor names, festival names) in the relevant domain that occupies dominant appearance in the specific period of Twitter Stream. To limit the named entities into the range of domain relevant ones, we pre-fetch the domain entities set from realeyz Meta-data DB. After named entities recognition being applied in the Twitter stream, they are filtered again by the domain-relevant entities set.
Figure 1. Flow Architecture of Event-based Approach
Figure 1. shows the detailed framework of our event-based approach. Elasticsearch provides powerful search capabilities with support for sharding and replication of the data. In order to make use of these capabilities the data found in the Realeyz relational database is indexed into Elasticsearch. Following the flow points in the figure, 1) for each entity type (actors, directors, festivals etc.), fetch all concrete entities from elasticsearch 2) retrieve Tweets from the last n days from all friends and followers of realeyz Twitter account; 3) for each Tweet, do named entity extraction (name entity recognition / NER) & match all relevant entities; 4) after doing NER, calculate scores for each matched named entity based on their appearance in the Tweets stream; 5) for the named entities with highest scores, we see them as the event take place in the Twitter stream in the specific time interval, and generate the boosted multi-phrases query with their scoring as the boosting factor, such multi-phrase query is then executed in the Elastic Search Service and the recommended items are returned back. For the final step, we can either combine events in each entity type as one compound query and have one overall recommendation result list, or do the boosted event query separately regarding each entity type and feedback with recommendation list respectively.
Event-based recommendation approach produces dynamic recommendation candidate list according to the external social media opinion. Specific events like the death of an Actor or the Birthday of a director can be detected and can alter the recommendation results. Such and approach can help the recommender get rid of the internal user choices bias, and provides the dynamically altering recommendation while keeping the recommendation results user specific by carefully picking the data sources (Twitter-Accounts) the recommendation is based on. Niche market can benefit from this approach by boosting the item minority and introduce serendipities.
We´d shortly like to introduce you to the project team:
Andreas Lommatzsch works as a senior researcher at the Distributed Artificial Intelligence Lab (DAI-Labor) at the TU Berlin. His research focuses on distributed knowledge management and machine learning algorithms. His primary interests lie in the areas of recommendations based on data-streams and context-aware meta-recommender algorithms.
Jing Yuan is a Ph.D. student working at Distributed Artificial Intelligence Lab (DAI-Labor) in TU Berlin. Her research interest includes recommender system, information retrieval, and machine learning algorithms.
Phani Saripalli works as a Data Engineer at EYZ Media GmbH (operator of realeyz.de) and is coordinating the project on site. He is specialized in building data pipelines and data wrangling. He works with Redis, AWS, Airflow, Flask, Python and Postgres to ensure data is transformed from its raw form to something that is insightful.
Khalit Hartmann is a Bachelor of Computer Science (Informatik) student working at Distributed Artificial Intelligence Lab (DAI-Labor) in TU Berlin. His current fields of research include recommender systems based on natural language processing and machine learning algorithms.