Elasticsearch: a flexible plugin makes meta-data better used in recommenders

    |    Friday August 24th, 2018

The following tech blog article is part of a series of detailed tech explanations of a recommender system built together by realeyz (EYZ Media GmbH) and DAI-Labor (TU Berlin). The tech blog functions as a knowledge base and exchange about technical solutions supporting transnational marketing, branding and distribution of European audiovisual works on VOD services.

Content-based recommender algorithms are an effective approach for providing good recommendations. In contrast to Collaborative Filtering approaches, content-based strategies do not suffer from the cold-start problem: the method does not require user feedback or explicit ratings. In order to ensure that content-based approaches work well, detailed meta-data describing the items are required.

Carefully curated video platforms provide high quality meta-description of the videos. This is a precious value for computing content-based recommendations on non-mainstream VOD platform like realeyz. In realeyz, the video descriptive information is provided to both developers and users in a standard and structured way, which outweighs UGC (User Generated Content) form used by platforms like Youtube. With this benefit, recommender system can better capture the semantic relations between videos, even between users and video characteristics; thus more reliable recommendation strategies are created.

The meta-data for videos in the realeyz platforms comprise several different aspects. The most important fields are:
Full Synopsis

In the sake of giving users ranked recommended movie items based on these structured meta-data, content-based approaches are indispensable. Given the various types of meta fields in the movie data, it is flexible to define the items’ similarity or user-item relation through Elasticsearch (a Lucene-cored RESTful searching framework) and then further design the recommendation strategies.

The DAI-Labor adopted Elastic Search as similarity calculator and relevance ranker in the content-based recommender, thus avoided the shortages of traditional relational Database. This means that the developed solution 1) can efficiently sort the results while doing full-text search instead of solely filtering, 2) can define searching and scoring strategies more flexibly when taking multi-field data into account. Figure 1 shows the detailed architecture of the content-based recommender design, which needs to interact with realeyz App Server, realeyz Movie Metadata Database and AWS Elasticsearch Service.


Thanks to the flexibility of query definition in elastic search, all full-text-based/item-based /user-based recommendation request can be solved by corresponding searching strategy. Any query text can be the input for the full-text-based recommendation request, either user-typed keywords or machine generated text are valid, and the feedback result would be the videos similar to the query text description. As to item-based recommendation, pre-input item Ids seek for the similarly described other movies by being thrown into the “More Like This” searching function. For user-based recommendation request, a user’s previously viewed videos contribute to the source of “More Like This” searching function, which aggregate texts from multiple documentations to form the query and proceed the searching, and all videos with the relevant meta descriptions to those viewed videos will be the recommendation candidates.

Other than this, the bonus like multi-field search in Elastic Search gives us the opportunity to assign self-defined importance weight to each fields, thus the recommendation ranking can be distinguished when facing different fields weighting configuration under multi-field search solution. When thinking about responding efficiency, since Elasticsearch caches the inverted index of all the meta descriptions in the memory, it can always respond to the recommendation request instantly.

We´d shortly like to introduce you to the project team:

Andreas Lommatzsch works as a senior researcher at the Distributed Artificial Intelligence Lab (DAI-Labor) at the TU Berlin. His research focuses on distributed knowledge management and machine learning algorithms. His primary interests lie in the areas of recommendations based on data-streams and context-aware meta-recommender algorithms.

Jing Yuan is a Ph.D. student working at Distributed Artificial Intelligence Lab (DAI-Labor) in TU Berlin. Her research interest includes recommender system, information retrieval, and machine learning algorithms.

Phani Saripalli works as a Data Engineer at EYZ Media GmbH (operator of realeyz.de) and is coordinating the project on site. He is specialized in building data pipelines and data wrangling. He works with Redis, AWS, Airflow, Flask, Python and Postgres to ensure data is transformed from its raw form to something that is insightful.

Khalit Hartmann is a Bachelor of Computer Science (Informatik) student working at Distributed Artificial Intelligence Lab (DAI-Labor) in TU Berlin. His current fields of research include recommender systems based on natural language processing and machine learning algorithms.