Live Coding Spring, Kafka, & Elasticsearch: Personalized Search Results on Ranking and User Profile
Link |
https://springone.io/2021/sessions/spring-kafka-elasticsearch |
Author(s) |
Erdem Günay as CTO, Layermark |
Length |
26:42 |
Date |
13-09-2021 |
Language |
English 🇺🇸 |
Track |
Architecture |
Rating |
⭐⭐⭐⭐☆ |
-
✅ Informative overview of what Elasticsearch is capable of, although the live demo is impossible to follow but still impressive
-
⛔ The way result popularity in real-time was updated from Kafka was not clearly explained.
-
⛔ It’s not clear why Kafka figures in the demo if an easier approach could be used
-
⛔ Data structure should be shown as not everybody has experience with Elasticsearch (all because I guess mostly caused by lack of time)
ElasticSearch Analyzers
By default, ElasticSearch finds by an exact match, to enable easy search using a single letter ignoring accented characters (é, í, etc…) it is needed to use ElasticSearch Analyzers utilizing POST _analyze with "tokenizer": "standard" and "char_filter" with pattern_replace "type` to replace anything that is not an alphanumeric character.
To get rid of capital letters and non-ASCII characters, it is needed to add token:
"filter" : ["asciifolding", "lowercase"]
…and a specific edge-ngram among them.
It is needed to delete the former index and recreate the index.
Each hit has a _score that sets the element order in the returned structure.
For search by fields, it is possible to add boosting, ex. boost the search of artist name artist_name by a factor of 5 (exact match should have a bigger score) as artist_name^5 or artist_name.prefix^1 where are the generated tokens stored.
"fuzziness" : 1 enables to match other elements (ex. "query": "sezan" would match Selena Gomez with a low _score since there is a partial match in individual letters from search (basically allows typos).
Boosting results by popularity
If the search is based on a single letter (s), Shakira might be placed below Selena Goméz although the popularity says otherwise (I like Shakira more, though). It is needed to enable scoring on a search through POST /content/_search and provide "script_score" in "functions" in "function_score" in "query": `
"script" : {
"source" : "Math.max(((!doc['ranking'].empty )
? Math.log10(doc.['ranking'].value)
: 1), 1)",
"lang" : "painless"
}`
Assuming the popularity is updated programmatically (200 asynchronous hits by Kafka) it is needed to process the listen-event messages and place them in listen-event indices using: POST /listen-event-*/_search/. User profiles are also getting generated (POST /user-profile/_search).
Boosting by user behavior
If a particular user searches for a certain element, that element should be boosted in the search for that particular user only. Another function must be taken into account similarly as previous boosting:
"script" : {
"source" : "params.boosts.get(doc[params.artistIdFieldName].value)",
"lang" : "painless",
"params" : { ..
}
}