Vespa is Better than Elasticsearch for Matching Millions of Men and Women





An integral part of the OkCupid dating site is the recommendation of potential partners. They are based on the overlap of many preferences that you and your potential partners have indicated. As you can imagine, there are many ways to optimize this task.



However, your preferences are not the only factor influencing who we recommend you as a potential partner (or recommend you yourself as a potential partner for others). If we were to simply show all users that match your criteria, without any ranking, then the list would not be optimal at all. For example, if you ignore recent user activity, you can spend a lot more time talking to a person who does not visit the site. In addition to the preferences you specify, we use numerous algorithms and factors to recommend you the people we think you should see.



We must deliver the best possible results and an almost endless list of recommendations. In other applications, where content changes less frequently, you can do this by periodically updating the recommendations. For example, when using Spotify's “Discover Weekly” feature, you enjoy a set of recommended tracks, this set does not change until next week. On OkCupid, users endlessly view their recommendations in real time. Recommended “content” is very dynamic in nature (for example, a user can change their preferences, profile data, location, deactivate at any time, etc.). The user can change who and how he can recommend him, so we want to make sure that potential matches are the best at a given time.



In order to take advantage of the various ranking algorithms and make real-time recommendations, you need to use a search engine that is constantly updated with user data and provides the ability to filter and rank potential candidates.



What are the problems with the existing match search system



OkCupid has been using its own internal search engine for years. We won't go into details, but at a high level of abstraction, it is a map-reduce framework over user-space shards, where each shard contains some of the relevant user data in memory, which is used when enabling various filters and sorts on the fly. The search terms diverge across all shards, and ultimately the results are combined to return the top k candidates. This pairing system we wrote worked well, so why did we decide to change it now?



We knew we needed to update the system to support various recommendation-based projects in the coming years. We knew that our team would grow, and so did the number of projects. One of the biggest challenges was updating the schema. For example, adding a new piece of user data (say, gender tags in preferences) required hundreds or thousands of lines of code in templates, and deployment required careful coordination to ensure that all parts of the system were deployed in the correct order. Simply trying to add a new way to filter a custom dataset or rank results took half a day of engineer time. He had to manually deploy each segment in production and monitor for potential issues. More importantly, it has become difficult to manage and scale the system,because shards and replicas were manually distributed across a fleet of machines that did not have any software installed.



At the beginning of 2019, the load on the pairing system increased, so we added another set of replicas by manually placing service instances on multiple machines. The work took many weeks on the backend and for the devops. During this time, we also began to notice performance bottlenecks in embedded service discovery, message queuing, and so on. While these components previously performed well, we had reached a point where we began to question the scalability of these systems. Our task was to move most of our workload to the cloud. Porting this pairing system is a tedious task in itself, but it also involves other subsystems.



Today at OkCupid, many of these subsystems are served by more robust and cloud-friendly OSS options, and the team has adopted various technologies with great success over the past two years. We won't go into these projects here, but instead focus on the actions we took to address the above issues, moving on to a more developer-friendly and scalable search engine for our recommendations: Vespa .



That's a coincidence! Why OkCupid became friends with Vespa



Our team has historically been small. We knew from the start that choosing a search engine would be extremely difficult, so we looked at the open source options that worked for us. The two main contenders were Elasticsearch and Vespa.



Elasticsearch



It is a popular technology with a large community, good documentation, and support. There are tons of features and it is even used by Tinder . New schema fields can be added using PUT mapping, queries can be made using structured REST calls, there is some support for ranking by query time, the ability to write custom plugins, etc. When it comes to scaling and maintenance, you only need to define the number of shards , and the system itself handles replica distribution. Scaling requires rebuilding another index with more shards.



One of the main reasons we ditched Elasticsearch was the lack of true partial updates in memory. This is very important for our use case, because the documents we are going to index must be updated very often due to likes, messaging, etc. These documents are very dynamic in nature, compared to content like ads or pictures, which are mostly static objects with constant attributes. Therefore, inefficient read-write cycles on updates were a major performance issue for us.



Vespa



The source code was opened only a few years ago. The developers announced support for storing, searching, ranking and organizing Big Data in real time. Features that Vespa supports:



  • ( , 40-50 . )

  • ,

  • (, TensorFlow)

  • YQL (Yahoo Query Language) REST

  • Java-


When it comes to scaling and maintenance, you don't think about shards anymore  - you set up the layout for your content nodes and Vespa automatically handles how to shard documents, replicate and distribute data. In addition, data is automatically restored and redistributed from replicas whenever you add or remove nodes. Scaling simply means updating the configuration to add nodes and allows Vespa to automatically redistribute this data in real time.



Overall Vespa seemed to fit best for our use cases. OkCupid includes a lot of different information about users to help them find the best match - in terms of just filters and sorts, there are over a hundred parameters! We will always be adding filters and sorts, so it is very important to maintain this workflow. In terms of entries and queries, Vespa is most similar to our existing system; that is, our system also required processing fast partial updates in memory and real-time processing during a match request. Vespa also has a much more flexible and simpler ranking structure. Another nice bonus is the ability to express queries in YQL, in contrast to the inconvenient structure for queries in Elasticsearch. In terms of scaling and maintenance,then the automatic data distribution capabilities of Vespa proved to be very attractive to our relatively small team. Overall, Vespa was found to better support our use cases and performance requirements, while being easier to maintain than Elasticsearch.



Elasticsearch is a better-known engine and we could benefit from Tinder's experience with it, but any option would require a ton of preliminary research. At the same time, Vespa serves many systems in production such as Zedge , Flickr with billions of images, Yahoo Gemini Ads advertising platform with over one hundred thousand requests per second to serve ads to 1 billion monthly active users. This gave us the confidence that it was a battle-tested, efficient and reliable option - in fact, Vespa was around even before Elasticsearch.



Also, the Vespa developers have proven to be very sociable and helpful. Vespa was originally built for advertising and content. As far as we know, it hasn't been used on dating sites yet. It was difficult to integrate the engine at first because we had a unique use case, but the Vespa team proved to be very responsive and quickly optimized the system to help us deal with several issues that arose.



How Vespa works and what search looks like in OkCupid







Before diving into our Vespa example, here's a quick overview of how it works. Vespa is a collection of numerous services, but each Docker container can be configured to be an admin / config host, a stateless Java container host, and / or a stateful C ++ content host. Application package with configuration, components, ML model, etc. can be deployed via the State APIin a configuration cluster that handles applying changes to the container and content cluster. Feed requests and other requests go through a stateless Java container (which allows customization of processing) over HTTP before feed updates arrive in the content cluster or requests are forked to the content layer, where distributed request execution occurs. For the most part, deploying a new application package only takes a few seconds, and Vespa processes these changes in real time in the container and content cluster, so you rarely have to restart anything.



What does search look like?



Vespa cluster documents contain a variety of user-specific attributes. The schema definition defines the document type fields as well as the ranking profiles that contain the set of applicable ranking expressions. Suppose we have a schema definition that represents a user like this:



search user {

    document user {

        field userId type long {
            indexing: summary | attribute
            attribute: fast-search
            rank: filter
        }

        field latLong type position {
            indexing: attribute
        }

        # UNIX timestamp
        field lastOnline type long {
            indexing: attribute
            attribute: fast-search
        }

        # Contains the users that this user document has liked
        # and the corresponding weights are UNIX timestamps when that like happened 
        field likedUserSet type weightedset<long> {
            indexing: attribute
            attribute: fast-search
        }
        
   }

    rank-profile myRankProfile inherits default {
        rank-properties {
            query(lastOnlineWeight): 0
            query(incomingLikeWeight): 0
        }

        function lastOnlineScore() {
            expression: query(lastOnlineWeight) * freshness(lastOnline)
        }

        function incomingLikeTimestamp() {
            expression: rawScore(likedUserSet)
        }

        function hasLikedMe() {
            expression:  if (incomingLikeTimestamp > 0, 1, 0)
        } 

        function incomingLikeScore() {
            expression: query(incomingLikeWeight) * hasLikedMe
        }

        first-phase {
            expression {
                lastOnlineScore + incomingLikeScore
            }
        }

        summary-features {
            lastOnlineScore incomingLikeScore
        }
    }
    
}


The notation indexing: attributeindicates that these fields should be stored in memory for the best read and write performance of these fields.



Let's say we populated the cluster with these custom documents. We could then filter and rank on any of the above fields. For example, making a POST request to the default search engine http://localhost:8080/search/to find users other than our own user 777, within 50 miles of our location, who have been online since the timestamp 1592486978, ranked by last activity and keeping the top two candidates. Let's also select the summaryfeatures to see the contribution of each ranking expression in our ranking profile:



{
    "yql": "select userId, summaryfeatures from user where lastOnline > 1592486978 and !(userId contains \"777\") limit 2;",
    "ranking": {
        "profile": "myRankProfile",
        "features": {
            "query(lastOnlineWeight)": "50"
        }
    },
    "pos": {
        "radius": "50mi",
        "ll": "N40o44'22;W74o0'2",
        "attribute": "latLong"
    },
    "presentation": {
        "summary": "default"
    }
}


We could get a result like this:



{
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 317
        },
        "coverage": {
            "coverage": 100,
            "documents": 958,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:user/0/bde9bd654f1d5ae17fd9abc3",
                "relevance": 48.99315843621399,
                "source": "user",
                "fields": {
                    "userId": -5800469520557156329,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 0.0,
                        "rankingExpression(lastOnlineScore)": 48.99315843621399,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            },
            {
                "id": "index:user/0/e8aa37df0832905c3fa1dbbd",
                "relevance": 48.99041280864198,
                "source": "user",
                "fields": {
                    "userId": 6888497210242094612,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 0.0,
                        "rankingExpression(lastOnlineScore)": 48.99041280864198,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            }
        ]
    }
}


After filtering by matching result ranking calculated expression of the first phase (first-phase) for ranking the results. The returned relevance (relevance) is the overall score as a result of performing all the ranking functions of the first phase in the ranking profile (rank-profile) that we specified in our query, that is ranking.profile myRankProfile. ranking.featuresWe define query(lastOnlineWeight)50 in the list , which is then referenced by the only ranking expression we use lastOnlineScore. It uses a built -in ranking function freshness , which is a number close to 1 if the timestamp in the attribute is more recent than the current timestamp. As long as everything is going well, there is nothing complicated here.



Unlike static content, this content can influence whether it is shown to the user or not. For example, they might like you! We could index a weighted field likedUserSet for each user document that contains as keys the IDs of the users they liked and as values ​​the timestamp of when that happened. Then it would be easy to filter out those who liked you (for example, adding an expression likedUserSet contains \”777\”in YQL), but how to include this information during ranking? How to increase the togr of the user who liked our person in the results?



In previous results, the ranking expression incomingLikeScorewas 0 for both of these hits. The user 6888497210242094612actually liked the user777but it is currently unavailable in the rankings even if we had put "query(incomingLikeWeight)": 50. We can use the rank function in YQL (the first and only the first argument to the function rank()determines if the document is a match, but all arguments are used to calculate the ranking score) and then use dotProduct in our YQL ranking expression to store and retrieve the raw scores (in this case timestamps when the user liked us), for example, in this way:



{
    "yql": "select userId,summaryfeatures from user where !(userId contains \"777\") and rank(lastOnline > 1592486978, dotProduct(likedUserSet, {\"777\":1})) limit 2;",
    "ranking": {
        "profile": "myRankProfile",
        "features": {
            "query(lastOnlineWeight)": "50",
            "query(incomingLikeWeight)": "50"
        }
    },
    "pos": {
        "radius": "50mi",
        "ll": "N40o44'22;W74o0'2",
        "attribute": "latLong"
    },
    "presentation": {
        "summary": "default"
    }
}


{
    "root": {
        "id": "toplevel",
        "relevance": 1.0,
        "fields": {
            "totalCount": 317
        },
        "coverage": {
            "coverage": 100,
            "documents": 958,
            "full": true,
            "nodes": 1,
            "results": 1,
            "resultsFull": 1
        },
        "children": [
            {
                "id": "index:user/0/e8aa37df0832905c3fa1dbbd",
                "relevance": 98.97595807613169,
                "source": "user",
                "fields": {
                    "userId": 6888497210242094612,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 50.0,
                        "rankingExpression(lastOnlineScore)": 48.97595807613169,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            },
            {
                "id": "index:user/0/bde9bd654f1d5ae17fd9abc3",
                "relevance": 48.9787037037037,
                "source": "user",
                "fields": {
                    "userId": -5800469520557156329,
                    "summaryfeatures": {
                        "rankingExpression(incomingLikeScore)": 0.0,
                        "rankingExpression(lastOnlineScore)": 48.9787037037037,
                        "vespa.summaryFeatures.cached": 0.0
                    }
                }
            }
        ]
    }
}


Now the user is 68888497210242094612raised to the top, because he liked our user and it incomingLikeScorehas full meaning. Of course, we actually have a timestamp of when he liked us so that we can use it in more complex expressions, but for now we will leave it simple.



This demonstrates the mechanics of filtering and ranking results using a ranking system. The ranking framework provides a flexible way to apply expressions (which are mostly just mathematical) to matches during a query.



Setting up middleware in Java



What if we wanted to take a different route and make this dotProduct expression implicitly part of every request? This is where the custom Java container layer comes in - we can write a custom Searcher component . This allows you to process arbitrary parameters, rewrite the query, and process the results in a specific way. Here's an example in Kotlin:



@After(PhaseNames.TRANSFORMED_QUERY)
class MatchSearcher : Searcher() {

    companion object {
        // HTTP query parameter
        val USERID_QUERY_PARAM = "userid"

        val ATTRIBUTE_FIELD_LIKED_USER_SET = “likedUserSet”
    }

    override fun search(query: Query, execution: Execution): Result {
        val userId = query.properties().getString(USERID_QUERY_PARAM)?.toLong()

        // Add the dotProduct clause
        If (userId != null) {
            val rankItem = query.model.queryTree.getRankItem()
            val likedUserSetClause = DotProductItem(ATTRIBUTE_FIELD_LIKED_USER_SET)
            likedUserSetClause.addToken(userId, 1)
            rankItem.addItem(likedUserSetClause)        
       }

        // Execute the query
        query.trace("YQL after is: ${query.yqlRepresentation()}", 2)
        return  execution.search(query)
    }
}


Then, in our services.xml file, we can configure this component as follows:



...       
         <search>
            <chain id="default" inherits="vespa">
                <searcher id="com.okcupid.match.MatchSearcher" bundle="match-searcher"/>
            </chain>
        </search>
        <handler id="default" bundle="match-searcher">
            <binding>http://*:8080/match</binding>
        </handler>
...


Then we just create and deploy the application package and make a request to the custom handler http://localhost:8080/match-?userid=777:



{
    "yql": "select userId,summaryfeatures from user where !(userId contains \"777\") and rank(lastOnline > 1592486978) limit 2;",
    "ranking": {
        "profile": "myRankProfile",
        "features": {
            "query(lastOnlineWeight)": "50",
            "query(incomingLikeWeight)": "50"
        }
    },
    "pos": {
        "radius": "50mi",
        "ll": "N40o44'22;W74o0'2",
        "attribute": "latLong"
    },
    "presentation": {
        "summary": "default"
    }
}


We get the same results as before! Note that in the Kotlin code, we added a traceback to output the YQL view after the change, so if set tracelevel=2in the URL parameters, the response will be shown as well:



...
                    {
                        "message": "YQL after is: select userId, summaryfeatures from user where ((rank(lastOnline > 1592486978, dotProduct(likedUserSet, {\"777\": 1})) AND !(userId contains \"777\") limit 2;"
                    },
...


The Java middleware container is a powerful tool to add custom processing logic through Searcher or native generation of results using Renderer . We customize our Searcher componentsto handle cases like the ones above and other aspects that we want to make implicit in our searches. For example, one of the product concepts that we support is the idea of ​​"reciprocity" - you can search for users with specific criteria (such as age range and distance), but you must also meet the search criteria for candidates. To support this in our Searcher component, we could fetch the document of the user who is searching to provide some of its attributes in a subsequent forked query for filtering and ranking. The ranking framework and custom middleware together provide a flexible way to support multiple use cases. In these examples, we only covered a few aspects, but here you can find detailed documentation.



How we built a Vespa cluster and put it into production



In the spring of 2019, we started planning a new system. During this time, we also contacted the Vespa team and consulted regularly about our use cases. Our operational team evaluated and built the initial cluster setup, and the backend team began documenting, designing and prototyping various Vespa use cases.



The first stages of prototyping



OkCupid backend systems are written in Golang and C ++. To write custom Vespa logic components, as well as provide high feed rates using the Java Vespa HTTP feed client API , we had to get a little familiar with the JVM environment - we ended up using Kotlin when configuring Vespa components and in our feed pipelines.



It took several years to port the application logic and unveil Vespa functions, consulting with the Vespa team as needed. Most of the system logic of the match engine is written in C ++, so we also added logic to translate our current filter and sort data model into equivalent YQL queries that we issue to the Vespa cluster via REST. Early on, we also took care of creating a good pipeline to repopulate the cluster with a full user base of documents; Prototyping must involve many changes to determine the correct field types to use, and inadvertently requires re-submitting the document feed.



Monitoring and stress testing



When we created the Vespa search cluster, we had to make sure of two things: that it can handle the expected volume of search queries and records, and that the recommendations that the system gives are comparable in quality to the existing pairing system.



Before load tests, we added Prometheus metrics everywhere. Vespa-exporter provides tons of statistics, and Vespa itself also provides a small set of additional metrics . Based on this, we created various Grafana dashboards for requests per second, latency, resource utilization by Vespa processes, etc. We also ran vespa-fbench to test query performance. With the help of Vespa developers, we have determined that due to the relatively highthe cost of static requests, our grouped ready-made layout will provide faster results. In a flat layout, adding more nodes basically only reduces the cost of a dynamic query (that is, the portion of the query that depends on the number of documents indexed). A grouped layout means that each configured site group will contain a complete set of documents, and therefore one group can serve the request. Due to the high cost of static requests, while keeping the number of nodes the same, we significantly increased the throughput, increasing the number from one flat group to three. Finally, we also tested unreported "shadow traffic" in real time, when we became confident in the reliability of static benchmarks.



Optimizing performance



Checkout performance was one of the biggest hurdles we faced early on. At the very beginning, we had problems processing updates even at 1000 QPS (requests per second). We used weighted set fields extensively, but they weren't effective at first. Fortunately, Vespa's developers were quick to help solve these problems, as well as others related to data dissemination. They later also added extensive documentation on feed sizing , which we use to some extent: integer fields in large weighted sets, when possible, allow batching by settingvisibility-delayby using multiple conditional updates and relying on attribute fields (that is, in memory), as well as reducing the number of round-trip packets from clients by compacting and merging operations in our fmdov pipelines. Now pipelines are quietly handling 3000 QPS at steady state, and our humble cluster is processing 11K QPS updates when such a spike occurs for some reason.



Quality of recommendations



After we were convinced that the cluster can handle the load, it was necessary to verify that the quality of the recommendations is not worse than in the existing system. Any minor deviation in the implementation of the rating has a huge impact on the overall quality of the recommendations and the overall ecosystem as a whole. We applied an experimental systemVespa in some test groups, while the control group continued to use the existing system. Several business metrics were then analyzed, repeating and documenting the problems until the Vespa group performed as good, if not better, than the control group. Once we were confident in the Vespa results, it was easy to forward match requests to the Vespa cluster. We were able to launch all search traffic into the Vespa cluster without a hitch!



System diagram



In a simplified form, the final architecture diagram of the new system looks like this:







How Vespa works now and what's next



Let's compare the state of the Vespa pair finder with the previous system:



  • Schema updates

    • Before: a week with hundreds of new lines of code, carefully coordinated deployment with multiple subsystems

    • :
  • /

    • :

    • : . , !


    • : ,

    • : , Vespa . -


Overall, the design and maintenance aspect of the Vespa cluster has helped the development of all OkCupid products. At the end of January 2020, we launched our Vespa cluster into production and it serves all recommendations in the search for pairs. We've also added dozens of new fields, ranking expressions, and use cases with support for all the new features this year, such as Stacks . And unlike our previous matchmaking system, we now use real-time machine learning models at query time.



What's next?



For us, one of the main advantages of Vespa is direct support for ranking using tensors and integration with models trained using frameworks such as TensorFlow . This is one of the main features that we will be developing in the coming months. We are already using tensors for some use cases, and will soon be looking at integrating different machine learning models that we hope will better predict outcomes and matches for our users.



In addition, Vespa recently announced support for multidimensional nearest neighbor indexes, which are fully real-time, simultaneously searchable and dynamically updated. We are very interested in exploring other use cases for real-time nearest neighbor index search .



OkCupid and Vespa. Go!



Many people have heard or worked with Elasticsearch, but there is not such a large community around Vespa. We believe that many other Elasticsearch applications would work better on Vespa. It's great for OkCupid, and we're glad we switched to it. This new architecture allowed us to evolve and develop new features much faster. We're a relatively small company, so it's great not to worry too much about the complexity of the service. We are now much better prepared to scale out our search engine. Without Vespa, we certainly could not have made the progress that we have made over the past year. For more information on Vespa's technical capabilities, be sure to check out the Vespa AI in Ecommerce Guidelines from @jobergum .



We took the first step and liked the Vespa developers. They sent us a message back and it turned out to be a coincidence! We couldn't have done this without the help of the Vespa team. Special thanks to @jobergum and @geirst for recommendations on ranking and query handling, and @kkraune and @vekterli for their support. The level of support and effort that the Vespa team has given us has been truly amazing - from deep insight into our use case, to diagnosing performance issues and making immediate improvements to the Vespa engine. Comrade @vekterli even flew to our office in New York and worked directly with us for a week to help integrate the engine. Many thanks to the Vespa team!



In conclusion, we have only touched on a few aspects of Vespa usage, but none of this would have been possible without the tremendous work of our backend and operations teams over the past year. We ran into a lot of unique challenges to bridge the gap between existing systems and the more modern technology stack, but these are topics for other articles.



All Articles