Jul 26, 2023

Serverless Smart Search using Azure and Qdrant

Lukasz memoji

Lukasz C.

robot searching with flashlight
robot searching with flashlight
robot searching with flashlight
robot searching with flashlight

Our recent project entailed incorporating a semantic search functionality into a highly specialized domain - the US real estate marketplace. This task was challenging for numerous reasons, and we are eager to share the valuable lessons we've learned during this process.

The Objective

Develop an intelligent, AI-enhanced search experience for the US real estate marketplace.

Constraints and Presumptions

  1. Serverless architecture: As per the client's preference, we needed to align with their existing reliance on serverless architecture and cloud technologies.

  2. Employing a version of the GPT model: The intention was dual-pronged: facilitate user experience with a tool that complements the standard search form, and incorporate GPT and AI into the marketing narrative.

  3. Ease of listing status update (active, inactive, removed, etc.): Considering the dynamic nature of the listings database, prompt updates are crucial to avoid showing outdated listings to users.

  4. Handling over 1 million records: The ever-expanding coverage of the US real estate market by the marketplace required a scalable solution that doesn't skyrocket costs.

  5. Strict adherence to constraints when present (this one added post-revision and testing): In cases where user input includes limitations such as address, price range, and property type, all listing suggestions must conform to these constraints.

Architecture Design

We decided to use Microsoft Azure. Given our expertise in the Microsoft stack and the fact that Azure is currently the only major cloud hosting GPT models, the decision was simple ;)

Search architecture diagram

Fig.1 - Serverless search architecture design.

First Approach

Our approach was to use similarity search provided by a vector database. We extracted meaningful information from real estate listings and converted them into embeddings using a LLM. Then we used the same model to convert the user's input into an embedding. Eventually we utilized searching by similarity to find the best matching vectors and thus listings.

Simple approach to semantic search

Fig.2 - Search leveraging only similarity.

The approach was simple, and we could find properties using not so obvious search phrases. However, it struggled to meet simple user-defined criteria such as price range or property type, contradicting our fifth assumption. A revised approach was needed.

Revised Approach - Qdrant Filtering to the Rescue

To enhance our initial approach, we:

  • Identified a list of stringent conditions (like price range, property type, address, etc.) that must be met when specified by the user.

  • Revised the Sync process to include fields necessary to validate these conditions in the payload.

  • Constructed payload indexes on these fields.

  • Modified the Search function to use the LLM to transform user input into a strongly typed query object (XML) that incorporates all stringent conditions specified by the user.

  • Leveraged Qdrant's filtering capabilities to query based on both similarity and payload filters.

Revised approach to semantic search

Fig.3 - Search leveraging both filters and similarity.

The Road Ahead

Our solution, while functional, can be further refined and extended. Potential enhancements include:

  • Adjusting the listing data converted into embeddings to improve search results.

  • Incorporating Qdrant's recommendation functionality, which allows for the amalgamation of similarity search, filtering, and contextual recommendation (likes/dislikes) into a single query.

  • Leveraging the coordinate field type and filtering to return records based on map boundaries.

We will keep you updated as we expand our reach!

If you need help leveraging any of these integrations, we are here to assist you.

Stay safe and happy coding!

Lukasz memoji
Lukasz memoji
Lukasz memoji

Lukasz C.

Share this post