Created by
Charis Kyriakou
·
July 2, 2019

Azure Search: Implementing Partial Word Search

In only 3 easy steps

Azure

Reading time:
5
minutes

One of our core missions at Cloud Maker is to simplify cloud design and deployment. But if users can’t even find the cloud components they need, then the design process is slowed significantly.

We’re solving this problem a number of ways, firstly through our ML powered suggestion engine. Our system analyses how each engineer likes to design their cloud solution and then, at each step of the design process, surfaces up the cloud component it thinks they will need next. But sometimes, this isn’t enough, and at this point a powerful search capability is required to enable rapid retrieval of the required component for the user.

On the surface, search may seem like a simple problem to solve. But once you start to consider the hundreds of cloud components required – with each being known by either their formal or colloquial name – whilst also factoring in spelling mistakes and partial searches, it’s clear that the problem becomes much more challenging.

First Take

To solve this problem, we turned to Azure Search — a fully managed, global scale, search service. Getting up and running was pretty straight forward, all we needed to do to start was provision an Azure Search service, then create an index and upload our data.

This is what our initial index definition looked like:


{
	"name": "droplets",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "retrievable": true,
      "searchable": false,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": true
    },
    {
      "name": "fullName",
      "type": "Edm.String",
      "retrievable": true,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": false
   }
  ],
  "corsOptions": {
    "allowedOrigins": ["https://app.cloudmaker.ai"]
  }
}

We then hooked up the search service to Cloud Maker, and used full text search with fuzzy search to allow for spelling mistakes. To do this, Cloud Maker makes requests to the Azure Search service instance enabling the “full” query type, and also appending the tilde (~) symbol on each search term enabling fuzzy search.

For example the following web request would successfully give us the result the user was looking for — in this case, a virtual machine component.


POST https://{{serviceInstance}}.search.windows.net/indexes/droplets/docs/search?api-version=2015-02-28&queryType=full
{
  "search": "virtul~ machnie~"
}

This works great, but it relies on people typing whole words for the components they are searching for. We need to make it easier for our users, and let them find things by typing only a few characters. For example if someone writes “virt” we wanted to start giving them accurate results back.

Adding Partial Word Matching

Achieving this requires a deeper understanding of how a search engine works and then configure the engine to do more heavy lifting to return the correct results to the user.

Firstly, when the search index is being built, the engine analyses the data, breaking down text into elements. For example, with the default analyser, if the name is “Virtual Machine” it will be broken down to “virtual” and “machine”. A similar process happens at query time.

Note that this only splits up full words. We want to be able to split it into smaller parts so that incomplete words are found easily.

In order to do that, we need to change the default index analysis behaviour to use an edge n-gram tokenizer which will emit “n-grams” for each word. In our example, and if we set the minimum length to be 2, “virtual machine” will give us “vi”, “vir”, “virt”, “virtu”, “virtua”, “virtual”, “ma”, “mac”, “mach”, “machi”, “machin”, “machine”.

We only need to do that at index time, not at query time. This is because we don’t want to be searching for all n-grams — it would make the query complex and we’d get back odd results. To achieve this, we need to use different analyzers for index and search.

We introduced a new field that indexes these fragments — the partialName field.

This is the updated index definition:


{
  "name": "droplets",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "retrievable": true,
      "searchable": false,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": true
    },
    {
      "name": "fullName",
      "type": "Edm.String",
      "retrievable": true,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": false
    },
    {
      "name": "partialName",
      "type": "Edm.String",
      "retrievable": false,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "searchAnalyzer": "standardCmAnalyzer",
      "indexAnalyzer": "prefixCmAnalyzer"
    }
  ],
  "corsOptions": {
    "allowedOrigins": ["https://app.cloudmaker.ai"]
  },
  "analyzers": [
    {
      "name": "standardCmAnalyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer": "standard_v2",
      "tokenFilters": ["lowercase", "asciifolding"]
    },
    {
      "name": "prefixCmAnalyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer": "standard_v2",
      "tokenFilters": [ 
        "lowercase",
        "asciifolding",
        "edgeNGramCmTokenFilter"
      ]
    }
  ],
  "tokenFilters": [
    {
      "name": "edgeNGramCmTokenFilter",
      "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
      "minGram": 2,
      "maxGram": 20
    }
  ]
}

It’s worth noting that when uploading data to the index, the partialName field needs to get the same data as the fullName field.

If you’d like to understand more about the above setup, it’s worth reading this page from the Azure Search documentation.

Fine Tuning Our Solution

With this setup we were getting fast prefix matching. For example, someone would type “virt” and they’d get items with the word “virtual” in their name. However, when it came to exact matches, the results were a little bit odd.

To fix that, we needed our index to treat exact matches as more important than partial matches. Azure Search exposes the idea of a scoring profile, so we can easily set up the index to give more weight to results coming from the fullNamefield. We choose a two to one weight ratio for fullName to partialName matches.

Here’s the updated, and final index definition:


{
  "name": "droplets",
  "fields": [
    {
      "name": "id",
      "type": "Edm.String",
      "retrievable": true,
      "searchable": false,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": true
    },
    {
      "name": "fullName",
      "type": "Edm.String",
      "retrievable": true,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": false
    },
    {
      "name": "partialName",
      "type": "Edm.String",
      "retrievable": false,
      "searchable": true,
      "filterable": false,
      "sortable": false,
      "facetable": false,
      "key": false,
      "searchAnalyzer": "standardCmAnalyzer",
      "indexAnalyzer": "prefixCmAnalyzer"
    }
  ],
  "corsOptions": {
    "allowedOrigins": ["https://app.cloudmaker.ai"]
  },
  "analyzers": [
    {
      "name": "standardCmAnalyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer": "standard_v2",
      "tokenFilters": ["lowercase", "asciifolding"]
    },
    {
      "name": "prefixCmAnalyzer",
      "@odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
      "tokenizer": "standard_v2",
      "tokenFilters": [ 
        "lowercase",
        "asciifolding",
        "edgeNGramCmTokenFilter"
      ]
    }
  ],
  "tokenFilters": [
    {
      "name": "edgeNGramCmTokenFilter",
      "@odata.type": "#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
      "minGram": 2,
      "maxGram": 20
    }
  ],
  "scoringProfiles": [
    {
      "name": "exactFirst",
      "text": {
      "weights": {
        "fullName": 2,
        "partialName": 1
      }
    }
  }
  ],
  "defaultScoringProfile": "exactFirst"
}

Final Thoughts

Azure Search is a great service that allows engineers with not much knowledge on search technologies to implement search functionality in their apps quickly. Adding partial word matching is a bit more challenging, but hopefully this post makes it easier to understand and implement.

To see our search index in action, head over to Cloud Maker and sign up for an account today.

Keep reading...

Want to know more?

Book a demo and let one of our Cloud Solution
Architects demonstrate the power of Cloud Maker

Book a demo now