Getting full text search up and running in Azure

For those that have apps running in Azure using Azure SQL databases, you'll notice one major feature missing. There is no full text search capability.

For those that have apps running in Azure using Azure SQL databases, you'll notice one major feature missing. There is no full text search capability. It's one of those features you never need until you do, and the only supported Azure story thus far is to spin up an expensive SQL Server 2012 VM. That's not to say there aren't options, just nothing out of the box.

The best of those options I've found to date is Lucene.Net. Lucene.Net is a C# port of the Apache Lucene search engine library for the .Net framework. That, combined with the AzureDirectory Library for Lucene.Net allowed me to accomplish my goal of a fast and robust search engine for the nearly 20 million books in my database. Essentially what the AzureDirectory project does, is allow you to easily build an index from your data and store it directly into Azure blob storage. I don't have a full grasp on all of its capabilities yet, but from my testing thus far, it's exactly what I was looking for.

First, install the AzureDirectory package

install-package Lucene.Net.Store.Azure

Building the index

Grab your storage account and initialize AzureDirectory. The blob container you specify is where all of the index files will be stored in your specified storage account.

Feed it your data. In my case, an IEnumerable<Book> from my azure SQL database. I'm looping through each book entry, only feeding in the columns I want searchable or will need for reference later.

Optimize and clean up

Searching the index

Initialize the IndexSearch with my AzureDirectory reference from above and then define a query parser. I then initialize a MultiFieldQueryParser to define how I want my search phrase interpreted. This particular setup will return results with one or more of my keywords in order of relevance. Or at least that's what I've determined based on my testing and research thus far. So a search for "Cat in the hat" would first give me the Dr. Seuss book but then may also return books relevant to Hats. You can, of course, specify that all search terms much be present, e.g., Cat AND Hat, but for my needs, that isn't what I'm after.

Search my index based on the search phrase and do something with the results. In my production application, I would use the for loop to generate a collection of Book repository objects using the Book ID stored on each indexed document...but for this example, I'm just dumping the results to the console.

That's it!

Well, sort of :) To keep the index current you'll want to schedule the rebuild using whatever scheduling method you have in place for maintenance tasks like this. I use an app I wrote based on Quartz.Net, which I use to build and schedule custom jobs, but something this simple could just be dropped into a console app on a VM or into a worker role of your cloud service. Azure also has a new scheduler service in preview that will help facilitate things like this out of the box in the near future.

All said and done, it was very easy to set up. The index built quickly, even with my large number of records...though do be careful of memory usage with large queries. Once indexed, even the most complicated searches completed sub-second with very relevant results. Overall I'm happy with this solution even though it's not technically out of the box.

Credit to Leon Cullens
AzureDirectory Codeplex project

Updates Full support for Lucene query syntax in Azure Search service