Bill Baer /bɛːr/

Index and query time merging unpacked

April 9, 2022 Microsoft Search Microsoft Graph connectors Federation

The amount of information we create has grown exponentially, and more often is distributed across multiple locations, making finding the right information, at the right time, increasingly difficult. One solution to this challenge is to merge your on-premises information with your information in Microsoft 365, but occasionally business requirements, corporate and/or regulatory compliance, or other constraints may limit the ability to merge or store data in the cloud. In this post we’ll cover two examples, index-time and query-time merging.

Index-time and query-time merging are two closely related aspects of information retrieval. The latter, query-time merging, more commonly referred to as federated search, is a critical first step for information retrieval and knowledge management; however, federated search alone is often insufficient for most companies’ workplace search needs. On the other hand, index-time merging, may provide more than what’s needed for a companies’ business needs.

In either scenario, both index and query-time merging are designed to provide a common solution, one search to rule them all. Solutions like Microsoft Search are designed to solve for this with coherent and ubiquitous search. The intent behind one search to rule them all is to mitigate the high cognitive cost of context switching and help people find what they need, when they need it, wherever it is.

“Results suggest that interruptions lead people to change not only work rhythms but also strategies and mental states. Another possibility is that interruptions do in fact lengthen the time to perform a task”
Mark, Gloria, et al. “The Cost of Interrupted Work: More Speed and Stress.” https://www.ics.uci.edu/~gmark/chi08-mark.pdf

Context switching in the “context” of search is when someone needs to leave the application or service they are working in, accessing a discrete location where they perform their search (distributed search), commonly related to the original task prior to the context switch. A cohesive search experience that coalesces multiple content sources, solves this problem by allowing a person to search where they’re working, returning results from across multiple disparate systems, often interleaving entities to provide a holistic view of related information.

“Recently, researchers have quantified some of those effects. They found that most professionals only spend an average of one minute and fifteen seconds on a task before interruption. And after an interruption, it can take up to 25 minutes to resume the task.”
McCormack, David. “The High Cost of Multitasking (Infographic).” Fuze, Fuze, 29 June 2021, https://www.fuze.com/blog/the-high-cost-of-multitasking-infographic.

So, if both index and query time merging solve the same problem of distributed search, what’s the difference?

That’s a great question…

Let’s take an analogy. Imagine you’re searching for Megan Bowen’s phone number in Seattle and in Paris using a phone book for each city… In this example, the objective is the same, finding Megan’s phone number, the task of finding it is replicated across the two sources. These sources, phone books, are disconnected indexes, each representing their own silo of data (which is the city). This is an example of query-time merging. The task (finding Megan’s phone number) is repeated across both sources and then combined into a single mental contact card which represents Megan.

Index-time merging on the other hand, would be where Megan’s phone number from her Seattle and Paris offices are constrained to one index (phone book), so you only need to search once for Megan’s phone number, and the results are interleaved on one page or mental contact card.

Probably not the best analogy… so simplified even further, in a federated search scenario, your query is sent to each data source and the results returned to a single location, I.e., page, whereas in an index-time scenario, your query is sent to one location (index), the results consolidated, following the same relevancy model, and returned in a single location.

Now that we’ve had a primer on these two types of search experiences, how does Microsoft Search support index and query-time merging?

Another great question…

Microsoft Search provides two solutions to merging query results. The first, Microsoft Graph connectors, provide index-time merging, by indexing one or more disparate data sources, whether it’s Salesforce, ServiceNow, or something else - combining their index with that of Microsoft 365. When using Microsoft Graph connectors, you index your data source, and the results are displayed alongside your data from Microsoft 365 apps and services, such as SharePoint and OneDrive content absent any visible separation. Data sources indexed through Microsoft Graph connectors can be displayed as a Results Cluster in line with Microsoft 365 results or as a separate search vertical where you can limit the result set to just information from that data source.

When using Microsoft Graph connectors, relevance and performance are emphasized in the search experience, whether searching in Office.com, SharePoint, Microsoft Bing, etc.

You can view a complete list of available Microsoft Graph connectors for Microsoft Search at https://www.microsoft.com/microsoft-search/connectors.

Microsoft Search also supports query-time merging and chances are, you’re already using it… For example, when returning results from Dynamics 365 or Yammer in SharePoint or Office.com, you’re using federated search. For example, if searching for “customer:Contoso”, the query is sent both to the Microsoft 365 index with Microsoft Search and the Dynamics 365 and/or Yammer index. The results are then merged in the results page, either as an Answer or through a dedicated search vertical.

Federated search is a key solution when you want to expose data from systems in Microsoft Search that may be subject to strict compliance requirements or where the systems data cannot leave the systems boundaries. In some other cases, sources with sensitive data that customer doesn’t want to be indexed on the cloud where data is on-premises to expand coverage for their sensitive data via federation. With federated search, you can make information from these systems available to search across in Microsoft 365 productivity apps and services, without indexing its data with Microsoft Search.

In addition to federation with native data sources, federated search also provides a platform to empower you to build custom search providers for your own unique information sources. The federated search platform (with Microsoft Search):

Provides the ability to bring results from other Microsoft clouds, custom data stores on-premises, or other clouds by building and registering custom search providers that can be invoked for any input query.
Allows custom verticals for domain-specific experiences and high-confidence answer cards in the All results tab, making your enterprise specific and custom data available in enterprise search experiences.
Provides an industry-standard development platform powered by Azure Bot Service and Adaptive UX technology to build-once/enable across many canvases.

In either case there are pros and cons to each.

Pros and cons of query-time merging

While query-time merging provides the fastest time to deployment and often lower cost, it comes at the cost of both performance and relevance. Because the index resides within the queried data source, the time to retrieve results can be affected by several factors beyond your control, such as data source and network performance. In addition, the results returned from the federated source, while interleaved, aren’t relevancy ranked with the issuing systems data, instead relevance is determined through any signal collection (where it exists) on the source system.

Federated search is best used where:

The data from the source data source does not provide sufficient value to necessitate relevancy augmentation (relevance is largely left up to the person conducting the search)
The data is boundary restricted by corporate or regulatory compliance

Index-time merging is best used where:

The information from the remote data source is high value data meaningful to a cohesive search experience
You’d like to combine ranked results through similar relevance patterns

Index-time merging with Microsoft Graph connectors improves overall relevance as the external data is merged into the Microsoft 365 index where machine learning analyzes user behavior signals, collaborative patterns and the content types to better understand user intent (personalization) and context (contextualization).

Performance should be optimized for best user experience
Access to a rich set of APIs is desired
The Microsoft Search API provides one unified search endpoint that you can use to query data in the Microsoft cloud - messages and events in Outlook mailboxes, and files on OneDrive and SharePoint - that Microsoft Search already indexes.
The source system lacks an integrated search experience

This has been a primer on index and query-time merging. Continue learning about index-time and query-time merging with the resources below.