A quick trip around taxonomy tagging
A quick trip around taxonomy tagging
What is it?
Taxonomy Tagging is one of the earlier content tagging features of SharePoint Syntex (alongside image tagging), later becoming Microsoft Syntex when it transitioned to the PAYG billing model and allows you to automatically tag documents in SharePoint libraries with terms configured in the term store.
Terms are stored in a managed metadata column (also known as a taxonomy column) on the item, making the documents easier to search, sort, filter, and manage.
Documents are from diverse domains and have variant content quality, taxonomy tagging aids in discovering content across Microsoft 365.
How does it work?
Taxonomy tagging extracts keyphrases from documents (.doc, .docx, .pdf, and .pptx) to improve discovery, a key phrase is a word/sequence of words in a document/email that forms a concept or entity.
In simple terms think of the key phrase as primary source to complete a search query to aid user, for example, when using content query to discover a label associated with a document.
Labels in the context of keyphrases can roughly be summarized as:
- Salience: A keyphrase captures the essential meaning of the page with no ambiguity.
- Extraction: The keyphrase has to appear in the document.
- Fine-Grained: The keyphrase cannot be general topics, such as “Sports” and “Politics”.
- Correct & Succinct: The keyphrase has to form a correct English noun phrase, while also cannot be clauses or sentences.
In the case of taxonomy tagging in SharePoint Premium, extraction is the keyphrase label generated by the annotator and stored in a SharePoint column, e.g. text that can uniquely be attributed to a document. This label is used by content query and Microsoft Search; however, is not used with Semantic Index, more on this below:
Using content query, you can customize the search flyout menu to reason across metadata associated with one or more documents. This improves the recall process, allowing you to quickly find what matters when within the context of document library.
Content query allows you perform specific metadata-based queries on SharePoint document libraries for faster, more precise queries based on specific metadata column values, rather than just searching for keywords to help find the file in a SharePoint document library. Content query is particularly useful when you have a specific piece of information you want to search for, such as when a document was last modified, a specific person associated with a file, or a specific file type.
Out of the context of the document library, Microsoft Search uses tags to assist in the formulation of user queries. For example, where a search term is entered in the search box, if the keyword matches a tag associated with a document, the link to the document automatically ranked in the search results.
Additionally, automatically extracting keyphrases that are salient to the document meaning is an essential step to semantic document understanding otherwise defined as the process of drawing meaning from text. Semantic understanding is core to understanding and interpreting sentences, paragraphs, or whole documents, by analyzing their grammatical structure, and identifying relationships between individual words in a particular context. A semantic index works across these same principles. So what does that mean for taxonomy tagging and Semantic Index for Copilot?
Semantic Index for Copilot
Semantic Index for Copilot builds upon keyword matching, personalization, and social matching capabilities within Microsoft 365 by creating vectorized indices to enable conceptual understanding, which helps determine your intent and helps you find what organizational content you need. A vector can be a numerical representation of a word, image pixel or other data point and is arranged or mapped with close numbers placed in proximity to one another to represent similarity.
Today, Semantic Index for Copilot looks at the envelope or otherwise the document to derive context for the purposes of search and since the taxonomy is stored outside of the envelope (in columns) it is not reasoned over today. Data saved in user defined fields rules out incorporation in ranking since cannot be generalized to other users.
NOTE Copilot, outside of Semantic Index, leverages column-based metadata.
Thu, 21 Dec. 2023, 23:18 UTC
Great post! Thanks!
Thu, 14 Dec. 2023, 04:11 UTC
Wed, 13 Dec. 2023, 15:52 UTC
Hi Bill, What do you mean by "NOTE Copilot, outside of Semantic Index, leverages column-based metadata." Is this means that M365 Chat for example use column-based metadata when building a response or what exactly?