Bill Baer /bɛːr/

Understanding classifiers in Microsoft Syntex and Microsoft Purview

April 13, 2023 Microsoft Syntex Purview Information Governance

Every day, we use Outlook, Teams, SharePoint, and OneDrive to exchange important and/or sensitive information which can include financial reports and data, contracts, product information, sales reports and projections, competitive analysis, patent information, customer and employee information, etc. It’s easy to see that content is the lifeblood of an organization.

BYOD (bring your own device) and remote work mean people can access their e-mail, chats, and files from just about anywhere, on any device, transforming these devices into repositories of information - and frequently it’s sensitive information that can be easily shared within and in some cases, outside of the organization.

Data leakage and overexposure remain critical concerns and threats to an organization. According to IDC, by 2024, 30% of organizations will be forced to expand data management and privacy measures to mitigate risks of data breaches caused by ecosystem partners costing $4.6 million per breach.

IDC’s Future of Customer Experience predictions are presented in full detail in the report, IDC FutureScape: Worldwide Future of Customer Experience 2023 Predictions (IDC #US48543222). It doesn’t need to be stated, but data loss is non-negotiable, it’s not something we can buy back.

In addition to the potential for lost revenue, incidents of data leakage and overexposure can also result in an organizations ability to compete effectively, erode customer confidence, and more. These impacts are just the tip of the iceberg when it comes to the challenges organizations face with securing and managing information.

The ever-increasing depedency on digital assets makes information management more and more challenging, especially considering government and/or industry data-handling standards and regulations.

Overexposure or the loss of sensitive information, increased compliance obligations with data-handling standards and regulations, etc. demands effective information protection systems, which are not only secure but are also easy to apply, whether it’s about e-mail messages, documents accessed inside an organization or outside it to business partner organizations (e.g. suppliers and partners), customers, and public administration, or any other kind of information.

Organizations of any size can benefit from an effective information protection system that combines the strengths of Microsoft Syntex and Microsoft Purview in many of ways by helping to reduce:

Violations of corporate policy and best practices.
Non-compliance with government and industry regulations such as Health Insurance Portability and Accountability Act (HIPAA)[1]/ Health Information Technology for Economic and Clinic Health (HITECH)[2], Gramm-Leach-Bliley Act (GLBA)[3], Sarbanes-Oxley (SOX)[4], Canada’s Personal Information Protection and Electronic Documents Act (PIPEDA)[5], the forthcoming European Union (EU) General Data Protection Regulation (GDPR) (a.k.a. Regulation 2016/679)[6] repealing the EU Data Protection Directive (a.k.a. Directive 95/46/EC), Japan’s Personal Information Privacy Act (PIPA)[7], etc. to just name a few.
Loss of intellectual property and proprietary information.
High-profile leaks of sensitive information.
Damage to corporate brand image and reputation.

Business runs on content—proposals, contracts, invoices, designs, plans, legal documents, images, training videos, and more. According to IDC, by 2025 there will be more than 130 billion terabytes of content.1 This content is often unstructured, locked up in siloed repositories, or worse, sitting as paper in a warehouse. The patchwork of disconnected systems and processes leaves gaps in classification and organization. This all makes it challenging to use content at scale effectively. And emerging digital work trends have changed the boundaries and frontiers of where we work—by escalating security and compliance needs as content is accessed from more places than ever. Today, organizations spend $46 billion per year storing and managing content2, and time spent looking for the right content can impact productivity by 11 to 14 percent.3

Protecting sensitive information requires a broad strategy, from enforcing security and compliance policies to organizing and automating processes. Microsoft supplies complimentary solutions to address these requirements, through Microsoft Purview and Microsoft Syntex classifiers.

What are the differences between the Microsoft Syntex classifier and trainable classifier?

Microsoft Syntex is designed to help manage and organize business information and automate business processes, while the classification in Microsoft 365 Compliance is designed to help enforce security and compliance policies. The product experience is designed for different purposes.

With Microsoft Syntex, a classifier is a type of model that you can use to automate identification and classification of a document type. For example, you might want to identify all Contract Renewal documents that are added to your document library. Whereas a Microsoft Purview trainable classifier is a tool you can train to recognize various types of content by giving it samples to look at. Once trained, you can use it to identify items for application of Office sensitivity labels, Communications compliance policies, and retention label policies.

However, the two solutions can be well-integrated. You can use Microsoft Syntex classification to associate with the retention label you want to apply. For example, you can assign a retention label to a document understanding model in Microsoft Syntex. Learn more about this scenario in this skilling video.

Comparing Microsoft Syntex and Microsoft Purview.

	Microsoft Syntex	Microsoft Purview
Mechanism	Content classification and metadata extraction. I.e., identify what is and what is not a contract. I.e., extract the dollar amount from contracts and add the information as metadata for business processing.	Data classification, including documents, e-mails, and messages. I.e., identify resumes among all documents in the specified location(s). I.e., detect offensive, harassing language
Goal	Automate content processing and business processes. Business centric use case. I.e., trigger Power Automate to send notifications to managers for contracts > 10K	Meet compliance requirement and reduce risks. Compliance centric use cases I.e., retain tax documents for 7 years and delete
Deployment scope	ML model is deployed at the container level. Users can select multiple libraries to deploy the model. No option to apply it for all tenants. I.e., per library, per mailbox, per site collection	ML model is deployed at the tenant level. Available to be used for all compliance solutions when applicable.
Content scope	New content or modified content can also be applied on demand to older content.	All content in the selected location(s) that are 6 months old (for retention labels)
Technology used	ACS Language Understanding / Syntex modeling, machine teaching	PMI, ML.net > categorizing information, machine learning
# of seeding content	5 positive and 1 negative samples for document understanding; more needed for content extraction.	50 positive samples Review 200 samples More adding content needed to have higher accuracy for compliance reasons.
Model testing	Optional	Required to improve stabilization
Retraining model; feedback loop	No	Yes
Targeted personas	Information workers	Compliance and data management admins
Pre-built templates	No	Yes – resume, source code, offensive language, etc.
Integration between Project Cortex and M365 Compliance	Classification and metadata extracted can be used to apply retention (and sensitivity by YE2020) labels in Microsoft 365 Compliance.	Classification usually won’t be used for productivity purposes.

When looking at Microsoft Syntex alone, there are several use case scenarios that provide benefit to customers. For example:

Content Electronics is a FCI and need to make sure that trade documents and all sensitive financial information are appropriately protected and governed.

Recommendations: Use trainable classifier in Microsoft 365 Compliance to identify trade documents across Exchange SharePoint, OneDrive, and Teams; use pattern recognition in Microsoft 365 Compliance to identify financial sensitive information; protect and govern all financial sensitive information stored in Microsoft 365.

Northwind Traders is an e-commerce industry, and the procurement team needs to manage selected suppliers’ contracts of significant size in a scalable way while protecting and governing the information.

Recommendations: Create a document understanding model in the procurement team’s Content Center; identify suppliers’ contracts and extract metadata to initiate business processes. At the same time, enforce retention and deletion policies once the system identifies suppliers’ contracts via Microsoft Syntex classification.

What are the benefits of Microsoft Syntex and Microsoft Purview?

Microsoft Purview supports multiple ways of auto-classification, and Microsoft Syntex is one of them. Typically, customers who are heavy SharePoint users and leverage Content Type a lot, might adopt Microsoft Syntex to advance their use of SharePoint. Microsoft Purview enables them to use Syntex to auto-apply retention policies.

What metadata can we leverage from Microsoft Syntex to create information governance policies?

Organizations can use all the metadata extracted by Syntex to create retention or record policies, using Keyword Query Language (KQLs). For example, if a user extracts dollar amount from contracts as a new metadata called “Contract Price”, compliance admins can create an auto-apply polices to apply retention/record labels for contracts that are bigger than $10K.

What licenses do I need to use Microsoft Syntex classification to apply retention policies?

Organizations need to have licenses for both Microsoft Syntex and the compliance capabilities they plan to use. For example, organizations can purchase the Content Service add-on of Microsoft Syntex and Information Protection & Governance add-on of M365 Compliance to use Microsoft Syntex classification to enforce retention policies automatically.

Resources

Learn about trainable classifiers - Microsoft Purview (compliance) | Microsoft Learn

Train an unstructured document processing model in Microsoft Syntex

References

[1] Passed in 1996, HIPAA relates to healthcare coverage and, for example, how companies may use medical information.

[2] Enacted in 2009 to promote and expand the adoption of health information technology.

[3] Gramm-Leach-Bliley, also known as the Financial Services Modernization Act, was passed in 1999.

[4] The Sarbanes–Oxley Act of 2002, also known as the ‘Public Company Accounting Reform and Investor Protection Act’ (in the Senate) and ‘Corporate and Auditing Accountability and Responsibility Act’ (in the House), is a United States federal law that set new or enhanced standards for all U.S. public company boards, management and public accounting firms.

[5] Passed in 2000, and reviewed every 5 years, PIPEDA is a Canadian law relating to data privacy that governs how private sector organizations collect, use and disclose personal information in the course of commercial business.

[6] Passed in 2016, EU GDPR entered into force on 24 May 2016 and is due to apply from 25 May 2018.

[7] Passed in 2003, PIPA spells out duties of the national and local government for handling personal information and measures for protecting personal information. It also sets out obligations of businesses that handle personal information.

| Bill Baer |

bf5929b73423376ea51a4e83bba31411

Understanding classifiers in Microsoft Syntex and Microsoft Purview

Understanding classifiers in Microsoft Syntex and Microsoft Purview

Understanding classifiers in Microsoft Syntex and Microsoft Purview

Resources

References

Comments

Social Links