Permission Alternatives for SharePoint
When crawling SharePoint, Glean leverages both the Microsoft Graph API and the SharePoint REST API. These APIs necessitate permissions — x.Read.All for the Graph API and FullControl for the SharePoint REST API — that might not align with your company's Standard Operating Procedure (SOP).
This document outlines alternative approaches to grant Glean the necessary permissions while addressing the possible drawbacks or limitations of each method.
Overview of Default Permissions
Both the Microsoft Graph API and the SharePoint REST API are used to fetch content from your company's SharePoint tenant. The latter is required, because certain information, like site access permissions, are not yet retrievable via the Graph API. When Microsoft makes this capability available, Glean will deprecate the use of the SharePoint REST API.
For a full list of all the API endpoints that Glean uses for SharePoint crawling (and the associated permissions required), please see M365 API Endpoints.
Both the Microsoft Graph API and the SharePoint REST API have their own permissions that are managed separately.
Graph API
Glean requires the following Graph API permissions; which must be granted as application permissions:
Sites.Read.AllFiles.Read.AllFiles.ReadWrite.AllGroupMember.Read.AllUser.Read.AllReports.Read.All
Why is Write access required for Files?
The Files.ReadWrite.All permission is crucial for Glean to create and manage webhook subscriptions for SharePoint and OneDrive content updates. Webhooks play a pivotal role in enabling Glean to immediately reflect changes, such as the creation, modification, or deletion of documents and site content. This real-time update capability is essential not only for keeping the search index current, but also for accurately maintaining the map of permission and access controls.
For example, if a user's access to specific content is revoked, Glean leverages webhooks to swiftly update its records, thereby preventing unauthorized access to sensitive data. This permission ensures Glean can uphold the integrity and security of the data it handles, aligning with stringent access controls.
Failing to provide this permission prevents Glean from leveraging webhooks, which means that any updates to SharePoint & OneDrive content (including permissions) will only be reflected on completion of an incremental crawl (every 24 hours).
Can the webhooks be manually configured?
No. For each webhook, a secret is used to verify webhook responses. Files.ReadWrite.All is required to ensure that this secret can be refreshed and rotated regularly as per Microsoft specification.
It is not feasible to perform this manually on the frequency required.
For detailed information on the use of each permision, please see:
SharePoint REST API
FullControl permissions are required to fetch role assignments and access permissions for site pages and associated web components using the SharePoint REST API v1. The Microsoft Graph API only exposes access permissions for Document Library items, hence it cannot be used to obtain the information needed by Glean.
Futhermore, the SharePoint REST API endpoint responsible for returning this data returns a HTTP 403 Forbidden response when queried with any permission other than FullControl (i.e. read-only permission). Glean does not perform any write actions to your SharePoint tenant using the SharePoint REST API. Only read actions (i.e. HTTP GET) are performed.
For more information, please see this StackOverflow post.
Permission Alternatives
Some security teams are not comfortable providing Glean with many of the x.Read.All permissions required for the Graph API, e.g. Sites.Read.All, or the tenant-wide FullControl permission required for the SharePoint REST API.
While usage of these permissions can be constrained by limiting Glean's SharePoint crawler to an explicit set of site URLs (or usernames), there is often a requirement to have this restricted on the M365 side, rather than within the configuration of Glean's crawler. As such, the following alternatives can be used, pending on your SOP:
- Graph API (
x.Read.All) ->Sites.Selectedpermission. - SharePoint REST API (tenant-wide
FullControl) -> Site-specificFullControl.
Graph API Alternative
Overview
If the requirement is to avoid providing many of the x.Read.All permissions required by Glean, the Sites.Selected permission can be used instead to authorize read access to only a specific set of sites.
Sites.Selected replaces the following permissions:
Sites.Read.AllFiles.Read.AllFiles.ReadWrite.All
The following permissions are still required alongside Sites.Selected:
User.Read.AllGroupMember.Read.AllReports.Read.All
User.Read.All and GroupMember.Read.All are required for Glean to obtain and enforce document and site permissions.
Reports.Read.All is used to improve search result rankings, verify crawling state and progress, and to ensure that your search infrastructure is correctly scaled (SharePoint is one of the largest datasources typically connected to Glean).
Limitations
Warning
Using Sites.Selected will heavily impact the end-user experience for both Search and Glean Assistant. Glean strongly recommends against its use.
Leveraging Sites.Selected will prevent Glean from obtaining:
- Activity data
- Webhook data
Activity data is used extensively for ranking signals in search. Without activity data, you will notice a significant degradation in search quality for SharePoint results.
Webhooks are used to update the Glean index as content is created/accessed/changed/deleted in real-time. Without webhook subscriptions, changes in SharePoint will only be reflected in Glean once every 24 hours when an incremental crawl takes place. This includes changes to site and file permissions.
Additionally, each site must be individually and manually added to the Sites.Selected permission set by your M365 administrator. This creates a high degree of friction and can hinder expansion of Glean within your organization.
Sites.Selected creates a negative user experience
Given the above limitations and impact that this will have on your users, Glean strongly recommends that the Sites.Selected permission is NOT used. Where possible, the default set of permissions for SharePoint & OneDrive should be used alongside crawling or WAF restrictions. Use of Sites.Selected should only be considered as a last resort.
Configuration
Configure SharePoint using Sites.Selected.
SharePoint REST API Alternative
Overview
If the requirement is to avoid providing the FullControl permission tenant-wide, the permission can instead be granted at an individual site-level.
With this method, Glean is not able to read or access any other sites apart from the specific sites that have had the FullControl permission granted, as no permissions for the SharePoint REST API would exist for those sites.
Limitations
There are no adverse affects to applying the FullControl permission to each individual SharePoint site, other than the operational overhead it places on your SharePoint administrators whenever a new site needs to be added to Glean.
Why is the FullControl permission required?
FullControl permissions are required to fetch role assignments and access permissions for the site pages and associated web components of each site. The Graph API only exposes access permissions for Document Library items, hence it cannot be used to obtain the information needed by Glean.
The SharePoint REST API endpoint responsible for returning this data returns a HTTP 403 Forbidden response when the API is queried with any other permission other than FullControl (i.e. Read permission).
Glean does not perform any write actions to your SharePoint tenant. Only read actions (i.e. HTTP GET) are performed.
For more information, please see this StackOverflow post.
What alternatives are there to FullControl?
Unfortunately, there are no alternatives to FullControl at this time. Some of the data required by Glean can only be obtained using:
- Endpoints only present in v1 of the SharePoint REST API, and
- SharePoint API v1 endpoints that require
FullControlto return data as the permission of least priviledge.
If either of these change in the future, the use of FullControl will no longer be required, and Glean will deprecate its use.
For customers that have a Glean cloud-prem deployment, you can implement WAF rules to restrict the Glean SharePoint crawler to only be able to perform HTTP GET (i.e. read) requests towards the SharePoint REST API endpoints documented here.
More information: