コンテンツにスキップ

About SharePoint Connector Permissions

The Microsoft Graph API and the SharePoint REST API are both used by Glean to fetch content from your company's SharePoint instance.

The SharePoint REST API is utilized because certain data (access permissions for site collections) is not yet retrievable via the Graph API. When Microsoft makes this capability available using the Graph API, Glean will deprecate the use of the SharePoint REST API.

More information: SharePoint & OneDrive API Endpoints.

Both the Microsoft Graph API and the SharePoint REST API have separate permissions that need to be managed independently. For each API, this document will cover:

  • Each permission that Glean requests, why it is required, and how it is used.
  • Typical objections encountered when requesting the specified permission.
  • Alternatives for permissions that are not permitted as part of your organization's Standard Operating Procedure (SOP).

Graph API Permissions

Overview

Glean requires the following permission scopes for the Microsoft Graph API when integrating with SharePoint:

  • Sites.Read.All
  • User.Read.All
  • Files.Read.All
  • Files.ReadWrite.All
  • GroupMember.Read.All
  • Reports.Read.All

All permissions must be granted as Application permissions. Delegated permissions cannot be used.

Permissions Explained

Permission Scope
Purpose Typical Objection
Sites.Read.All Sites.Read.All is a fundamental requirement, as Glean needs to crawl each of your SharePoint sites to provide search capability for them.

Specifically, this permission is used to fetch site collections, sub-sites, site lists, site columns (attributes), and site metadata. Site metadata is a dependency in order for files from associated document libraries to be crawled (see Files.Read.All below).
Concerns often arise regarding the potential for Glean to access sensitive information within SharePoint sites that are subject to stringent access controls.

It is crucial to understand that Glean respects the existing permissions and access controls of each piece of content it crawls. This means that search results will only display content to users who have the appropriate permissions to view it in SharePoint.

Glean's ability to map a piece of content to the users who are permitted to access it is tied to the User.Read.All permission (detailed below).

The SharePoint sites that are crawled by Glean can be restricted by Site URL in the Glean UI. For more information on setting these restrictions, refer to: M365 Crawling Restrictions.
User.Read.All The User.Read.All permission is essential for Glean to accurately map and enforce the permissions of every site and content piece that is indexed.

This permission enables Glean to identify and respect the access controls set for your SharePoint content, ensuring that search results are appropriately restricted. Without it, Glean would be unable to apply any access controls, potentially exposing sensitive content.
Concerns regarding this permission often revolve around the privacy and security of accessing user/employee information. There's also a question of necessity, particularly if Glean as a tool is only accessible to a subset of users within the organization.

Glean uses this permission to verify the access permissions of content it crawls, obtaining a list of User and Group IDs with authorized access. It is crucial for Glean to recognize the user identities linked to these IDs in SharePoint/OneDrive, aligning them with user profiles in Glean. This ensures that search results are only shown to users with the right permissions. The List users endpoint of the Graph API, which requires User.Read.All, facilitates this process.

Additionally, understanding user identities helps Glean enhance the metadata of indexed content, improving search result relevance. For instance, displaying the document owner's name alongside search results enriches the user's search experience, even if the document owner doesn't use Glean.
Files.Read.All The Files.Read.All permission is crucial for Glean to access and index files from SharePoint and OneDrive. It enables Glean to retrieve metadata, permissions, and content from user drives in OneDrive, and document libraries on SharePoint sites.

This permission is necessary for documents to be included in search results, ensuring a comprehensive search experience. More information on the API endpoints leveraged by this permission can be found here: Drives - M365 API Endpoints.

Concerns about this permission often mirror those related to Sites.Read.All, focusing on the potential for Glean to access sensitive or restricted company documents.

It is important to understand that Glean respects the existing permissions and access controls for each document it indexes. This means that only users with the appropriate permissions in the source application will see the content in search results.

This careful mapping of permissions ensures that even the most sensitive documents are only visible to authorized employees. Glean's ability to map a piece of content to the users who are permitted to access it is tied to the User.Read.All permission (detailed above).

Additionally, Glean offers options to restrict crawling to specific SharePoint sites and user drives by specifying URLs or usernames associated with the drives. It is also possible to exclude specific individual content from search results. For details on how to apply these restrictions, refer to:
M365 Crawling Restrictions.
Files.ReadWrite.All The Files.ReadWrite.All permission is crucial for Glean to create and manage webhook subscriptions for SharePoint and OneDrive content updates. Webhooks play a pivotal role in enabling Glean to immediately reflect changes, such as the creation, modification, or deletion of documents and site content.

This real-time update capability is essential not only for keeping the search index current, but also for accurately maintaining the map of permission and access controls. For example, if a user's access to specific content is revoked, Glean leverages webhooks to swiftly update its records, thereby preventing unauthorized access to sensitive data. This permission ensures Glean can uphold the integrity and security of the data it handles, aligning with stringent access controls.

To ensure data integrity, webhook subscriptions frequently require re-authorization (HTTP POST).
Concerns about this permission often stem from its capability to write back to the Graph API, potentially altering data. As per Microsoft's documentation, Files.ReadWrite.All is the permission of least privilege to be able to create and re-authorize subscriptions to the driveItem resource that is required by Glean.

Glean is committed to adopting less permissive options should they become available. To mitigate concerns, Glean advises implementing restrictions on the data it can access by specifying Site URLs or usernames in the Glean UI.

Monitoring audit logs for both Glean and the Microsoft Graph API, with configured alerts for unexpected API usage, is also recommended.

If Glean is deployed in your organization's GCP or AWS environment, you can enhance security by implementing Web Application Firewall (WAF) rules. These rules can restrict the HTTP request methods and URLs the Glean SharePoint crawler can access, ensuring it only interacts with approved content. This step adds an extra layer of protection for your M365 environment, aligning Glean's access with your security policies.
GroupMember.Read.All The GroupMember.Read.All permission is essential for Glean to accurately map user access to content. It works in tandem with User.Read.All to identify which users are allowed access to each piece of content Glean indexes.

Specifically, when permissions for a site or file are associated with a Group ID, Glean uses the List group members endpoint of the Graph API to determine the user IDs within that group. This endpoint necessitates the GroupMember.Read.All permission, which is the least privileged permission required for this operation.
Concerns about this permission often mirror those for User.Read.All, focusing on its scope. Given that groups assigned to specific sites or files could span across the entire organization, it's crucial for Glean to understand the membership of these groups comprehensively. This understanding allows Glean to accurately enforce permissions, ensuring that only authorized users can access specific content.

The necessity of mapping group memberships across all groups is fundamental to maintaining the integrity and security of data access within the organization.
Reports.Read.All The Reports.Read.All permission enables Glean to access SharePoint & OneDrive usage data for sites, pages, users, and files within a specified time period. This is crucial for optimizing the search experience.

Specifically, it allows Glean to:
1. Enhance search result ranking by prioritizing frequently or recently accessed content.
2. Monitor crawler progress and efficiency.
3. Scale Glean infrastructure appropriately to manage the vast content volume in SharePoint & OneDrive.
Concerns about this permission often center on the potential access to sensitive activity and usage data via other accessible reports.

Glean strictly accesses data from four reporting endpoints: File Count (OneDrive), Site Count (SharePoint), User Count (SharePoint), and Page Usage (SharePoint).

These endpoints are essential for Glean's functionality and require Reports.Read.All, the least privileged permission necessary for access.



Understanding Application vs. Delegated Permissions for Glean

Glean's integration with SharePoint and OneDrive necessitates the use of Application permissions, as opposed to Delegated permissions. This distinction is crucial for the functionality of the SharePoint and OneDrive Connector within Glean and aligns with Microsoft's guidelines for application development:

When should I use application-only access?

In most cases, application-only access is broader and more powerful than delegated access, so you should only use app-only access where needed. It’s usually the right choice if:

  • The application needs to run in an automated way, without user input. For example, a daily script that checks emails from certain contacts and sends automated responses.
  • The application needs to access resources belonging to multiple different users. For example, a backup or data loss prevention app might need to retrieve messages from many different chat channels, each with different participants.
  • You find yourself tempted to store credentials locally and allow the app to sign in "as" the user or admin.

Understanding application-only access (learn.microsoft.com)

Why Application Permissions?

Application permissions are required for several key reasons:

  1. Autonomous Operation: Glean operates independently of any specific user interaction. It needs to access and index data across your SharePoint and OneDrive environments systematically and continuously. This includes crawling content, permissions, and activity data for assets.

  2. Comprehensive Access: Unlike delegated permissions, which act on behalf of a user, application permissions allow Glean to access all relevant data across the environment without being tied to individual user sessions or permissions. This is essential for Glean to perform its functions effectively, ensuring that it can access and index content as needed, regardless of user activity.

  3. Efficiency and Scalability: The need to fetch data asynchronously and across the entire environment means that relying on user-based delegated permissions would severely limit Glean's ability to operate efficiently. Application permissions ensure that Glean can scale its operations to meet the demands of large and complex environments.

Limitations of Delegated Permissions

Delegated permissions, which operate on behalf of a logged-in user, cannot support the breadth of access required for Glean's operations. Specifically:

  • User Dependency: Delegated permissions restrict Glean to the permissions of individual users, limiting content access and indexing to what the user can see during their session. This approach is not scalable and delays data availability, risking exposure of sensitive data if document access changes, as updates depend on user sessions to be processed.

  • Interactivity Requirement: Delegated permissions are designed for scenarios where an application acts with user interaction. Glean's requirement to operate independently, fetching data asset by asset without direct user involvement, is incompatible with the nature of delegated permissions.

Constraining Data Access

To constrain the scope of data that the SharePoint crawler has access to and the actions it can perform via the Graph API, there are 3 methods for control that can be leveraged.

Crawling Restrictions

For SharePoint and OneDrive, Glean can constrain the SharePoint crawler in your deployment to only target specific Sites or User drives as defined by you. Conversely, specific Sites or User drives can also be excluded from crawling.

This involves providing Glean with:

  • The SharePoint Site URL(s) to explicitly include (or exclude) when crawling; and/or
  • The Azure AD/Entra ID Group ID containing the users to explicitly include (or exclude) when crawling; and/or
  • The usernames of the users to explicitly include (or exclude) when crawling.

More information: M365 Crawling Restrictions.

WAF Restrictions (Cloud-prem only)

If your deployment of Glean is cloud-prem (1), you can leverage the WAF functionality of the project to only permit HTTP GET requests to the API endpoints that Glean needs for operation.

  1. cloud-prem refers to a deployment of Glean that is hosted within your company's own GCP or AWS environment.

More information: M365 API Endpoints.

Sites.Selected

The Sites.Selected permission can be used in place of several of the above x.Read.All permissions when crawling SharePoint content; however, this has significant trade-offs, including:

  • Severely degrading the search result quality for SharePoint.
  • Limiting Glean's ability to synchronize content updates (including permissions) to only once every 24 hours.

Glean does not recommend this approach.

More information: Sites.Selected - Permission Alternatives.

FAQ

Can the webhooks be manually configured?

No. Glean generates (and regularly rotates) a secret for each customer environment to verify webhook responses. Files.ReadWrite.All is required to ensure that this secret can be rotated (and the webhooks refreshed before expiry).

It is not feasible to perform this manually at the frequency required.


SharePoint REST API Permissions

Overview

Glean requires tenant-wide FullControl permissions to be able to fetch specific site metadata (including permissions) for each of your SharePoint sites.

There is a limitation with the SharePoint REST API in that permission and role assignment data for site collections is not returned when read-only permissions are used. As a result, Glean must leverage the FullControl permission to obtain this information. The SharePoint crawler only reads data from the SharePoint REST API: at no point is data ever written using the API.

For these operations, Glean currently leverages the SharePoint REST API v1. The SharePoint REST API v2 cannot currently be used, as it does not return key site metadata required by Glean. When Microsoft brings the v2 API to parity with v1, Glean will shift to the v2 API and deprecate the use of v1.

The following permissions are required by default by Glean for the SharePoint REST API:

<AppPermissionRequests AllowAppOnlyPolicy="true">
    <AppPermissionRequest Scope="http://sharepoint/content/tenant" Right="FullControl" />
    <AppPermissionRequest Scope="http://sharepoint/content/sitecollection" Right="FullControl" />
    <AppPermissionRequest Scope="http://sharepoint/content/sitecollection/web" Right="FullControl" />
</AppPermissionRequests>

Permissions Explained

There are four (4) SharePoint REST API endpoints that Glean leverages:

Utilization of these endpoints is controlled via the following three (3) permission scopes:

Permission Scope
Purpose Typical Objection
Scope="http://sharepoint/content/tenant" Right="FullControl" Glean uses this scope to navigate through SharePoint tenant sites, identifying which sites to crawl and gathering essential metadata about each. Concerns with this scope stem from the FullControl permission and its extensive control over the SharePoint tenant. Glean employs this permission solely for data reading purposes, not for writing or modifying data.

To mitigate concerns, it is advised to implement rigorous audit log monitoring and set up alerts for any unauthorized use. The necessity for FullControl instead of a more restrictive read-only permission is due to the API endpoints required by Glean responding with a HTTP 403 Forbidden error if queried with any permission other than FullControl. This is detailed in this linked StackOverflow discussion.

While it's possible to limit this permission to site-level access and provide Glean with specific site URLs for crawling, this method is not recommended due to the increased management workload it introduces whenever new sites are added.

If Glean is deployed in your organization's GCP or AWS environment, you can enhance security by implementing Web Application Firewall (WAF) rules. These rules can restrict the HTTP request methods and URLs the Glean SharePoint crawler can access, ensuring it only interacts with approved content.
Scope="http://sharepoint/content/sitecollection" Right="FullControl" Glean uses this scope to access site permissions, role assignments, and list items across each site and subsite within a site collection. The main concern with the FullControl permission is its extensive authority over site collections. However, Glean utilizes this permission exclusively for reading data, not for making any modifications.

The necessity for FullControl instead of a more restrictive read-only permission is due to the API endpoints required by Glean responding with a HTTP 403 Forbidden error if queried with any permission other than FullControl. This is detailed in this linked StackOverflow discussion.

While it's possible to limit this permission to site-level access and provide Glean with specific site URLs for crawling, this method is not recommended due to the increased management workload it introduces whenever new sites are added.

If Glean is deployed in your organization's GCP or AWS environment, you can enhance security by implementing Web Application Firewall (WAF) rules. These rules can restrict the HTTP request methods and URLs the Glean SharePoint crawler can access, ensuring it only interacts with approved content.
Scope="http://sharepoint/content/sitecollection/web" Right="FullControl" Glean utilizes this scope to access web components on site pages, such as content blocks in text boxes on classic sites, page titles, and more. Concerns about this scope stem from the FullControl permission and its potential to modify page content. However, Glean strictly uses this permission for reading data, not for writing or altering content.

The necessity for FullControl instead of a more restrictive read-only permission is due to the API endpoints required by Glean responding with a HTTP 403 Forbidden error if queried with any permission other than FullControl. This is detailed in this linked StackOverflow discussion.

While it's possible to limit this permission to site-level access and provide Glean with specific site URLs for crawling, this method is not recommended due to the increased management workload it introduces whenever new sites are added.

If Glean is deployed in your organization's GCP or AWS environment, you can enhance security by implementing Web Application Firewall (WAF) rules. These rules can restrict the HTTP request methods and URLs the Glean SharePoint crawler can access, ensuring it only interacts with approved content.

Constraining Data Access

Crawling Restrictions

For SharePoint, Glean can constrain the crawler to only target specific Sites URLs. You must provide Glean with a list of all Site URLs that the crawler needs to be restricted to.

More information: M365 Crawling Restrictions.

Site-specific FullControl

When utilized in conjunction with Crawling Restrictions (see above), the requirement for the FullControl over the SharePoint tenant can be removed. Instead, FullControl can be provided at an individual site level to mitigate concerns regarding the use of the permission's scope.

The drawback of this method is that it has a high degree of operational overhead. For each site that you would like Glean to crawl, you must:

  • Apply the required permissions at the individual site level for each Glean-SharePoint Crawler App.
  • Notify Glean of the Site URL.

This must also be completed for any new sites that require crawling in the future.

WAF Restrictions (Cloud-prem only)

If your deployment of Glean is cloud-prem (1), you can leverage the WAF functionality of the project to only permit HTTP GET requests to the API endpoints that Glean needs for operation.

  1. cloud-prem refers to a deployment of Glean that is hosted within your company's own GCP or AWS environment.

More information: M365 API Endpoints.

FAQ

What alternatives are there to FullControl?

Unfortunately, there are no alternatives to FullControl at this time. FullControl permissions are required to fetch role assignments and access permissions for the site pages and associated web components of each site. The Graph API only exposes access permissions for Document Library items, hence it cannot be used to obtain the information needed by Glean.

The SharePoint REST API endpoint responsible for returning this data returns a HTTP 403 Forbidden response when the API is queried with any other permission other than FullControl (i.e. read-only permission). Specifically:

{
    "odata.error": {
       "code":"-2147024891, System.UnauthorizedAccessException",
       "message": {
           "lang":"en-US",
           "value":"Access denied. You do not have permission to perform this action or access this resource."
       }
    }
}

If the SharePoint REST API v1 endpoints used by Glean are fixed to correctly respond when read permissions are used, OR when Microsoft brings the Graph API inline with the data that can be pulled from the SharePoint REST API v1, Glean will deprecate the use of the FullControl permission.