Skip to content

Knowledge Graph

The Glean Knowledge Graph is a powerful tool that forms the backbone of Glean's enterprise search platform. It is designed to provide users with the most personalized and relevant results for their queries.

About the Glean Knowledge Graph

Glean's unique search engine operates based on a real-time model of your enterprise's indexed information – the enterprise knowledge graph.

This system involves three key pillars: Content, People, Activity.

flowchart TB
    title["Knowledge Graph"]

    subgraph Activity
        creation["Creation date"]:::yellow
        history["Editing history"]:::yellow
        Comments:::yellow
        Searches:::yellow
        Clicks:::yellow
        Shares:::yellow
        Views:::yellow
    end
    subgraph People
        Identities:::salmon
        Roles:::salmon
        Teams:::salmon
        Departments:::salmon
        Groups:::salmon
        manager["Reporting line"]:::salmon
        Tenure:::salmon
    end
    subgraph Content
        Documents:::teal
        Messages:::teal
        Tickets:::teal
        Emails:::teal
        Assets:::teal
        Images:::teal
        Entities:::teal
    end

    title --> Content
    title --> People
    title --> Activity

    classDef teal stroke:#A2BA25,fill:#D0E26F,color:#333;
    classDef salmon stroke:#FF935F,fill:#FFBE9F,color:#333;
    classDef yellow stroke:#FAC748,fill:#FEE58F,color:#333;

Content

Filling the knowledge graph with content begins with our easy-to-use 100+ connectors. Each connector is tailor-made for each application’s unique data model and API endpoints, requires no additional professional services to use, and is fully permissions-aware – ensuring all accessibility and sharing protocols are strictly followed for each source.

Through these connectors, Glean's content crawler searches over every part of a piece of content, not just the title. This involves relevant item content (titles, body copy, comments, media, etc.), as well as metadata (file creator, time of creation, update history, file type, folder structure, etc.)

flowchart LR

    Glean((Glean)):::glean
    Communication
    Docs["Documents & Wikis"]
    Storage["Cloud Storage"]
    CRMs
    Ticketing["Tasks & Ticketing"]
    Code["Code & Dev tools"]

    Glean --- Communication & Docs & Storage & CRMs & Ticketing & Code


    classDef glean stroke:#9297F4,fill:#343CED,color:#fff;

All of these elements, stored within the knowledge graph’s index, become readily searchable. Customizable weights can also be set for specific categories to influence search results. Facet search results can be tuned based on metadata fields specific to each app.

Permissions can also be individually set for specific items. Files can be selectively included or excluded from Glean's crawl system by specifying asset IDs or by a broader categorization, such as item containers (folders or drives).

The content crawl strategy is also completely tunable according to your organization's preference. Adjust crawl cadence, assign blackout hours to avoid peak work hours, and shift between different crawl methodologies to ensure your knowledge graph contains the most relevant and up-to-date information throughout the day.

People

One out of ten enterprise searches are about people. It makes good sense – workers want to know exactly who they're working with, what their role is, and what they have most recently worked on. Glean facilitates this by aggregating data across multiple tools, providing a comprehensive and information-rich view of anyone in the company.

Our engine is also capable of providing deeply personalized and permissions-aware results, especially as it better understands each individual’s role within an organization.

Glean builds each enterprise's knowledge graph with a deep understanding of the people within, such as what their role is which team they're on, their tenure, and location. Our system is then capable of constructing a unified identity for each person across all apps, along with a holistic organization structure that understands everything from each person's closest collaborators to what projects they have most recently worked on.

The underlying data model and individual sources of information used to build these profiles can be customized according to preference.

Activity

Glean collects activity data from several apps (Teams, Slack, email, plugins, Chrome extension, etc.) to index critical signals required for better search personalization and relevance. We only collect activity on sources connected to the product. As mentioned, none of this activity information ever leaves your exclusive GCP project and follows strict data protection rules to ensure privacy.

The activity information is used in two ways:

  • Learning what information matters most to better personalize results for users – individual user data does not leak over to any other user.

  • Improving personalization for a collection of users – privacy thresholds ensure data is only collected when we see a common data point across multiple users.