Elastic Enterprise Search | Elastic Workplace Search
For new users, we recommend using our Elasticsearch native tools, rather than the standalone Workplace Search product. See this blog post for more information about upgrading your internal knowledge search, to make it an amazing experience for your users!
We recommend using the new Elastic SharePoint Server connector to ingest your content from SharePoint Server in regular Elasticsearch indices.
Use this Elastic Enterprise Search SharePoint Server connector package to deploy and run a SharePoint Server content source on your own infrastructure. The connector package extracts and syncs data from a Microsoft 365 SharePoint Server service or tenant. The data is indexed into a Workplace Search content source within an Elastic deployment.
ℹ️ This connector package requires a compatible Elastic subscription level. Refer to the Elastic subscriptions pages for Elastic Cloud and self-managed deployments.
Table of contents:
- Setup and basic usage
- Gather SharePoint Server details
- Gather Elastic details
- Create a Workplace Search API key
- Create a Workplace Search content source
- Choose connector infrastructure and satisfy dependencies
- Install the connector
- Configure the connector
- Test the connection
- Sync data
- Log errors and exceptions
- Schedule recurring syncs
- Troubleshooting
- Advanced usage
- Connector reference
Complete the following steps to deploy and run the connector:
- Gather SharePoint Server details
- Gather Elastic details
- Create a Workplace Search API key
- Create a Workplace Search content source
- Choose connector infrastructure and satisfy dependencies
- Install the connector
- Configure the connector
- Test the connection
- Sync data
- Log errors and exceptions
- Schedule recurring syncs
The steps above are relevant to all users. Some users may require additional features. These are covered in the following sections:
Before deploying the connector, you’ll need to gather relevant details about your SharePoint Server. If you plan to connect to multiple servers or tenants, choose one to use for the initial setup.
First, ensure your SharePoint Server is compatible with the SharePoint Server connector package.
Then, collect the information that is required to connect to SharePoint Server:
- The address of the SharePoint farm.
- The domain name of the SharePoint Server for NTLM authentication.
- The username the connector will use to log in to SharePoint Server.
- The password the connector will use to log in to SharePoint Server.
ℹ️ The username and password must be the admin account for the SharePoint server.
Later, you will configure the connector with these values.
Some connector features require additional details. Review the following documentation if you plan to use these features:
First, ensure your Elastic deployment is compatible with the SharePoint Server connector package.
Next, determine the Enterprise Search base URL for your Elastic deployment.
Later, you will configure the connector with this value.
You also need a Workplace Search API key and a Workplace Search content source ID. You will create those in the following sections.
If you plan to use document-level permissions, you will also need user identity information. See Use document-level permissions (DLP) for details.
Each SharePoint Server connector authorizes its connection to Elastic using a Workplace Search API key.
Create an API key within Kibana. See Workplace Search API keys.
Each SharePoint Server connector syncs data from SharePoint Server into a Workplace Search content source.
Create a content source within Kibana:
- Navigate to Enterprise Search → Workplace Search → Sources → Add Source → SharePoint Server.
- Choose Configure SharePoint Server.
Record the ID of the new content source. This value is labeled Source Identifier within Kibana. Later, you will configure the connector with this value.
Alternatively, if you have already deployed a SharePoint Server connector, you can use the connector’s bootstrap
command to create the content source. See bootstrap
command.
After you’ve prepared the two services, you are ready to connect them.
Provision a Windows, MacOS, or Linux server for your SharePoint Server connectors.
The infrastructure must provide the necessary runtime dependencies. See Runtime dependencies.
Clone or copy the contents of this repository to your infrastructure.
After you’ve provisioned infrastructure and copied the package, use the provided make
target to install the connector:
make install_package
This command runs as the current user and installs the connector and its dependencies. Note: By Default, the package installed supports Enterprise Search version 8.0 or above. In order to use the connector for older versions of Enterprise Search(less than version 8.0) use the ES_VERSION_V8 argument while running make install_package or make install_locally command:
make install_package ES_VERSION_V8=no
ℹ️ Within a Windows environment, first install make
:
winget install -e --id GnuWin32.Make
Next, ensure the ees_sharepoint
executable is on your PATH
. For example, on macOS:
export PATH=/Users/shaybanon/Library/Python/3.8/bin:$PATH
The following table provides the installation location for each operating system:
Operating system | Installation location |
---|---|
Linux | ./local/bin |
macOS | /Users/<user_name>/Library/Python/3.8/bin |
Windows | \Users\<user_name>\AppData\Roaming\Python\Python38\Scripts |
You must configure the connector to provide the information necessary to communicate with each service. You can provide additional configuration to customize the connector for your needs.
Create a YAML configuration file at any pathname. Later, you will include the -c
option when running commands to specify the pathname to this configuration file.
Alternatively, in Linux environments only, locate the default configuration file created during installation. The file is named sharepoint_server_connector.yml
and is located within the config
subdirectory where the package files were installed. See Install the connector for a listing of installation locations by operating system. When you use the default configuration file, you do not need to include the -c
option when running commands.
After you’ve located or created the configuration file, populate each of the configuration settings. Refer to the settings reference. You must provide a value for all required settings.
Use the additional settings to customize the connection and manage features such as document-level permissions. See:
After you’ve configured the connector, you can test the connection between Elastic and SharePoint Server. Use the following make
target to test the connection:
make test_connectivity
After you’ve confirmed the connection between the two services, you are ready to sync data from SharePoint to Elastic.
The following table lists the available sync operations, as well as the commands to perform the operations.
Operation | Command |
---|---|
Incremental sync | incremental-sync |
Full sync | full-sync |
Deletion sync | deletion-sync |
Permission sync | permission-sync |
Begin syncing with an incremental sync. This operation begins extracting and syncing content from SharePoint Server to Elastic. If desired, customize extraction and syncing for your use case.
Review the additional sync operations to learn about the different types of syncs. Additional configuration is required to use document-level permissions.
You can use the command line interface to run sync operations on demand, but you will likely want to schedule recurring syncs.
The various sync commands write logs to standard output and standard error.
To persist logs, redirect standard output and standard error to a file. For example:
ees_sharepoint -c ~/config.yml incremental-sync >>~/incremental-sync.log 2>&1
You can use these log files to implement your own monitoring and alerting solution.
Configure the log level using the log_level
setting.
Use a job scheduler, such as cron
, to run the various sync commands as recurring syncs.
The following is an example crontab file:
0 */2 * * * ees_sharepoint -c ~/config.yml incremental-sync >>~/incremental-sync.log 2>&1
0 0 */2 * * ees_sharepoint -c ~/config.yml full-sync >>~/full-sync.log 2>&1
0 * * * * ees_sharepoint -c ~/config.yml deletion-sync >>~/deletion-sync.log 2>&1
*/5 * * * * ees_sharepoint -c ~/config.yml permission-sync >>~/permission-sync.log 2>&1
This example redirects standard output and standard error to files, as explained here: Log errors and exceptions.
Use this example to create your own crontab file. Manually add the file to your crontab using crontab -e
. Or, if your system supports cron.d, copy or symlink the file into /etc/cron.d/
.
To troubleshoot an issue, first view your logged errors and exceptions.
Use the following sections to help troubleshoot further:
If you need assistance, use the Elastic community forums or Elastic support:
The following sections provide solutions for content extraction issues.
The connector uses the Tika module for parsing file contents from attachments. Tika-python uses Apache Tika REST server. To use this library, you need to have Java 7+ installed on your system as tika-python starts up the Tika REST server in the background.
At times, the TIKA server fails to start hence content extraction from attachments may fail. To avoid this, make sure Tika is running in the background.
Tika Server also detects contents from images by automatically calling Tesseract OCR. To allow Tika to also extract content from images, you need to make sure tesseract is on your path and then restart tika-server in the background (if it is already running). For example, on a Unix-like system, try:
ps aux | grep tika | grep server # find PID
kill -9 <PID>
To allow Tika to extract content from images, you need to manually install Tesseract OCR.
The following sections provide solutions for issues related to syncing.
Some of the SharePoint API endpoint responses have a delay of around 15 minutes. The response contains timestamps that are not in sync with the current UTC time. The problem is described in this issue.
Full sync is the only sync operation that guarantees syncing of all subsites. This limitation is due to a SharePoint issue. A SharePoint parent site is not always updated when its child subsite is created or modified.
The following sections cover additional features that are not covered by the basic usage described above.
After you’ve set up your first connection, you may want to further customize that connection or scale to multiple connections.
By default, each connection syncs all supported SharePoint data across all SharePoint site collections.
You can limit which SharePoint site collections are synced. Configure the setting sharepoint.site_collections
.
You can also customize which objects are synced, and which fields are included and excluded for each object. Configure the setting objects
.
Finally, you can set custom timestamps to control which objects are synced, based on their created or modified timestamps. Configure the following settings:
Complete the following steps to use document-level permissions:
- Enable document-level permissions
- Map user identities
- Sync document-level permissions data
Within your configuration, enable document-level permissions using the following setting: enable_document_permission
.
Copy to your server a CSV file that provides the mapping of user identities. The file must follow this format:
- First column: SharePoint Server AD username
- Second column: Elastic username
Then, configure the location of the CSV file using the following setting: sharepoint_workplace_user_mapping
.
Sync document-level permissions data from SharePoint to Elastic.
The following sync operations include permissions data:
Sync this information continually to ensure correct permissions. See Schedule recurring syncs.
The following reference sections provide technical details:
- Data extraction and syncing
- Sync operations
- Command line interface (CLI)
- Configuration settings
- Enterprise Search compatibility
- SharePoint Server compatibility
- Runtime dependencies
Each SharePoint Server connector extracts and syncs the following data from SharePoint Server:
- Site Collections
- Sites and Subsites
- Lists
- Items (List Items)
- Attachments
- Drives (Files & Folders)
The connector handles SharePoint pages comprised of various web parts, it extracts content from various document formats, and it performs optical character recognition (OCR) to extract content from images.
You can customize extraction and syncing per connector. See Customize extraction and syncing.
The following sections describe the various operations to sync data from SharePoint Server to Elastic.
Syncs to Enterprise Search all supported SharePoint data created or modified since the previous incremental sync.
When using document-level permissions (DLP), each incremental sync will also perform a permission sync.
Perform this operation with the incremental-sync
command.
Syncs to Enterprise Search all supported SharePoint data created or modified since the configured start_time
. Continues until the current time or the configured end_time
.
This is the only sync operation that guarantees syncing of all subsites. This limitation is due to a SharePoint issue. A SharePoint parent site is not always updated when its child subsite is created or modified.
Perform this operation with the full-sync
command.
Deletes from Enterprise Search all supported SharePoint data deleted since the previous deletion sync.
Perform this operation with the deletion-sync
command.
Syncs to Enterprise Search all SharePoint document permissions since the previous permission sync.
When using document-level permissions (DLP), use this operation to sync all updates to users and groups within SharePoint Server.
Perform this operation with the permission-sync
command.
Each SharePoint Server connector has the following command line interface (CLI):
ees_sharepoint [-c <pathname>] <command>
The pathname of the configuration file to use for the given command.
ees_sharepoint -c ~/config.yml full-sync
Creates a Workplace Search content source with the given name. Outputs its ID.
ees_sharepoint bootstrap --name 'Accounting documents' --user 'shay.banon'
See also Create a Workplace Search content source.
To use this command, you must configure the following settings:
And you must provide on the command line any of the following arguments that are required:
--name
(required): The name of the Workplace Search content source to create.--user
(optional): The username of the Elastic user who will own the content source. If provided, the connector will prompt for a password. If omitted, the connector will use the configured API key to create the content source.
Performs a incremental sync operation.
Performs a full sync operation.
Performs a deletion sync operation.
Performs a permission sync operation.
Configure any of the following settings for a connector:
The domain name of the SharePoint Server for NTLM authentication.
sharepoint.domain: example.com
The username of the admin account for the SharePoint Server. See Gather SharePoint Server details.
sharepoint.username: bill.gates
The password of the admin account for the SharePoint Server. See Gather SharePoint Server details.
sharepoint.password: 'L,Ct%ddUvNTE5zk;GsDk^2w)(;,!aJ|Ip!?Oi'
The address of the SharePoint farm. The port should represent the web application containing the site collections, not Central Administration.
sharepoint.host_url: https://example.com:14682/
Specifies which SharePoint site collections to sync to Enterprise Search.
sharepoint.site_collections:
- Sales
- Marketing
The Workplace Search API key. See Create a Workplace Search API key.
workplace_search.api_key: 'zvksftxrudcitxa7ris4328b'
The ID of the Workplace Search content source. See Create a Workplace Search content source.
workplace_search.source_id: '62461219647336183fc7652d'
The Enterprise Search base URL for your Elastic deployment.
enterprise_search.host_url: https://my-deployment.ent.europe-west1.gcp.elastic-cloud.com:9243
Note: While using Elastic Enterprise Search version 8.0.0 and above, port must be specified in enterprise_search.host_url
Whether the connector should sync document-level permissions (DLP) from SharePoint.
enable_document_permission: Yes
By default, it is set to Yes
i.e. by default the connector will try to sync document-level permissions.
Specifies which SharePoint objects to sync to Enterprise Search, and for each object, which fields to include and exclude. When the include/exclude fields are empty, all fields are synced.
objects:
sites:
include_fields:
exclude_fields:
lists:
include_fields:
exclude_fields:
list_items:
include_fields:
exclude_fields:
drive_items:
include_fields:
exclude_fields:
A UTC timestamp the connector uses to determine which objects to extract and sync from SharePoint. Determines the starting point for a full sync.
start_time: 2022-04-01T04:44:16Z
By default it is set to 180 days from the current execution time.
A UTC timestamp the connector uses to determine which objects to extract and sync from SharePoint. Determines the stopping point for a full sync.
end_time: 2022-04-01T04:44:16Z
By default, it is set to current execution time.
The level or severity that determines the threshold for logging a message. One of the following values:
DEBUG
INFO
(default)WARN
ERROR
log_level: INFO
By default, it is set to INFO
.
The number of retries to perform when there is a server error. The connector applies an exponential backoff algorithm to retries.
retry_count: 3
By default, it is set to 3
.
The number of threads the connector will run in parallel when fetching documents from the SharePoint server. By default, the connector uses 5 threads.
sharepoint_sync_thread_count: 5
The number of threads the connector will run in parallel when indexing documents to the Enterprise Search instance. By default, the connector uses 5 threads.
enterprise_search_sync_thread_count: 5
For a Linux distribution with at least 2 GB RAM and 4 vCPUs, you can increase thread counts— if the overall CPU and RAM are underutilized, i.e. below 60-70%.
The pathname of the CSV file containing the user identity mappings for document-level permissions (DLP).
sharepoint_workplace_user_mapping: 'C:/Users/banon/sharepoint_1/identity_mappings.csv'
The SharePoint Server connector package is compatible with Elastic deployments that meet the following criteria:
- Elastic Enterprise Search version greater than or equal to 7.13.0.
- An Elastic subscription that supports this feature. Refer to the Elastic subscriptions pages for Elastic Cloud and self-managed deployments.
The SharePoint Server connector package is compatible with the following versions of SharePoint Server:
- SharePoint Server 2013
- SharePoint Server 2016
- SharePoint Server 2019
Each SharePoint Server connector requires a runtime environment that satisfies the following dependencies:
- Windows, MacOS, or Linux server. The connector has been tested with CentOS 7, MacOS Monterey v12.0.1, and Windows 10.
- Python version 3.6 or later.
- To extract content from images: Java version 7 or later, and
tesseract
command installed and added toPATH
- To schedule recurring syncs: a job scheduler, such as
cron