Skip to content
This repository has been archived by the owner on Sep 25, 2022. It is now read-only.

Initiating an Import

Peter Monks edited this page Nov 2, 2015 · 13 revisions

The Bulk Import Tool is implemented as a background job inside the Alfresco server application, and as such the native interface to the tool is a Java API (see Developers for more information on using that API directly from your own code). This native interface is then wrapped into a variety of higher level, optional "convenience" mechanisms, including the following repository-tier Web Scripts. These Web Scripts allow imports to be initiated and monitored either manually or via scripting (i.e. using curl, wget, the excellent httpie, and similar tools).

UI Web Script

The UI Web Script is an HTTP GET Web Script located at service path /bulk/import. If you have a default local install of Alfresco, this equates to: http://localhost:8080/alfresco/s/bulk/import

This Web Script presents a simple HTML form to the user, containing (at least) the following form fields:

  • Source - the type of source to use for the import. The Bulk Import Tool only ships with a single source (called "Default"), but your installation may have others developed by 3rd parties.
  • Target space - the target space in the repository to import the content into. This field has an autocomplete function - as you start typing it will offer suggestions of spaces in the repository with matching names.
  • Replace - flag indicating whether to replace existing content in the repository, should it conflict with content being read from the source content set.
  • Dry run - flag indicating that the import should be a "dry run". In this mode the tool logs everything it's doing to alfresco.log, but doesn't actually do any of it.

Depending on the source you're using, there may be additional Source Settings as well. The "Default" source (shipped with the Bulk Import Tool and selected by default) adds one extra field to the form:

  • Source directory - the directory on the server from which to read the source content. This directory must be readable by the Alfresco server process.

Note: This web script is not useful for scripted approaches (it's a convenience for humans only).

Initiate Web Script

The initiate Web Script is an HTTP POST Web Script located at service path /bulk/import/initiate that accepts the above form fields and initiates a bulk import with them. These form fields are:

  • sourceBeanId - (optional) the Spring bean id of the source to use. Default is "bit.fs.source" (the Spring bean id of the "Default" import source).
  • targetPath - (optional, though either this or targetNodeRef are required) the path of the target space, relative to Company Home. e.g. /Sites/mysite/documentLibrary/importFolder
  • targetNodeRef - (optional, though either this or targetPath are required) the NodeRef of the target space
  • replaceExisting - (optional) flag (boolean) indicating whether to replace existing files or not (default is "false")
  • dryRun - (optional) flag (boolean) indicating whether this is a dry run or not (default is "false")

Depending on the source you're using, there may be additional Source Settings as well. The "Default" source (shipped with the Bulk Import Tool and selected by default) adds one extra form field to the POST body:

  • sourceDirectory - (mandatory) the directory on the server from which to read the source content

Note: This web script is useful for scripted approaches - it's used to initiate an import.

Status Web Script

The status Web Script is an HTTP GET Web Script located at service path /bulk/import/status that returns status information about the current (or previous, if an import isn't active) bulk import.

If you have a default local install of Alfresco, this equates to: http://localhost:8080/alfresco/s/bulk/import/status

This Web Script supports two response formats:

  1. HTML - useful for human monitoring of imports (default)
  2. JSON - useful for scripted monitoring of imports

Formats are chosen in the usual way for Web Scripts i.e. by appending the desired extension to the URL. For example http://<alfrescohost>:<alfrescoport>/alfresco/s/bulk/import/status.json - if you have a default local install of Alfresco, this equates to: http://localhost:8080/alfresco/s/bulk/import/status.json

Note: This web script is useful for scripted approaches - it's used to monitor the status of an import.

Additional Notes about Status Information

The status Web Script does not need to be open for an import to proceed - it is an entirely optional, resumable window into the status of any (background) bulk imports that are in progress.

The Bulk Import Tool is capable of tracking a variable number of statistics, both for the repository (the so-called "target counters") and the source system that's providing content (the so-called "source counters"). It is important to remember when developing scripted solutions that the number and names of these counters can differ from version to version of the tool, and (most especially) between different types of source. As a result, any scripting that uses these statistics needs to be designed in such a way that the number and names of both the target and source counters are not hardcoded (or if they are, that appropriate checks are taken to ensure they exist in a given response).

A Note on Service Paths

The Web Script service paths shown above are not the full URLs used to access these Web Scripts. As with any Alfresco Web Script, you need to prefix them with the protocol (HTTP or HTTPS), host, port, Alfresco webapp context and service path before you'll have a URL you can actually use in a browser, curl or wget request, or whatever.

As an example, if you have the Bulk Import Tool installed into a default Alfresco instance running on your local machine, the following URL will take you to the "UI" Web Script:

http://localhost:8080/alfresco/s/bulk/import


Back to usage.