Skip to content

Commit

Permalink
Merge pull request #3 from qadan/json-to-xml
Browse files Browse the repository at this point in the history
json to xml
  • Loading branch information
adam-vessey authored Apr 18, 2017
2 parents f177bb2 + a60e81d commit aad2e99
Show file tree
Hide file tree
Showing 6 changed files with 434 additions and 45 deletions.
171 changes: 127 additions & 44 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,155 @@
SUMMARY
-------
# DGI GSearch Extensions

discoverygarden's GSearch extensions
## Introduction

Currently provides a thin wrapper around the Joda date/time library, in
order to more reliably transform dates for Solr.
discoverygarden's GSearch extensions, providing extended functionality available to GSearch XSLTs that would otherwise be extremely difficult or impossible to recreate in XSLT 1.0.

REQUIREMENTS
------------
## Requirements

For build:
- Maven 2/3
* [Apache Maven](https://maven.apache.org) 2 or 3 to build.

INSTALLATION
------------
## Installation

Build the extensions with `mvn package`, and copy the created jar into GSearch's
lib directory (`$CATALINA_HOME/webapps/fedoragsearch/WEB-INF/lib`).
Build the extensions with `mvn package`, and copy the created jar into GSearch's lib directory (`$CATALINA_HOME/webapps/fedoragsearch/WEB-INF/lib`).

If you want to include Joda Time manually use gsearch_extensions-0.1.0.jar otherwise use
gsearch_extensions-0.1.0-jar-with-dependencies.jar
If providing package libraries yourself, use gsearch_extensions-0.1.2.jar; otherwise use gsearch_extensions-0.1.2-jar-with-dependencies.jar.

CONFIGURATION
-------------
## Usage

### Namespacing

USAGE
-------------
Extensions are available to the java namespace of your XSLT parser. In Xalan, this should be:

`xmlns:java="http://xml.apache.org/xalan/java"`

From there, extension functions can be called at that namespace.

### Functions Available

#### `ca.discoverygarden.gsearch_extensions.JodaAdapter`

##### `transformForSolr($date, $pid, $datastream)`

Attempts to parse dates to a Solr-appropriate format in the following default set and order, assuming UTC if no timezone is provided:

* `M/d/y`, e.g., `7/23/2013` would become `2013-07-23T00:00:00.000Z`
* `M/d/y H:m`, e.g., `7/23/2013 11:36` would become `2013-07-23T11:36:00.000Z`
* ISO Date Time, as per [Joda-Time ISODateTimeFormat](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateTimeParser%28%29). Timezone offsets will be transformed off, so `2013-07-23T02:36-03:00` would be transformed to `2013-07-23T11:36:00.000Z`.

For example:

In Xalan, it should be possible to call the code like:
```
<xsl:variable name="date_to_parse">08/13/2013</xsl:variable>
<xsl:variable xmlns:java="http://xml.apache.org/xalan/java"
name="date"
select="java:ca.discoverygarden.gsearch_extensions.JodaAdapter.transformForSolr($date_to_parse)"/>
```

For better logging, one could also call like:
```
<xsl:variable xmlns:java="http://xml.apache.org/xalan/java"
name="date"
select="java:ca.discoverygarden.gsearch_extensions.JodaAdapter.transformForSolr($date_to_parse, $pid, $datastream)"/>
```
where `$pid` and `$datastream` are the identifiers of the object and datastream, respectively.
More parser formats can be added with `addDateParser()`. The list of formats can be reset with `resetParsers()`.

Variable|Description
--------|-----------
`$date`|A string date formatted like one of the `JodaAdapter`'s parsers.
`$pid`|(Optional) The string PID of the object being processed, for potential logging purposes.
`$datastream`|(Optional; required if `$pid` is given) The string datastream ID of the datastream being processed, for potential logging purposes.

##### `addDateParser($position, $format)`

Adds a parsing format pattern to the list of patterns to attempt when running `transformForSolr()`, optionally at the provided position.

Variable|Description
--------|-----------
`$position`|(Optional) An integer position to place the parser format at.
`$format`|A string format to add to the parser list, e.g., `Y-m-d`.

##### `resetParsers()`

Resets the list of parsers `transformForSolr()` will attempt when converting a date.

#### `ca.discoverygarden.gsearch_extensions.XMLStringUtils`

The three base parsers assume that they are given dates in UTC if no timezone is provided, and are attempted in order:
- `M/d/y`
So `7/23/2013` should result in `2013-07-23T00:00:00.000Z`.
- `M/d/y H:m`
So `7/23/2013 11:36` should result in `2013-07-23T11:36:00.000Z`.
- ISO Date Time, as per http://joda-time.sourceforge.net/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateTimeParser%28%29
Timezone offsets will be transformed off, so `2013-07-23T02:36-03:00` will be transformed to `2013-07-23T11:36:00.000Z`.
##### `escapeForXML($input, $replacement)`

CUSTOMIZATION
-------------
Escapes a string for inclusion in XML, for example, in cases where the contents of a plaintext datastream are being provided to Solr.

The list of characters being replaced are based off of the Apache Commons lang3 library's [escapeXML10](https://commons.apache.org/proper/commons-lang/javadocs/api-3.5/org/apache/commons/lang3/StringEscapeUtils.html#escapeXml10-java.lang.String-) list of replaced characters.

Variable|Description
--------|-----------
`$input`|The string to sanitize.
`$replacement`|(Optional) The string to use when replacing invalid characters; otherwise, invalid characters will be replaced with Unicode U+FFFD (the Unicode replacement character).

#### `ca.discoverygarden.gsearch_extensions.FedoraUtils`

##### `getDatastreamDisseminationInputStream($pid, $dsId, $fedoraBase, $fedoraUser, $fedoraPass)`

Gets the dissemination of a datastream as an InputStream object.

Variable|Description
--------|-----------
`$pid`|The PID of the object to get a datastream from.
`$dsId`|The ID of the datastream to get the dissemination for.
`$fedoraBase`|The base URL of Fedora, including the protocol; e.g., `http://localhost:8080/fedora`.
`$fedoraUser`|The username to log into Fedora with.
`$fedoraPass`|The password for the given username.

##### `getRawDatastreamDissemination($pid, $dsId, $fedoraBase, $fedoraUser, $fedoraPass)`

Gets the dissemination of a datastream as a string. Useful in cases where GSearch refuses to return the text of a datastream, i.e., most cases.

Variable|Description
--------|-----------
`$pid`|The PID of the object to get a datastream from.
`$dsId`|The ID of the datastream to get the dissemination for.
`$fedoraBase`|The base URL of Fedora, including the protocol; e.g., `http://localhost:8080/fedora`.
`$fedoraUser`|The username to log into Fedora with.
`$fedoraPass`|The password for the given username.

#### `ca.discoverygarden.gsearch_extensions.JSONToXML`

##### `convertJSONToXML($input, $enclosing_tag)`

Converts a JSON string to an XML string.

Variable|Description
--------|-----------
`$input`|The input JSON string.
`$enclosing_tag`|(Optional) The top-level element to wrap resultant XML in, to prevent invalid XML from being written. If not provided, defaults to an element called 'json'.

##### `convertJSONToDocument($input, $enclosing_tag)`

Converts a JSON string to an XML Document object.

The resultant document can be interpreted as a Node-Set by Xalan, for example:

```
<xsl:variable name="some_json">{"something": "has content"}</xsl:variable>
<xsl:variable
xmlns:java="http://xml.apache.org/xalan/java"
name="some_xml"
select="java:ca.discoverygarden.gsearch_extensions.JSONToXML.convertJSONToDocument($some_json)"/>
<!-- This will evaluate to "has content". -->
<xsl:variable name="node" select="$some_xml//something/text()">
```

Variable|Description
--------|-----------
`$input`|The input JSON string.
`$enclosing_tag`|(Optional) The top-level element to wrap resultant XML in, to prevent invalid XML from being written. If not provided, defaults to an element called 'json'.

TROUBLESHOOTING
---------------
## Troubleshooting/Issues

Having problems or solved a problem? Contact [discoverygarden](http://support.discoverygarden.ca).

F.A.Q.
------
## Maintainers/Sponsors

Current maintainers:

CONTACT
-------
* [discoverygarden](http://www.discoverygarden.ca)

## Development

SPONSORS
--------
If you would like to contribute to this module, please check out our helpful
[Documentation for Developers](https://github.com/Islandora/islandora/wiki#wiki-documentation-for-developers)
info, [Developers](http://islandora.ca/developers) section on Islandora.ca and
contact [discoverygarden](http://support.discoverygarden.ca).
13 changes: 12 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

<groupId>ca.discoverygarden</groupId>
<artifactId>gsearch_extensions</artifactId>
<version>0.1.1</version>
<version>0.1.2</version>
<packaging>jar</packaging>


Expand Down Expand Up @@ -35,6 +35,17 @@
<artifactId>commons-lang3</artifactId>
<version>3.3.2</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>20160810</version>
</dependency>
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
<version>1.6.1</version>
<scope>provided</scope>
</dependency>
</dependencies>

<build>
Expand Down
137 changes: 137 additions & 0 deletions src/main/java/ca/discoverygarden/gsearch_extensions/FedoraUtils.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
package ca.discoverygarden.gsearch_extensions;

import org.apache.log4j.Logger;

import java.io.IOException;
import java.io.InputStream;
import java.io.ByteArrayInputStream;
import java.net.MalformedURLException;
import java.net.URL;
import java.net.Authenticator;
import java.net.PasswordAuthentication;
import java.util.Scanner;

/**
* Utilities for interfacing with Fedora in ways that GSearch can't.
*/
public class FedoraUtils {

protected static final Logger logger = Logger.getLogger(FedoraUtils.class);

/**
* Authenticator for connecting to Fedora.
*/
static class FedoraAuthenticator extends Authenticator {

protected String fedoraUser;
protected String fedoraPass;

/**
* Constructor; sets the username and password for this authenticator.
*
* @param username
* The username to set.
* @param password
* The password to set.
*/
FedoraAuthenticator(String username, String password) {
fedoraUser = username;
fedoraPass = password;
}

/**
* Overloaded password authenticator.
*
* @return PasswordAuthentication
* Authentication for this authenticator.
*/
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication(fedoraUser, fedoraPass.toCharArray());
}

}

/**
* Gets an InputStream for a datastream.
*
* @param pid
* The PID of the object that has the datastream.
* @param dsId
* The ID of the datastream to get.
* @param fedoraBase
* The base URL of fedora, including protocol; e.g.,
* http://localhost:8080/fedora.
* @param fedoraUser
* The Fedora username to connect with.
* @param fedoraPass
* The Fedora password to connect with.
*
* @return InputStream
* An InputStream at the constructed dissemination point.
*/
public static InputStream getDatastreamDisseminationInputStream(String pid, String dsId, String fedoraBase, String fedoraUser, String fedoraPass) {
// Set the authenticator.
FedoraAuthenticator auth = new FedoraAuthenticator(fedoraUser, fedoraPass);
Authenticator.setDefault(auth);
// Attempt to generate the URL from input.
try {
URL url = getDatastreamDisseminationURL(pid, dsId, fedoraBase);
return url.openStream();
}
// On exception, log and return a stream with no content so the caller
// doesn't get messed up.
catch (MalformedURLException e) {
logger.warn(String.format("Attempt to generate URL for datastream dissemination failed: %s", e.getMessage()));
}
catch (IOException e) {
logger.warn(String.format("Failed to open stream: %s", e.getMessage()));
}
return new ByteArrayInputStream("".getBytes());
}

/**
* Gets the raw text of a datastream.
*
* @param pid
* The PID of the object that has the datastream.
* @param dsId
* The ID of the datastream to get.
* @param fedoraBase
* The base URL of fedora, including protocol; e.g.,
* http://localhost:8080/fedora.
* @param fedoraUser
* The Fedora username to connect with.
* @param fedoraPass
* The Fedora password to connect with.
*
* @return String
* The text of the given datastream.
*/
public static String getRawDatastreamDissemination(String pid, String dsId, String fedoraBase, String fedoraUser, String fedoraPass) {
InputStream dsStream = getDatastreamDisseminationInputStream(pid, dsId, fedoraBase, fedoraUser, fedoraPass);
// Scan the InputStream, delimited by the start of string marker, so the
// whole string gets scanned.
Scanner scanner = new Scanner(dsStream).useDelimiter("\\A");
// If no content, return an empty string.
String dsString = scanner.hasNext() ? scanner.next() : "";
if (logger.isDebugEnabled()) {
logger.debug(String.format("getRawDatastreamDissemination (pid: %s, DSID: %s): %s", pid, dsId, dsString));
}
return dsString;
}

/**
* Turns some parameters into a datastream dissemination URL.
*
* @return URL
* A URL object using the given parameters.
*/
protected static final URL getDatastreamDisseminationURL(String pid, String dsId, String fedoraBase) throws MalformedURLException {
// Build the URL.
String url = String.format("%s/objects/%s/datastreams/%s/content", fedoraBase, pid, dsId);
if (logger.isDebugEnabled()) {
logger.debug(String.format("Building URL for %s", url));
}
return new URL(url);
}
}
Loading

0 comments on commit aad2e99

Please sign in to comment.