-
Notifications
You must be signed in to change notification settings - Fork 6
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #3 from qadan/json-to-xml
json to xml
- Loading branch information
Showing
6 changed files
with
434 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,72 +1,155 @@ | ||
SUMMARY | ||
------- | ||
# DGI GSearch Extensions | ||
|
||
discoverygarden's GSearch extensions | ||
## Introduction | ||
|
||
Currently provides a thin wrapper around the Joda date/time library, in | ||
order to more reliably transform dates for Solr. | ||
discoverygarden's GSearch extensions, providing extended functionality available to GSearch XSLTs that would otherwise be extremely difficult or impossible to recreate in XSLT 1.0. | ||
|
||
REQUIREMENTS | ||
------------ | ||
## Requirements | ||
|
||
For build: | ||
- Maven 2/3 | ||
* [Apache Maven](https://maven.apache.org) 2 or 3 to build. | ||
|
||
INSTALLATION | ||
------------ | ||
## Installation | ||
|
||
Build the extensions with `mvn package`, and copy the created jar into GSearch's | ||
lib directory (`$CATALINA_HOME/webapps/fedoragsearch/WEB-INF/lib`). | ||
Build the extensions with `mvn package`, and copy the created jar into GSearch's lib directory (`$CATALINA_HOME/webapps/fedoragsearch/WEB-INF/lib`). | ||
|
||
If you want to include Joda Time manually use gsearch_extensions-0.1.0.jar otherwise use | ||
gsearch_extensions-0.1.0-jar-with-dependencies.jar | ||
If providing package libraries yourself, use gsearch_extensions-0.1.2.jar; otherwise use gsearch_extensions-0.1.2-jar-with-dependencies.jar. | ||
|
||
CONFIGURATION | ||
------------- | ||
## Usage | ||
|
||
### Namespacing | ||
|
||
USAGE | ||
------------- | ||
Extensions are available to the java namespace of your XSLT parser. In Xalan, this should be: | ||
|
||
`xmlns:java="http://xml.apache.org/xalan/java"` | ||
|
||
From there, extension functions can be called at that namespace. | ||
|
||
### Functions Available | ||
|
||
#### `ca.discoverygarden.gsearch_extensions.JodaAdapter` | ||
|
||
##### `transformForSolr($date, $pid, $datastream)` | ||
|
||
Attempts to parse dates to a Solr-appropriate format in the following default set and order, assuming UTC if no timezone is provided: | ||
|
||
* `M/d/y`, e.g., `7/23/2013` would become `2013-07-23T00:00:00.000Z` | ||
* `M/d/y H:m`, e.g., `7/23/2013 11:36` would become `2013-07-23T11:36:00.000Z` | ||
* ISO Date Time, as per [Joda-Time ISODateTimeFormat](http://joda-time.sourceforge.net/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateTimeParser%28%29). Timezone offsets will be transformed off, so `2013-07-23T02:36-03:00` would be transformed to `2013-07-23T11:36:00.000Z`. | ||
|
||
For example: | ||
|
||
In Xalan, it should be possible to call the code like: | ||
``` | ||
<xsl:variable name="date_to_parse">08/13/2013</xsl:variable> | ||
<xsl:variable xmlns:java="http://xml.apache.org/xalan/java" | ||
name="date" | ||
select="java:ca.discoverygarden.gsearch_extensions.JodaAdapter.transformForSolr($date_to_parse)"/> | ||
``` | ||
|
||
For better logging, one could also call like: | ||
``` | ||
<xsl:variable xmlns:java="http://xml.apache.org/xalan/java" | ||
name="date" | ||
select="java:ca.discoverygarden.gsearch_extensions.JodaAdapter.transformForSolr($date_to_parse, $pid, $datastream)"/> | ||
``` | ||
where `$pid` and `$datastream` are the identifiers of the object and datastream, respectively. | ||
More parser formats can be added with `addDateParser()`. The list of formats can be reset with `resetParsers()`. | ||
|
||
Variable|Description | ||
--------|----------- | ||
`$date`|A string date formatted like one of the `JodaAdapter`'s parsers. | ||
`$pid`|(Optional) The string PID of the object being processed, for potential logging purposes. | ||
`$datastream`|(Optional; required if `$pid` is given) The string datastream ID of the datastream being processed, for potential logging purposes. | ||
|
||
##### `addDateParser($position, $format)` | ||
|
||
Adds a parsing format pattern to the list of patterns to attempt when running `transformForSolr()`, optionally at the provided position. | ||
|
||
Variable|Description | ||
--------|----------- | ||
`$position`|(Optional) An integer position to place the parser format at. | ||
`$format`|A string format to add to the parser list, e.g., `Y-m-d`. | ||
|
||
##### `resetParsers()` | ||
|
||
Resets the list of parsers `transformForSolr()` will attempt when converting a date. | ||
|
||
#### `ca.discoverygarden.gsearch_extensions.XMLStringUtils` | ||
|
||
The three base parsers assume that they are given dates in UTC if no timezone is provided, and are attempted in order: | ||
- `M/d/y` | ||
So `7/23/2013` should result in `2013-07-23T00:00:00.000Z`. | ||
- `M/d/y H:m` | ||
So `7/23/2013 11:36` should result in `2013-07-23T11:36:00.000Z`. | ||
- ISO Date Time, as per http://joda-time.sourceforge.net/apidocs/org/joda/time/format/ISODateTimeFormat.html#dateTimeParser%28%29 | ||
Timezone offsets will be transformed off, so `2013-07-23T02:36-03:00` will be transformed to `2013-07-23T11:36:00.000Z`. | ||
##### `escapeForXML($input, $replacement)` | ||
|
||
CUSTOMIZATION | ||
------------- | ||
Escapes a string for inclusion in XML, for example, in cases where the contents of a plaintext datastream are being provided to Solr. | ||
|
||
The list of characters being replaced are based off of the Apache Commons lang3 library's [escapeXML10](https://commons.apache.org/proper/commons-lang/javadocs/api-3.5/org/apache/commons/lang3/StringEscapeUtils.html#escapeXml10-java.lang.String-) list of replaced characters. | ||
|
||
Variable|Description | ||
--------|----------- | ||
`$input`|The string to sanitize. | ||
`$replacement`|(Optional) The string to use when replacing invalid characters; otherwise, invalid characters will be replaced with Unicode U+FFFD (the Unicode replacement character). | ||
|
||
#### `ca.discoverygarden.gsearch_extensions.FedoraUtils` | ||
|
||
##### `getDatastreamDisseminationInputStream($pid, $dsId, $fedoraBase, $fedoraUser, $fedoraPass)` | ||
|
||
Gets the dissemination of a datastream as an InputStream object. | ||
|
||
Variable|Description | ||
--------|----------- | ||
`$pid`|The PID of the object to get a datastream from. | ||
`$dsId`|The ID of the datastream to get the dissemination for. | ||
`$fedoraBase`|The base URL of Fedora, including the protocol; e.g., `http://localhost:8080/fedora`. | ||
`$fedoraUser`|The username to log into Fedora with. | ||
`$fedoraPass`|The password for the given username. | ||
|
||
##### `getRawDatastreamDissemination($pid, $dsId, $fedoraBase, $fedoraUser, $fedoraPass)` | ||
|
||
Gets the dissemination of a datastream as a string. Useful in cases where GSearch refuses to return the text of a datastream, i.e., most cases. | ||
|
||
Variable|Description | ||
--------|----------- | ||
`$pid`|The PID of the object to get a datastream from. | ||
`$dsId`|The ID of the datastream to get the dissemination for. | ||
`$fedoraBase`|The base URL of Fedora, including the protocol; e.g., `http://localhost:8080/fedora`. | ||
`$fedoraUser`|The username to log into Fedora with. | ||
`$fedoraPass`|The password for the given username. | ||
|
||
#### `ca.discoverygarden.gsearch_extensions.JSONToXML` | ||
|
||
##### `convertJSONToXML($input, $enclosing_tag)` | ||
|
||
Converts a JSON string to an XML string. | ||
|
||
Variable|Description | ||
--------|----------- | ||
`$input`|The input JSON string. | ||
`$enclosing_tag`|(Optional) The top-level element to wrap resultant XML in, to prevent invalid XML from being written. If not provided, defaults to an element called 'json'. | ||
|
||
##### `convertJSONToDocument($input, $enclosing_tag)` | ||
|
||
Converts a JSON string to an XML Document object. | ||
|
||
The resultant document can be interpreted as a Node-Set by Xalan, for example: | ||
|
||
``` | ||
<xsl:variable name="some_json">{"something": "has content"}</xsl:variable> | ||
<xsl:variable | ||
xmlns:java="http://xml.apache.org/xalan/java" | ||
name="some_xml" | ||
select="java:ca.discoverygarden.gsearch_extensions.JSONToXML.convertJSONToDocument($some_json)"/> | ||
<!-- This will evaluate to "has content". --> | ||
<xsl:variable name="node" select="$some_xml//something/text()"> | ||
``` | ||
|
||
Variable|Description | ||
--------|----------- | ||
`$input`|The input JSON string. | ||
`$enclosing_tag`|(Optional) The top-level element to wrap resultant XML in, to prevent invalid XML from being written. If not provided, defaults to an element called 'json'. | ||
|
||
TROUBLESHOOTING | ||
--------------- | ||
## Troubleshooting/Issues | ||
|
||
Having problems or solved a problem? Contact [discoverygarden](http://support.discoverygarden.ca). | ||
|
||
F.A.Q. | ||
------ | ||
## Maintainers/Sponsors | ||
|
||
Current maintainers: | ||
|
||
CONTACT | ||
------- | ||
* [discoverygarden](http://www.discoverygarden.ca) | ||
|
||
## Development | ||
|
||
SPONSORS | ||
-------- | ||
If you would like to contribute to this module, please check out our helpful | ||
[Documentation for Developers](https://github.com/Islandora/islandora/wiki#wiki-documentation-for-developers) | ||
info, [Developers](http://islandora.ca/developers) section on Islandora.ca and | ||
contact [discoverygarden](http://support.discoverygarden.ca). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
137 changes: 137 additions & 0 deletions
137
src/main/java/ca/discoverygarden/gsearch_extensions/FedoraUtils.java
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,137 @@ | ||
package ca.discoverygarden.gsearch_extensions; | ||
|
||
import org.apache.log4j.Logger; | ||
|
||
import java.io.IOException; | ||
import java.io.InputStream; | ||
import java.io.ByteArrayInputStream; | ||
import java.net.MalformedURLException; | ||
import java.net.URL; | ||
import java.net.Authenticator; | ||
import java.net.PasswordAuthentication; | ||
import java.util.Scanner; | ||
|
||
/** | ||
* Utilities for interfacing with Fedora in ways that GSearch can't. | ||
*/ | ||
public class FedoraUtils { | ||
|
||
protected static final Logger logger = Logger.getLogger(FedoraUtils.class); | ||
|
||
/** | ||
* Authenticator for connecting to Fedora. | ||
*/ | ||
static class FedoraAuthenticator extends Authenticator { | ||
|
||
protected String fedoraUser; | ||
protected String fedoraPass; | ||
|
||
/** | ||
* Constructor; sets the username and password for this authenticator. | ||
* | ||
* @param username | ||
* The username to set. | ||
* @param password | ||
* The password to set. | ||
*/ | ||
FedoraAuthenticator(String username, String password) { | ||
fedoraUser = username; | ||
fedoraPass = password; | ||
} | ||
|
||
/** | ||
* Overloaded password authenticator. | ||
* | ||
* @return PasswordAuthentication | ||
* Authentication for this authenticator. | ||
*/ | ||
protected PasswordAuthentication getPasswordAuthentication() { | ||
return new PasswordAuthentication(fedoraUser, fedoraPass.toCharArray()); | ||
} | ||
|
||
} | ||
|
||
/** | ||
* Gets an InputStream for a datastream. | ||
* | ||
* @param pid | ||
* The PID of the object that has the datastream. | ||
* @param dsId | ||
* The ID of the datastream to get. | ||
* @param fedoraBase | ||
* The base URL of fedora, including protocol; e.g., | ||
* http://localhost:8080/fedora. | ||
* @param fedoraUser | ||
* The Fedora username to connect with. | ||
* @param fedoraPass | ||
* The Fedora password to connect with. | ||
* | ||
* @return InputStream | ||
* An InputStream at the constructed dissemination point. | ||
*/ | ||
public static InputStream getDatastreamDisseminationInputStream(String pid, String dsId, String fedoraBase, String fedoraUser, String fedoraPass) { | ||
// Set the authenticator. | ||
FedoraAuthenticator auth = new FedoraAuthenticator(fedoraUser, fedoraPass); | ||
Authenticator.setDefault(auth); | ||
// Attempt to generate the URL from input. | ||
try { | ||
URL url = getDatastreamDisseminationURL(pid, dsId, fedoraBase); | ||
return url.openStream(); | ||
} | ||
// On exception, log and return a stream with no content so the caller | ||
// doesn't get messed up. | ||
catch (MalformedURLException e) { | ||
logger.warn(String.format("Attempt to generate URL for datastream dissemination failed: %s", e.getMessage())); | ||
} | ||
catch (IOException e) { | ||
logger.warn(String.format("Failed to open stream: %s", e.getMessage())); | ||
} | ||
return new ByteArrayInputStream("".getBytes()); | ||
} | ||
|
||
/** | ||
* Gets the raw text of a datastream. | ||
* | ||
* @param pid | ||
* The PID of the object that has the datastream. | ||
* @param dsId | ||
* The ID of the datastream to get. | ||
* @param fedoraBase | ||
* The base URL of fedora, including protocol; e.g., | ||
* http://localhost:8080/fedora. | ||
* @param fedoraUser | ||
* The Fedora username to connect with. | ||
* @param fedoraPass | ||
* The Fedora password to connect with. | ||
* | ||
* @return String | ||
* The text of the given datastream. | ||
*/ | ||
public static String getRawDatastreamDissemination(String pid, String dsId, String fedoraBase, String fedoraUser, String fedoraPass) { | ||
InputStream dsStream = getDatastreamDisseminationInputStream(pid, dsId, fedoraBase, fedoraUser, fedoraPass); | ||
// Scan the InputStream, delimited by the start of string marker, so the | ||
// whole string gets scanned. | ||
Scanner scanner = new Scanner(dsStream).useDelimiter("\\A"); | ||
// If no content, return an empty string. | ||
String dsString = scanner.hasNext() ? scanner.next() : ""; | ||
if (logger.isDebugEnabled()) { | ||
logger.debug(String.format("getRawDatastreamDissemination (pid: %s, DSID: %s): %s", pid, dsId, dsString)); | ||
} | ||
return dsString; | ||
} | ||
|
||
/** | ||
* Turns some parameters into a datastream dissemination URL. | ||
* | ||
* @return URL | ||
* A URL object using the given parameters. | ||
*/ | ||
protected static final URL getDatastreamDisseminationURL(String pid, String dsId, String fedoraBase) throws MalformedURLException { | ||
// Build the URL. | ||
String url = String.format("%s/objects/%s/datastreams/%s/content", fedoraBase, pid, dsId); | ||
if (logger.isDebugEnabled()) { | ||
logger.debug(String.format("Building URL for %s", url)); | ||
} | ||
return new URL(url); | ||
} | ||
} |
Oops, something went wrong.