Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add client JSON storage API endpoint #1405

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions SETUP/API.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,24 @@ Three settings in `configuration.sh` control limiting:
allowed per given window.
* `_API_RATE_LIMIT_SECONDS_IN_WINDOW` - the number of seconds within a given
window.

## Client Storage

To facilitate javascript UI clients persisting data across browsers and devices,
the API includes an optional endpoint for clients to store and fetch JSON blobs.
To enable this feature, add a string for the client to the
`_API_CLIENT_STORAGE_KEYS` configuration setting and have the client use that
string with the endpoint as the `clientid`.
Comment on lines +74 to +76
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am having a hard time understanding the purpose of the API client storage keys. If the end goal is to allow any client to store this configuration down the road (which means that a user could trivially get one of our keys), is this some temporary measure to prevent arbitrary storage from storing JSON in our DB?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have the gist of it, but it's not temporary.

Keep in mind that every user has de facto API access via PHP sessions -- that's how the Page Browser users the API and how the new proofreading UI will too. If we don't have some restriction, any logged in user could fill up our server with random JSON blobs. The clientid means that each user can have only at most len(API_CLIENT_STORAGE_KEYS) JSON blobs and those have a maximum size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "client" is the wrong word for this? I am using it in the "JS client that uses the APIs". Would "app" or some other word be clearer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble visualizing it, too. Part of the discussion has centered around not only persisting settings across browsers and devices, but allowing different browsers on different devices to have settings specific to those devices (for example, if one device has a large monitor, the user may prefer the side-by-side version of the interface, but prefer top-to-bottom on a device with a smaller monitor).

My initial impression of "per client per user" was that multiple devices was covered because each user might be using different clients, thus, each user might have multiple entries, so I'm wondering if I'm just misunderstanding the terminology.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JS client (ie: new proofreading interface) will need to figure out which settings it should store in the local browser cache (and only available to that browser -- like window size) and which one it should store on the server (and available to any browser device -- like font family). All this endpoint does is handle saving and returning whatever the JS client sends it. We have to figure out the other parts but that's all done within the JS client.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clientid means that each user can have only at most len(API_CLIENT_STORAGE_KEYS) JSON blobs and those have a maximum size.

I am not seeing any validation on the number of stored blobs, we only check that the client id is allow-listed but besides that the constraints on the DB would allow unbounded storage. Those are trusted clients so it doesn't seem to be a concern, just curious about this.

Perhaps "client" is the wrong word for this? I am using it in the "JS client that uses the APIs". Would "app" or some other word be clearer?

Mhh, is the idea to separate our browser users from other "app" that programmatically call our API?

You have the gist of it, but it's not temporary.

OK, as a side question since we need a logged in users to store the information, would it make sense to use the user rather than an extra key? Or are we thinking that some "apps" will backfill/mutate a lot of users' keys? I am probably missing something rather obvious here as I don't have much context on the new UI.


Some important notes about this feature:
* Client storage is one blob per user per client. Said another way: API users are
only able to store one blob per `clientid` and that blob is only for the user
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouln't this be 'one blob per userid ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's both. It's one per clientid and by the user that is authenticated. I'll change it slightly to maybe make it clearer.

authenticated with the API.
* Beyond validating these are valid JSON objects, they are treated as opaque
blobs server-side. It is up to the client to manage the object, including
the schema and the possibility that the object will not match an expected
schema.
* The `clientid` is not a secret to the browser. Nothing prevents users with
API keys (or valid PHP session keys) from using this endpoint with a valid
client ID to change the contents of this blob for their user. Clients
should treat this blob as unvalidated user input and act accordingly.
1 change: 1 addition & 0 deletions SETUP/configuration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,7 @@ _API_ENABLED=true
_API_RATE_LIMIT=false
_API_RATE_LIMIT_REQUESTS_PER_WINDOW=3600
_API_RATE_LIMIT_SECONDS_IN_WINDOW=3600
_API_CLIENT_STORAGE_KEYS='[]'

# XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Expand Down
12 changes: 12 additions & 0 deletions SETUP/db_schema.sql
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,18 @@ CREATE TABLE `job_logs` (
KEY `timestamp` (`tracetime`, `succeeded`)
);

--
-- Table structure for table `json_storage`
--

CREATE TABLE `json_storage` (
`username` varchar(25) NOT NULL,
`setting` varchar(32) NOT NULL,
`value` json NOT NULL,
`timestamp` int NOT NULL DEFAULT '0',
PRIMARY KEY (`username`,`setting`)
);

--
-- Table structure for table `news_items`
--
Expand Down
1 change: 1 addition & 0 deletions SETUP/tests/ci_configuration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ _API_ENABLED=true
_API_RATE_LIMIT=false
_API_RATE_LIMIT_REQUESTS_PER_WINDOW=3600
_API_RATE_LIMIT_SECONDS_IN_WINDOW=3600
_API_CLIENT_STORAGE_KEYS='[]'

_EXTERNAL_CATALOG_LOCATOR='z3950.loc.gov:7090/Voyager'

Expand Down
59 changes: 52 additions & 7 deletions SETUP/tests/unittests/ApiTest.php
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ public function test_get_invalid_round_stats(): void
$this->expectExceptionCode(103);

$path = "v1/stats/site/rounds/P4";
$query_params = "";
$query_params = [];
$router = ApiRouter::get_router();
$_SERVER["REQUEST_METHOD"] = "GET";
$router->route($path, $query_params);
Expand All @@ -128,7 +128,7 @@ public function test_get_invalid_page_data(): void

$project = $this->_create_project();
$path = "v1/projects/$project->projectid/pages/999.png/pagerounds/P1";
$query_params = "";
$query_params = [];
$router = ApiRouter::get_router();
$_SERVER["REQUEST_METHOD"] = "GET";
$router->route($path, $query_params);
Expand All @@ -142,7 +142,7 @@ public function test_get_invalid_pageround_data(): void
$this->add_page($project, "001");
// P0 is not a valid round
$path = "v1/projects/$project->projectid/pages/001.png/pagerounds/P0";
$query_params = "";
$query_params = [];
$router = ApiRouter::get_router();
$_SERVER["REQUEST_METHOD"] = "GET";
$router->route($path, $query_params);
Expand All @@ -153,7 +153,7 @@ public function test_get_valid_pageround_data(): void
$project = $this->_create_project();
$this->add_page($project, "001");
$path = "v1/projects/$project->projectid/pages/001.png/pagerounds/OCR";
$query_params = "";
$query_params = [];
$router = ApiRouter::get_router();
$_SERVER["REQUEST_METHOD"] = "GET";
$result = $router->route($path, $query_params);
Expand All @@ -168,7 +168,7 @@ public function test_create_project_unauthorised(): void
$this->expectExceptionCode(3);

$path = "v1/projects";
$query_params = "";
$query_params = [];
$router = ApiRouter::get_router();
$_SERVER["REQUEST_METHOD"] = "POST";
$router->route($path, $query_params);
Expand All @@ -181,7 +181,7 @@ public function test_create_project_no_data(): void

$pguser = $this->TEST_USERNAME_PM;
$path = "v1/projects";
$query_params = "";
$query_params = [];
$router = ApiRouter::get_router();
$_SERVER["REQUEST_METHOD"] = "POST";
$router->route($path, $query_params);
Expand Down Expand Up @@ -787,6 +787,9 @@ public function test_pickersets(): void
$this->assertEquals(["¿", "INVERTED QUESTION MARK"], $pickerset["subsets"][3]["rows"][1][1]);
}

//---------------------------------------------------------------------------
// tests for documents

public function test_available_italian_documents(): void
{
$path = "v1/documents";
Expand Down Expand Up @@ -823,11 +826,53 @@ public function test_unavailable_document(): void
$_SERVER["REQUEST_METHOD"] = "GET";
$router->route($path, ['language_code' => 'de']);
}

//---------------------------------------------------------------------------
// tests for storage

public function test_client_storage_valid(): void
{
global $pguser;
global $api_client_storage_keys;
global $request_body;

$pguser = $this->TEST_USERNAME_PM;
array_push($api_client_storage_keys, "valid");

$path = "v1/storage/clients/valid";
$query_params = [];
$request_body = json_encode(["key" => 1]);
$router = ApiRouter::get_router();

$_SERVER["REQUEST_METHOD"] = "PUT";
$response = $router->route($path, $query_params);
$this->assertEquals(json_decode($request_body), json_decode($response));

$_SERVER["REQUEST_METHOD"] = "GET";
$response = $router->route($path, $query_params);
$this->assertEquals(json_decode($request_body), json_decode($response));

$_SERVER["REQUEST_METHOD"] = "DELETE";
$response = $router->route($path, $query_params);
$this->assertEquals(null, $response);
}

public function test_client_storage_invalid(): void
{
$this->expectExceptionCode(4);

$query_params = [];

$path = "v1/storage/clients/invalid";
$_SERVER["REQUEST_METHOD"] = "GET";
$router = ApiRouter::get_router();
$router->route($path, $query_params);
}
}

// this mocks the function in index.php
/** @return string|array */
function api_get_request_body()
function api_get_request_body(bool $raw = false)
{
global $request_body;
return $request_body;
Expand Down
50 changes: 50 additions & 0 deletions SETUP/tests/unittests/StorageTest.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
<?php

class StorageTest extends PHPUnit\Framework\TestCase
{
//------------------------------------------------------------------------
// Basic JSON storage test

public function test_valid_json(): void
{
$storage = new JsonStorage("username");
$storage->set("setting", "{}");
$value = $storage->get("setting");
$this->assertEquals("{}", $value);
$value = $storage->delete("setting");
$this->assertEquals(null, $value);
}

public function test_invalid_json(): void
{
$this->expectException(ValueError::class);
$storage = new JsonStorage("username");
$storage->set("setting", "blearg");
}

//------------------------------------------------------------------------
// API client storage test

public function test_valid_client(): void
{
global $api_client_storage_keys;
$api_client_storage_keys = ["valid"];

$storage = new ApiClientStorage("valid", "username");
$storage->set("{}");
$value = $storage->get();
$this->assertEquals("{}", $value);
$value = $storage->delete();
$this->assertEquals(null, $value);
}

public function test_invalid_client(): void
{
global $api_client_storage_keys;
$api_client_storage_keys = [];

$this->expectException(ValueError::class);

$storage = new ApiClientStorage("invalid", "username");
}
}
23 changes: 23 additions & 0 deletions SETUP/upgrade/22/20241219_create_json_storage.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
<?php
$relPath = '../../../pinc/';
include_once($relPath.'base.inc');

// ------------------------------------------------------------

echo "Creating json_storage table...\n";

$sql = "
CREATE TABLE json_storage (
username varchar(25) NOT NULL,
setting varchar(32) NOT NULL,
value json NOT NULL,
timestamp int NOT NULL default 0,
PRIMARY KEY (username, setting)
)
";

mysqli_query(DPDatabase::get_connection(), $sql) or die(mysqli_error(DPDatabase::get_connection()));

// ------------------------------------------------------------

echo "\nDone!\n";
43 changes: 37 additions & 6 deletions api/ApiRouter.inc
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,10 @@ class ApiRouter
{
private $_url_map = [];
private $_validators = [];
private $_response;
private bool $_raw_response = false;

public function add_route($method, $url, $function)
public function add_route(string $method, string $url, string $function): void
{
// Confirm the function is defined or raise an assert exception
assert(function_exists($function), "$function not defined");
Expand All @@ -33,7 +35,7 @@ class ApiRouter
$url_map["endpoint"][$method] = $function;
}

public function route($url, $query_params)
public function route(string $url, array $query_params)
{
$url_map = &$this->_url_map;
$data = [];
Expand All @@ -55,15 +57,16 @@ class ApiRouter
throw new MethodNotAllowed();
}
$function = $url_map["endpoint"][$method];
return $function($method, $data, $query_params);
$this->_response = $function($method, $data, $query_params);
return $this->_response;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this line now?

}

public function add_validator($label, $function)
public function add_validator(string $label, string $function): void
{
$this->_validators[$label] = $function;
}

private function get_validator($url_map)
private function get_validator(array $url_map): array
{
foreach (array_keys($url_map) as $route) {
if (startswith($route, ":")) {
Expand All @@ -73,7 +76,35 @@ class ApiRouter
throw new InvalidAPI();
}

public static function get_router()
public function request(bool $raw = false)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could stay in index.php

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this here to keep the encoding and decoding together. And because I had to move the encoding here this came along with it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. I've had some further thoughts about this and deleted my other ananswered comments which had overlooked a change you made in index.php. It's difficult to see the whole picture in this github view so I dowloaded the files to look at them more carefully.

{
if ($raw) {
return file_get_contents('php://input');
} else {
$json_object = json_decode(file_get_contents('php://input'), true);
if ($json_object === null) {
throw new InvalidValue("Content was not valid JSON");
}
return $json_object;
}
}

public function response(bool $raw = false): string
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wouldn't need this if it was incorporated as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I wanted to keep the encoding and decoding together and they got moved in here, breaking them out into clear functions makes it clearer what's going on IMHO.

{
if ($raw || $this->_raw_response) {
return $this->_response;
} else {
return json_encode($this->_response, JSON_PRETTY_PRINT |
JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);
}
}

public function set_raw_response(): void
{
$this->_raw_response = true;
}

public static function get_router(): ApiRouter
{
static $router = null;
if (!$router) {
Expand Down
52 changes: 52 additions & 0 deletions api/dp-openapi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1338,6 +1338,58 @@ paths:
404:
$ref: '#/components/responses/NotFound'

/storage/clients/{clientid}:
get:
tags:
- storage
description: Get JSON blob stored by the client
parameters:
- name: clientid
in: path
description: Client ID
required: true
schema:
type: string
responses:
200:
description: JSON blob
content:
application/json:
schema:
type: object
put:
tags:
- storage
description: Save JSON blob for the client
parameters:
- name: clientid
in: path
description: Client ID
required: true
schema:
type: string
responses:
200:
description: JSON blob that was persisted
content:
application/json:
schema:
type: object
Comment on lines +1373 to +1377
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it make sense to return the timestamp (something the client could use for caching) rather than the JSON blob that was sent by the client?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp is arguably just as useless as the JSON blob because it's always now(). I was uncertain what a good return value would be so I modeled what we do for PUTs for word lists -- and what the product at my job does for PUTs and PATCHes -- which is to return the object that was actually serialized.

Open to ideas / thoughts on what API clients should "expect" from a successful PUT / PATCH / etc beyond the HTTP status code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp is arguably just as useless as the JSON blob because it's always now().

Sure as it represents the last modified timestamp from the server's perspective.

However I do think there is a lot more value to returning this compared to the blob that was sent and is known to the client. The timestamp would allow the client to implement some caching (e.g. don't check with the server if the timestamp is fresh enough), lower the bandwidth needed (by return the JSON object on a get request if the timestamp doesn't match, useful for mobile devices) or potentially even allow for delta (this would need to switch to some event based logic). Arguably, neither of those scenarios applies here, but those are enabled by returning the timestamp 😄

Open to ideas / thoughts on what API clients should "expect" from a successful PUT / PATCH / etc beyond the HTTP status code.

I don't think we need to return anything from the API on success: the return code says if it was successful and we can return an optional error field to return the cause of errors to the client. Those 2 are what a minimal client would expect. We can always add more as we understand our needs more.

delete:
tags:
- storage
description: Delete JSON blob for the client
parameters:
- name: clientid
in: path
description: Client ID
required: true
schema:
type: string
responses:
200:
description: JSON blob was deleted

components:
securitySchemes:
ApiKeyAuth:
Expand Down
Loading