Skip to content
This repository has been archived by the owner on Aug 1, 2022. It is now read-only.

Evaluate feasibility of a REST API #220

Closed
garbados opened this issue Mar 12, 2020 · 8 comments · Fixed by #293
Closed

Evaluate feasibility of a REST API #220

garbados opened this issue Mar 12, 2020 · 8 comments · Fixed by #293
Milestone

Comments

@garbados
Copy link
Contributor

Pursuant to #165, this issue explores the feasibility of a REST API by way of designing one. As an example:

Endpoints

Consider a set of HTTP endpoints scoped under the prefix /api/v1 such that the path to the endpoint identities is actually /api/v1/identities. Object types referenced in this outline, such as "identities", refer to entities that are already described within the proxy as Rust structs.

  • GET identities
    • list local identities
  • GET identity/{id}
    • retrieve info about a single identity
  • GET local_branches/{...path}
    • retrieve a list of branches for a locally-stored repository
  • GET org/{id}
    • retrieve information about an organization
  • GET org/{id}/projects
    • list this org's projects
  • GET org/{id}/users
    • list registered users associated with this org
  • GET orgs
    • list orgs, scoped using query parameters
  • GET project/{id}
    • retrieve the project as of its latest commit
  • GET project/{id}/blob/{...path}
    • retrieve a blob from the main branch (ex: master)'s latest commit
  • GET project/{id}/branches
    • list branches for the project
  • GET project/{id}/commit/{sha1}
    • retrieve the project as of the given commit
  • GET project/{id}/commit/{sha1}/blob/{...path}
    • retrieve a blob as of a given commit
  • GET project/{id}/commits
    • list this project's commits, scoped using query parameters
  • GET project/{id}/tags
    • list this project's tags, scoped using query parameters
  • GET project/{id}/tree/{revision}
    • retrieve the project as of the given revspec
  • GET project/{id}/tree/{revision}/blob/{...path}
    • retrieve a blob as of the given revspec
  • GET project/{id}/contributors
    • list users (or identities?) who have contributed to this repo
  • GET projects
    • list projects, scoped using query parameters
  • GET transaction/{id}
    • get info about a specific transaction, ex: did it succeed?
  • GET transactions
    • list transactions, scoped using query parameters
  • GET user/{id}
    • get info about a user (that is, an identity with a registered handle)
  • GET users
    • list users, scoped using query parameters
  • POST identity
    • create a new identity / keypair
  • POST transaction
    • submit a transaction (ex: register user, project, org, etc)

Entities

Endpoints deal with common, standardized entities, including:

  • Org
  • User
  • Identity
  • Project
  • Commit
  • Branch
  • Tag
  • Blob
  • Transaction

The nature of these entities is already contained within the proxy. For example, here is the current characterization of an Identity:

/// The users personal identifying metadata and keys.
pub struct Identity {
    /// The librad id.
    pub id: String,
    /// Unambiguous identifier pointing at this identity.
    pub shareable_entity_identifier: String,
    /// Bundle of user provided data.
    pub metadata: Metadata,
}

/// User maintained information for an identity, which can evolve over time.
pub struct Metadata {
    /// Similar to a nickname, the users chosen short identifier.
    pub handle: String,
    /// A longer name to display, e.g.: full name.
    pub display_name: Option<String>,
    /// Url of an image the user wants to present alongside this [`Identity`].
    pub avatar_url: Option<String>,
}

All of these object definitions can be preserved in the transition to a REST API.

Query Parameters

Some endpoints list objects, such as listing projects or users. These lists can be scoped using common parameters, listed here:

  • mode: A special index to use before other forms of scoping. As an example, consider GET projects?mode=popular which would list projects indexed as "popular".
  • start: A key at which to begin the list. Used for pagination. For example: GET projects?start={id} would list projects with IDs that sort after the given start ID.
  • end: A key at which to end the list. Used for pagination. For example: GET projects?end={id} would list projects with IDs that sort before the given end ID.
  • descending: A boolean indicating whether to sort entries by their ID in an ascending (false, default) or descending (true) way.
  • limit: Integer indicating how many entries to return. Defaults to some reasonable number like 20, with an upper limit of 40.

These options are inspired by those used in CouchDB views.

But this is rudimentary. Queries like finding projects registered by a given user or set of users would require constructing and maintaining indexes, which is beyond the scope of the API itself.

Indexing

GraphQL uses its own indexing engine, and moving away from it would mean we would need to choose or create our own. I highly recommend not creating an indexer and instead using a mature database with one of its own, such as PostgreSQL or CouchDB, because databases are very complicated and our time is limited.

If we used PostgreSQL, the REST API would essentially be an interface to highly controlled SQL queries, in order to prevent malformed or unintentional destructive database interactions. This is a relatively labor-intensive approach as our evolving data model would require frequent migrations, and any changes to the API required by the frontend would necessitate additional labor on the backend to write the appropriate query and return the appropriate information. This architecture -- of using a REST API as a mediator between client software and raw SQL -- is very common and best practices for it are well-established.

If we used CouchDB, which exposes its own REST API, the client could interact with CouchDB directly while the proxy simply monitors librad and the registry for relevant changes, which it pushes to the database which subsequently indexes them. This saves us the trouble of writing our own REST API and allows us to focus that labor on writing "views" AKA indexes, which the frontend can perform queries against. Queries are non-destructive so there is no danger of a malformed query destroying information. Furthermore, CouchDB implements advanced REST API features including etags which would otherwise be a considerable undertaking to develop ourselves.

NOTE: I mentioned an architecture using CouchDB during a meeting on March 11 that was very off-the-cuff and did not reflect a thoughtful architecture. It has no bearing on my suggested use of CouchDB here.

@garbados
Copy link
Contributor Author

Further remarks on building with CouchDB

Consider this architecture:

  • The Proxy: A Rust service that monitors librad and the registry for relevant changes, such as new commits to owned repositories and newly registered projects. Requests to CouchDB pass through the Proxy so that it can retrieve information from the network in response to queries about data it hasn't already downloaded yet, such that -- for example -- a query about a heretofore unknown project causes the Proxy to query the registry about this project and, if it is found, to add it to the CouchDB node. Additionally the Proxy would manage any necessary cleanup of data, such as by deleting information about projects that have been deleted, and the processing of user-initiated transactions such as the registration of projects.
  • The CouchDB: A single CouchDB node which stores entities such as users, projects, commits, etc., as documents in a database. The node maintains many indexes over this data to support a broad array of queries.
  • The App: A JavaScript application that queries the CouchDB node directly for information over HTTP. CouchDB has advanced tooling for JavaScript, see PouchDB and Nano.

In this architecture, the user is always working with locally stored data. Should they lose connectivity, service is not degraded -- the CouchDB node simply stops receiving updates until connectivity is re-established, at which point updates resume. Likewise pending transactions can be saved in this way, as documents within the database which the Proxy only processes once connectivity is restored.

With Relay in mind, which proactively updates components as they change in the datastore, it is possible to do this with CouchDB's _changes endpoint and filters.

This is a preliminary description of an architecture for discussion. I am happy to expand the description in response to inquiry, and to walk through appropriate tooling and indexing practices.

@cloudhead
Copy link
Contributor

Great work, thanks! I think the idea of using CouchDB is really interesting, because indeed it saves us from having to maintain an HTTP API, as well as having to worry about persistence.

In general, it will make sense to put things in some kind of database to handle persistence and querying, but also to get synchronization functionality. Eg. in the future it would be nice to be able to sync a mobile client with your desktop without having to process the whole thing from scratch.

The advantage of couch over something sql-based is as you said, that we get the API for free - which is huge. What I would worry about is:

  • Operations: is this going to complicate running the app?
  • Overhead: how much extra memory/CPU does couch use compared to eg. something very lightweight like SQLite?
  • Runtime: given that it's written in erlang, is it more complicated to install for the user?
  • Mobile: how well does it work on mobile, if we want to run the proxy on for eg. iOS?
  • Limitiations (eg. disk space): since couchdb is 100% index based, the more queries we want, the more indices we need. These take up a bunch of disk space, compared to being able to do ad-hoc queries in a SQL database. Or am I wrong?

Overall though I still like the idea a lot.

@cloudhead
Copy link
Contributor

More generally about REST vs. GraphQL: REST is a lot easier to script, eg. with curl + jq or whatever, which I think is a bonus. If we only had GraphQL, developers may want to add a REST API on top of it anyway in the long term.

@xla
Copy link
Contributor

xla commented Mar 13, 2020

Thanks for the thorough and insightful write-up @garbados - this is a good start for further explorations.

I'd like to keep the conversation focused on the actual Rest API part as a feasible alternative to the current GraphQL implementation. While there is a lot of interesting potential it also comes with a significant amount of operational and mental overhead. Also at this stage we don't need a indexing solution as we currently don't maintain one either. GraphQL itself doesn't maintain any indexes it purely wires DSL to resolvers which provide the data as per implementation (memory, postgres, network requests, etc.). For now we should avoid another materialisation layer which can introduce subtle issues with data consistency and availability. We should look into the needs and concerns brought up here wrt mobile clients in isolation as follow-up(s).

What would be good as actionable outcome is to define a package which shows an implementation path that we can look at side-by-side. It should full-fill the following requirements:

  • N endpoints for a single entity
  • composition of these endpoints in a warp server (or alternative if deemed more feasible to construct REST APIs in Rust)
  • a similar separation of concerns we currently have with our domain objects/logic and the wrapping of it in the graphql module
  • integration tests for the endpoints to assert correct responses symmetrical to the tests/graph_ ones

Outlining this package and following up with an implementation should be possible in a day.

@cloudhead
Copy link
Contributor

Though I agree that we should start by looking at REST more in the abstract (so let's start with that), I also don't really see how the proxy could work without a backing database. It would require querying the registry for all projects and users stored in radicle-link on startup..

@xla
Copy link
Contributor

xla commented Mar 13, 2020

Though I agree that we should start by looking at REST more in the abstract (so let's start with that), I also don't really see how the proxy could work without a backing database. It would require querying the registry for all projects and users stored in radicle-link on startup..

radicle-link operates on local data already, unless we see this becoming a bottleneck there is not much we would gain from storing the same data in another place for now. While it's reasonable to assume that the UX will benefit from local caching/indexing it's an orthogonal concern to this issue and as I said will be captured in a follow-up. How and what we want/can cache is a non-trivial question which needs proper investigation.

@garbados
Copy link
Contributor Author

garbados commented Mar 14, 2020

Radicle REST API

Introduction

This REST API is intended for consumption by the radicle-upstream user application, and thus presumes a one-user, one-device environment. Authentication has not been considered in this design as it is presumed that the only software able to access the API will be the application itself.

In the future, we can add authentication capabilities to support modders that want to take advantage of the API to develop applications that interact with the Radicle network. For now that use case is not under consideration.

Entities and Endpoints

The server will respond with response bodies of JSON. Query parameters can pass options on GET and DEL requests, while information formatted into the content-type application/x-www-form-urlencoded is used to pass options on POST requests, such as to create and register organizations or projects.

The non-URL parameters of endpoints will be enumerated after a demo implementation.

Org

An organization, or Org, is an identity with associated users and associated projects. Users can create organizations and add other users to them, or create projects under the organization.

Endpoints:

  • GET /orgs
    • Retrieve a list of organizations.
  • POST /orgs
    • Create an organization. Returns an OrgId; does NOT register the project.
  • GET /org/{OrgId}
    • Retrieve information about an organization.
  • POST /org/{OrgId}/register
    • Register an organization.
  • POST /org/{OrgId}/unregister
    • Unregister an organization.
  • DEL /org/{OrgId}
    • Deletes an org, forgetting about it. Cannot delete a registered org without unregistering.

User

A user is an identity with associated projects and associated organizations. A user can create projects associated with themselves, or with any organizations that have granted them sufficient permissions.

Endpoints:

  • GET /users
    • Retrieve a list of users.
  • POST /users
    • Create a user. Returns an Identity, including a librad ID or UserId.
  • GET /user/{UserId}
    • Get information about a user.
  • POST /user/{UserId}/register
    • Register a user.
  • POST /user/{UserId}/unregister
    • Unregister a user.
  • DEL /user/{UserId}
    • Deletes the user, forgetting about it. Cannot delete a registered user without unregistering.

Project

A project is a code repository with associated collaboration artifacts.

Endpoints:

  • GET /projects
    • Retrieve a list of projects.
  • POST /projects
    • Create a project. Returns a Project, including a librad::project::ProjectId.
  • GET /project/{ProjectId}
    • Get information about a project.
  • POST /project/{ProjectId}/register
    • Register a project.
  • POST /project/{ProjectId}/unregister
    • Unregister a project.
  • DEL /project/{ProjectId}
    • Deletes the project, forgetting about it. Cannot delete a registered project without unregistering.

Commit

A commit is a specific revision of a code repository, and thus must be referred to with the ProjectId of its associated project.

  • GET /project/{ProjectId}/commits
    • List the commit history of the primary (ex: master) branch.
  • GET /project/{ProjectId}/commit/{hash}
    • Returns the code repository as of the commit indicated by hash, which is a cryptographic identitifer for the commit.
  • GET /project/{ProjectId}/commit/{hash}/blob/{...path}
    • Returns information about the file or directory at the given path under the given commit.

Branch

A branch is a revision tree within a code repository, and thus must be referred to with the ProjectId of its associated project.

  • GET /project/{ProjectId}/trees
    • List the revision trees (branches and tags) associated with a project.
  • GET /project/{ProjectId}/branches
    • List the branches associated with a project.
  • GET /project/{ProjectId}/tags
    • List the tags associated with a project.
  • GET /project/{ProjectId}/tree/{revspec}
    • Return the code repository as of the latest commit on the indicated revision tree. Specific commits can be reached through the /project/.../commit/... endpoints.

Considerations

This outline does not include code collaboration artifacts as those are not currently among the entities that the proxy supports. As I learn more about how we plan to represent these artifacts, I can incorporate them into the design.

@garbados
Copy link
Contributor Author

Ok, now about that demo implementation...

@xla xla added this to the Research milestone Mar 18, 2020
@xla xla linked a pull request Apr 20, 2020 that will close this issue
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants