-
Notifications
You must be signed in to change notification settings - Fork 1
/
index.Rmd
79 lines (51 loc) · 10.8 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
html_document: default
author: "Paul Oldham"
title: "Introduction"
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
This Handbook focuses on methods and techniques for monitoring the use of genetic resources and traditional knowledge under the [Nagoya Protocol](https://www.cbd.int/abs/about/) of the [Convention on Biological Diversity](https://www.cbd.int/). The aim of the Handbook is to assist governments with identifying practical tools and methods for implementing monitoring under the Nagoya Protocol as a precondition for building trust in international relationships involving exchanges of genetic resources and associated traditional knowledge.
The Handbook is in a very early stage of development. Drawing on the publication model in data science, draft articles will be published as they become available and will then be corrected, revised and expanded. Comments are welcome and can be made using Github issues at the project repository [here](https://github.com/poldham/abs). This is an open source project and all data is available in the [Github project repository](https://github.com/poldham/abs). Please see the [Contributing](https://github.com/poldham/abs/contributing.html) page for more information.
The Handbook explores software tools and methods for tracking and monitoring genetic resources and traditional knowledge. The Handbook is intended to support the development and implementation of an [online permit and monitoring system](http://abspermits.net/index.html) in Kenya and other partner countries. The Handbook is being developed with support from the [ABS Initiative](http://www.abs-initiative.info/) and contributions from partner countries interested in developing monitoring capacity.
Article 17 of the Nagoya Protocol calls upon countries to develop cost effective tools to enhance transparency about the utilisation of genetic resources and associated traditional knowledge.
In practice this will involve establishing links between access and benefit sharing permits under Article 6 of the Nagoya Protocol with monitoring under Article 17 in the following areas:
1. Publications in the scientific literature
2. Patent publications (as a reflection of R&D investments)
3. Digital sequence information (DNA and amino acid sequences)
4. Product information.
This Handbook explores and provides examples of the use of software and database tools that can be used by governments to monitor research and patent activity under the Nagoya Protocol. The main focus of the Handbook is on the growing availability of free software tools, notably web services or Application Programming Interfaces, that can be used to freely access scientific, patent and sequence information.
The Handbook focuses on the strengths and limitations of data sources for monitoring in the development of scientific and patent landscapes for access and benefit-sharing. Landscapes for scientific research and patent activity seek to identify:
a) Trends in activity over time
b) The biological organisms involved
c) The organisations involved
d) The researchers involved
e) Sectors, technology areas and markets involving research, and/or research and development, utilizing genetic resources and associated traditional knowledge
f) Evidence of revenue generated by products or licensing
In practice, governments seeking to engage in monitoring of access and benefit sharing agreements under the Nagoya Protocol are likely to utilize a range of commercial and free software and database tools. At present there is a limited understanding of the strengths and limitations of these tools for monitoring under the Nagoya Protocol.
This Handbook seeks to advance understanding of the strengths and limitations of these tools using data about Kenya as the main example. The Handbook forms part of a wider family of projects involving the Bahamas, Kenya and India to facilitate monitoring of access and benefit-sharing involving genetic resources and traditional knowledge. It is expected that the Handbook will be useful to other projects and countries as a starting point to support practical work on monitoring under the Nagoya Protocol.
At a later stage the Handbook will demonstrate the use of commercial tools, such as Web of Science and Thomson Innovation. However, as a starting point the Handbook will focus as far as possible on pursuing cost-effectiveness by focusing on the potential of free tools. The landscape for free software tools and databases has changed dramatically in recent years. Specifically:
- a growing number of countries are promoting open access policies for the results of scientific research,
- databases are increasingly providing web services (Application Programming Interfaces or APIs) that provide free access to the contents, or metadata about scientific and related subjects.
- A range of software tools and programming languages are focusing on access to data as a precondition for the rise of data science.
An important issue when considering the use of free software tools such as APIs is that they will generally require engagement with programming languages such as JavaScript, Ruby, Python or R (among others) and will typically be linked with database software. Governments seeking to implement monitoring for access and benefit sharing should anticipate from the start that this is a technical exercise that will require investment in technical expertise, either in house or through contracted services. That is, the main costs in implementing monitoring for access and benefit-sharing will be in human knowledge and technical capacity.
This report demonstrates the use of free tools using the statistical programming language R. R was selected for this task for a variety of reasons.
1. Scientific and patent landscaping involves the generation of statistical information that, over the longer term, can be linked with modelling and forecasting.
2. R is supported by over 10,000 packages for different tasks including mapping and an ever increasing number of packages for free and easy access to web services and data services.
3. R is supported by easy to use and powerful user interfaces, notably [RStudio](https://www.rstudio.com/), a huge ecosystem of users and extensive online training materials for different tasks.
4. R is widely used in biochemistry and genomics research through a specialised set of `bioconductor` packages.
5. The R user community is associated with a drive towards `reproducible research`, whereby sufficient information is provided that other researchers can reproduce the results. Reproducible methods are important in quality control for monitoring in access and benefit-sharing, for disseminating methods for use by the wider ABS community and for improving on methods over time
While this Handbook will use R, it is important to note that the choices made by governments on monitoring tools will in part depend on the available expertise. For example, Python is a very popular general programming language and most of the tasks that can be performed in R can readily be performed in Python. If there is a much greater expertise in the use of Python or another language it may make sense to base monitoring tools on that language with due regard to the promotion of transparent and reproducible methods.
As we will see below, it is possible to achieve a great deal using free tools. However, the report also exposes some significant limitations with existing tools for access and benefit-sharing. These can be briefly summarised as possible.
1. International scientific publications are distributed in a myriad of pay walled (pay per view silos). Free tools such as CrossRef and PubMed provide access to limited metadata about publications that can be used in ABS monitoring. In addition, commercial tools such as Thomson Reuters Web of Science provide access to a great range of meta data. However, at present access to data is a significant obstacle to monitoring as we will see in the case of CrossRef.
2. Access to patent data using free patent databases is typically limited to metadata (front pages) and the ability to search the full text (where genetic resources are typically referenced) are presently limited. This situation is changing through full text search facilities using the Lens free patent database and WIPO Patentscope. In addition, the USPTO provides for bulk downloads of its entire patent collection. As such the situation is improving but data access in the form required for effective monitoring is presently limited.
3. Issues around data access relate to the ability to engage in monitoring at scale. Taxonomic information about genetic resources recorded in international databases such as the [Global Biodiversity Information Facility](http://www.gbif.org/) are typically in the order of 10s of thousands of species. This presents challenges in terms of searching millions of documents and, where access to the data is possible, require advanced text mining methods. We are not there yet, but the situation is improving.
In light of these challenges in terms of data access and the ability to operate at scale, an approach to monitoring that starts with achievable goals and expands in scope as data access improves is likely to be the most appropriate.
The Handbook is presently divided into four broad sections.
Section 1 Biodiversity Informatics. This section focuses on accessing taxonomic data from the Global Biodiversity Information Facility in order to generate species lists for use in monitoring and the use of coordinate data for mapping.
Section 2 Scientific Literature. This section explores methods for obtaining the scientific literature and then combines data on researchers who have held permits with research with the scientific literature.
This section also explores the strengths and weaknesses of the use of ORCID researcher identifier system to retrieve scientific publications linked to researchers who have held permits and the role of identifiers in automating monitoring.
Section 3 Patents. This section will examine the issues involved in monitoring genetic resources and associated traditional knowledge in the patent system. The section will build on existing work and training provided as part of the [WIPO Manual on Open Source Patent Analytics](https://wipo-analytics.github.io/).
Section 4 Data Visualization. Data visualization is a key element in communication and engagement. This section will explore a range of data visualization techniques such as the use of interactive maps and network visualization.
The Handbook will focus on practical methods and tools that countries can use to meet their monitoring needs under the Nagoya Protocol. Much of the work in the Handbook will involve experimentation but is based in the principle of reproducible research. That is, a reader interested in adopting or adapting a particular approach or tool will be provided in the articles with the code and the data that will allow them to do that.