diff --git a/README.md b/README.md index 5af31e10..f2628421 100644 --- a/README.md +++ b/README.md @@ -31,17 +31,6 @@ Important features are PK-DB is available at https://pk-db.com -## License -[[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) -PK-DB code and documentation is licensed as -* Source Code: [LGPLv3](http://opensource.org/licenses/LGPL-3.0) -* Documentation: [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/) - -## Funding -[[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) -Jan Grzegorzewski and Matthias König are supported by the Federal Ministry of Education and Research (BMBF, Germany) -within the research network Systems Medicine of the Liver ([LiSyM](http://www.lisym.org/), grant number 031L0054). - ## How to cite [[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) If you use PK-DB data or the web interface cite @@ -50,8 +39,24 @@ If you use PK-DB data or the web interface cite > Jan Grzegorzewski, Janosch Brandhorst, Dimitra Eleftheriadou, Kathleen Green, Matthias König > bioRxiv 760884; doi: https://doi.org/10.1101/760884 +> Grzegorzewski J, Brandhorst J, Green K, Eleftheriadou D, Duport Y, Barthorscht F, Köller A, Ke DYJ, De Angelis S, König M. +> *PK-DB: pharmacokinetics database for individualized and stratified computational modeling*. +> Nucleic Acids Res. 2020 Nov 5:gkaa990. doi: [10.1093/nar/gkaa990](https://doi.org/10.1093/nar/gkaa990). Epub ahead of print. PMID: [33151297](https://pubmed.ncbi.nlm.nih.gov/33151297/). + If you use PK-DB code cite in addition [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1406979.svg)](https://doi.org/10.5281/zenodo.1406979) +## License +[[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) +PK-DB code and documentation is licensed as +* Source Code: [LGPLv3](http://opensource.org/licenses/LGPL-3.0) +* Documentation: [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/) + +## Funding +[[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) +Jan Grzegorzewski and Matthias König are supported by the Federal Ministry of Education and Research (BMBF, Germany) +within the research network Systems Medicine of the Liver ([LiSyM](http://www.lisym.org/), grant number 031L0054). + + © 2017-2020 Jan Grzegorzewski & Matthias König; https://livermetabolism.com. diff --git a/backend/README.md b/backend/README.md index 058507db..23bf695a 100644 --- a/backend/README.md +++ b/backend/README.md @@ -1,43 +1 @@ -# PKDB Backend (`django`) - -- [ ] documentation page with queries and searches - -## PKData Query -The event cycle of PKData is: -1. Query studies, interventions, groups, individuals, and outputs by adding - the respective word as a prefix following two underscores to the url filter - (e.g. ...api/v1/pkdata/?studies__sid=PKDB00008 is equivalent to ...api/v1/studies/?sid=PKDB00008). - The search/filter is performed on the indexed database. For more details on how to construct the query by patterns in the - url check "https://django-elasticsearch-dsl-drf.readthedocs.io/en/latest/". - -2. All tables are updated to get rid of redundant entries. This results in a concise set of entries -in all tables (e.g. a filter on the study table for a specific sid reduces the entries of the other tables -only to interventions, groups, individuals, and outputs which are part of the study). - -3. paginated studies, interventions, groups, individuals, and outputs are returned. Getting the next page for one of the tables -works equivalently to the filters (e.g. getting the second studies page while searching for the interventions containing caffeine. ...api/v1/pkdata/?interventions__substance=caffeine&studies__page=2). - - -## PKDData -documentation - -### Queries - -Query for single study: -``` -http://localhost:8000/api/v1/pkdata/?studies__sid=PKDB00008 -``` -Query for multiple studies based on sids: -``` -http://localhost:8000/api/v1/pkdata/?studies__sid__in=PKDB00008__PKDB00001 -``` -Query for interventions substance: -``` -http://localhost:8000/api/v1/pkdata/?interventions__substance=codeine -``` -Query for interventions and outputs simultaneously: -``` -http://localhost:8000/api/v1/pkdata/?interventions__substance=codeine&outputs__measurement_type=clearance -``` - -© 2017-2020 Jan Grzegorzewski & Matthias König. +# PKDB \ No newline at end of file diff --git a/backend/download_extra/README.md b/backend/download_extra/README.md index aa93c46c..48c58947 100644 --- a/backend/download_extra/README.md +++ b/backend/download_extra/README.md @@ -11,9 +11,6 @@ and * [How to cite](https://github.com/matthiaskoenig/pkdb#how-to-cite) * [License](https://github.com/matthiaskoenig/pkdb#license) * [Funding](https://github.com/matthiaskoenig/pkdb#funding) -* [Installation](https://github.com/matthiaskoenig/pkdb#installation) -* [REST API](https://github.com/matthiaskoenig/pkdb#rest-api) -* [Docker interaction](https://github.com/matthiaskoenig/pkdb#docker-interaction) ## Overview [[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) @@ -34,155 +31,28 @@ Important features are PK-DB is available at https://pk-db.com -## License -[[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) -PK-DB code and documentation is licensed as -* Source Code: [LGPLv3](http://opensource.org/licenses/LGPL-3.0) -* Documentation: [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/) - -## Funding -[[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) -Jan Grzegorzewski and Matthias König are supported by the Federal Ministry of Education and Research (BMBF, Germany) -within the research network Systems Medicine of the Liver ([LiSyM](http://www.lisym.org/), grant number 031L0054). - ## How to cite [[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) If you use PK-DB data or the web interface cite -> *PK-DB: PharmacoKinetics DataBase for Individualized and Stratified Computational Modeling* -> Jan Grzegorzewski, Janosch Brandhorst, Dimitra Eleftheriadou, Kathleen Green, Matthias König -> bioRxiv 760884; doi: https://doi.org/10.1101/760884 +> Grzegorzewski J, Brandhorst J, Green K, Eleftheriadou D, Duport Y, Barthorscht F, Köller A, Ke DYJ, De Angelis S, König M. +> *PK-DB: pharmacokinetics database for individualized and stratified computational modeling*. +> Nucleic Acids Res. 2020 Nov 5:gkaa990. doi: [10.1093/nar/gkaa990](https://doi.org/10.1093/nar/gkaa990). Epub ahead of print. PMID: [33151297](https://pubmed.ncbi.nlm.nih.gov/33151297/). If you use PK-DB code cite in addition [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.1406979.svg)](https://doi.org/10.5281/zenodo.1406979) -## Installation -[[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) -PK-DB is deployed via `docker` and `docker-compose`. - -### Requirements -To setup the development server -the following minimal requirements must be fulfilled -- `docker` -- `docker-compose` -- `Python3.6` - -For elasticsearch the following system settings are required -``` -sudo sysctl -w vm.max_map_count=262144 -``` -To set `vm.max_map_count` persistently change the value in -``` -/etc/sysctl.conf -``` -### Start development server -To start the local development server -```bash -# clone or pull the latest code -git clone https://github.com/matthiaskoenig/pkdb.git -cd pkdb -git pull - -# set environment variables -set -a && source .env.local - -# create/rebuild all docker containers -./docker-purge.sh -``` -This setups a clean database and clean volumes and starts the containers for `pkdb_backend`, `pkdb_frontend`, `elasticsearch` and `postgres`. -You can check that all the containers are running via -```bash -docker container ls -``` -which lists the current containers -``` -CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES -bc7f9204468f pkdb_backend "bash -c '/usr/local…" 27 hours ago Up 18 hours 0.0.0.0:8000->8000/tcp pkdb_backend_1 -17b8d243e956 pkdb_frontend "/bin/sh -c 'npm run…" 27 hours ago Up 18 hours 0.0.0.0:8080->8080/tcp pkdb_frontend_1 -7730c6fe2210 elasticsearch:6.8.1 "/usr/local/bin/dock…" 27 hours ago Up 18 hours 9300/tcp, 0.0.0.0:9123->9200/tcp pkdb_elasticsearch_1 -e880fbb0f349 postgres:11.4 "docker-entrypoint.s…" 27 hours ago Up 18 hours 0.0.0.0:5433->5432/tcp pkdb_postgres_1 -``` -The locally running develop version of PK-DB can now be accessed via the web browser from -- frontend: http://localhost:8080 -- backend: http://localhost:8000 - -### Fill database -Due to copyright, licensing and privacy issues this repository does not contain any data. -All data is managed via a separate private repository at https://github.com/matthiaskoenig/pkdb_data. -This also includes the curation scripts and curation workflows. - -If you are interested in curating data or contributing data please contact us at https://livermetabolism.com. - - -## REST API +## License [[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) -PKDB provides a REST API which allows simple interaction with the database and easy access of data. -An overview over the REST endpoints is provided at [`http://localhost:8000/api/v1/`](http://localhost:8000/api/v1/). - -### Query examples -The REST API supports elastisearch queries, with syntax examples -available [here](https://django-elasticsearch-dsl-drf.readthedocs.io/en/latest/basic_usage_examples.html) -* http://localhost:8000/api/v1/comments_elastic/?user_lastname=K%C3%B6nig -* http://localhost:8000/api/v1/characteristica_elastic/?group_pk=5&final=true -* http://localhost:8000/api/v1/characteristica_elastic/?search=group_name:female&final=true -* http://localhost:8000/api/v1/substances_elastic/?search:name=cod -* http://localhost:8000/api/v1/substances_elastic/?search=cod -* http://localhost:8000/api/v1/substances_elastic/?ids=1__2__3 -* http://localhost:8000/api/v1/substances_elastic/?ids=1__2__3&ordering=-name -* http://localhost:8000/api/v1/substances_elastic/?name=caffeine&name=acetaminophen - -### Suggestion example -In addition suggestion queries are possible -* http://localhost:8000/api/v1/substances_elastic/suggest/?search:name=cod +PK-DB code and documentation is licensed as +* Source Code: [LGPLv3](http://opensource.org/licenses/LGPL-3.0) +* Documentation: [CC BY-SA 4.0](http://creativecommons.org/licenses/by-sa/4.0/) -## Docker interaction +## Funding [[^]](https://github.com/matthiaskoenig/pkdb#pk-db---a-pharmacokinetics-database) -In the following typical examples to interact with the PK-DB docker containers are provided. - -### Check running containers -To check the running containers use -```bash -watch docker container ls -``` - -### Interactive container mode -```bash -./docker-interactive.sh -``` - -### Container logs -To get access to individual container logs use `docker container logs `. For instance to check the -django backend logs use -```bash -docker container logs pkdb_backend_1 -``` - -### Run command in container -To run commands inside the docker container use -```bash -docker-compose run --rm backend [command] -``` -or to run migrations -```bash -docker-compose run --rm backend python manage.py makemigrations -``` - -### Authentication data -The following examples show how to dump and restore the authentication data. - -Dump authentication data -```bash -docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py dumpdata auth --indent 2 > ./backend/pkdb_app/fixtures/auth.json -docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py dumpdata users --indent 2 > ./backend/pkdb_app/fixtures/users.json -docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py dumpdata rest_email_auth --indent 2 > ./backend/pkdb_app/fixtures/rest_email_auth.json -``` +Jan Grzegorzewski and Matthias König are supported by the Federal Ministry of Education and Research (BMBF, Germany) +within the research network Systems Medicine of the Liver ([LiSyM](http://www.lisym.org/), grant number 031L0054). -Restore authentication data -```bash -docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py loaddata auth pkdb_app/fixtures/auth.json -docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py loaddata users pkdb_app/fixtures/users.json -docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py loaddata rest_email_auth pkdb_app/fixtures/rest_email_auth.json -``` © 2017-2020 Jan Grzegorzewski & Matthias König; https://livermetabolism.com. diff --git a/backend/pkdb_app/_version.py b/backend/pkdb_app/_version.py index ec4dbea7..9f65c118 100644 --- a/backend/pkdb_app/_version.py +++ b/backend/pkdb_app/_version.py @@ -1,4 +1,4 @@ """ Definition of version string. """ -__version__ = "0.9.3" +__version__ = "0.9.4" diff --git a/backend/pkdb_app/behaviours.py b/backend/pkdb_app/behaviours.py index 70b3c92b..87df4d51 100644 --- a/backend/pkdb_app/behaviours.py +++ b/backend/pkdb_app/behaviours.py @@ -54,7 +54,6 @@ def study_sid(self): def map_field(fields): return [f"{field}_map" for field in fields] -VALUE_FIELDS_SAME_SCALE = ["value", "mean", "median", "min", "max"] VALUE_FIELDS_SAME_SCALE = ["value", "mean", "median", "min", "max"] VALUE_FIELDS_NO_UNIT = VALUE_FIELDS_SAME_SCALE + ["sd", "se", "cv"] VALUE_FIELDS = VALUE_FIELDS_NO_UNIT + ["unit"] diff --git a/backend/pkdb_app/data/documents.py b/backend/pkdb_app/data/documents.py index bf980045..409d8fb4 100644 --- a/backend/pkdb_app/data/documents.py +++ b/backend/pkdb_app/data/documents.py @@ -1,6 +1,6 @@ from django_elasticsearch_dsl import Document, fields, ObjectField from django_elasticsearch_dsl.registries import registry -from pkdb_app.data.models import Dimension, SubSet +from pkdb_app.data.models import Dimension, SubSet, Data from ..documents import string_field, elastic_settings, info_node, study_field @@ -98,6 +98,40 @@ class SubSetDocument(Document): study = study_field study_sid = string_field('study_sid') study_name = string_field('study_name') + access = string_field('access') + allowed_users = fields.ObjectField( + attr="allowed_users", + + properties={ + 'username': string_field("username") + }, + multi=True + ) + def get_queryset(self): + """Not mandatory but to improve performance we can select related in one sql request""" + return super(SubSetDocument, self).get_queryset().prefetch_related("data_points__outputs") + class Django: + model = SubSet + # Ignore auto updating of Elasticsearch when a model is saved/deleted + ignore_signals = True + # Don't perform an index refresh after every update + auto_refresh = False + + class Index: + name = 'subset' + settings = elastic_settings + settings['number_of_shards'] = 5 + settings['number_of_replicas'] = 1 + settings['max_result_window'] = 100000 + + +''' +@registry.register_document +class TimeCourseDocument(Document): + study_sid = string_field('study_sid') + study_name = string_field('study_name') + outputs_pk = fields.ListField('timecourse') + # for permissions access = string_field('access') allowed_users = fields.ObjectField( @@ -108,6 +142,11 @@ class SubSetDocument(Document): multi=True ) + def get_queryset(self): + """Not mandatory but to improve performance we can select related in one sql request""" + return super(TimeCourseDocument, self).get_queryset().filter(data__data_type=Data.DataTypes.Timecourse) # .prefetch_related("interventions"). + + class Django: model = SubSet # Ignore auto updating of Elasticsearch when a model is saved/deleted @@ -120,4 +159,6 @@ class Index: settings = elastic_settings settings['number_of_shards'] = 5 settings['number_of_replicas'] = 1 - settings['max_result_window'] = 100000 \ No newline at end of file + settings['max_result_window'] = 100000 + +''' \ No newline at end of file diff --git a/backend/pkdb_app/data/models.py b/backend/pkdb_app/data/models.py index 837003e6..676d9038 100644 --- a/backend/pkdb_app/data/models.py +++ b/backend/pkdb_app/data/models.py @@ -1,6 +1,9 @@ import itertools +from collections import Iterable + from django.core.exceptions import ObjectDoesNotExist, MultipleObjectsReturned from django.db import models +from django.utils.functional import cached_property from pkdb_app.behaviours import Accessible from pkdb_app.interventions.models import Intervention from pkdb_app.utils import CHAR_MAX_LENGTH @@ -35,40 +38,40 @@ class DataTypes(models.TextChoices): dataset = models.ForeignKey(DataSet, related_name="data", on_delete=models.CASCADE, null=True) -class SubSet(Accessible): - name = models.CharField(max_length=CHAR_MAX_LENGTH) - data = models.ForeignKey(Data, related_name="subsets", on_delete=models.CASCADE) - study = models.ForeignKey('studies.Study', on_delete=models.CASCADE, related_name="subsets") - - def get_single_dosing(self) -> Intervention: - """Returns a single intervention of type dosing if existing. - If multiple dosing interventions exist, no dosing is returned!. - """ - try: - dosing_measurement_type = Intervention.objects.filter(id__in=self.interventions).get( - normed=True, measurement_type__info_node__name="dosing" - ) - return dosing_measurement_type - - except (ObjectDoesNotExist, MultipleObjectsReturned): - return None +class Timecourseable(models.Model): + class Meta: + abstract = True + def output_pk(self): + return self.data_points.values_list("outputs__pk") - @property - def array(self): - [point.values_list("output") for point in self.data_points] - return self.data.data_type + @cached_property + def timecourse(self): + """ FIXME: Documentation """ - @property - def data_type(self): - return self.data.data_type + tc = self.merge_values( + self.data_points.prefetch_related('outputs').values(*self._timecourse_extra().values()), + sort_values=["outputs__interventions__pk", "outputs__time"] + ) + self.reformat_timecourse(tc, self._timecourse_extra()) + self.validate_timecourse(tc) + return tc - @property - def outputs(self): - return self.data_points.values_list('outputs', flat=True) + def reformat_timecourse(self, timecourse, mapping): + """ FIXME: Documentation & type hinting """ + for new_key, old_key in mapping.items(): + timecourse[new_key] = timecourse.pop(old_key) + if new_key in ["intervention_pk", "interventions"]: + if isinstance(timecourse[new_key], int): + timecourse[new_key] = (timecourse[new_key],) - @property - def interventions(self): - return self.data_points.values_list('outputs__interventions', flat=True) + @cached_property + def timecourse_representation(self): + """ FIXME: Documentation """ + if self.data.data_type == Data.DataTypes.Timecourse: + timecourse = self.merge_values( + self.data_points.values(*self.keys_timecourse_representation().values()), ) + self.reformat_timecourse(timecourse, self.keys_timecourse_representation()) + return timecourse def timecourse_extra_no_intervention(self): return { @@ -93,14 +96,15 @@ def timecourse_extra_no_intervention(self): 'time_unit': 'outputs__time_unit', 'unit': 'outputs__unit', } + def keys_timecourse_representation(self): return { - "study_sid":"outputs__study__sid", + "study_sid": "outputs__study__sid", "study_name": "outputs__study__name", "outputs_pk": "outputs__pk", "subset_pk": "subset_id", "subset_name": "subset__name", - "interventions": "outputs__interventions__pk", + "intervention_pk": "outputs__interventions__pk", "group_pk": "outputs__group_id", "individual_pk": "outputs__individual_id", "normed": 'outputs__normed', @@ -113,9 +117,9 @@ def keys_timecourse_representation(self): "output_type": 'outputs__output_type', "time": 'outputs__time', 'time_unit': 'outputs__time_unit', - "measurement_type": "outputs__measurement_type__info_node__sid", - "measurement__label": "outputs__measurement_type__info_node__label", - "choice": "outputs__choice__info_node__sid", + "measurement_type": "outputs__measurement_type__info_node__sid", + "measurement_type__label": "outputs__measurement_type__info_node__label", + "choice": "outputs__choice__info_node__sid", "choice_label": "outputs__choice__info_node__label", "substance": "outputs__substance__info_node__sid", "substance_label": "outputs__substance__info_node__label", @@ -144,10 +148,10 @@ def _timecourse_extra(self): @staticmethod def none_tuple(values): - if all(pd.isna(v) for v in values): - return (None,) - else: - return tuple(values) + if isinstance(values, Iterable): + if all(pd.isna(v) for v in values): + return (None,) + return tuple(values) @staticmethod def to_list(tdf): @@ -166,10 +170,11 @@ def _tuple_or_value(values): return tuple(values) @staticmethod - def merge_values(values=None ,df=None, groupby=("outputs__pk",), sort_values=["outputs__interventions__pk","outputs__time"]): + def merge_values(values=None, df=None, groupby=("outputs__pk",), + sort_values=["outputs__interventions__pk", "outputs__time"]): if values: - df =pd.DataFrame(values) + df = pd.DataFrame(values) if sort_values: df = df.sort_values(sort_values) merged_dict = df.groupby(list(groupby), as_index=False).apply(SubSet.to_list).to_dict("list") @@ -177,7 +182,6 @@ def merge_values(values=None ,df=None, groupby=("outputs__pk",), sort_values=["o for key, values in merged_dict.items(): if key not in ['outputs__time', 'outputs__value', 'outputs__mean', 'outputs__median', 'outputs__cv', 'outputs__sd' 'outputs__se']: - merged_dict[key] = SubSet.tuple_or_value(values) if all(v is None for v in values): @@ -213,30 +217,41 @@ def validate_timecourse(self, timecourse): raise ValueError(f"Subset used for timecourse is not unique on '{key}'. Values are '{name}'. " f"Check uniqueness of labels for timecourses.") - def timecourse(self): - """ FIXME: Documentation """ - tc = self.merge_values( - self.data_points.prefetch_related('outputs').values(*self._timecourse_extra().values()), - sort_values=["outputs__interventions__pk", "outputs__time"] - ) - self.reformat_timecourse(tc, self._timecourse_extra()) - self.validate_timecourse(tc) - return tc - def reformat_timecourse(self, timecourse, mapping): - """ FIXME: Documentation & type hinting """ - for new_key, old_key in mapping.items(): - timecourse[new_key] = timecourse.pop(old_key) - if new_key == "interventions": - if isinstance(timecourse[new_key], int): - timecourse[new_key] = (timecourse[new_key],) +class SubSet(Accessible, Timecourseable): + name = models.CharField(max_length=CHAR_MAX_LENGTH) + data = models.ForeignKey(Data, related_name="subsets", on_delete=models.CASCADE) + study = models.ForeignKey('studies.Study', on_delete=models.CASCADE, related_name="subsets") - def timecourse_representation(self): - """ FIXME: Documentation """ - timecourse = self.merge_values( - self.data_points.values(*self.keys_timecourse_representation().values()),) - self.reformat_timecourse(timecourse, self.keys_timecourse_representation()) - return timecourse + def get_single_dosing(self) -> Intervention: + """Returns a single intervention of type dosing if existing. + If multiple dosing interventions exist, no dosing is returned!. + """ + try: + dosing_measurement_type = Intervention.objects.filter(id__in=self.interventions).get( + normed=True, measurement_type__info_node__name="dosing" + ) + return dosing_measurement_type + + except (ObjectDoesNotExist, MultipleObjectsReturned): + return None + + @property + def array(self): + [point.values_list("output") for point in self.data_points] + return self.data.data_type + + @property + def data_type(self): + return self.data.data_type + + @property + def outputs(self): + return self.data_points.values_list('outputs', flat=True) + + @property + def interventions(self): + return self.data_points.values_list('outputs__interventions', flat=True) def keys_scatter_representation(self): """ FIXME: Documentation """ @@ -245,6 +260,7 @@ def keys_scatter_representation(self): "data_point": "pk" } + @cached_property def scatter_representation(self): scatter_x = self.merge_values(self.data_points.filter(dimensions__dimension=0).values(*self.keys_scatter_representation().values()), sort_values=None) self.reformat_timecourse(scatter_x, self.keys_scatter_representation()) diff --git a/backend/pkdb_app/data/serializers.py b/backend/pkdb_app/data/serializers.py index 8f27e5df..345ad3a7 100644 --- a/backend/pkdb_app/data/serializers.py +++ b/backend/pkdb_app/data/serializers.py @@ -1,11 +1,16 @@ +import json import traceback +from django.utils.functional import cached_property +from django_elasticsearch_dsl_drf.serializers import DocumentSerializer +from pkdb_app.behaviours import MEASUREMENTTYPE_FIELDS from pkdb_app.comments.serializers import DescriptionSerializer, CommentSerializer, CommentElasticSerializer, \ DescriptionElasticSerializer +from pkdb_app.data.documents import SubSetDocument from pkdb_app.data.models import DataSet, Data, SubSet, Dimension, DataPoint from pkdb_app.outputs.models import Output from pkdb_app.outputs.pk_calculation import pkoutputs_from_timecourse -from pkdb_app.outputs.serializers import OUTPUT_FOREIGN_KEYS +from pkdb_app.outputs.serializers import OUTPUT_FOREIGN_KEYS, OUTPUT_FIELDS from pkdb_app.serializers import WrongKeyValidationSerializer, ExSerializer, StudySmallElasticSerializer from pkdb_app.subjects.models import DataFile from pkdb_app.utils import _create, create_multiple_bulk, create_multiple_bulk_normalized, list_of_pk @@ -13,6 +18,8 @@ import pandas as pd import numpy as np +from functools import lru_cache + class DimensionSerializer(WrongKeyValidationSerializer): output = serializers.CharField(write_only=True, allow_null=False, allow_blank=False) @@ -40,7 +47,7 @@ class SubSetSerializer(ExSerializer): class Meta: model = SubSet - fields = ['name', "descriptions", "comments", "dimensions", "shared"] + fields = ['name', "descriptions", "comments", "dimensions", "shared"] def to_internal_value(self, data): self.validate_wrong_keys(data) @@ -90,7 +97,7 @@ def calculate_pks_from_timecourses(self, subset): outputs_dj = create_multiple_bulk(subset, "subset", outputs, Output) - for intervention, output in zip(interventions,outputs_dj): + for intervention, output in zip(interventions, outputs_dj): output.interventions.add(*intervention) if outputs_dj: @@ -100,7 +107,7 @@ def calculate_pks_from_timecourses(self, subset): subset.save() @staticmethod - def _add_id_to_foreign_keys(value:str): + def _add_id_to_foreign_keys(value: str): if value in OUTPUT_FOREIGN_KEYS: return value + "_id" else: @@ -128,19 +135,20 @@ def create_scatter(self, dimensions, shared, subset_instance): data_set = outputs_pd[outputs_pd['label'].isin(dimensions)] if len(data_set) == 0: raise serializers.ValidationError( - {"data_set":{"data":[{"subsets":{"dimensions":f"Outputs with label <{dimensions}> do not exist."}}]}}) + {"data_set": { + "data": [{"subsets": {"dimensions": f"Outputs with label <{dimensions}> do not exist."}}]}}) data_set["dimension"] = None - data_set.loc[data_set['label'] == dimensions[0],'dimension'] = 0 - data_set.loc[data_set['label'] == dimensions[1],'dimension'] = 1 + data_set.loc[data_set['label'] == dimensions[0], 'dimension'] = 0 + data_set.loc[data_set['label'] == dimensions[1], 'dimension'] = 1 shared_reformated = [] for shared_field in shared: shared_field_reformated = self._add_id_to_foreign_keys(shared_field) if shared_field_reformated not in data_set: - p_options = [self._remove_id_to_foreign_keys(c) for c in data_set.columns] - raise serializers.ValidationError(f"Shared_field <{shared_field}> not in outputs fields. Options are <{p_options}>") + raise serializers.ValidationError( + f"Shared_field <{shared_field}> not in outputs fields. Options are <{p_options}>") shared_reformated.append(shared_field_reformated) if len(data_set.groupby(shared_reformated)) == 0: @@ -203,7 +211,7 @@ def create_timecourse(self, subset_instance, dimensions): for output in subset_outputs.iterator(): data_point_instance = DataPoint.objects.create(subset=subset_instance) - dimension = Dimension(dimension=0, study=study, output=output,data_point=data_point_instance) + dimension = Dimension(dimension=0, study=study, output=output, data_point=data_point_instance) dimensions.append(dimension) Dimension.objects.bulk_create(dimensions) @@ -211,7 +219,6 @@ def create_timecourse(self, subset_instance, dimensions): class DataSerializer(ExSerializer): - comments = CommentSerializer( many=True, read_only=False, required=False, allow_null=True ) @@ -249,13 +256,12 @@ def create(self, validated_data): for subset in poped_data["subsets"]: subset_instance, poped_data = _create(model_serializer=SubSetSerializer(context=self.context), - validated_data={**subset, "data": data_instance}, - create_multiple_keys=['comments', 'descriptions']) + validated_data={**subset, "data": data_instance}, + create_multiple_keys=['comments', 'descriptions']) return data_instance class DataSetSerializer(ExSerializer): - data = DataSerializer(many=True, read_only=False, required=False, allow_null=True) comments = CommentSerializer( many=True, read_only=False, required=False, allow_null=True @@ -305,21 +311,29 @@ def to_internal_value(self, data): ) data_single['subsets'] = temp_subsets data_container.extend(self.entries_from_file(data_single)) + self.validate_no_timeocourses(data_container) autogenerate_timecourses = self.autogenerate_timecourses() if autogenerate_timecourses: - data_container.append(self.autogenerate_timecourses()) + data_container.append(autogenerate_timecourses) data['data'] = data_container return super().to_internal_value(data) + def validate_no_timeocourses(self, data): + for data_single in data: + if data_single.get("data_type") == Data.DataTypes.Timecourse: + raise serializers.ValidationError("timecourses are not allowed to be definied explictly in dataset. " + "Timecourses are created automatically by adding label " + "and output_type='timecourse' to the respective outputs.") + + def autogenerate_timecourses(self): - #Study = apps.get_model('studies', 'Study') + # Study = apps.get_model('studies', 'Study') study_sid = self.context["request"].path.split("/")[-2] outputs = Output.objects.filter(study__sid=study_sid, normed=True, output_type=Output.OutputTypes.Timecourse) - timecourse_labels = outputs.values_list("label",flat=True).distinct() + timecourse_labels = outputs.values_list("label", flat=True).distinct() if len(timecourse_labels) > 0: - auto_generated_data = { "name": "AutoGenerate", "data_type": "timecourse", @@ -328,18 +342,15 @@ def autogenerate_timecourses(self): } return auto_generated_data - def validate(self, attrs): self._validate_unique_names(attrs["data"]) return super().validate(attrs) - - def create(self, validated_data): dataset_instance, poped_data = _create(model_manager=self.Meta.model.objects, - validated_data=validated_data, - create_multiple_keys=['comments', 'descriptions'], - pop=['data']) + validated_data=validated_data, + create_multiple_keys=['comments', 'descriptions'], + pop=['data']) data_instance_container = [] for data_single in poped_data['data']: data_single["dataset"] = dataset_instance @@ -359,23 +370,196 @@ def create(self, validated_data): # Read Serializer ################################ +class TimecourseSerializer(serializers.Serializer): + """ Timecourse Serializer""" + study_sid = serializers.CharField() + study_name = serializers.CharField() + output_pk = serializers.SerializerMethodField() + subset_pk = serializers.IntegerField(source="pk") + subset_name = serializers.CharField(source="name") + + intervention_pk = serializers.SerializerMethodField() + group_pk = serializers.SerializerMethodField() + individual_pk = serializers.SerializerMethodField() + normed = serializers.SerializerMethodField() + + tissue = serializers.SerializerMethodField() + tissue_label = serializers.SerializerMethodField() + + method = serializers.SerializerMethodField() + method_label = serializers.SerializerMethodField() + + label = serializers.SerializerMethodField() + + time = serializers.SerializerMethodField() + time_unit = serializers.SerializerMethodField() + + measurement_type = serializers.SerializerMethodField() + measurement_type_label = serializers.SerializerMethodField() + choice = serializers.SerializerMethodField() + choice_label = serializers.SerializerMethodField() + + substance = serializers.SerializerMethodField() + substance_label = serializers.SerializerMethodField() + + value = serializers.SerializerMethodField() + mean = serializers.SerializerMethodField() + median = serializers.SerializerMethodField() + min = serializers.SerializerMethodField() + max = serializers.SerializerMethodField() + sd = serializers.SerializerMethodField() + se = serializers.SerializerMethodField() + cv = serializers.SerializerMethodField() + unit = serializers.SerializerMethodField() + + # @cached_property + # def json_object(self): + # return json.dumps(self.instance.to_dict()) + + @lru_cache(maxsize=128) + def _get_general(self, obj): + """ This function reshapes and reformats the outputs to a Pandas DataFrame. """ + + obj = [v["point"][0] for v in json.loads(obj)["array"]] + result = pd.DataFrame(obj) + return result.where(result.notnull(), None) + + def _get_field(self, obj, field): + result = self._get_general(json.dumps(obj.to_dict())) + if result[field].isnull().all(): + return None + return list(result[field].values) + + def get_output_pk(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return self._get_field(obj, "pk") + + def get_intervention_pk(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return [i["pk"] for i in result["interventions"].iloc[0]] + + def get_group_pk(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["group"][0]: + return result["group"][0]["pk"] + + def get_individual_pk(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["individual"][0]: + return result["individual"][0]["pk"] + + def get_normed(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return result["normed"][0] + + def get_tissue(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["tissue"][0]: + return result["tissue"][0]["sid"] + + def get_tissue_label(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["tissue"][0]: + return result["tissue"][0]["label"] + + def get_method(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["method"][0]: + return result["method"][0]["sid"] + + def get_method_label(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["method"][0]: + return result["method"][0]["label"] + + def get_label(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return result["label"][0] + + def get_time(self, obj): + return self._get_field(obj, "time") + + def get_time_unit(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return result["time_unit"][0] + + def get_measurement_type(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return result["measurement_type"][0]["sid"] + + def get_measurement_type_label(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return result["measurement_type"][0]["label"] + + def get_choice(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["choice"][0]: + return result["choice"][0]["sid"] + + def get_choice_label(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["choice"][0]: + return result["choice"][0]["label"] + + def get_substance(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["substance"][0]: + return result["substance"][0]["sid"] + + def get_substance_label(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + if result["substance"][0]: + return result["substance"][0]["label"] + + def get_value(self, obj): + return self._get_field(obj, "value") -class SubSetElasticSerializer(serializers.ModelSerializer): + def get_mean(self, obj): + return self._get_field(obj, "mean") + def get_median(self, obj): + return self._get_field(obj, "median") + + def get_min(self, obj): + return self._get_field(obj, "min") + + def get_max(self, obj): + return self._get_field(obj, "max") + + def get_sd(self, obj): + return self._get_field(obj, "sd") + + def get_se(self, obj): + return self._get_field(obj, "se") + + def get_cv(self, obj): + return self._get_field(obj, "cv") + + def get_unit(self, obj): + result = self._get_general(json.dumps(obj.to_dict())) + return result["unit"][0] + + class Meta: + fields = ["study_sid", "study_name", "output_pk", "intervention_pk", "group_pk", "individual_pk", "normed", + "calculated"] + OUTPUT_FIELDS + MEASUREMENTTYPE_FIELDS + + +class SubSetElasticSerializer(DocumentSerializer): study = StudySmallElasticSerializer(read_only=True) name = serializers.CharField() data_type = serializers.CharField() array = serializers.SerializerMethodField() class Meta: - model = SubSet - fields = ["pk", "study", + document = SubSetDocument + fields = ["pk", + "study", "name", "data_type", - "array"] + "array", + "timecourse"] - def get_array(self,object): - #return [[SmallOutputSerializer(point.point,many=True, read_only=True).data] for point in object["array"]] + def get_array(self, object): return [point["point"] for point in object.to_dict()["array"]] @@ -392,6 +576,7 @@ class Meta: def get_subsets(self, obj): return list_of_pk("subsets", obj) + class DataAnalysisSerializer(serializers.ModelSerializer): class Meta: model = Dimension diff --git a/backend/pkdb_app/data/views.py b/backend/pkdb_app/data/views.py index 838eaa76..8f1b3437 100644 --- a/backend/pkdb_app/data/views.py +++ b/backend/pkdb_app/data/views.py @@ -6,7 +6,7 @@ ) from pkdb_app.documents import AccessView from pkdb_app.data.documents import DataAnalysisDocument, SubSetDocument -from pkdb_app.data.serializers import DataAnalysisSerializer, SubSetElasticSerializer +from pkdb_app.data.serializers import DataAnalysisSerializer, SubSetElasticSerializer, TimecourseSerializer from pkdb_app.pagination import CustomPagination @@ -84,3 +84,45 @@ class SubSetViewSet(AccessView): "name": "name.raw", "data_type": "data_type.raw" } +class TimecourseViewSet(AccessView): + """ Endpoint to query timecourses + + The timecourses endpoints gives access to timecourses. + """ + document = SubSetDocument + serializer_class = TimecourseSerializer + pagination_class = CustomPagination + lookup_field = "id" + filter_backends = [FilteringFilterBackend, IdsFilterBackend, MultiMatchSearchFilterBackend] + search_fields = ( + "name", + "data_type", + "study.sid", + "study.name", + "array.data_points.point.outputs.group.name", + "array.data_points.point.outputs.individual.name", + "array.data_points.point.outputs.interventions.name", + "array.data_points.point.outputs.measurement_type.label", + "array.data_points.point.outputs.choice.label", + "array.data_points.point.outputs.substance.label", + "array.data_points.point.outputs.tissue.label", + ) + multi_match_search_fields = {field: {"boost": 1} for field in search_fields} + multi_match_options = {'operator': 'and'} + filter_fields = { + 'study_sid': {'field': 'study_sid.raw', + 'lookups': [ + LOOKUP_QUERY_IN, + LOOKUP_QUERY_EXCLUDE, + + ], + }, + 'study_name': {'field': 'study_name.raw', + 'lookups': [ + LOOKUP_QUERY_IN, + LOOKUP_QUERY_EXCLUDE, + + ], + }, + + } diff --git a/backend/pkdb_app/documents.py b/backend/pkdb_app/documents.py index 31717974..a3daed5c 100644 --- a/backend/pkdb_app/documents.py +++ b/backend/pkdb_app/documents.py @@ -17,7 +17,7 @@ 'number_of_shards': 1, 'number_of_replicas': 1, 'max_ngram_diff': 15, - 'max_terms_count':65536*4, + 'max_terms_count': 65536*4, } edge_ngram_filter = token_filter( diff --git a/backend/pkdb_app/interventions/documents.py b/backend/pkdb_app/interventions/documents.py index 047ea2fe..f8fbd117 100644 --- a/backend/pkdb_app/interventions/documents.py +++ b/backend/pkdb_app/interventions/documents.py @@ -10,7 +10,7 @@ # ------------------------------------ @registry.register_document class InterventionDocument(Document): - pk = fields.IntegerField() + pk = fields.IntegerField("pk") measurement_type = info_node("i_measurement_type") form = info_node("i_form") route = info_node("i_route") diff --git a/backend/pkdb_app/interventions/serializers.py b/backend/pkdb_app/interventions/serializers.py index 0d8fb6b5..ecd889df 100644 --- a/backend/pkdb_app/interventions/serializers.py +++ b/backend/pkdb_app/interventions/serializers.py @@ -3,14 +3,13 @@ """ import itertools -from django.apps import apps from rest_framework import serializers from pkdb_app import utils from pkdb_app.behaviours import VALUE_FIELDS_NO_UNIT, \ MEASUREMENTTYPE_FIELDS, map_field, EX_MEASUREMENTTYPE_FIELDS from pkdb_app.info_nodes.models import InfoNode -from pkdb_app.info_nodes.serializers import MeasurementTypeableSerializer, EXMeasurementTypeableSerializer +from pkdb_app.info_nodes.serializers import MeasurementTypeableSerializer from pkdb_app.subjects.serializers import EXTERN_FILE_FIELDS from ..comments.serializers import DescriptionSerializer, CommentSerializer, DescriptionElasticSerializer, \ CommentElasticSerializer @@ -297,36 +296,108 @@ class Meta: fields = ["pk", "normed"] + INTERVENTION_FIELDS + ["study"] + MEASUREMENTTYPE_FIELDS -class InterventionElasticSerializerAnalysis(serializers.ModelSerializer): +class InterventionElasticSerializerAnalysis(serializers.Serializer): + study_sid = serializers.CharField() + study_name = serializers.CharField() intervention_pk = serializers.IntegerField(source="pk") - substance = serializers.CharField(source="substance_name", allow_null=True) - measurement_type = serializers.CharField(source="measurement_type_name",) - route = serializers.CharField(source="route_name",) - application = serializers.CharField(source="application_name",) - form = serializers.CharField(source="form_name",) - choice = serializers.CharField(source="choice_name") - value = serializers.FloatField(allow_null=True) - mean = serializers.FloatField(allow_null=True) - median = serializers.FloatField(allow_null=True) - min = serializers.FloatField(allow_null=True) - max = serializers.FloatField(allow_null=True) - sd = serializers.FloatField(allow_null=True) - se = serializers.FloatField(allow_null=True) - cv = serializers.FloatField(allow_null=True) + raw_pk = serializers.IntegerField() + normed = serializers.BooleanField() + + name = serializers.CharField() + route = serializers.SerializerMethodField() + route_label = serializers.SerializerMethodField() + + form = serializers.SerializerMethodField() + form_label = serializers.SerializerMethodField() + + application = serializers.SerializerMethodField() + application_label = serializers.SerializerMethodField() + + time = serializers.FloatField() + time_end = serializers.FloatField() + time_unit = serializers.CharField() + measurement_type = serializers.SerializerMethodField() + measurement_type_label = serializers.SerializerMethodField() + + choice = serializers.SerializerMethodField() + choice_label =serializers.SerializerMethodField() + + substance = serializers.SerializerMethodField() + substance_label = serializers.SerializerMethodField() + + value = serializers.FloatField() + mean = serializers.FloatField() + median = serializers.FloatField() + min = serializers.FloatField() + max = serializers.FloatField() + sd = serializers.FloatField() + se = serializers.FloatField() + cv = serializers.FloatField() + unit = serializers.CharField() + + def get_choice(self, obj): + if obj.choice: + return obj.choice.sid + + def get_choice_label(self, obj): + if obj.choice: + return obj.choice.label + + def get_route(self, obj): + if obj.route: + return obj.route.sid + + def get_route_label(self, obj): + if obj.route: + return obj.route.label + + def get_form(self, obj): + if obj.form: + return obj.form.sid + + def get_form_label(self, obj): + if obj.form: + return obj.form.label + + def get_application(self, obj): + if obj.application: + return obj.application.sid + + def get_application_label(self, obj): + if obj.application: + return obj.application.label + + def get_measurement_type(self, obj): + if obj.measurement_type: + return obj.measurement_type.sid + + def get_measurement_type_label(self, obj): + if obj.measurement_type: + return obj.measurement_type.label + + def get_substance(self, obj): + if obj.substance: + return obj.substance.sid + + def get_substance_label(self, obj): + if obj.substance: + return obj.substance.label class Meta: - model = Intervention fields = ["study_sid", "study_name", "intervention_pk", "raw_pk", "normed"] + INTERVENTION_FIELDS + MEASUREMENTTYPE_FIELDS + """ def to_representation(self, instance): - rep = super().to_representation(instance) - for field in VALUE_FIELDS_NO_UNIT + ["time"]: - try: - rep[field] = '{:.2e}'.format(rep[field]) - except (ValueError, TypeError): - pass - return rep + rep = super().to_representation(instance) + for field in VALUE_FIELDS_NO_UNIT + ["time"]: + try: + rep[field] = '{:.2e}'.format(rep[field]) + except (ValueError, TypeError): + pass + return rep + """ + diff --git a/backend/pkdb_app/outputs/documents.py b/backend/pkdb_app/outputs/documents.py index 529f98c9..d1dc4711 100644 --- a/backend/pkdb_app/outputs/documents.py +++ b/backend/pkdb_app/outputs/documents.py @@ -75,7 +75,7 @@ class Index: settings = elastic_settings settings['number_of_shards'] = 5 settings['number_of_replicas'] = 1 - settings['max_result_window'] = 100000 + settings['max_result_window'] = 500000 @registry.register_document @@ -90,15 +90,25 @@ class OutputInterventionDocument(Document): label = string_field('label') output_type = string_field('output_type') measurement_type = string_field("measurement_type") + measurement_type_label = string_field("measurement_type_label") + substance = string_field("substance") + substance_label = string_field("substance_label") + normed = fields.BooleanField() calculated = fields.BooleanField() method = string_field('method') + method_label = string_field('method_label') + tissue = string_field('tissue') + tissue_label = string_field('tissue_label') + time = fields.FloatField('time') time_unit = string_field('time_unit') unit = string_field('unit') choice = string_field('choice') + choice_label = string_field('choice_label') + # output fields value = fields.FloatField('value') @@ -132,7 +142,7 @@ class Index: settings = elastic_settings settings['number_of_shards'] = 5 settings['number_of_replicas'] = 1 - settings['max_result_window'] = 100000 + settings['max_result_window'] = 500000 def get_queryset(self): diff --git a/backend/pkdb_app/outputs/models.py b/backend/pkdb_app/outputs/models.py index 0f409ade..0dd85740 100644 --- a/backend/pkdb_app/outputs/models.py +++ b/backend/pkdb_app/outputs/models.py @@ -278,22 +278,40 @@ def time_unit(self): @property def tissue(self): if self.output.tissue: - return self.output.tissue.info_node.name + return self.output.tissue.info_node.sid + @property + def tissue_label(self): + if self.output.tissue: + return self.output.tissue.info_node.label @property def method(self): if self.output.method: - return self.output.method.info_node.name + return self.output.method.info_node.sid + + @property + def method_label(self): + if self.output.method: + return self.output.method.info_node.label @property def measurement_type(self): - return self.output.measurement_type.info_node.name + return self.output.measurement_type.info_node.sid + + @property + def measurement_type_label(self): + return self.output.measurement_type.info_node.label @property def choice(self): if self.output.choice: - return self.output.choice.info_node.name + return self.output.choice.info_node.sid + + @property + def choice_label(self): + if self.output.choice: + return self.output.choice.info_node.label @property def label(self): @@ -302,7 +320,12 @@ def label(self): @property def substance(self): if self.output.substance: - return self.output.substance.info_node.name + return self.output.substance.info_node.sid + + @property + def substance_label(self): + if self.output.substance: + return self.output.substance.info_node.label @property def normed(self): diff --git a/backend/pkdb_app/outputs/pk_calculation.py b/backend/pkdb_app/outputs/pk_calculation.py index aea19396..f52109ba 100644 --- a/backend/pkdb_app/outputs/pk_calculation.py +++ b/backend/pkdb_app/outputs/pk_calculation.py @@ -31,7 +31,7 @@ def pkoutputs_from_timecourse(subset:Subset) -> List[Dict]: """ outputs = [] dosing = subset.get_single_dosing() - timecourse = subset.timecourse() + timecourse = subset.timecourse # dosing information must exist if not dosing: return outputs diff --git a/backend/pkdb_app/outputs/serializers.py b/backend/pkdb_app/outputs/serializers.py index 613efb80..f1c0dbdc 100644 --- a/backend/pkdb_app/outputs/serializers.py +++ b/backend/pkdb_app/outputs/serializers.py @@ -14,13 +14,12 @@ from .models import ( Output, OutputSet, - OutputEx, - OutputIntervention) + OutputEx) from ..comments.serializers import DescriptionSerializer, CommentSerializer, DescriptionElasticSerializer, \ CommentElasticSerializer from ..interventions.models import Intervention from ..serializers import ( - ExSerializer, StudySmallElasticSerializer, SidNameLabelSerializer, SidNameSerializer) + ExSerializer, StudySmallElasticSerializer, SidNameLabelSerializer) from ..subjects.models import Group, DataFile, Individual from ..subjects.serializers import ( EXTERN_FILE_FIELDS, GroupSmallElasticSerializer, IndividualSmallElasticSerializer) @@ -315,12 +314,52 @@ def get_outputs(self, obj): return list_of_pk("outputs", obj) -class OutputInterventionSerializer(serializers.ModelSerializer): +class OutputInterventionSerializer(serializers.Serializer): + study_sid = serializers.CharField() + study_name = serializers.CharField() + output_pk = serializers.IntegerField() + intervention_pk = serializers.IntegerField() + group_pk = serializers.IntegerField() + individual_pk = serializers.IntegerField() + normed = serializers.BooleanField() + calculated = serializers.BooleanField() + + tissue = serializers.CharField() + tissue_label = serializers.CharField() + + method = serializers.CharField() + method_label = serializers.CharField() + + label = serializers.CharField() + output_type = serializers.CharField() + + time = serializers.FloatField() + time_unit = serializers.CharField() + + measurement_type = serializers.CharField() + measurement_type_label = serializers.CharField() + + choice = serializers.CharField() + choice_label = serializers.CharField() + + substance = serializers.CharField() + substance_label = serializers.CharField() + + + value = serializers.FloatField() + mean = serializers.FloatField() + median = serializers.FloatField() + min = serializers.FloatField() + max = serializers.FloatField() + sd = serializers.FloatField() + se = serializers.FloatField() + cv = serializers.FloatField() + unit = serializers.CharField() + + class Meta: - model = OutputIntervention fields = ["study_sid", "study_name", "output_pk", "intervention_pk", "group_pk", "individual_pk", "normed", "calculated"] + OUTPUT_FIELDS + MEASUREMENTTYPE_FIELDS - read_only_fields = fields class SmallOutputSerializer(serializers.ModelSerializer): diff --git a/backend/pkdb_app/serializers.py b/backend/pkdb_app/serializers.py index 358daee8..4a758a01 100644 --- a/backend/pkdb_app/serializers.py +++ b/backend/pkdb_app/serializers.py @@ -762,7 +762,6 @@ def validate_dict(dic): "detail": dic} ) - class StudySmallElasticSerializer(serializers.ModelSerializer): class Meta: model = Study diff --git a/backend/pkdb_app/studies/documents.py b/backend/pkdb_app/studies/documents.py index 8b17546f..003a9412 100644 --- a/backend/pkdb_app/studies/documents.py +++ b/backend/pkdb_app/studies/documents.py @@ -196,7 +196,6 @@ class StudyDocument(Document): ) dataset = common_setfields("subsets") - class Django: model = Study # Ignore auto updating of Elasticsearch when a model is saved/deleted diff --git a/backend/pkdb_app/studies/serializers.py b/backend/pkdb_app/studies/serializers.py index 88c176de..c990abdd 100644 --- a/backend/pkdb_app/studies/serializers.py +++ b/backend/pkdb_app/studies/serializers.py @@ -8,7 +8,6 @@ from pkdb_app.data.serializers import DataSetSerializer, DataSetElasticSmallSerializer from rest_framework import serializers - from pkdb_app import utils from pkdb_app.outputs.models import OutputSet from pkdb_app.outputs.serializers import OutputSetSerializer, OutputSetElasticSmallSerializer @@ -19,7 +18,8 @@ DescriptionElasticSerializer from ..interventions.models import DataFile, InterventionSet from ..interventions.serializers import InterventionSetSerializer, InterventionSetElasticSmallSerializer -from ..serializers import WrongKeyValidationSerializer, SidSerializer, StudySmallElasticSerializer, SidNameLabelSerializer +from ..serializers import WrongKeyValidationSerializer, SidSerializer, StudySmallElasticSerializer, \ + SidNameLabelSerializer from ..subjects.models import GroupSet, IndividualSet from ..subjects.serializers import GroupSetSerializer, IndividualSetSerializer, DataFileElasticSerializer, \ GroupSetElasticSmallSerializer, IndividualSetElasticSmallSerializer @@ -193,11 +193,11 @@ def to_internal_value(self, data): data["creator"] = self.get_or_val_error(User, username=creator) # curators to internal - if hasattr(data,"curators"): - if len(data.get("curators",[])) == 0: - raise serializers.ValidationError( - {"curators": "At least One curator is required"} - ) + if hasattr(data, "curators"): + if len(data.get("curators", [])) == 0: + raise serializers.ValidationError( + {"curators": "At least One curator is required"} + ) else: ratings = [] for curator_and_rating in data.get("curators", []): @@ -313,6 +313,7 @@ def related_sets(): ("dataset", DataSet), ] ) + def related_serializer(self): return OrderedDict( [ @@ -339,9 +340,10 @@ def pop_relations(self, validated_data): "collaborators": User, "files": DataFile, } - related_foreignkeys_dict = OrderedDict([(name, validated_data.pop(name, None)) for name in related_foreignkeys.keys()]) + related_foreignkeys_dict = OrderedDict( + [(name, validated_data.pop(name, None)) for name in related_foreignkeys.keys()]) related_many2many_dict = OrderedDict([(name, validated_data.pop(name)) for name in related_many2many.keys() if - name in validated_data]) + name in validated_data]) related = OrderedDict(list(related_foreignkeys_dict.items()) + list(related_many2many_dict.items())) return related @@ -361,16 +363,13 @@ def create_relations(self, study, related): if getattr(study, name): getattr(study, name).delete() - this_serializer = serializer(context=context) instance = this_serializer.create(validated_data={**related[name]}) - setattr(study, name, instance) study.save() - if "curators" in related: if related["curators"]: @@ -403,7 +402,6 @@ def create_relations(self, study, related): study.save() - return study def validate(self, attrs): @@ -414,7 +412,7 @@ def validate(self, attrs): else: if attrs.get("date", None) is not None: _validate_not_allowed_key(attrs, "date", extra_message="For a study without a '^PKDB\d+$' identifier " - "the date must not be set in the study.json.") + "the date must not be set in the study.json.") if "curators" in attrs and "creator" in attrs: if attrs["creator"] not in [curator["user"] for curator in attrs["curators"]]: @@ -516,8 +514,8 @@ class StudyElasticSerializer(serializers.ModelSerializer): name = serializers.CharField(help_text="Name of the study. The convention is to deduce the name from the " "refererence with the following pattern " - "'[Author][PublicationYear][A-Z(optional)]'." ) - licence = serializers.CharField(help_text="Licence",) + "'[Author][PublicationYear][A-Z(optional)]'.") + licence = serializers.CharField(help_text="Licence", ) access = serializers.CharField() curators = CuratorRatingElasticSerializer(many=True, ) @@ -526,7 +524,7 @@ class StudyElasticSerializer(serializers.ModelSerializer): substances = SidNameLabelSerializer(many=True, ) - files = serializers.SerializerMethodField() # DataFileElasticSerializer(many=True, ) + files = serializers.SerializerMethodField() comments = CommentElasticSerializer(many=True, ) descriptions = DescriptionElasticSerializer(many=True, ) @@ -594,21 +592,38 @@ def get_files(self, obj): else: return [] -class StudyAnalysisSerializer(serializers.ModelSerializer): + +class StudyAnalysisSerializer(serializers.Serializer): sid = serializers.CharField() - name= serializers.CharField() + name = serializers.CharField() licence = serializers.CharField() access = serializers.CharField() + date = serializers.DateField() + + creator = serializers.SerializerMethodField() + curators = serializers.SerializerMethodField() substances = serializers.SerializerMethodField() + reference_pmid = serializers.SerializerMethodField() reference_title = serializers.SerializerMethodField() - creator = serializers.SerializerMethodField() - curators = serializers.SerializerMethodField() + reference_date = serializers.DateField() + def get_substances(self, obj): + return [s["label"] for s in obj.substances] - class Meta: - model = Study + def get_reference_pmid(self, obj): + return obj.reference["pmid"] + def get_reference_title(self, obj): + return obj.reference["title"] + + def get_creator(self, obj): + return obj.creator["username"] + + def get_curators(self, obj): + return [s["username"] for s in obj.curators] + + class Meta: fields = [ "sid", "name", @@ -624,18 +639,3 @@ class Meta: ] read_only_fields = fields - - def get_substances(self, obj): - return [s["label"] for s in obj.substances] - - def get_reference_pmid(self, obj): - return obj.reference["pmid"] - - def get_reference_title(self, obj): - return obj.reference["title"] - - def get_creator(self, obj): - return obj.creator["username"] - - def get_curators(self, obj): - return [s["username"] for s in obj.curators] \ No newline at end of file diff --git a/backend/pkdb_app/studies/views.py b/backend/pkdb_app/studies/views.py index d0e3391b..b0bf088f 100644 --- a/backend/pkdb_app/studies/views.py +++ b/backend/pkdb_app/studies/views.py @@ -1,4 +1,3 @@ - import tempfile import uuid import zipfile @@ -17,7 +16,7 @@ from django.http import JsonResponse, HttpResponse from django.utils.decorators import method_decorator from django.views.decorators.csrf import csrf_exempt -from django_elasticsearch_dsl_drf.constants import LOOKUP_QUERY_IN +from django_elasticsearch_dsl_drf.constants import LOOKUP_QUERY_IN, LOOKUP_QUERY_EXCLUDE from django_elasticsearch_dsl_drf.filter_backends import FilteringFilterBackend, \ OrderingFilterBackend, IdsFilterBackend, MultiMatchSearchFilterBackend, CompoundSearchFilterBackend from django_elasticsearch_dsl_drf.viewsets import BaseDocumentViewSet, DocumentViewSet @@ -27,10 +26,11 @@ from elasticsearch_dsl.query import Q from pkdb_app.data.documents import DataAnalysisDocument, SubSetDocument -from pkdb_app.data.models import SubSet, Data, DataPoint -from pkdb_app.data.views import SubSetViewSet, DataAnalysisViewSet -from pkdb_app.documents import AccessView, UUID_PARAM -from pkdb_app.interventions.serializers import InterventionElasticSerializerAnalysis +from pkdb_app.data.models import SubSet, Data +from pkdb_app.data.serializers import TimecourseSerializer +from pkdb_app.data.views import SubSetViewSet +from pkdb_app.documents import UUID_PARAM +from pkdb_app.interventions.serializers import InterventionElasticSerializerAnalysis from pkdb_app.outputs.serializers import OutputInterventionSerializer from pkdb_app.subjects.serializers import GroupCharacteristicaSerializer, IndividualCharacteristicaSerializer from rest_framework.generics import get_object_or_404 @@ -224,6 +224,7 @@ def related_elastic_dict(study): docs_dict[ReferenceDocument] = study.reference return docs_dict + @method_decorator(name='list', decorator=swagger_auto_schema(manual_parameters=[UUID_PARAM])) class ElasticStudyViewSet(BaseDocumentViewSet, APIView): """ Endpoint to query studies @@ -336,6 +337,27 @@ def get_queryset(self): return qs +class StudyAnalysisViewSet(ElasticStudyViewSet): + swagger_schema = None + serializer_class = StudyAnalysisSerializer + filter_fields = { + 'study_sid': {'field': 'sid.raw', + 'lookups': [ + LOOKUP_QUERY_IN, + LOOKUP_QUERY_EXCLUDE, + + ], + }, + 'study_name': {'field': 'name.raw', + 'lookups': [ + LOOKUP_QUERY_IN, + LOOKUP_QUERY_EXCLUDE, + + ], + }, + } + + class ElasticReferenceViewSet(BaseDocumentViewSet): """Read/query/search references. """ swagger_schema = None @@ -345,7 +367,8 @@ class ElasticReferenceViewSet(BaseDocumentViewSet): pagination_class = CustomPagination permission_classes = (IsAdminOrCreatorOrCurator,) serializer_class = ReferenceElasticSerializer - filter_backends = [FilteringFilterBackend, IdsFilterBackend, OrderingFilterBackend, CompoundSearchFilterBackend, MultiMatchSearchFilterBackend] + filter_backends = [FilteringFilterBackend, IdsFilterBackend, OrderingFilterBackend, CompoundSearchFilterBackend, + MultiMatchSearchFilterBackend] search_fields = ( 'sid', 'pmid', @@ -377,6 +400,7 @@ class ElasticReferenceViewSet(BaseDocumentViewSet): class PKData(object): """ PKData represents a consistent set of pharmacokinetic data. """ + def __init__(self, request, concise: bool = True, @@ -393,7 +417,6 @@ def __init__(self, self.request = request - time_init = time.time() self.outputs = Output.objects.filter(normed=True).select_related("study__sid").prefetch_related( @@ -402,7 +425,6 @@ def __init__(self, queryset=Intervention.objects.only('id'))).only( 'group_id', 'individual_id', "id", "interventions__id", "subset__id", "output_type") - # --- Elastic --- if studies_query: self.studies_query = studies_query @@ -411,7 +433,7 @@ def __init__(self, self.outputs = self.outputs.filter(study_id__in=studies_pks) else: - studies_pks = StudyViewSet.filter_on_permissions(request.user,Study.objects).values_list("id", flat=True) + studies_pks = StudyViewSet.filter_on_permissions(request.user, Study.objects).values_list("id", flat=True) self.outputs = self.outputs.filter(study_id__in=Subquery(studies_pks)) self.studies = Study.objects.filter(id__in=studies_pks) @@ -426,11 +448,10 @@ def __init__(self, time_elastic_individuals = time.time() if concise: self.outputs = self.outputs.filter( - DQ(group_id__in=groups_pks) | DQ(individual_id__in=individuals_pks)) + DQ(group_id__in=groups_pks) | DQ(individual_id__in=individuals_pks)) else: - self.studies = self.studies.filter(DQ(groups__id__in=groups_pks) | DQ(individuals__id__in=individuals_pks)) - - + self.studies = self.studies.filter( + DQ(groups__id__in=groups_pks) | DQ(individuals__id__in=individuals_pks)) if interventions_query: self.interventions_query = {"normed": "true", **interventions_query} @@ -451,7 +472,6 @@ def __init__(self, self.studies = self.studies.filter(outputs__id__in=outputs_pks) - time_elastic = time.time() time_loop_start = time.time() @@ -464,7 +484,8 @@ def __init__(self, timecourses = set() scatters = set() - for output in self.outputs.values("study_id","group_id", "individual_id", "id", "interventions__id", "subset__id", "output_type"): + for output in self.outputs.values("study_id", "group_id", "individual_id", "id", "interventions__id", + "subset__id", "output_type"): studies.add(output["study_id"]) if output["group_id"]: groups.add(output["group_id"]) @@ -489,7 +510,6 @@ def __init__(self, "outputs": list(outputs), "timecourses": list(timecourses), "scatters": list(scatters), - } else: @@ -507,8 +527,10 @@ def __init__(self, "individuals": list(self.individuals.values_list("pk", flat=True)), "interventions": list(self.interventions.values_list("pk", flat=True)), "outputs": list(self.outputs.values_list("pk", flat=True)), - "timecourses": list(self.subset.filter(data__data_type=Data.DataTypes.Timecourse).values_list("pk", flat=True)), - "scatters": list(self.subset.filter(data__data_type=Data.DataTypes.Scatter).values_list("pk", flat=True)), + "timecourses": list( + self.subset.filter(data__data_type=Data.DataTypes.Timecourse).values_list("pk", flat=True)), + "scatters": list( + self.subset.filter(data__data_type=Data.DataTypes.Scatter).values_list("pk", flat=True)), } time_loop_end = time.time() @@ -521,7 +543,7 @@ def __init__(self, print("init:", time_init - time_start) print("elastic:", time_elastic - time_init) print("django:", time_django - time_elastic) - print("Loop:", time_loop_end- time_loop_start) + print("Loop:", time_loop_end - time_loop_start) print("-" * 80) @@ -539,7 +561,7 @@ def individual_pks(self): return self._pks(view_class=IndividualViewSet, query_dict=self.individuals_query) def output_pks(self): - return self._pks(view_class=ElasticOutputViewSet, query_dict=self.outputs_query,scan_size=20000) + return self._pks(view_class=ElasticOutputViewSet, query_dict=self.outputs_query, scan_size=20000) def subset_pks(self): return self._pks(view_class=SubSetViewSet, query_dict=self.subsets_query) @@ -547,7 +569,7 @@ def subset_pks(self): def study_pks(self): return self._pks(view_class=ElasticStudyViewSet, query_dict=self.studies_query, pk_field="pk") - def set_request_get(self, query_dict:Dict): + def set_request_get(self, query_dict: Dict): """ :param query_dict: @@ -558,7 +580,7 @@ def set_request_get(self, query_dict:Dict): get[k] = v self.request._request.GET = get - def _pks(self, view_class: DocumentViewSet, query_dict: Dict, pk_field: str="pk", scan_size=10000): + def _pks(self, view_class: DocumentViewSet, query_dict: Dict, pk_field: str = "pk", scan_size=10000): """ query elastic search for pks. """ @@ -569,11 +591,17 @@ def _pks(self, view_class: DocumentViewSet, query_dict: Dict, pk_field: str="pk" response = queryset.source([pk_field]).params(size=scan_size).scan() return [instance[pk_field] for instance in response] - def data_by_query_dict(self,query_dict, viewset, serializer): + def data_by_query_dict(self, query_dict, viewset, serializer, boost): view = viewset(request=self.request) queryset = view.filter_queryset(view.get_queryset()) - queryset = queryset.filter("terms",**query_dict).source(serializer.Meta.fields) - return [hit.to_dict() for hit in queryset.params(size=10000).scan()] + if boost: + queryset = queryset.filter("terms", **query_dict).source(serializer.Meta.fields) + return [hit.to_dict() for hit in queryset.params(size=5000).scan()] + + else: + queryset = queryset.filter("terms", **query_dict) + + return serializer(queryset.params(size=5000).scan(), many=True).data class ResponseSerializer(serializers.Serializer): @@ -582,13 +610,15 @@ class ResponseSerializer(serializers.Serializer): required=True, allow_null=False, help_text="The resulting queries can be accessed by adding this uuid as " - "an argument to the endpoints: /studies/, /groups/, /individuals/, /outputs/, /timecourses/, /subsets/." + "an argument to the endpoints: /studies/, /groups/, /individuals/, /outputs/, /timecourses/, /subsets/." ) studies = serializers.IntegerField(required=True, allow_null=False, help_text="Number of resulting studies.") groups = serializers.IntegerField(required=True, allow_null=False, help_text="Number of resulting groups.") - individuals = serializers.IntegerField(required=True, allow_null=False, help_text="Number of resulting individuals.") + individuals = serializers.IntegerField(required=True, allow_null=False, + help_text="Number of resulting individuals.") outputs = serializers.IntegerField(required=True, allow_null=False, help_text="Number of resulting outputs.") - timecourses = serializers.IntegerField(required=True, allow_null=False, help_text="Number of resulting timecourses.") + timecourses = serializers.IntegerField(required=True, allow_null=False, + help_text="Number of resulting timecourses.") scatters = serializers.IntegerField(required=True, allow_null=False, help_text="Number of resulting scatters.") @@ -678,7 +708,6 @@ def _get_param(self, key, request): } ) - def get(self, request, *args, **kw): time_start_request = time.time() @@ -711,57 +740,58 @@ def get(self, request, *args, **kw): if request.GET.get("download") == "true": + def serialize_scatters(ids): + scatter_subsets = SubSet.objects.filter(id__in=ids) + return [t.scatter_representation for t in scatter_subsets] - - def serialize_scatter(ids): - scatter_subsets = SubSet.objects.filter(id__in=ids).prefetch_related('data_points') - return [t.scatter_representation() for t in scatter_subsets] - - Sheet = namedtuple("Sheet", ["sheet_name", "query_dict", "viewset", "serializer", "function"]) + Sheet = namedtuple("Sheet", + ["sheet_name", "query_dict", "viewset", "serializer", "function", "boost_performance", ]) table_content = { - "studies": Sheet("Studies", {"pk": pkdata.ids["studies"]}, ElasticStudyViewSet, StudyAnalysisSerializer, None), - "groups": Sheet("Groups", {"group_pk": pkdata.ids["groups"]}, GroupCharacteristicaViewSet, GroupCharacteristicaSerializer, None), - "individuals": Sheet("Individuals", {"individual_pk": pkdata.ids["individuals"]}, IndividualCharacteristicaViewSet,IndividualCharacteristicaSerializer, None), - "interventions": Sheet("Interventions", {"pk": pkdata.ids["interventions"]} ,ElasticInterventionAnalysisViewSet, InterventionElasticSerializerAnalysis, None), - "outputs": Sheet("Outputs", {"output_pk": pkdata.ids["outputs"]}, OutputInterventionViewSet, OutputInterventionSerializer, None), - #"timecourses": Sheet("Timecourses", {"subset_pk": pkdata.ids["timecourses"]}, None, None, serialize_timecourses), - "scatters": Sheet("Scatter", {"subset_pk": pkdata.ids["scatters"]}, None, None, serialize_scatter), + "studies": Sheet("Studies", {"pk": pkdata.ids["studies"]}, ElasticStudyViewSet, StudyAnalysisSerializer, + None, False), + "groups": Sheet("Groups", {"group_pk": pkdata.ids["groups"]}, GroupCharacteristicaViewSet, + GroupCharacteristicaSerializer, None, True, ), + "individuals": Sheet("Individuals", {"individual_pk": pkdata.ids["individuals"]}, + IndividualCharacteristicaViewSet, IndividualCharacteristicaSerializer, None, True), + "interventions": Sheet("Interventions", {"pk": pkdata.ids["interventions"]}, + ElasticInterventionAnalysisViewSet, InterventionElasticSerializerAnalysis, None, + False), + "outputs": Sheet("Outputs", {"output_pk": pkdata.ids["outputs"]}, OutputInterventionViewSet, + OutputInterventionSerializer, None, True), + "timecourses": Sheet("Timecourses", {"pk": pkdata.ids["timecourses"]}, SubSetViewSet, + TimecourseSerializer, None, False), + "scatters": Sheet("Scatter", {"subset_pk": pkdata.ids["scatters"]}, None, None, serialize_scatters, + None), } - + # Create archive with tempfile.SpooledTemporaryFile() as tmp: with zipfile.ZipFile(tmp, 'w', zipfile.ZIP_DEFLATED) as archive: download_times = {} for key, sheet in table_content.items(): - download_time_start = time.time() + download_time_start = time.time() string_buffer = StringIO() if sheet.function: df = pd.DataFrame(sheet.function(sheet.query_dict["subset_pk"])) + df.to_csv(string_buffer) + archive.writestr(f'{key}.csv', string_buffer.getvalue()) + download_times[key] = time.time() - download_time_start else: - data = pkdata.data_by_query_dict(sheet.query_dict,sheet.viewset,sheet.serializer) - df = pd.DataFrame(data) - def sorted_tuple(v): - return sorted(tuple(v)) - - if key=="outputs": - - timecourse_df = df[df["output_type"] == Output.OutputTypes.Timecourse] - timecourse_df = pd.pivot_table(data=timecourse_df,index=["output_pk"], aggfunc=sorted_tuple).apply(SubSet.to_list) - timecourse_df = pd.pivot_table(data=timecourse_df,index=["label","study_name"], aggfunc=tuple).apply(SubSet.to_list) - timecourse_df.to_csv(string_buffer) - archive.writestr(f'timecourse.csv', string_buffer.getvalue()) - - df.to_csv(string_buffer) - archive.writestr(f'{key}.csv', string_buffer.getvalue()) - download_times[key] = time.time()-download_time_start + df = pd.DataFrame( + pkdata.data_by_query_dict(sheet.query_dict, sheet.viewset, sheet.serializer, + sheet.boost_performance)) + if len(df) < 0: + df = df[sheet.serializer.Meta.fields] + df.to_csv(string_buffer) + archive.writestr(f'{key}.csv', string_buffer.getvalue()) + download_times[key] = time.time() - download_time_start + archive.write('download_extra/README.md', 'README.md') archive.write('download_extra/TERMS_OF_USE.md', 'TERMS_OF_USE.md') - - tmp.seek(0) resp = HttpResponse(tmp.read(), content_type='application/x-zip-compressed') resp['Content-Disposition'] = "attachment; filename=%s" % "pkdata.zip" diff --git a/backend/pkdb_app/subjects/documents.py b/backend/pkdb_app/subjects/documents.py index dca8aa30..1070942d 100644 --- a/backend/pkdb_app/subjects/documents.py +++ b/backend/pkdb_app/subjects/documents.py @@ -76,6 +76,8 @@ class Django: class Index: name = 'individuals' settings = elastic_settings + settings['max_result_window'] = 100000 + # ------------------------------------ @@ -117,6 +119,8 @@ class Django: class Index: name = 'groups' settings = elastic_settings + settings['max_result_window'] = 100000 + def get_queryset(self): """Not mandatory but to improve performance we can select related in one sql request""" @@ -196,7 +200,7 @@ class Django: class Index: name = "group_characteristica" - settings = {**elastic_settings, 'max_result_window': 50000} + settings = {**elastic_settings, 'max_result_window': 100000} def get_queryset(self): """Not mandatory but to improve performance we can select related in one sql request""" @@ -275,7 +279,7 @@ class Django: class Index: name = "individual_characteristica" - settings = {**elastic_settings, 'max_result_window': 50000} + settings = {**elastic_settings, 'max_result_window': 100000} def get_queryset(self): """Not mandatory but to improve performance we can select related in one sql request""" diff --git a/backend/pkdb_app/subjects/serializers.py b/backend/pkdb_app/subjects/serializers.py index 96d2d9ed..657bb3db 100644 --- a/backend/pkdb_app/subjects/serializers.py +++ b/backend/pkdb_app/subjects/serializers.py @@ -15,8 +15,7 @@ DataFile, Individual, CharacteristicaEx, - GroupEx, - GroupCharacteristica, IndividualCharacteristica) + GroupEx ) from ..comments.serializers import DescriptionSerializer, CommentSerializer, DescriptionElasticSerializer, \ CommentElasticSerializer from ..serializers import WrongKeyValidationSerializer, ExSerializer, ReadSerializer @@ -728,15 +727,59 @@ class Meta: ) -class GroupCharacteristicaSerializer(serializers.ModelSerializer): +class GroupCharacteristicaSerializer(serializers.Serializer): + study_sid = serializers.CharField() + study_name = serializers.CharField() + group_pk = serializers.IntegerField() + group_name = serializers.CharField() + group_count = serializers.IntegerField() + group_parent_pk = serializers.IntegerField() + characteristica_pk = serializers.IntegerField() + count = serializers.IntegerField() + + measurement_type =serializers.CharField() + choice = serializers.CharField() + substance =serializers.CharField() + + value = serializers.FloatField() + mean = serializers.FloatField() + median = serializers.FloatField() + min = serializers.FloatField() + max = serializers.FloatField() + sd = serializers.FloatField() + se = serializers.FloatField() + cv = serializers.FloatField() + unit = serializers.CharField() + class Meta: - model = GroupCharacteristica fields = ['study_sid', 'study_name', 'group_pk', 'group_name', 'group_count', 'group_parent_pk', 'characteristica_pk', 'count'] + MEASUREMENTTYPE_FIELDS -class IndividualCharacteristicaSerializer(serializers.ModelSerializer): +class IndividualCharacteristicaSerializer(serializers.Serializer): + + study_sid = serializers.CharField() + study_name = serializers.CharField() + individual_pk = serializers.IntegerField() + individual_name = serializers.CharField() + individual_group_pk = serializers.IntegerField() + characteristica_pk = serializers.IntegerField() + count = serializers.IntegerField() + + measurement_type = serializers.CharField() + choice = serializers.CharField() + substance = serializers.CharField() + + value = serializers.FloatField() + mean = serializers.FloatField() + median = serializers.FloatField() + min = serializers.FloatField() + max = serializers.FloatField() + sd = serializers.FloatField() + se = serializers.FloatField() + cv = serializers.FloatField() + unit = serializers.CharField() + class Meta: - model = IndividualCharacteristica fields = ['study_sid', 'study_name', 'individual_pk', 'individual_name', 'individual_group_pk', 'characteristica_pk', 'count'] + MEASUREMENTTYPE_FIELDS diff --git a/backend/pkdb_app/urls.py b/backend/pkdb_app/urls.py index 824c14e9..e4a3abf7 100755 --- a/backend/pkdb_app/urls.py +++ b/backend/pkdb_app/urls.py @@ -4,7 +4,7 @@ from django.conf.urls import url from django.urls import path, include from drf_yasg.views import get_schema_view -from pkdb_app.data.views import DataAnalysisViewSet, SubSetViewSet +from pkdb_app.data.views import DataAnalysisViewSet, SubSetViewSet, TimecourseViewSet from rest_framework.authtoken.views import obtain_auth_token from rest_framework.routers import DefaultRouter @@ -31,7 +31,7 @@ StudyViewSet, ElasticReferenceViewSet, ElasticStudyViewSet, - update_index_study, PKDataView, + update_index_study, PKDataView, StudyAnalysisViewSet, ) from .subjects.views import ( DataFileViewSet, @@ -83,11 +83,13 @@ router.register('_info_nodes', InfoNodeViewSet, basename="_info_nodes") # django -router.register("flat/interventions", ElasticInterventionAnalysisViewSet, basename="interventions_analysis") -router.register("flat/groups", GroupCharacteristicaViewSet, basename="groups_analysis") -router.register("flat/individuals", IndividualCharacteristicaViewSet, basename="individuals_analysis") -router.register("flat/output", OutputInterventionViewSet, basename="output_analysis") -router.register("flat/data", DataAnalysisViewSet, basename="data_analysis") +router.register("pkdata/studies", StudyAnalysisViewSet, basename="studies_analysis") +router.register("pkdata/interventions", ElasticInterventionAnalysisViewSet, basename="interventions_analysis") +router.register("pkdata/groups", GroupCharacteristicaViewSet, basename="groups_analysis") +router.register("pkdata/individuals", IndividualCharacteristicaViewSet, basename="individuals_analysis") +router.register("pkdata/outputs", OutputInterventionViewSet, basename="output_analysis") +router.register("pkdata/data", DataAnalysisViewSet, basename="data_analysis") +router.register("pkdata/timecourses", TimecourseViewSet, basename="timecourse_analysis") diff --git a/backend/requirements.txt b/backend/requirements.txt index 3d71973e..71c0caaa 100644 --- a/backend/requirements.txt +++ b/backend/requirements.txt @@ -5,7 +5,7 @@ json-logging>=1.2.6 psycopg2-binary>=2.8.5 # django -Django == 3.1.1 +Django == 3.1.3 django-model-utils>=4.0.0 django-extra-fields>=3.0.0 django-storages>=1.9.1 diff --git a/backend/setup.py b/backend/setup.py index 6607d443..2f235df3 100644 --- a/backend/setup.py +++ b/backend/setup.py @@ -1,5 +1,4 @@ #!/usr/bin/env python -# -*- encoding: utf-8 -*- """ pkdb_app pip package """ @@ -43,7 +42,7 @@ def read(*names, **kwargs): raise RuntimeError("Unable to find version string") # description from markdown -long_description = read('README.md') +long_description = read('download_extra/README.md') setup_kwargs['long_description'] = long_description setup( diff --git a/elastic-rebuild-index.sh b/elastic-rebuild-index.sh index 25257f50..0d41e2f9 100755 --- a/elastic-rebuild-index.sh +++ b/elastic-rebuild-index.sh @@ -4,3 +4,5 @@ # ----------------------------------------------------------------------------- : "${PKDB_DOCKER_COMPOSE_YAML:?The 'PKDB_*' environment variables must be exported.}" docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py search_index --rebuild -f +# rebuild single +# docker-compose -f $PKDB_DOCKER_COMPOSE_YAML run --rm backend ./manage.py search_index --rebuild -f --models [e.g interventions] \ No newline at end of file diff --git a/frontend/package.json b/frontend/package.json index 43bdf54f..792b7f0c 100644 --- a/frontend/package.json +++ b/frontend/package.json @@ -13,9 +13,6 @@ "acorn": "^7.4.0", "axios": "^0.19.2", "base-64": "^0.1.0", - "color-normalize": "1.5.0", - "color-rgba": "2.1.1", - "color-parse": "1.3.8", "vega": "^5.16.1", "vega-embed": "^6.12.2", "vega-lite": "^4.16.7", diff --git a/frontend/src/components/Home.vue b/frontend/src/components/Home.vue index 334678ff..b6259c74 100644 --- a/frontend/src/components/Home.vue +++ b/frontend/src/components/Home.vue @@ -15,6 +15,26 @@ pharmacokinetics data enriched with the required meta-information for computational modeling and data integration.

+ + + + + + fas fa-file-alt + + + + + + PK-DB: pharmacokinetics database for individualized and stratified computational modeling
+ Grzegorzewski J, Brandhorst J, Green K, Eleftheriadou D, Duport Y, Barthorscht F, Köller A, Ke DYJ, De Angelis S, König M.
+ Nucleic Acids Res. 2020 Nov 5:gkaa990. doi: 10.1093/nar/gkaa990. Epub ahead of print. PMID: 33151297 +
+
+

Data

diff --git a/frontend/src/components/detail/SubjectDetail.vue b/frontend/src/components/detail/SubjectDetail.vue index ff3f5958..1f379540 100644 --- a/frontend/src/components/detail/SubjectDetail.vue +++ b/frontend/src/components/detail/SubjectDetail.vue @@ -4,7 +4,7 @@

{{ faIcon(subject_type) }} {{ subject.name }} - +

diff --git a/frontend/src/components/home/About.vue b/frontend/src/components/home/About.vue index 52b95a5d..676b4a8b 100644 --- a/frontend/src/components/home/About.vue +++ b/frontend/src/components/home/About.vue @@ -34,9 +34,10 @@

How to cite

- PK-DB: PharmacoKinetics DataBase for Individualized and Stratified Computational Modeling
- Jan Grzegorzewski, Janosch Brandhorst, Dimitra Eleftheriadou, Kathleen Green, Matthias König
+ PK-DB: pharmacokinetics database for individualized and stratified computational modeling
+ Grzegorzewski J, Brandhorst J, Green K, Eleftheriadou D, Duport Y, Barthorscht F, Köller A, Ke DYJ, De Angelis S, König M.
bioRxiv 760884; doi: https://doi.org/10.1101/760884 + Nucleic Acids Res. 2020 Nov 5:gkaa990. doi: 10.1093/nar/gkaa990. Epub ahead of print. PMID: 33151297

Licensing

diff --git a/release-notes/0.9.2.md b/release-notes/0.9.3.md similarity index 96% rename from release-notes/0.9.2.md rename to release-notes/0.9.3.md index b8183385..3af07648 100644 --- a/release-notes/0.9.2.md +++ b/release-notes/0.9.3.md @@ -1,4 +1,4 @@ -# Release notes for pkdb 0.9.2 +# Release notes for pkdb 0.9.3 ## New features ### frontend diff --git a/release-notes/0.9.4.md b/release-notes/0.9.4.md new file mode 100644 index 00000000..64ec16be --- /dev/null +++ b/release-notes/0.9.4.md @@ -0,0 +1,11 @@ +# Release notes for pkdb 0.9.4 + +## New features +- updated publication information + +## Fixes +- multiple fixes in serializers +- bugfixes in download and speedup +- bugfix groups and individuals JSON button (#660) +- security bugfix django (#665) +