Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a metadata table - one-to-many relationship enhanced #83

Merged

Conversation

waterflow80
Copy link
Collaborator

Description

closes #13
This PR has almost the same logic as the one-to-many relationship approach (PR #82), however with a different approach.

Changes

Java

We'll be using a new Java model using the @ElementCollection, @CollectionTable in the SeqcolLevelOneEntity:

...
@ElementCollection(fetch = FetchType.LAZY)
@CollectionTable(name = "seqcol_md", joinColumns =
@JoinColumn(name = "digest", nullable = false, updatable = false))
private Set<SeqColMetadataEntity> metadata;
...

I think this will reduce the complexity of using @One-to-many relationship and managing different entities.
See more in this article.

Data model

The new seqcol_md table will remain almost the same, but without a primary key , only digest as a foreign key. This will allow for multiple entries with same digest (and eventually the other attributes):

Screenshot from 2024-04-20 22-34-53

(Timestamp is still not working for some reason, we can debug it later).

Endpoints

We added a new endpoint for the metadata:
/eva/webservices/seqcol/collection/{digest}/metadata, which will return the following result.

[
    {
        "sourceIdentifier": "GCA_000146045.2",
        "sourceUrl": null,
        "namingConvention": "UCSC",
        "timestamp": null
    }
]

I would like to hear your feedback about this approach, and about the different techniques, logic, and data structures used.

Copy link
Member

@tcezard tcezard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the Database design which looks good.
I offered some possible improvement that could be done here or in a subsequent PR.

import org.springframework.stereotype.Repository;

import uk.ac.ebi.eva.evaseqcol.entities.SeqColLevelOneEntity;
import uk.ac.ebi.eva.evaseqcol.entities.SeqColMetadataEntity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not being used in this class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -208,4 +215,28 @@ public List<SeqColExtendedDataEntity<List<Integer>>> constructIntegerListExtData
return integerListExtendedDataEntities;
}

public List<SeqColMetadataEntity> metadataObjectArrayListToMetadataList(List<Object[]> metadataArray) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method need to be public ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily, it can be private in our case. However I'm not sure if we might need to used it outside of this class in the future.

@@ -83,6 +83,13 @@ public ResponseEntity<?> getSeqColByDigestAndLevel(
return new ResponseEntity<>(HttpStatus.NOT_FOUND);
}

@GetMapping(value = "/collection/{digest}/metadata")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the rational for making the metadata available for a given digest and we could keep it for now.
However I think the metadata should injected in the SeqCol level2 to and be added with their own properties.

{
  "sequences": [...],
  "names": : [...],
  "length": : [...],
  "naming_convention": GENBANK,
  "source_id": "GCA_000001",
  "source_url": "https://..."
  "ingested_on": 2024-04-21T12:00:00
}

This could be added to SeqColService.getSeqColByDigestAndLevel

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be added on request via a path variable - e.g. /collection/{digest}?level=2&metadata=true - in case anyone wants the "plain" level 2 object?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be added on request via a path variable - e.g. /collection/{digest}?level=2&metadata=true - in case anyone wants the "plain" level 2 object?

I think that's a very good idea, it preserves the endpoint specified/required by the spec, and extend it with other optional parameters.

Copy link
Contributor

@apriltuesday apriltuesday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using ElementCollection seems reasonable here - if I understand correctly, we can query based on metadata fields (i.e. "give me all the seqcols ingested after X date"), but metadata rows can't be created without a corresponding seqcol and will be automatically deleted if we remove its corresponding seqcol (both of which make sense). And the code is simplified which is always good.

The new seqcol_md table will remain almost the same, but without a primary key , only digest as a foreign key.

Is there still a requirement that the combination of all fields is unique? Based on this paragraph. I think this is what we want, just wanted to clarify.

@@ -83,6 +83,13 @@ public ResponseEntity<?> getSeqColByDigestAndLevel(
return new ResponseEntity<>(HttpStatus.NOT_FOUND);
}

@GetMapping(value = "/collection/{digest}/metadata")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also be added on request via a path variable - e.g. /collection/{digest}?level=2&metadata=true - in case anyone wants the "plain" level 2 object?


public List<SeqColMetadataEntity> getAllMetadata() {
List<Object[]> metadataArrayList = repository.findAllMetadata();
return metadataObjectArrayListToMetadataList(metadataArrayList);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth using something like TupleTransformer in the repository to do this? I'm not sure how much difference it makes in practice compared to your conversion method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a made a little enhancement in the transformation of the fetched md objects using streams (commit), however using TupleTransformer might be a little bit tricky and require some changes in the design.

This is an example I found online that uses the TupleTransformer:

List<PostRecord> postRecords = entityManager.createQuery("""
    select
        p.id,
        p.title,
        p.createdOn,
        p.createdBy,
        p.updatedOn,
        p.updatedBy
    from Post p
    order by p.id
    """)
.unwrap(org.hibernate.query.Query.class)
.setTupleTransformer(
    (tuple, aliases) -> {
        int i =0;
        return new PostRecord(
            longValue(tuple[i++]),
            stringValue(tuple[i++]),
            new AuditRecord(
                localDateTimeValue(tuple[i++]),
                stringValue(tuple[i++]),
                localDateTimeValue(tuple[i++]),
                stringValue(tuple[i++])
            )
        );
    }
)
.getResultList();

Since we're using an interface repository, implementing this logic might require adding an additional repository class that will execute the custom query.

Not sure if that's gonna make the design more complex or not...

Idk.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, your implementation is pretty neat so let's keep it like that 👍

@waterflow80 waterflow80 merged commit e9548b9 into EBIvariation:main May 22, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding a 'database' for assembly accessions to map saved seqCol objects
3 participants