Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Match works to DMPs #17

Merged
merged 2 commits into from
Nov 12, 2024
Merged

Match works to DMPs #17

merged 2 commits into from
Nov 12, 2024

Conversation

jdddog
Copy link
Collaborator

@jdddog jdddog commented Sep 19, 2024

COKI Google BigQuery work for assisting the DMPTool with matching DMPs to research outputs in Crossref, DataCite and OpenAlex.

This PR consists of:

  • Initial Apache Airflow code (python)
  • Google BigQuery SQL queries (SQL with jinja2)
  • Using Google's latest text embedding models text-embedding-004 and text-multilingual-embedding-002

Supporting documentation can be found here (will be moved into the wiki as we get closer to finalizing the process):

Addresses tickets:

Copy link
Collaborator

@briri briri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good @jdddog and thanks for the walkthrough yesterday

@jdddog
Copy link
Collaborator Author

jdddog commented Oct 7, 2024

looks good @jdddog and thanks for the walkthrough yesterday

Thanks Brian, you're welcome.

I've pushed the Apache Airflow workflow here. I need to integrate it with the DMPTool APIs and also run the queries in individual tasks in the workflow so that if one query fails everything doesn't need to be re-run.

@briri briri marked this pull request as ready for review November 12, 2024 18:29
@briri briri merged commit a3d1a92 into main Nov 12, 2024
@briri briri deleted the feature/dmp-works-matching branch November 12, 2024 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants