Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create archive function #34

Open
aalbino2 opened this issue Feb 15, 2024 · 6 comments · May be fixed by #35
Open

create archive function #34

aalbino2 opened this issue Feb 15, 2024 · 6 comments · May be fixed by #35
Assignees

Comments

@aalbino2
Copy link
Collaborator

aalbino2 commented Feb 15, 2024

I prepared the create_archive function. Let's discuss this after the IKZ workshop @hampusnasstrom:

def get_reference(upload_id, entry_id):
    return f'../uploads/{upload_id}/archive/{entry_id}#data'


def get_entry_id_from_file_name(filename, upload_id):
    from nomad.utils import hash
    return hash(upload_id, filename)


def create_archive(
    entry_dict, context, filename, file_type, logger, *, bypass_check: bool = False
):
    import yaml
    import json
    from nomad.datamodel.context import ClientContext
    if isinstance(context, ClientContext):
        return None
    if context.raw_path_exists(filename):
        with context.raw_file(filename, "r") as file:
            existing_dict = yaml.safe_load(file)
    if context.raw_path_exists(filename) and existing_dict != entry_dict:
        logger.error(
            f"{filename} archive file already exists. "
            f"You are trying to overwrite it with a different content. "
            f"To do so, remove the existing archive and click reprocess again."
        )
    if not context.raw_path_exists(filename) or existing_dict == entry_dict or bypass_check:
        with context.raw_file(filename, "w") as newfile:
            if file_type == "json":
                json.dump(entry_dict, newfile)
            elif file_type == "yaml":
                yaml.dump(entry_dict, newfile)
        context.upload.process_updated_raw_file(filename, allow_modify=True)

    return get_reference(
        context.upload_id,
        get_entry_id_from_file_name(filename, context.upload_id)
    )
@aalbino2
Copy link
Collaborator Author

I think also Micha can possible take this. Notice the more general entry_dict is an EntryArchive such the following:

entry_dict = EntryArchive(
                data=experiment_data,
                # m_context=archive.m_context,
                metadata=EntryMetadata(upload_id=archive.m_context.upload_id),
            )

@hampusnasstrom
Copy link
Collaborator

Why do you pass a dict and not an EntryData section to the function? Also what is the * argument for?

@aalbino2
Copy link
Collaborator Author

aalbino2 commented Feb 16, 2024

entry_dict is actually an improper variable name, because as you see above, it is an EntryArchive type! The EntryData type is nested inside, it is in the present example the experiment_data variable

The * means that bypass_check must be specified as a keyword argument when calling the function. For example, this would be a valid call:

        create_archive(
            experiment_archive.m_to_dict(),
            archive.m_context,
            experiment_filename,
            filetype,
            logger,
            bypass_check=True,
        )

@aalbino2
Copy link
Collaborator Author

by_pass check was just a @theodore idea to patch something few weeks ago. We can avoid putting it in our official plugin

@aalbino2 aalbino2 self-assigned this Feb 21, 2024
@aalbino2
Copy link
Collaborator Author

ask about if isinstance(context, ClientContext) failing in local tests @hampusnasstrom

@aalbino2 aalbino2 linked a pull request Feb 23, 2024 that will close this issue
@aalbino2
Copy link
Collaborator Author

aalbino2 commented Jun 25, 2024

Most updated version (from imem-nomad-plugin.utils):

def create_archive(
    entry_dict, context, filename, file_type, logger, *, overwrite: bool = False
):
    from nomad.datamodel.context import ClientContext
    from nomad.datamodel import EntryArchive

    file_exists = context.raw_path_exists(filename)
    dicts_are_equal = None
    if isinstance(context, ClientContext):
        return None
    if file_exists:
        with context.raw_file(filename, "r") as file:
            existing_dict = yaml.safe_load(file)
            dicts_are_equal = dict_nan_equal(existing_dict, entry_dict)
    if not file_exists or overwrite or dicts_are_equal:
        with context.raw_file(filename, "w") as newfile:
            if file_type == "json":
                json.dump(entry_dict, newfile)
            elif file_type == "yaml":
                yaml.dump(entry_dict, newfile)
        context.upload.process_updated_raw_file(filename, allow_modify=True)
    elif file_exists and not overwrite and not dicts_are_equal:
        logger.error(
            f"{filename} archive file already exists. "
            f"You are trying to overwrite it with a different content. "
            f"To do so, remove the existing archive and click reprocess again."
        )
    return get_hash_ref(context.upload_id, filename)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants