Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement ATLAS_Z0_7TEV_49FB_HIMASS #2178

Merged
merged 20 commits into from
Dec 6, 2024

Conversation

ecole41
Copy link
Collaborator

@ecole41 ecole41 commented Oct 17, 2024

This pull request introduces a new filtering module for the ATLAS Z0 7 TeV high mass dataset, along with several supporting utility functions and updates to various data files. The most important changes include the addition of filtering functions, utility functions for data extraction, and updates to metadata and raw data files.

Old vs New Data Comparison

https://vp.nnpdf.science/8EHbLXgpTTOQWrbIiQMXhg==/

New Filtering Module:

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/filter.py : Added functions to filter and write central values, kinematics, and systematics to YAML files.

Utility Functions

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/filter_utils.py : Added helper functions to extract uncertainties, kinematics, and data values from raw data files.

Metadata and Data Files

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/metadata.yaml :Updated metadata with new URLs and table references

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/rawdata/ATLAS-49fb-Zhighmass.csv : Added raw data file containing mass ranges and systematic uncertainties.

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/uncertainties.yaml : Added YAML file defining statistical and systematic uncertainties.

Compatibility Checks

Covariance Matrix check:

from validphys.api import API
import numpy as np
import os

def check_theory_exists(theory_id):
    theory_path = f"/Users/ellacole/miniconda3/envs/nnpdf_dev/share/NNPDF/theories/theory_{theory_id}"
    return os.path.exists(theory_path)

theory_id = 708 


if check_theory_exists(theory_id):
    inp1 = {"dataset_input": {"dataset": "ATLAS_Z0_7TEV_49FB_HIMASS"}, "theoryid": theory_id, "use_cuts": "internal"}
    inp2 = {"dataset_input": {"dataset": "ATLAS_Z0_7TEV_49FB_HIMASS", "variant": "legacy"}, "theoryid": theory_id, "use_cuts": "internal"}
    covmat1 = API.covmat_from_systematics(**inp1)
    covmat2 = API.covmat_from_systematics(**inp2)
    
    result = np.isclose(covmat1, covmat2)
    print(result)
else:
    print(f"Theory {theory_id} not found. ")
    
 Out:
 [[ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]

t0 Matrix Check:

from validphys.api import API
import numpy as np
 
inp1 = {"dataset_input": {"dataset": "ATLASZHIGHMASS49FB"}, "theoryid": 40_000_000, "use_cuts": "internal", "t0pdfset": "NNPDF40_nnlo_as_01180", "use_t0": True}
inp2 = {"dataset_input": {"dataset": "ATLAS_Z0_7TEV_49FB_HIMASS", "variant": "legacy"}, "theoryid": 40_000_000, "use_cuts": "internal", "t0pdfset": "NNPDF40_nnlo_as_01180", "use_t0": True}
 
covmat1 = API.covmat_from_systematics(**inp1)
covmat2 = API.covmat_from_systematics(**inp2)
 
t0_covmat1 = API.t0_covmat_from_systematics(**inp1)
t0_covmat2 = API.t0_covmat_from_systematics(**inp2)
 
result= np.all(np.isclose(covmat1, covmat2))
result_2 = np.all(np.isclose(t0_covmat1, t0_covmat2))
print('covmat',result)
print('t0_covmat',result_2)

[Out]:covmat True
t0_covmat True

@ecole41
Copy link
Collaborator Author

ecole41 commented Oct 30, 2024

This pull request introduces a new filtering module for the ATLAS Z0 7 TeV high mass dataset, along with several supporting utility functions and updates to various data files. The most important changes include the addition of filtering functions, utility functions for data extraction, and updates to metadata and raw data files.

New Filtering Module:

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/filter.py : Added functions to filter and write central values, kinematics, and systematics to YAML files.

Utility Functions

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/filter_utils.py : Added helper functions to extract uncertainties, kinematics, and data values from raw data files.

Metadata and Data Files

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/metadata.yaml :Updated metadata with new URLs and table references

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/rawdata/ATLAS-49fb-Zhighmass.csv : Added raw data file containing mass ranges and systematic uncertainties.

  • nnpdf_data/nnpdf_data/commondata/ATLAS_Z0_7TEV_49FB/uncertainties.yaml : Added YAML file defining statistical and systematic uncertainties.

Compatibility Checks

Covariance Matrix check:

from validphys.api import API
import numpy as np
import os

def check_theory_exists(theory_id):
    theory_path = f"/Users/ellacole/miniconda3/envs/nnpdf_dev/share/NNPDF/theories/theory_{theory_id}"
    return os.path.exists(theory_path)

theory_id = 708 


if check_theory_exists(theory_id):
    inp1 = {"dataset_input": {"dataset": "ATLAS_Z0_7TEV_49FB_HIMASS"}, "theoryid": theory_id, "use_cuts": "internal"}
    inp2 = {"dataset_input": {"dataset": "ATLAS_Z0_7TEV_49FB_HIMASS", "variant": "legacy"}, "theoryid": theory_id, "use_cuts": "internal"}
    covmat1 = API.covmat_from_systematics(**inp1)
    covmat2 = API.covmat_from_systematics(**inp2)
    
    result = np.isclose(covmat1, covmat2)
    print(result)
else:
    print(f"Theory {theory_id} not found. ")
    
 Out:
 [[ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]
 [ True  True  True  True  True]]
 

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! And great that the results are perfectly compatible!
Since the results don't change, you can set the data_central key to your file (and remove the data_legacy_HIMASS.yml file).
And the same for the uncertainties, you can set data_uncertainties: <your file>.

The other changes required are to set the kinematic variables to the ones we want to use in plots and so (right now they are called k1, k2, k3 and I'm not even sure what they are, rapidity, scale and something else). This part is tricky (maybe @comane can help you) because maybe one of the variables now is Q^2 and what we want to plot is Q or vice-versa.

Related to this, select one of the process from https://github.com/NNPDF/nnpdf/blob/master/validphys2/src/validphys/process_options.py
and set process_type to that process (I guess DY_Z_Y). This will tell you which variables you can use (e.g., if it is indeed DY_Z_Y, they are defined here

accepted_variables=(_Vars.y, _Vars.eta, _Vars.m_W2, _Vars.m_Z2, _Vars.sqrts),
).

This way you can create the kinematic coverage plot plot_xq2 for this dataset.

You also need to set kinematics_override: identity (the current value of that key has to do with some legacy code that will be removing soo-nish)

Btw, make sure to add the float prettifier to your filter file to avoid the x.000000001 numbers, see #2185 (comment)

import pandas as pd
import numpy as np


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some of the utilities in this file can either use utilities in https://github.com/NNPDF/nnpdf/tree/master/nnpdf_data/nnpdf_data/filter_utils (or, if not, be added there)

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm listing here some of the missing items.

  • Change the kinematic variable from k_i to their actual names
  • The kinematics_override should be removed or set to the identity
  • The process_type must be change to one of the processes in
    accepted_variables=(_Vars.y, _Vars.eta, _Vars.m_W2, _Vars.m_Z2, _Vars.sqrts),
  • Version comment, bump the version and remove the "Port of old commondata"
  • The data_uncertainties enty should point to the new file. If the files are equal you can even remove the old one (and make the legacy point to the new one)

@scarlehoff
Copy link
Member

Hi @ecole41 , here, beside the comment still unresolved, you need to either rebase on top of master or perhaps mege from master (resolving the conflicts).

This will bring also the test that @Radonirinaunimi added for the commondata filters.

@scarlehoff scarlehoff changed the title [WIP] Reimplement ATLAS_Z0_7TEV_49FB_HIMASS Reimplement ATLAS_Z0_7TEV_49FB_HIMASS Dec 5, 2024
@ecole41
Copy link
Collaborator Author

ecole41 commented Dec 5, 2024

Hi @ecole41 , here, beside the comment still unresolved, you need to either rebase on top of master or perhaps mege from master (resolving the conflicts).

This will bring also the test that @Radonirinaunimi added for the commondata filters.

Ok, I will try merging

@ecole41
Copy link
Collaborator Author

ecole41 commented Dec 5, 2024

Hi @ecole41 , here, beside the comment still unresolved, you need to either rebase on top of master or perhaps mege from master (resolving the conflicts).
This will bring also the test that @Radonirinaunimi added for the commondata filters.

Ok, I will try merging

I have merged using git merge origin/master and sorted out any conflicts, is there anything else that needs doing for this branch?

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the comment below the tests should pass (hopefully). Once the tests pass I think this can be merged.

validphys2/src/validphys/process_options.py Outdated Show resolved Hide resolved
Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left just a final thing for the latex in the plot, other than that I think this is ready to go. Thanks!

@scarlehoff scarlehoff added the Done PRs that are done but waiting on something else to merge/approve label Dec 5, 2024
@scarlehoff scarlehoff merged commit 1140342 into master Dec 6, 2024
6 checks passed
@scarlehoff scarlehoff deleted the reimplement_ATLAS_Z0_7TEV_49FB_HIMASS branch December 6, 2024 10:17
@Radonirinaunimi
Copy link
Member

Just to mention that the the change of signs (as long as they are consistent, ie all + becomes - and vice-versa) is completely expected and we've known this even in the old commondata format. The reason is indeed due to the covmat decomposition as @scarlehoff explained, as is really highly dependent on the scipy/numpy versions. So the bot when re-generating the data makes sure that everything is just consistent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ATLAS_DY_DATA data toolchain Done PRs that are done but waiting on something else to merge/approve
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants