Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check if correlation matrix is consistent #675

Merged
merged 1 commit into from
Dec 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 54 additions & 3 deletions src/semeio/fmudesign/create_design.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,37 @@
from semeio.fmudesign.iman_conover import ImanConover


def is_consistent_correlation_matrix(matrix):
"""
Check if a matrix is a consistent correlation matrix.

A correlation matrix is consistent if it has:
1. All diagonal elements equal to 1
2. Is positive semidefinite (all eigenvalues ≥ 0)

Args:
matrix: numpy array representing the correlation matrix

Returns:
bool: True if matrix is a consistent correlation matrix, False otherwise
"""
# Check if diagonal elements are 1
if not np.allclose(np.diagonal(matrix), 1):
return False

# Check positive semidefiniteness using eigenvalues
try:
eigenvals = np.linalg.eigvals(matrix)
# Matrix is positive semidefinite if all eigenvalues are non-negative
# Using small tolerance to account for numerical errors
if not np.all(eigenvals > -1e-8):
return False
except np.linalg.LinAlgError:
return False
Copy link
Contributor

@tommyod tommyod Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the others checks raise, but this returns False. raise here too?

actually, the other conditions here would raise, and the matrix would not be fixed. there seems to be some inconsistencies here. if we can fix it we should, but we should print warnings. if there is no way to fix it, then we should raise on the outside.

we can also consider:

  • print on "small errors"
  • raise on "big errors" (diagonal is 2.0 => raise, diagonal is 1.000001 => print and fix)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have instead opted to just return True or False. I think it's cleaner. What do you think?


return True


def nearest_correlation_matrix(matrix, *, weights=None, eps=1e-6, verbose=False):
"""Returns the correlation matrix nearest to `matrix`, weighted elementwise
by `weights`.
Expand Down Expand Up @@ -822,9 +853,29 @@ def generate(self, realnums, parameters, seedvalues, corrdict, rng):
# Make correlation matrix symmetric by copying lower triangular part
correlations = np.triu(correlations.T, k=1) + np.tril(correlations)

correlations = nearest_correlation_matrix(
correlations, weights=None, eps=1e-6, verbose=False
)
if not is_consistent_correlation_matrix(correlations):
print("\nWarning: Correlation matrix is not consistent")
print("Requirements:")
print(" - Ones on the diagonal")
print(" - Positive semi-definite matrix")
print("\nInput correlation matrix:")
with np.printoptions(
precision=2,
suppress=True,
formatter={"float": lambda x: f"{x:.2f}"},
):
print(correlations)
correlations = nearest_correlation_matrix(
correlations, weights=None, eps=1e-6, verbose=False
)
print("\nAdjusted to nearest consistent correlation matrix:")
with np.printoptions(
precision=2,
suppress=True,
formatter={"float": lambda x: f"{x:.2f}"},
):
print(correlations)
print()

sampler = qmc.LatinHypercube(
d=len(multivariate_parameters), seed=rng
Expand Down
Binary file modified tests/fmudesign/data/config/design_input_onebyone.xlsx
Binary file not shown.
38 changes: 37 additions & 1 deletion tests/fmudesign/test_designmatrix.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,8 +68,44 @@ def test_endpoint(tmpdir, monkeypatch):
shutil.copy(str(designfile), ".")
shutil.copy(Path(designfile).parent / dependency, ".")

subprocess.run(["fmudesign", str(designfile)], check=True)
result = subprocess.run(
["fmudesign", str(designfile)], check=True, capture_output=True, text=True
)

expected_output = """Added sensitivity : seed
Added sensitivity : faults
Added sensitivity : velmodel
Added sensitivity : contacts
Added sensitivity : multz
Added sensitivity : sens6

Warning: Correlation matrix is not consistent
Requirements:
- Ones on the diagonal
- Positive semi-definite matrix

Input correlation matrix:
[[1.00 0.90 0.00 0.00]
[0.90 1.00 0.90 0.00]
[0.00 0.90 1.00 0.00]
[0.00 0.00 0.00 1.00]]

Adjusted to nearest consistent correlation matrix:
[[1.00 0.74 0.11 0.00]
[0.74 1.00 0.74 0.00]
[0.11 0.74 1.00 0.00]
[0.00 0.00 0.00 1.00]]

Added sensitivity : sens7
Added sensitivity : sens8
Provided number of background values (11) is smaller than number of realisations for sensitivity ('sens7', 'p10_p90') and parameter PARAM13. Will be filled with default values.
Provided number of background values (11) is smaller than number of realisations for sensitivity ('sens7', 'p10_p90') and parameter PARAM14. Will be filled with default values.
Provided number of background values (11) is smaller than number of realisations for sensitivity ('sens7', 'p10_p90') and parameter PARAM15. Will be filled with default values.
Provided number of background values (11) is smaller than number of realisations for sensitivity ('sens7', 'p10_p90') and parameter PARAM16. Will be filled with default values.
A total of 91 realizations were generated
Designmatrix written to generateddesignmatrix.xlsx"""

assert result.stdout.split() == expected_output.split()
assert Path("generateddesignmatrix.xlsx").exists # Default output file
valid_designmatrix(pd.read_excel("generateddesignmatrix.xlsx", engine="openpyxl"))

Expand Down
Loading