[Fixing Test-Cases] Pandas/Numpy use different stdev estimators (mean_std_column) #124

ian-coccimiglio · 2024-09-09T03:29:42Z

Pandas and numpy use different approximations for standard deviation. As such, this explains why this test gets such a low score. Related to #76 and one of the not-improving tests in #118

I think we should either specify the desired estimator in the prompt (unbiased or biased) or make the test flexible enough to accept either answer (the majority of the models use the builtin pandas stdev, but we only accept the numpy estimator).

This is our reference test function

def mean_std_column(dataframe, column:str):
    """
    Computes the mean average and standard deviation of a specified column 
    in a given dataframe and returns these two values.
    """
    import numpy as np
    data = dataframe[column]
    return np.mean(data), np.std(data)

And this is what most models provide:

import pandas as pd

def mean_std_column(dataframe, column:str):
    """
    Computes the mean average and standard deviation of a specified column 
    in a given dataframe and returns these two values.
    """
    mean_value = dataframe[column].mean()
    std_value = dataframe[column].std()
    return (mean_value, std_value)

haesleinhuepf · 2024-09-09T06:25:28Z

make the test flexible enough to accept either answer (the majority of the models use the builtin pandas stdev, but we only accept the numpy estimator).

I vote for flexibility. Both solutions should be detected as correct.

ian-coccimiglio · 2024-09-09T06:50:52Z

Done! I put in a PR for this one.

ian-coccimiglio mentioned this issue Sep 9, 2024

Adjust mean_std_column to accept both pandas and numpy stdev #126

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fixing Test-Cases] Pandas/Numpy use different stdev estimators (mean_std_column) #124

[Fixing Test-Cases] Pandas/Numpy use different stdev estimators (mean_std_column) #124

ian-coccimiglio commented Sep 9, 2024 •

edited

Loading

haesleinhuepf commented Sep 9, 2024

ian-coccimiglio commented Sep 9, 2024

[Fixing Test-Cases] Pandas/Numpy use different stdev estimators (mean_std_column) #124

[Fixing Test-Cases] Pandas/Numpy use different stdev estimators (mean_std_column) #124

Comments

ian-coccimiglio commented Sep 9, 2024 • edited Loading

haesleinhuepf commented Sep 9, 2024

ian-coccimiglio commented Sep 9, 2024

ian-coccimiglio commented Sep 9, 2024 •

edited

Loading