-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update aggregation to match ancestry output #79
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to change read_pgs
in ancestry_analysis
, as it reads in the old version of the aggregated scores.
pgscatalog_utils/pgscatalog_utils/ancestry/read.py
Lines 74 to 87 in c672be7
def read_pgs(loc_aggscore, onlySUM: bool): | |
""" | |
Function to read the output of aggreagte_scores | |
:param loc_aggscore: path to aggregated scores output | |
:param onlySUM: whether to return only _SUM columns (e.g. not _AVG) | |
:return: | |
""" | |
logger.debug('Reading aggregated score data: {}'.format(loc_aggscore)) | |
df = pd.read_csv(loc_aggscore, sep='\t', index_col=['sampleset', 'IID'], converters={"IID": str}, header=0) | |
if onlySUM: | |
df = df[[x for x in df.columns if x.endswith('_SUM')]] | |
rn = [x.rstrip('_SUM') for x in df.columns] | |
df.columns = rn | |
return df |
Signed-off-by: smlmbrt <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my last commit solves it
* update vulnerable dependencies * Update aggregation to match ancestry output (#79) * match ancestry aggregation output * bump version * fix column name (accession -> PGS) * fix column name * add aggregate tests * fix not respecting outdir * read new version of pgs * drop onlySUM parameter * Make sure it only reads SUM and provides the correct column names back Signed-off-by: smlmbrt <[email protected]> * drop deprecated parameter --------- Signed-off-by: smlmbrt <[email protected]> Co-authored-by: smlmbrt <[email protected]> --------- Signed-off-by: smlmbrt <[email protected]> Co-authored-by: smlmbrt <[email protected]>
Doing this in the quarto report with
data.table
causing problems with building the dockerfile.Also, this fixes an issue where the DENOM column is missing when multiple custom scoring files are aggregated