Cross-validation and model selection #5

gully · 2017-06-13T17:58:15Z

We'll need:

A function that computes the cross-validation score for subsamples
A function that iterates the cross-validation for varying model complexity
A function that ties this all together for multiple sources.

gully · 2017-06-13T18:28:07Z

Our ultimate goal is to have a dictionary of dictionaries for each source that contains entries for:

Top 5 periods as determined from multiterm LombScargle
Lomb Scargle Scores of those top 5 periods
Linear regression coefficients for underly polynomial, with length set by cross-validation
Linear regression coefficients for sines and cosines for each of the 5 periods
Number of sines and cosines (up to five) desired by cross-validation.
Linear regression coefficients for sines and cosines for cross-validated subset of terms (non-orthogonal!)

This dictionary is a dimensionality-reduced representation of the data in the interval.

After we do have this, we can do lots of fun things-- go back and inspect the residual spectrum-- what is the actual noise distribution? How many outliers (cosmic rays, flares) are there, and where are they? We could then go back and re-do everything with a refined noise model, masked cosmic-rays, and maybe non-linear regression methods.

gully · 2017-06-13T18:34:38Z

Note that it's a little awkward that we're bungling multiterm Lomb Scargle and top N periods. Strictly speaking, those top N periods arise from assumptions of an underlying Fourier series, so we should actually have N_top_periods x N_Fourier_terms = 5 * 4 = 20 (times 2 = 40 for sines and cosines!) linearly-regressed coefficients in our model. However, that's not the right thing to do, since many of the top_N_periods are actually aliases of the main period, by design. So what we're doing is some weird approximation of strictly Fourier methods. Our strategy has the drawback of being non-orthogonal, but has the (potential, unproven) benefit of picking up real physics that has multiple periods (e.g. differential rotation? multiple stars? weird physics?). Let's try it anyways...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-validation and model selection #5

Cross-validation and model selection #5

gully commented Jun 13, 2017

gully commented Jun 13, 2017

gully commented Jun 13, 2017 •

edited

Loading

Cross-validation and model selection #5

Cross-validation and model selection #5

Comments

gully commented Jun 13, 2017

gully commented Jun 13, 2017

gully commented Jun 13, 2017 • edited Loading

gully commented Jun 13, 2017 •

edited

Loading