README

PROJECT DESCRIPTION:

Based on Graham et al. 2011, who claimed to have found a different characteristic
 timescale for quasar variability using damped random walk model. 

BIG PICTURE:

"Stripe 82 sampling was not introducing significant bias in best-fit tau 
estimates after Chelsea's analysis summarized in fig. 5 in
http://www.astro.washington.edu/users/ivezic/Publications/macleod2010.pdf
The smallest input tau she used was 100 days in observer's frame, 
and the Graham et al. 54 days time scale in rest frame corresponds to 
152 days in observers frame. Therefore, Stripe 82 sampling ought be able to 
uncover evidence for such short time scales. It would be good to repeat this 
analysis with even shorter input tau scales (we have to run into a problem with 
stripe 82 sampling at some short scale)."

PAPERS:

The core papers underlying the usage of DRW for describing quasar variability,
in particular focusing on using the SDSS data:
Ivezic et al. 2014 (Optical variability of quasars : a damped random walk)
Ivezic et al. 2014 (Optical selection of quasars : SDSS and LSST)
MacLeod et al. 2012 (A description of quasar variability measured using 
repeated SDSS and POSS imaging)
MacLeod et al. 2011 (Quasar selection based on photometric variability)
MacLeod et al. 2010 ( Modeling the time variability of SDSS stripe 82 quasars as 
a damped random walk)

TASKS:

------------- STANDARD STARS SDSS AND CRTS -------------------------------

1) Plot the SDSS Standard stars (which are not variable). 

I loaded the stripe82calibStars_v2.6.dat  data from 
http://www.astro.washington.edu/users/ivezic/sdss/catalogs/stripe82.html   
using the stars1.py code, which ignores the comment lines (#), ignores the 
first column (which only has a keyword),  and saves as my_array1.npy , which is 
much easier to work with.  Each ugriz datapoint is an average of lightcurves, 
which is why we can use statistical characteristics such as mean, median, or rms 
scatter. 

The Columns in my_array1.py  (which removes the first column of the SDSS dataset)

0	RA  print arr[0:5,0]  [ 308.5002136  308.5000916  308.5000916  308.5018921  308.5046082]
1	Dec print arr[0:5,1]  [-1.227721  -1.2402642 -1.2171559 -1.1764922 -1.174189 ]
2	RArms   [ 0.032  0.015  0.009  0.005  0.028]
3	Decrms [ 0.032  0.015  0.024  0.024  0.038]
4 	Ntot  [ 4.  4.  4.  4.  4.]
5	Ar    [ 0.587  0.596  0.579  0.559  0.556]
6	u  Nobs 
7	u  mmed 
8 	u  mmu 
9 	u  msig
10	u  mrms
11	u  mchi2
12	g  Nobs 
13	g  mmed
14	g  mmu
15 	g  msig
16	g  mrms
17 	g  mchi2
18 	r  Nobs 
19	r  mmed 
20	r  mmu
21 	r  msig
22	r  mrms
23	r  mchi2
24	i  Nobs 
25	i  mmed
26 	i  mmu
27	i  msig
28 	i  mrms
29	i  mchi2
30 	z  Nobs 
31	z  mmed
32	z  mmu
33 	z  msig
34	z  mrms
35	z  mchi2

Comments: (from SDSS website)

RA Dec RArms Decrms: the mean position and its rms per coordinate, this is J2000,
   decimal degrees for RA and Dec, and arcsec for rms  NB: standard errors can 
   be computed as rms/sqrt(Ntot)
Ntot: the total number of epochs
Ar: the Schlegel, Finkbeiner & Davis (1998) ISM extinction value in  the r band;
   extinction in other bands can be computed as [Rv=3.1]: Am = Cm*Ar, with 
   Cm=(1.873, 1.377, 0.758, 0.537) for m=(ugiz) 
and then in each band (ugriz, in this order):  (Nobs mmed mmu msig mrms mchi2), 
which are: 
  the total number of observations in this band
  the median magnitude 
  the mean magnitude
  the standard error for the mean (1.25 larger for the median)
  the root-mean-square scatter
  chi2 per degree of freedom (computed using the mean magnitude)

--------------------------

I plot using the code   plot1.py , which loads my_array1.npy, consisting  
of 36 columns, and 1006849 rows of data  (  check by arr.shape ). I plot  rms vs. 
magnitude for u, g,r,i ,z bands   separately, and altogether 


2) Match the SDSS S82 standard stars to Catalina CRTS datapoints, to prove that 
those that are non-variable in SDSS might have much larger scatter in CRTS due 
to poor photometry

- I extracted lists of 100 objects each, with id, RA, Dec, and desired search 
radius (following the specifications of Catalina online form)  
http://nesssi.cacr.caltech.edu/cgi-bin/getmulticonedb_release2.cgi 

- I need to 
1) upload my file, 
2) then click 'Submit' , 
3) read the content of the upcoming page, 
4) select four digits after the 'box' string (which is a human test), 
5) enter them in the relevant form, and
6) click 'submit', and after that 
7) save the upcoming html data as txt. 

- I found tools such as 
'mechanize' wwwsearch.sourceforge.net/mechanize/  
'requests' http://docs.python-requests.org/en/latest/
and  'Beautiful Soup' http://www.crummy.com/software/BeautifulSoup/bs4/doc/  

but I don't know yet how to do the task using either of those. Requestst 
seems most user-friendly.

- Comment from Jake VDP : "Unfortunately I don't have much experience with doing 
this sort of http request automation.  I'd probably just search around Google 
and stack overflow to see if there are clues anywhere...
I wonder if it might be easier to get in touch with some of the data maintainers 
and try to get a tarball of relevant results directly from them?  They might 
actually prefer that to having you buzz their interface with automated requests"

- search in progress...  (was this not used in mining github for Andy's Statistics 
class? )


------------------- QSO FROM SDSS AND CRTS -------------------------

3) Further steps:

- downloading the  CRTS light curves for quasars from the Stripe 82 
sample, available from 
http://www.astro.washington.edu/users/ivezic/cmacleod/qso_dr7/Southern.html
to determine tau and SFinfinity for them using *CRTS* data; we will 
show that the CRTS based values are very different from the SDSS-based
values (because their data are of lower quality)

- modeling the lightcurves using Chelsea's fortran code:


---------------------- LIGHTCURVES FORTRAN  ---------------------------------

Fortran code (qso.f) which fits a group of light curves with 
format as a DRW, and compiles as

g77 qso.f -ffixed-line-length-none -o qsoexe

The reference for this code is Kozlowski et al. (2010). The light curves needs 
the format: time, magnitude, error (no header lines)

But first, the code needs a couple input files (see attached examples for a list
 of two light curves):

1. process.dat: first character (in header line) needs to be the number of light 
curves you are fitting. Then one row for each light curve, where the first column
 is the number of light curve data points. Then, columns for ra, dec, redshift, 
and abs mag, which don't matter for the fits (just use the values in the example
 file).  Then the light curve name.

2. file called 'input': just the number 1, since you are only fitting one band.

Currently, the fortran code will only accept magnitudes with values between -30 
and 30 and spaced at least
0.5 days (or whatever time units you use) apart.  You can change these settings 
on lines 101 and 104 in qso.f.

The code evaluates the goodness of fit of the DRW via the parameters chi^2DRW
and the likelihood ('Plike'). The following table (see footnotes) explains how 
to make the cuts in fit quality:
http://www.astro.washington.edu/users/cmacleod/qso/S82var_format_m
or you can also just see the paper.

The code outputs a bunch of fort.* files:

   general statistics            --> fort.39
   PRH method                    --> fort.40
   Forecasting  method           --> fort.41
   monte carlo PRH               --> fort.50
   monte carlo forecasting       --> fort.51
   bad epochs list               --> fort.80

 The ones of interest are fort.39 (containing some of the fit quality parameters)
 and fort.40 (containing DRW parameters, tau etc.):

fort.40 columns (rows are same as in process.dat): 
----
edge      ; edge flag -- 1 on edge of grid, 2 too close to edge to properly 
centroid peak
mu        ; average magnitude (best estimate from PR method)
maxlike   : max likelihood                               
minchi    ; min chi^2                                    
minchired ; min chi^2/dof                                
sigma     ; best sigma (log)                             
sig_err_lo; 1-sigma lower limit on sigma (log) (Bayesian)
sig_err_hi; 1-sigma upper limit on sigma (log) (Bayesian)
tau        ; best tau (log)                             
tau_err_lo ; 1-sigma lower limit on tau (log) (Bayesian)
tau_err_hi ; 1-sigma upper limit on tau (log) (Bayesian)
tau_med    ; median tau (log) (Bayesian)
sig_med     ; median sigma (log) (Bayesian)

and columns in fort.39:
----
index    ,$;index
z_pr     ,$;redshift
distmod  ,$;dist modulus
mi_pr    ,$;M_i
muinit   ,$;initial estimate of mean mag
chiconst ,$;chi^2 for fitting LC as constant
chilinear,$;chi^2 for fitting it as a linear trend
Pnoise   ,$;Pnoise = likelihood of noise solution 
Pinf     ,$;Pinf   = likekihood of tau->infinity  
Plike    ,$;Plike  = PR likelihood                
chinoise ,$;chi^2 of noise                        
chiinf   ,$;chi^2 of tau->infintiy                
chi_pr   ,$;chi^2 of PR result                    
signoise ,$;log(sigma) of noise                   
siginf   ,$;log(sigma) of inf                     
npts       ;# points in lightcurve


So, to produce Fig 5, I
1) generated light curves using the method in Kelly et al. 2009 (see appendix), 
for a certain range in tau and SFinf, and adopting a certain cadence and errors.  
2) fit these simulated LCs as a DRW, using this attached code, and made cuts in 
light curve quality (excluding the blue and red dots in fig 5) before comparing 
the input and output parameters.


----------------- LIGHTCURVES  : JAVELIN  -----------------------------------

Modelling lightcurves using Yu et al. tool written in Python, to model using:

- SDSS only 
- LINEAR only
- SDSS and LINER

https://bitbucket.org/nye17/javelin