-
Notifications
You must be signed in to change notification settings - Fork 0
Tables code
The tables
folder contains the code needed to use the FCSQ CSVs to create the published tables.
This is currently a work in progress. The Index page is complete, Table 16 is almost finished and Table 11 only needs some improvements to the formatting. The other tables have not yet been started
- There are two main aspects to the code; arranging the data in the correct format, and adding the correct formatting around the data
- There is a template which must contain all of the tabs that are required in the output. This is imported from the S3 bucket together with the CSVs.
- The code then writes the data and formatting into the template to produce the published tables as an output.
- The whole process can be run using the
run_tables.R
file. - Each table had a script, and there is an additional data script for any tables with a dropdown option. There is also a script for the index tab.
- There is also a
functions.R
script whilst the repository is being created, but any functions created can be moved tofcsrap
later. - There is a
dropdown.R
script to add in the drop down options into the tables.
The run_tables.R
file is all that needs to be run to produce the tables (although still a work in progress at the moment).
The start of the file loads all of the packages which are needed, currently:
- dplyr
- mojrap
- openxlsx
- s3tools
- glue
The next section needs to be updated each quarter and contains the varables needed to produce the tables;
-
pub_year
andpub_quarter
determine the file the CSVs are imported from and the dates included in the tables -
pub_date
andnext_pub_date
are just used to add this information to the bottom of the index tab -
path_to_project
states the folder where the scripts for the processing are stored -
csv_folder
states the location of the CSVs that should be updated. This currently changes automatically whenpub_year
andpub_quarter
are updated.
The import
section currently loads the template Excel file. See Template.
The environment is now set up to load the custom functions in functions.R
and run the scripts for each tab of the worksheet. See Data and Formatting for more information.
The final step is to export the completed worksheet. The code currently just writes this to the local environment, but this can be updated to export the completed Excel file to the S3 bucket once the code is more developed.
The template for the tables is stored in alpha-family-data/Tables/template.xlsx
. It needs to contain all of the tabs needed in the final table; index, tables and source data for tables with lookups. Currently (April 2021) it contains the index tab, table 11, data for table 11 and table 16.
The code is used to add in any part of the tables which is updated each quarter eg. the date of publication and the data. Anything which always stays the same can be included in the template eg. table names and headers.
The input CSVs need to be processed to match the output format needed for the published tables. There are two formats that are used depending on whether or not there is a drop down menu on the tables.
For both types, the years and quarters need to be added in by the code. This is not included in the template because this would require the new quarter to be added manually every time.
Each script loads the CSVs needed for the individual tab. In future, it may be better to load all the CSVs together. Table 1 should be created towards the end as it contains data covering all case types which is included in other separate tables.
The data is added to the worksheet using openxlsx
, or functions developed using this package.
Where the tables do not contain a drop down option, the data needs to be grouped up to create the relevant columns for both the annual and quarterly data. See table16.R
as an example.
Where the table contains a drop down option, the data is stored in a different tab in the spreadsheet. It is in a CSV format with a lookup on the left hand side that can be used with VLOOKUP. There should be minmimal processing needed to get from the CSV to this format. Some columns in the CSV may need to be ignored, and any missing values need adding as zeros. See data11.R
as an example.
In the table itself, the formulae need to be written that will pull through the data. The years and quarters will also need adding. See table11.R
as an example.
The code doesn't simply paste the data into the worksheet, it can also add formatting including grouping quarterly data by year and adding footnotes.
write_formatted_table()
is stored in the functions.R
script and is based off a function added to mojrap
by Francesca Bryden. The change made is very minor. This function uses openxlsx
and enables the annual and quarterly data together with the footnotes to be added all at once, together with formatting. This is used for the tables which don't have a dropdown, and very little further work is currently needed.
In general, a format can be defined and then reused across all of the tabs. The main instructions that then need to be provided to openxlsx
functions are the sheet to update and the rows and columns which should be updated.
More work is needed on the tables with dropdowns, to ensure the notes are written to the correct width. Currently, they only merge as wide as the year/quarter columns they are pasted with.
If these are included in the template, then the formatting is lost. Instead, each drop down that is needed needs to be added to the code as three parts.
- Write a list of the dropdown options that should appear. This needs to be stored somewhere out of the way on the tab.
- Format the list so that it appears with white text.
- In the cell for the dropdown, add data validation based on the list.
The code for these is stored in the dropdowns.R
script.