layout | title |
---|---|
page |
Part 3: Which Data to Open? |
The City has two goals when considering which data to release: 1) release strategically important, proven-valuable datasets to garner ROI; and 2) create a comprehensive and sustainable release process for every department. In the early stages of the City’s open data program, a balance must be struck between the two.
In 2015, the City will be pursuing a two-phase roll-out of the open data program: 1) in Q1&Q2, collaboration between the Mayor’s Office and the departments on implementing the Open Data Roadmap; and 2) in Q3&Q4, department-led creation of an Open Data Inventory and Publication Calendar.
These dual objectives should result in near-term impact and long-term sustainability.
The City of Los Angeles benefits from the fact that in the past few years, other cities have experimented with open data, both opening up various datasets and engaging their communities to gauge response.
The Mayor’s Open Data team has run analysis on the open data ecosystem along the following lines:
- Which datasets other cities have opened, and how popular are they?
- In particular, particular focus was placed on the cities of New York and Chicago, which have both active communities and long-standing programs.
- What apps and innovations exist in other cities, relying upon open data?
- In what areas are there data standards, indicating the existence of a broader international network of government agencies collaborating
Based on this analysis, the Mayor’s Office has identified roughly 140 datasets (i.e. 2-5 sets per department) that are considered High Value. This list gives us insight as we explore what data sets are next for publication. The Mayor’s Office will be working with the department Open Data Coordinators for feedback on this list to finalize the Open Data Roadmap and seek to release all data identified within it by end of Q2.
The Mayor’s Executive Directive requires that the City releases all its data, barring security or privacy concerns. This demands a rigorous and thorough assessment of the data within each department. To that end, each department is asked to create a Data Inventory, as described below.
Starting with a source system inventory: To begin identifying high quality datasets for release, you must first begin by understanding what you have. Your data may be housed in a variety of places - from databases to shared drives, to private folders. Creating a source system inventory, followed by a comprehensive data inventory, will allow you to begin identifying how to release your data strategically.
Note: ITA maintains a list of currently supported applications, which is available in the Open Data Catalog for approved staff. This list provides a good starting off point, but it should be vetted against the Data Coordinators’ and the Department leadership’s sense of the broader and current landscape.
- What information systems does your department use?
- What databases does your department use?
- What applications capture information or are used in your business processes?
- Are some data resources kept in spreadsheets (on shared or individual drives)?
- What information are we already publishing and where did that information come from?
Once your source systems are identified, you may begin to identify the possible datasets within each system. See Appendix A for a list of fields that should be completed for each dataset identified.
Once you have defined what exists in your source systems and completed your data inventory, the department should define what in your inventory should be prioritized for publishing. To do this, first begin by grouping your inventory into three categories: passive, active, and strategic. A high quality data publishing practice contains a consistent mix of all three categories, publishing at regular intervals as detailed below:
|
Passive Publishing |
Active Publishing |
Strategic Publishing |
---|---|---|---|
Data Location |
Publicly available in digital or print format |
Unpublished but shared internally upon request |
Unpublished but shared internally upon request |
Data Security Level (High, Medium, Low) |
Dataset contains no Personal Identifiable Information (PII) or sensitive content |
Data may need to be scrubbed for PII or sensitive information |
Data may need to be scrubbed for PII or sensitive information |
Data cleaning |
Dataset can be generated in less than one day |
Requires legal and executive review prior to publishing
|
Requires legal and executive review prior to publishing |
Level of use |
Data is frequently accessed in other formats |
Dataset is frequently requested internally and through Public Records Act or FOIA |
Dataset supports strategic plan or other policy related documents. |
Publishing frequency |
This data is published at consistent frequent intervals, such as two per month |
This data is published at consistent intermediate intervals, such as once per month or two per quarter |
This data is published at consistent infrequent intervals, such as once per quarter or two per year |
Communications Effort |
Publishing this dataset or group of datasets is accompanied by social media outreach |
Publishing this dataset or group of datasets is accompanied by a press release and social media outreach |
Publishing this dataset or group of datasets is accompanied by a robust communications strategy event, or publication |
Finally, consider the various considerations in the inventory (potentially scoring them on a 1-5 system) and develop a ranking. Once these datasets are ranked, you can begin building out your publishing calendar for the year. Publishing at a consistent rate allows the open data team to budget time for data preparation and proper coordination with the communication or public affairs team and ensures higher quality data. Here is a sample publishing calendar:
![calendar]({{ site.baseurl }}/public/img/calendar.png)