Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

exporting data to powerBI, ESRI, etc #350

Open
hunterowens opened this issue Oct 1, 2020 · 8 comments
Open

exporting data to powerBI, ESRI, etc #350

hunterowens opened this issue Oct 1, 2020 · 8 comments
Labels
civis civis data science platform work for-civis Issues for the Civis team to address

Comments

@hunterowens
Copy link
Contributor

Currently, the ITA team and Planning PMU team have been working on getting a project done regarding planning entitlements.

This project consists of 3-4 source datasets, some static (Census, etc) and a couple dynamic (PCTS - Planning Case Tracking System).

We have a number of cleaning scripts (see src) and a custom package for subsetting this data (see laplan). We'd like to expose this to end users in the planning performance management section (Power BI, Excel, GDrive) and the GIS unit (Arc) to start.

Here are the options I see, but curious if Civis has worked with this before.

  1. Save Static Files to S3, SFTP, or Similar
  2. Expose Database Connection
  3. Expose Python API

Ref

@hunterowens hunterowens added civis civis data science platform work for-civis Issues for the Civis team to address labels Oct 1, 2020
@hunterowens
Copy link
Contributor Author

cc @tiffanychu90 @trantom

@hunterowens
Copy link
Contributor Author

@tiffanychu90
Copy link
Contributor

Specifically, this notebook is one where City Planning would probably want as a PowerBI report.

Cell 4 is where we're at a bit of an impasse:

  1. It loads several processed/cleaned tables. --> We're more familiar with how to schedule these processed/cleaned tables in Civis.
  2. It joins these tables and aggregates. --> We need help with this step, because we're relying on Python functions to do it, and the resulting table is what is backing the interactive map.
  3. It puts the results on a map. Users can then filter by year or categories of interest. --> PowerBI report

@ian-r-rose
Copy link
Contributor

Took a look at python/R integration into powerbi today. In theory you can load a pandas/dplyr dataframe and hand it off to a dashboard element. In practice, it really doesn't seem like it works well at all. It does a really bad job of finding a local environment, has ugly shims that break scripts, and I was more-or-less unable to get anything to work.

There are connectors for web URLs and Redshift -- those may ultimately be better options

@trantom
Copy link

trantom commented Oct 5, 2020

I would say using a default connector would significantly reduce the overhead by PMU to access this data and others who use PowerBI. The other default connectors can also support some of the DBMS that Planning has as well I think it is a matter of getting a current environment description and agreement on who may be supporting it after the transfer through working with David Terukina and Mony's teams on the systems side. Our GIS side has data on SQL Server and the spatial data for the web gis on an ArcGIS data store.

@ian-r-rose The Python integration for me seems more for ad-hoc reports and less for live or dynamically updated ones. I think Python visualizations should be kept in their native environment since trying to work in Python with PowerBI is additional setup overhead we want to abstract away. Also, there are data input limits and no cross-filtering which is a key feature of PowerBI reports. Microsoft spells out the limitations here.

@hunterowens
Copy link
Contributor Author

Pipeline test from Civis -> Google Sheets -> PowerBi worked

@ian-r-rose
Copy link
Contributor

Pipeline test from Civis -> Google Sheets -> PowerBi worked

Very nice

@trantom
Copy link

trantom commented Oct 21, 2020

I was also able to successfully run the civis-df-to-drive-test.py locally instead of on platform to the Scratch schema on the postgres cluster and then on Civis export and overwrite a Google sheet. However the main caveat is the limit 5,000,000 cells.

100,000 records x 22 columns was okay
200,000 records x 22 columns gives an error even though it's not 5,000,000 cells yet.

Export output log:

Failed to sync some datasets. The following errors were encountered:
Too many failures uploading to Google Drive, please rerun or check job settings, the most recent failure error was: Error appending to worksheet named: Sheet1: badRequest: This action would increase the number of cells in the workbook above the limit of 5000000 cells.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
civis civis data science platform work for-civis Issues for the Civis team to address
Projects
None yet
Development

No branches or pull requests

4 participants