-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to populate installations with "real world" sample data #5235
Comments
@matthew-a-dunlap @mheppler @pdurbin @scolapasta Now that we live in a post #4990 world, I'd be interested to hear y'all's thoughts (and the thoughts of any other test data enthusiasts out there) on what the next step here is. Is there enough that we should start working on this or do we need more discussion? I'll bring it to backlog grooming tomorrow to remind us to discuss. |
@pdurbin is the local expert on scripting test data. He and I have discussed our hopes and dreams for this feature, and I think we're on the same page. He mentioned there are plans to include dataset scripts in the “dataverse-ansible” repo. Here is more about the Ansible configuration management tool from the Choose Your Own Installation Adventure pg of our Installation Guide.
@donsizemore has contributed most of the Ansible installer, and I believe had plans for also building on this "test data" script solution. |
Hey @donsizemore - we tried to estimate this today and I heard you may be working on this already. Can you leave a comment or some links here? Thank you! |
IQSS/dataverse-ansible#37 is the issue and https://github.com/IQSS/dataverse-ansible/tree/37_sample_data is the branch. As I reported at http://irclog.iq.harvard.edu/dataverse/2019-01-29#i_85802 I tried (as of IQSS/dataverse-ansible@945e1c1 ) and got this error:
I tried to set expectations during sprint planning today that we're going to want to iterate on this. We very much appreciate @donsizemore working on this and we're happy to retest when he's ready. Once we get something working, we'll probably have feedback on how the data looks, etc. |
@donsizemore as I mentioned in IRC, I'm out next week so I hope you don't mind if I assigned this issue to you. It sounds like you're making great progress. Thanks so much for working on this! At standup I gave @djbrooke a heads up that you'll ping him when you're ready for someone to try your "37_sample_data" branch in dataverse-ansible (pro tip: be sure to tweak your local copy of ec2-create-instance.sh to check out the right branch!). Oh, I'll attach here the main.yml file I've been using: main.yml.txt Call it with this:
Huh, I just noticed a typo at 3bdf6c9 (-b instead of -g) so I'll make a pull request to fix it. |
I believe I have the plumbing for this working in in dataverse-ansible. I don't have that much sample data, but the idea is this: if the ansible sampledata group_var is set to true, check for existing sampledata in a given location, or use your own. As-is, ansible first checks a dataverses/ subdirectory for json files, then creates users from another subdirectory, then finally checks for *.sh in a datasets/ subdirectory. The shell script(s) would assign permissions, create datasets and upload files. You're welcome to kick the tires (or me) via my 37_sample_data branch. |
Thanks @donsizemore! This is really important for Dataverse. I'll move this over to code review. |
Actually, I went ahead and spun in up at http://ec2-3-80-234-144.compute-1.amazonaws.com so a volunteer or two can say if we're done with the content. Here's a screenshot: |
Moving back to Team dev. What's left here:
|
@TaniaSchlatter @mheppler @pdurbin thanks for discussing this earlier today. I made the changes for the first four bullets in PR IQSS/dataverse-sample-data#1. I decided to keep Eleni's datasets in, but I did adjust the keywords from test and test2 to something more real-world. |
OK, PR mentioned above has been merged. Thanks @pdurbin for the review! @pdurbin can you spin up a branch for @TaniaSchlatter to take a look at the most recently-added files? |
hey @djbrooke regarding the last bullet above, what would help there? there's dataverse-ansible documentation for sample-data in its README.md, but some of this is fed through the EC2 script which Phil just had me move over to the dataverse-ansible repo. I could beef up README.md in some prescribed way, or would you want a developers' workflow, with examples for EC2, Vagrant, or a local install? there isn't currently a dataverse-ansible Dockerfile, but there could be. I'm personally leaning toward a dev guide / workflow? |
Hey @donsizemore thanks for checking in and for the offer of help! In regards to what would be the most helpful here, I'll defer to @mheppler. He tried to get this running when @pdurbin was out and was not successful. More documentation in general is always welcomed, but Mike may have some specific trouble spots that he ran into that could be targeted for additional docs. Thanks again! |
@mheppler @djbrooke and I are planning to meet at 3 to go through the README at https://github.com/IQSS/dataverse-sample-data . I can try to improve the README if it's confusing. 😄 We can use Dataverse running on laptops or the instance I spun up for this issue: http://ec2-3-80-234-144.compute-1.amazonaws.com . We can also try the |
Thanks @pdurbin and @mheppler for the helpful meeting about this, I was able to successfully set up sample data on the instance mentioned above. I need to remove the data files for larger states (IL) and replace it with data files for smaller states (WY). After updating this I'll destroy what's currently there and replace it with the data for the smaller states. |
OK, I was able to successfully destroy and re-create all sample data from the command line. If these instructions work for lowly me, they should work for anyone. :) Moving to code review. Passing to @TaniaSchlatter for review of the data on http://ec2-3-80-234-144.compute-1.amazonaws.com. We can make further adjustments to the test data that's available, but I'm pretty happy with the code where it is. |
We could also review what I wrote for http://guides.dataverse.org/en/4.15.1/developers/tips.html#sample-data and add to it if we want. |
Comments from review (+ looking at the April 25 list above):
Regarding the text in the guide: recommend adding a use case about using the test data on a server that is not the main installation. |
Thanks @TaniaSchlatter, I'll checklist-ize this and work through the ones that I can. I'll work with the team on others and also discuss some with you. Issues for Danny to update
Issues to discuss
Issues for @djbrooke to add
|
I'm going to close this issue. @pdurbin announced the sample data repo's existence on the Google group and it's in a state where people in the community can start using it: https://groups.google.com/forum/#!topic/dataverse-community/u-Yv0U3v4Bo We created individual issues in the dataverse-ansible and sample data repos for the follow on tasks to make this useful to more groups (design team, specifically):
We'll take these through our usual development process. Thanks everyone for the hard work on this issue, this is great to have!! |
With Dataverse we need better means to populate test data. Current methods such as our existing scripts and dumping a database work but they are flawed. This has gained greater importance with our ability to automate deployment (via #4990), as these newly spinned up environments come with no data. The easier and more open we make this, the easier it will be for people to demo and develop on Dataverse.
Specific needs:
Possible solutions:
scripts/deploy/phoenix.dataverse.org/post
automation for populating more normal dataThe text was updated successfully, but these errors were encountered: