Skip to content
This repository has been archived by the owner on Jan 3, 2018. It is now read-only.

revisions to the open science lesson plan #851

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

revisions to the open science lesson plan #851

wants to merge 3 commits into from

Conversation

cboettig
Copy link

Revisions follow along the lines of the proposal discussed in #712

Given the potentially more subjective nature of this particular lesson, I found some of the revisions challenging to implement. I have tried to focus the lesson on providing students with the information they would need if they chose to make data or software openly available, rather than trying to convince them of what they should do. Please let me know if this leaves any part inadequately motivated, or if there is anywhere I have been too prescriptive on a subjective topic.

Even while though I've avoided the cultural/philosophical discussion, the lesson is still not particularly hands-on. I'm curious if people feel that students would be better served by a tutorial in which they actually deposit data or share code on a particular repository, license and all. The downside of that approach is to disproportionately emphasize one particular repository over the others (which is why I avoid a specific example).

I have tried to keep the lesson to the length of the former, but not having taught that before I'm not sure how long it takes. I imagine this is one lesson instructors might use rather flexibly, skimming through the essentials very quickly if running behind schedule. I've tried to be concise and rely on external links for elaboration, but nevertheless some parts could be tightened up.

I've marked a few points with inline HTML comments where specific feedback would be helpful.

follows along the lines of the proposal discussed in #712
@gdevenyi
Copy link
Contributor

You have a bunch of comments embedded in your changes. Could you please remove those comments and describe them in the PR. Show us what you want to do, so we can comment on the content directly.

@cboettig
Copy link
Author

@gdevenyi Yes, as I stated above, I have marked a few lines in my edits with HTML comments where specific feedback is needed. These were:

  • L18 I commented out the opening quote and explained my reason for doing so. Similarly my reason for removing the opening story/vignette
  • L43 Should I have some disclaimer explaining the focus of the lesson is more on the how than the why?
  • L154 I briefly mention language specific archives for distributing software such as CRAN for R. I'm aware python has several package management systems, but not sure what, if any, would be appropriate to mention here in the discussion of what, if anything, researchers use to distribute python software.
  • L168 Mention gitlab?
  • L186 Should I introduce the open data section more explicitly (e.g. again focus is on how, more than why), or just jump in?
  • L242 Is the section following this too long? should it simply be removed?

I'd used html comments because I've found this keeps things more in-context (line numbers change during further edits, the above line references refer to the original) and does not require a separate window. My apologies if that is not the preferred style.

anyone who distributes a modified version of the code (or anything
that includes GPL'd code) must make *their* code freely available as
well. Lacking this clause, code under permissive licenses can be more
immediately used in commercial software.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implications of this choice are not going to be immediately apparent without some additional material. Perhaps point to some additional resources for the debate. E.g., John Hunter's 'BSD pitch': http://nipy.sourceforge.net/nipy/devel/faq/johns_bsd_pitch.html

@arokem
Copy link

arokem commented Nov 11, 2014

Some institutions have local resources, and it might be worth tailoring the
lesson accordingly, possibly even doing a bit of a practical exercise of
data-sharing with the local resources. For example, if I were teaching this
lesson at Stanford, I would point out the SDR (sdr.stanford.edu), and maybe
even show them (in collaboration with the SDR folks) how to put their data
(e.g. what metadata they need for the deposit, etc.)

On Tue, Nov 11, 2014 at 8:43 AM, Carl Boettiger [email protected]
wrote:

@gdevenyi https://github.com/gdevenyi Yes, as I stated above, I have
marked a few lines in my edits with HTML comments where specific feedback
is needed. These were:

L18 I commented out the opening quote and explained my reason for
doing so. Similarly my reason for removing the opening story/vignette

L43 Should I have some disclaimer explaining the focus of the lesson
is more on the how than the why?

L154 I briefly mention language specific archives for distributing
software such as CRAN for R. I'm aware python has several package
management systems, but not sure what, if any, would be appropriate to
mention here in the discussion of what, if anything, researchers use to
distribute python software.

L168 Mention gitlab?

L186 Should I introduce the open data section more explicitly (e.g.
again focus is on how, more than why), or just jump in?

L242 Is the section following this too long? should it simply be
removed?

I'd used html comments because I've found this keeps things more
in-context (line numbers change during further edits, the above line
references refer to the original) and does not require a separate window.
My apologies if that is not the preferred style
https://github.com/swcarpentry/bc/blob/gh-pages/CONTRIBUTING.md.


Reply to this email directly or view it on GitHub
#851 (comment).

@cboettig
Copy link
Author

@arokem thanks for the revisions, this all sounds great! Any thoughts on the other queries I mention in this thread before I start making changes?

Yup, I'd imagine some instructors might want to customize this to both be a bit more hands on and walk through particular examples of data repository relevant to the specific audience. I figured that would be a better approach anyway than shoehorning in a walkthrough of a very generic repository in the lesson here. Also given that the original lesson I modified didn't have any tutorial element, I figured it would be better to leave that out here. Happy to add something in if that's the consensus, but I'd find it hard to do so without the resulting lesson being one that would take rather longer to cover than the original lesson.

@strasser
Copy link

@cboettig This is such a great lesson! I'm excited to point people to it. A few comments:

Re. licensing data:

Just because they aren't subject to copyright doesn't mean that they aren't subject to intellectual property of some kind. That is, most institutions claim ownership of datasets. This means that (in the case of the UC) you aren't ALLOWED to use a cc-0 waiver on data you produce because technically you don't have any rights (copyright or otherwise) to waive. They belong to the regents of the UC.

I would probably fix this by adding some language about the complicated questions about data governance, and that although people should err on the side of permissive licensing (e.g., CC-BY 4.0 or CC-0), they should be aware that their institution might have clauses about ownership that they are unaware of. Of course, some institutions have a bit of a "don't ask / don't tell" policy when it comes to data licenses these days. It's basically the wild west out there.

Re. the scientific data repositories section:

I would argue that the word "scientific" isn't useful since (for example) figshare doesn't have any requirements on the content being scientific. Similarly institutional repositories take all kinds of data.

Institutional repositories deserve a mention here, too, since they might be taking a larger role in this space due to new funder requirements and projects like SHARE. (@arokem - Stanford librarians would be so happy you mentioned SDU!)

This is a potentially nitpicky point, but the division into "with pubs" and "without pubs" for data continues to encourage people to think about traditional scholarly incentives (papers). What if you divide into disciplinary versus general instead?

Pointing people to a publisher (nature scientific data) shouldn't be the only place to look for repository help. I would also include searching databib or re3data.

I would suggest changing the URL for DOI lookup to dx.doi.org

@cboettig
Copy link
Author

Carly, thanks these are all awesome suggestions. Would love to lean on
your expertise a bit more if I can ask for some more suggestions:

Are there any resources we can point people to trying to make more sense of
the data ownership issue? (Similar problem arises in software where the
trickiest part isn't what license to choose but whether it's up to you at
all -- I tried to point people to check with their institution; though that
feels a bit like a cop-out...)

yeah, good point on 'scientific', I'm usually better about avoiding that.
(I guess there's no risk about 'data repository' being confused with
non-academic data hosting / dropbox etc, right?)

Also very good point about the division of 'pubs' & 'non-pubs';
subject/content specific and general is probably a better division. Though
since there's so many ways to skin the cat, maybe it's better to just
highlight the various axes of differentiation without trying to draw
specific groups? general/specific, post-publication only / agnostic to
publication, and perhaps fee-charging / free? Guess there's always other
things to consider too (available metadata, recognition identifier,
indexing, clearly that's a rabbit hole we're not ready to go down here
though).

Yeah, didn't mean to indicate that publishers should be the definitive
source, just trying to be concise. Was originally going to point to the
databib list on DataCite website, but to be honest the embedded google doc
is far less accessible than the more discoursive doesn't seem to have all
DataCite DOI repos (zenodo?)

Is dx.doi.org still the preferred over doi.org then? Didn't realize -- will
fix!

@strasser
Copy link

Are there any resources we can point people to trying to make more sense of the data ownership issue?

Alas, it's about the same as software... in theory people should check with their institutions first and foremost, but in practice I encourage people to go ahead and slap as liberal a license as possible on their data, and follow the "ask forgiveness not permission" adage.

yeah, good point on 'scientific', I'm usually better about avoiding that.
(I guess there's no risk about 'data repository' being confused with
non-academic data hosting / dropbox etc, right?)

I would hope that there's no risk of confusion there, but maybe using the word "preservation" somewhere would help distinguish.

Also very good point about the division of 'pubs' & 'non-pubs';
subject/content specific and general is probably a better division. Though
since there's so many ways to skin the cat, maybe it's better to just
highlight the various axes of differentiation without trying to draw
specific groups? general/specific, post-publication only / agnostic to
publication, and perhaps fee-charging / free? Guess there's always other
things to consider too (available metadata, recognition identifier,
indexing, clearly that's a rabbit hole we're not ready to go down here
though).

I think it's a great idea to just highlight the axes. Certainly the pub/non-pub dichotomy is something that gets discussed, so no reason to leave it out. Maybe just include those other axes in your discussion. Axes I think are important:

  • general / specific
  • post-publication / agnostic
  • institution / non-institution (e.g., commercially owned)
  • free / not free
  • discipline-specific / agnostic
  • open to anyone / not

Yeah, didn't mean to indicate that publishers should be the definitive
source, just trying to be concise. Was originally going to point to the
databib list on DataCite website, but to be honest the embedded google doc
is far less accessible than the more discoursive doesn't seem to have all
DataCite DOI repos (zenodo?)

Agree - I think the DataCite / databib list isn't quite up to snuff yet. Maybe keep the reference to nature scientific data, but also mention re3data and perhaps checking for institutional repositories?

Is dx.doi.org still the preferred over doi.org then? Didn't realize -- will fix!

dx.doi.org has the advantage of a nice search box where you can plop your DOI and get immediate access to the object. doi.org is more of an informational website.

@gvwilson gvwilson self-assigned this Jul 26, 2016
@gvwilson gvwilson removed their assignment Apr 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants