Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch 1 #54

Open
wants to merge 4 commits into
base: gh-pages
Choose a base branch
from
Open

Patch 1 #54

wants to merge 4 commits into from

Conversation

bpnx
Copy link

@bpnx bpnx commented Sep 13, 2017

Hello, I think that 3 examples should be changed: after a "uniq -c" command a "-n" flag on sort command is very useful.
About sed command: I think that removing spaces or normalizing them is a very common need and the example using sed could be appreciated, in my opinion

@drjwbaker
Copy link

FYI: I have a bigger PR coming on this (probably next week)

Copy link

@weaverbel weaverbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why sed is being proposed instead of tr - what is the rationale? The sed command as written looks intimidating to a new user, whereas the tr command looks easily parseable. I won't merge this until I know why we would make this change.

@bpnx
Copy link
Author

bpnx commented Sep 22, 2017

Hello, sed is used because that tr command generates blank lines in case of repeated spaces, and normally it's not good; in a single command sed substitutes spaces and tabs, repeated and mixed a gives a clean ouput. Since basic regular expressions are covered on episode 02 and sed is used in this episode, I think that it's useful for people to see that example of sed usage. I don't think it's intimidating, no more than:
grep -Eo '\d{4}-\d{4}' 2014-01_JA.tsv
(taken fron episode 02)

@weaverbel
Copy link

OK, I suppose this makes sense. I can merge if other people agree. @drjwbaker ?

@drjwbaker
Copy link

Agreed. Looks a great change.

@danmichaelo
Copy link
Contributor

The command looks intimidating, but agree that it's not worse than other stuff in this episode. It's also more realistic, and perhaps it's good to stress regexps rather than introducing yet another command (tr, which I at least never use in practice myself). So +1

Perhaps change "repeated space" to "repeated whitespace"?

Also, is it necessary/correct to escape + as \+ here?

@bpnx
Copy link
Author

bpnx commented Sep 25, 2017

Hello, I'm changing "repeated space" to "repeated whitespace", thank you.
Escaping + is necessary, otherwise sed treats it as a simple character (I'm sure someone knows why :-) )

changed "repeated space" into "repeated whitespace"
@danmichaelo
Copy link
Contributor

danmichaelo commented Sep 25, 2017

Ouch, \s doesn't work on BSD/Mac: https://superuser.com/a/112837/56154

Even if we use the [[:space:]] group instead, it still fails to insert newlines: https://stackoverflow.com/questions/10748453/replace-comma-with-newline-in-sed

Test:

$ echo "Hello world" | sed 's/[[:space:]]/\n/g'
Hellonworld

Perhaps we need to stick with tr anyways?

@bpnx
Copy link
Author

bpnx commented Sep 26, 2017

Hello, it's because of different sed versions, the one I've used in example is GNU sed, installed on
every linux distribution: it supports extended regular expressions and has GNU extensions (often supported) . Unfortunately I don't have a Mac to test; have you tried with "-E" or "-r" options (if available) ? Or you could install gnu-sed package and it should work.
However, to be more portable, the sed part could be rewritten into:
sed 's/[[:space:]]+/ /g' | tr ' ' '\n' | sed '/^$/d'
First it changes (repeated) whitespaces and tabs into a single space, then tr changes them into newlines and then the final sed removes all blank lines (for rows with spaces/tabs at the beginning or at the end)
Or
tr ' \t' '\n\n' |sed '/^$/d'
simple, it should work anywhere, even with older sed versions, but students don't see the use of sed to remove repeated spaces and tabs.

@bpnx
Copy link
Author

bpnx commented Sep 26, 2017

Forgot to say: thank you for your analysis!

@danmichaelo
Copy link
Contributor

I'm sorry, it's really annoying when you find a nice example command, just to learn that it doesn't work cross platform.. Anyways, -E is available, but didn't help. Unless we add GNU sed to our install requirements, we probably have to live with this.

Can confirm that this works:

sed -E 's/[[:space:]]+/ /g' | tr ' ' '\n' | sed '/^$/d'

(Had to include -E in the first sed)

The last example also, not sure which one is the best to teach..

@bpnx
Copy link
Author

bpnx commented Sep 26, 2017

I forgot to set the "-E" flag, sorry.
About the last example: in my environment also
tr [:space:] '\n'
works (note the single \n), I think in BSD/Mac should work too
Please let me know what example you think it's better to teach, so I change the PR
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants