-
Notifications
You must be signed in to change notification settings - Fork 13
/
bash.Rmd
2018 lines (1549 loc) · 70.4 KB
/
bash.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Using the bash shell
====================
> **note**
>
> Some of the material in this tutorial was adapted from Chris
> Paciorek's [2014 Statistics 243 lecture notes on
> Bash](https://github.com/berkeley-stat243/stat243-fall-2014/blob/master/units/unit2-bash.pdf)
> and his [2014 Statistics 243 lecture notes on using
> R](https://github.com/berkeley-stat243/stat243-fall-2014/blob/master/units/unit4-usingR.pdf).
>
> Before reading this, you will want to be familiar with the material in
> the "Basics of UNIX" tutorial and screencast here:
> <http://statistics.berkeley.edu/computing/training/tutorials>
1) The Interactive Shell
---------------------
The shell is an interactive computer programming environment. More
specifically, it is a read-evaluate-print loop (REPL) environment. R and
Python also provide REPL environments. A REPL reads a single
*expression* or input, parses and *evaluates* it, *prints* the results,
and then *loops*.
> **note**
>
> I will use a `$` prompt for bash, a `>` prompt for R, a `>>>` for
> Python, and a `In [1]:` prompt for IPython. By convention, a regular
> user's prompt in bash is `$`, while the root (or administrative)
> user's prompt is `#`. However, it is common practice to never log on
> as the root user. If you need to run a command with root privileges,
> you should use the `sudo` command (see the *Getting started* section
> below for more details).
When you are working in a terminal window (i.e., a window with the
command line interface), you're interacting with a shell. There are
multiple shells (e.g., *sh*, *bash*, *csh*, *tcsh*, *zsh*, *fish*). I'll
assume you are using *bash*, as this is the default for Mac OS X,
the SCF machines, Savio, and most Linux distributions. However, the
basic ideas are applicable to any Unix shell.
The shell is an amazingly powerful programming environment. From it you
can interactively monitor and control almost any aspect of the OS and
more importantly you can automate it. As you will see, **bash** has a
very extensive set of capabilities intended to make both interactive as
well as automated control simple, effective, and customizable.
> **note**
>
> It can be difficult to distinguish what is shell-specific and what is
> just part of UNIX. Some of the material here is not bash-specific but
> general to UNIX.
>
> Reference: Newham and Rosenblatt, Learning the bash Shell, 2nd ed.
### 1.1) Getting started
I assume you already have access to a basic bash shell on a computer
with network access (e.g., the Terminal on a Mac, the Ubuntu subsystem on Windows, or a Linux machine). You should also have ssh installed. SSH provides an
encrypted mechanism to connect to a remote Unix-based (i.e., Linux or Mac) terminal. To learn more
about using ssh to connect to the SCF machines and general tips about
using ssh on various operating systems, see:
<http://statistics.berkeley.edu/computing/ssh>
To ssh to another machine, you need to know its (host)name. For example,
to ssh to `arwen.berkeley.edu`, one of the SCF machines, you would:
$ ssh arwen.berkeley.edu
Password:
At this point you have to type your password. Alternatively, you can set
up ssh so that you can use it without typing your password. To learn how
to set this up, see: <http://statistics.berkeley.edu/computing/ssh-keys>
If you have a different username on SCF machines, you will need to
specify it as well. For example, to specify the username `jarrod`, you
would:
$ ssh [email protected]
If you want to view graphical applications on your local computer that
are running on the remote computer you need to use the `-X` option:
$ ssh -X [email protected]
Alternatively, if you want to copy a file (`file1.txt`) from your local
computer to `arwen.berkeley.edu`, you can use the `scp` command,
which securely copies files between machines:
$ scp file1.txt [email protected]:.
The above command will copy `file1.txt` from my current working
directory on my local machine to `jarrod`'s home directory on
`arwen.berkeley.edu`. The `.` following the `:` indicates that I want
to copy the file to my home directory on the remote machine. I could
also replace `.` with any relative path from my home directory on the
remote machine or I could use an absolute path.
To copy a file (`file2.txt`) from `arwen.berkeley.edu` to my local
machine:
$ scp [email protected]:file2.txt .
I can even copy a file (`file3.txt`) owned by one user (`jarrod`) on one
remote machine `arwen.berkeley.edu` to the account of another user
(`jmillman`) on another remote machine `scf-ug02.berkeley.edu`:
$ scp [email protected]:file3.txt [email protected]:.
If instead of copying a single file, I wanted to copy an entire
directory (`src`) from one machine to another, I would use the `-r`
option:
$ scp -r src [email protected]:.
Regardless of whether you are working on a local computer or a remote
one, it is occasionally useful to operate as a different user. For
instance, you may need root (or administrative) access to change file
permissions or install software. (Note this will only be possible
on machines that you own or have special privileges on; the Ubuntu
subsystem for windows is one way to have a virtual Linux machine
for which you have root access.)
For example on an Ubuntu Linux machine (including the Ubuntu subsystem for Windows),
here's how you can act as the 'root' user to update or add software
on machines where you have administrative access:
To upgrade all the software on the machine:
$ sudo apt-get upgrade
To install the text editor vim on the machine:
$ sudo apt-get install vim
> **tip**
>
> Most bash commands have electronic manual pages, which are accessible
> directly from the commandline. You will be more efficient and
> effective if you become accustomed to using these `man` pages. To view
> the `man` page for the command `sudo`, for instance, you would type:
>
> $ man sudo
### 1.2) Variables
Much of how bash behaves can be customized through the use of variables,
which consists of names that have values assigned to them. To access the
value currently assigned to a variable, you can prepend the name with
the dollar sign (\$). To print the value you can use the `echo` command.
1. What is my default shell?
`$ echo $SHELL`
2. To change to bash on a one-time basis:
`$ bash`
3. To make it your default:
`$ chsh /bin/bash`
In the last example, `/bin/bash` should be whatever the path to the bash
shell is, which you can figure out using `which bash`.
To declare a variable, just assign a value to its reference. For
example, if you want to make a new variable with the name `counter` with
the value `1`:
$ counter=1
Since bash uses spaces to parse the expression you give it as input, it
is important to note the lack of spaces around the equal sign. Try
typing the command with and without spaces and note what happens.
You can also enclose the variable name in curly brackets, which comes in
handy when you're embedding a variable within a line of code to make
sure the shell knows where the variable name ends:
$ base=/home/jarrod/
$ echo $basesrc
$ echo ${base}src
Make sure you understand the difference in behavior in the last two
lines.
There are also special shell variables called environment variables that
help to control the shell's behavior. These are generally named in all
caps. Type `printenv` to see them. You can create your own environment
variable as follows:
$ export base=/home/jarrod/
The `export` command ensures that other shells created by the current
shell (for example, to run a program) will inherit the variable. Without
the export command, any shell variables that are set will only be
modified within the current shell. More generally, if you want a
variable to always be accessible, you should include the definition of
the variable with an `export` command in your `.bashrc` file.
You can control the appearance of the bash prompt using the `PS1`
variable:
$ echo $PS1
To modify it so that it puts the username, hostname, and current working
directory in the prompt:
$ export PS1='[\u@\h \W]\$ '
[user1@local1 ~]$
### 1.3) Commands
While each command has its own syntax, there are some rules usually
followed. Generally, a command line consists of 4 things: a command,
command options, arguments, and line acceptance. Consider the following
example:
$ ls -l file.txt
In the above example, `ls` is the command, `-l` is a command option
specifying to use the long format, `file.txt` is the argument, and the
line acceptance is indicated by hitting the `Enter` key at the end of
the line.
After you type a command at the bash prompt and indicate line acceptance
with the `Enter` key, bash parses the command and then attempts to
execute the command. To determine what to do, bash first checks whether
the command is a shell function (we will discuss functions below). If
not, it checks to see whether it is a builtin. Finally, if the command
is not a shell function nor a builtin, bash uses the `PATH` variable.
The `PATH` variable is a list of directories:
$ echo $PATH
/home/jarrod/usr/bin:/usr/local/bin:/bin:/usr/bin:
For example, consider the following command:
$ grep pdf file.txt
We will discuss `grep` later. For now, let's ignore what `grep` actually
does and focus on what bash would do when you press enter after typing
the above command. First bash checks whether `grep` a shell function or
a builtin. Once it determines that `grep` is neither a shell function
nor a builtin, it will look for an executable file named `grep` first in
`/home/jarrod/usr/bin`, then in `/usr/local/bin`, and so on until it
finds a match or runs out of places to look. You can use `which` to find
out where bash would find it:
$ which grep
/bin/grep
**Exercise**
Consider the following examples using the `ls` command:
$ ls --all -l
$ ls -a -l
$ ls -al
Use `man ls` to see what the command options do. Is there any difference
in what the three versions of the command invocation above return as the
result? What happens if you add a filename to the end of the command?
#### 1.3.1) Tab completion
When working in the shell, it is often unnecessary to type out an entire
command or file name, because of a feature known as tab completion. When
you are entering a command or filename in the shell, you can, at any
time, hit the tab key, and the shell will try to figure out how to
complete the name of the command or filename you are typing. If there is
only one command in the search path and you're using tab completion with
the first token of a line, then the shell will display its value and the
cursor will be one space past the completed name. If there are multiple
commands that match the partial name, the shell will display as much as
it can. In this case, hitting tab twice will display a list of choices,
and redisplay the partial command line for further editing. Similar
behavior with regard to filenames occurs when tab completion is used on
anything other than the first token of a command.
> **note**
> Note that R does tab completion for objects (including functions) and
> filenames. While the default Python shell does not perform tab
> completion, the IPython shell does.
#### 1.3.2) Command History and Editing
By using the up and down arrows, you can scroll through commands that
you have entered previously. So if you want to rerun the same command,
or fix a typo in a command you entered, just scroll up to it and hit
enter to run it or edit the line and then hit enter.
To list the history of the commands you entered, use the `history`
command:
$ history
1 echo $PS1
2 PS1=$
3 bash
4 export PS1=$
5 bash
6 echo $PATH
7 which echo
8 ls --all -l
9 ls -a -l
10 ls -al
11 ls -al manual.xml
The behavior of the `history` command is controlled by a shell
variables:
$ echo $HISTFILE
$ echo $HISTSIZE
You can also rerun previous commands as follows:
$ !-n
$ !gi
The first example runs the nth previous command and the second one runs
the last command that started with 'gi'.
**Table. Command History Expansion**
<table>
<thead>
<tr class="header">
<th align="left">Designator</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>!!</code></td>
<td align="left">Last command</td>
</tr>
<tr class="even">
<td align="left"><code>!n</code></td>
<td align="left">Command numbered <em>n</em> in the history</td>
</tr>
<tr class="odd">
<td align="left"><code>!-n</code></td>
<td align="left">Command <em>n</em> previous</td>
</tr>
<tr class="even">
<td align="left"><code>!string</code></td>
<td align="left">Last command starting with <em>string</em></td>
</tr>
<tr class="odd">
<td align="left"><code>!?string</code></td>
<td align="left">Last command containing <em>string</em></td>
</tr>
<tr class="even">
<td align="left"><code>^string1^string2</code></td>
<td align="left">Execute the previous command with <em>string2</em> substituted for <em>string1</em></td>
</tr>
</tbody>
</table>
If you're not sure what command you're going to recall, you can append
`:p` at the end of the text you type to do the recall, and the result
will be printed, but not executed. For example:
$ !gi:p
You can then use the up arrow key to bring back that statement for
editing or execution.
You can also search for commands by doing `Ctrl-r` and typing a string
of characters to search for in the search history. You can hit return to
submit, `Ctrl-c` to get out, or `ESC` to put the result on the regular
command line for editing.
#### 1.3.3) Command Substitution
You may occasionally need to substitute the results of a command for use
by another command. For example, if you wanted to use the directory
listing returned by `ls` as the argument to another command, you would
type `$(ls)` in the location you want the result of `ls` to appear.
An older notation for command substitution is to use backticks (e.g.,
`` `ls` `` versus `$(ls)`). It is generally preferable to use the new
notation, since there are many annoyances with the backtick notation.
For example, backslashes (`\`) inside of backticks behave in a
non-intuitive way, nested quoting is more cumbersome inside backticks,
nested substitution is more difficult inside of backticks, and it is
easy to visually mistake backticks for a single quote.
**Exercise**
Try the following commands:
$ ls -l tr
$ which tr
$ ls -l which tr
$ ls -l $(which tr)
Make sure you understand why each command behaves as it does.
### 1.4) Shortcuts
#### 1.4.1) Aliases -- command shortcuts
Aliases allow you to use an abbreviation for a command, to create new
functionality or to insure that certain options are always used when you
call an existing command. For example, I'm lazy and would rather type
`q` instead of `exit` to terminate a shell window. You could create the
alias as follow:
$ alias q=exit
As another example, suppose you find the `-F` option of `ls` (which
displays `/` after directories, `\` after executable files and `@` after
links) to be very useful. The command :
$ alias ls="ls -F"
will insure that the `-F` option will be used whenever you use `ls`. If
you need to use the unaliased version of something for which you've
created an alias, precede the name with a backslash (`\`). For example,
to use the normal version of `ls` after you've created the alias
described above:
$ \ls
The real power of aliases is only achieved when they are automatically
set up whenever you log in to the computer or open a new shell window.
To achieve that goal with aliases (or any other bash shell commands),
simply insert the commands in the file `.bashrc` in your home directory.
For example, here is an excerpt from my `.bashrc`:
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
pushdp () {
pushd "$(python -c "import os.path as _, ${1}; \
print _.dirname(_.realpath(${1}.__file__[:-1]))"
)"
}
export EDITOR=vim
source /usr/share/git-core/contrib/completion/git-prompt.sh
export PS1='[\u@\h \W$(__git_ps1 " (%s)")]\$ '
# history settings
export HISTCONTROL=ignoredups # no duplicate entries
shopt -s histappend # append, don't overwrite
# R settings
export R_LIBS=$HOME/usr/lib64/R/library
alias R="/usr/bin/R --quiet --no-save"
# Set path
mybin=$HOME/usr/bin
export PATH=$mybin:$HOME/.local/bin:$HOME/usr/local/bin:$PATH:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HOME/usr/local/lib
# Additional aliases
alias grep='grep --color=auto'
alias hgrep='history | grep'
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias more=less
alias vi=vim
alias which='(alias; declare -f) | /usr/bin/which --tty-only \
--read-alias --read-functions --show-tilde --show-dot'
**Exercise**
Look over the content of the example `.bashrc` and make sure you
understand what each line does. For instance, use `man grep` to see what
the option `--color=auto` does. Use `man which` to figure out what the
various options passed to it do.
#### 1.4.2) Keyboard shortcuts
Note that you can use emacs-like control sequences (`Ctrl-a`, `Ctrl-e`,
`Ctrl-k`) to navigate and delete characters.
**Table. Keyboard Shortcuts**
<table>
<thead>
<tr class="header">
<th align="left">Key Strokes</th>
<th align="left">Descriptions</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>Ctrl-a</code></td>
<td align="left">Beginning of line</td>
</tr>
<tr class="even">
<td align="left"><code>Ctrl-e</code></td>
<td align="left">End of file</td>
</tr>
<tr class="odd">
<td align="left"><code>Ctrl-k</code></td>
<td align="left">Delete line from cursor forward</td>
</tr>
<tr class="even">
<td align="left"><code>Ctrl-d</code></td>
<td align="left">EOF; exit</td>
</tr>
<tr class="odd">
<td align="left"><code>Ctrl-c</code></td>
<td align="left">Interrupt current command</td>
</tr>
<tr class="even">
<td align="left"><code>Ctrl-z</code></td>
<td align="left">Suspend current command</td>
</tr>
<tr class="odd">
<td align="left"><code>Ctrl-l</code></td>
<td align="left">Clear screen</td>
</tr>
</tbody>
</table>
2) Basic File Management
---------------------
In Unix, almost "everything is a file." This means that a very wide
variety of input and output resources (e.g., documents, directories,
keyboards, harddrives, network devices) are streams of bytes available
through the filesystem interface. This means that the basic file
management tools are extremely powerful in Unix. Not only can you use
these tools to work with files, but you can also use them to monitor and
control many aspects of your computer.
### 2.1) Files
A file typically consist of these attributes:
- Name.
- Type.
- Location.
- Size.
- Protection.
- Time, date, and user identification.
![Schematic of file attributes.](file.png)
Listing file attributes with `ls`:
$ ls -l
Getting more information with `stat`:
$ stat manual.xml
Finding out what type of file you have:
$ file manual.xml
> **tip**
>
> The `file` command relies on many sources
>
> : of information to determine what a file contains. The easiest part
> to explain is *magic*. Specifically, the `file` command examines
> the content of the file and compares it with information found in
> the `/usr/share/magic/` directory.
>
Changing file attributes with `chmod`:
$ chmod g+w manual.xml
For more detailed information, please see the "Basics of UNIX" tutorial
and screencast here:
<http://statistics.berkeley.edu/computing/training/tutorials>
### 2.2) Navigation
Efficient navigation of the filesystem from the shell is an essential
aspect of mastering Unix. Use `pwd` to list your current working
directory. If you just enter `cd` at a prompt, your current working
directory will change to your home directory. You can also refer to your
home directory using the tilde `~`. For example, if you wanted to change
your current directory to the subdirectory `src` in your home directory
from any other current directory, you could type:
$ cd ~/src
Also if you want to return to the previous directory, you could type:
$ cd -
You can use the pushd, popd, and dirs commands if you would like to keep
a stack of previous working directories rather than just the last one.
### 2.3) Filename Globbing
Shell file globbing will expand certain special characters (called
wildcards) to match patterns of filenames, before passing those
filenames on to a program. Note that the programs themselves don't know
anything about wildcards; it is the shell that does the expansion, so
that programs don't see the wildcards. The following table shows some of
the special characters that the shell uses for expansion.
**Table. Filename wildcards**
<table>
<thead>
<tr class="header">
<th align="left">Wildcard</th>
<th align="left">Function</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>*</code></td>
<td align="left">Match zero or more characters.</td>
</tr>
<tr class="even">
<td align="left"><code>?</code></td>
<td align="left">Match exactly one character.</td>
</tr>
<tr class="odd">
<td align="left"><code>[characters]</code></td>
<td align="left">Match any single character from among <em>characters</em> listed between brackets.</td>
</tr>
<tr class="odd">
<td align="left"><code>[!characters]</code></td>
<td align="left">Match any single character other than <em>characters</em> listed between brackets.</td>
</tr>
<tr class="odd">
<td align="left"><code>[a-z]</code></td>
<td align="left">Match any single character from among the range of characters listed between brackets.</td>
</tr>
<tr class="odd">
<td align="left"><code>[!a-z]</code></td>
<td align="left">Match any single character from among the characters not in the range listed between brackets</td>
</tr>
<tr class="odd">
<td align="left"><code>{frag1,frag2,...}</code></td>
<td align="left">Brace expansion: create strings frag1, frag2, etc.</td>
</tr>
</tbody>
</table>
List all files ending with a digit:
$ ls *[0-9]
Make a copy of `filename` as `filename.old`:
$ cp filename{,.old}
Remove all files beginning with *a* or *z*:
$ rm [az]*
List all the R code files with a variety of suffixes:
$ ls *.{r,q,R}
The `echo` command can be used to verify that a wildcard expansion will
do what you think it will:
$ echo cp filename{,.old}
cp filename filename.old
If you want to suppress the special meaning of a wildcard in a shell
command, precede it with a backslash (`\`). Note that this is a general
rule of thumb in many similar situations when a character has a special
meaning but you just want to treat it as a character.
To read more about standard globbing patterns, see the man page:
$ man 7 glob
**Exercise**
Brace expansion is quite useful and more flexible than I've indicated.
Above we saw how to use brace expansion using a comma comma separated
list of items inside the curly braces (e.g., `{r,q,R}`), but they can
also be used with a sequence specification. A sequence is indicated with
a start and end item separated by two periods (`..`). Try typing the
following examples at the command line and try to figure out how they
work:
$ echo {1..15}
$ echo {a{1..3},b{1..5},c{c..e}}
$ echo {{d..a},{a..d}}
$ echo {{d..b},a,{b..d}}
$ echo {1..5..2}
$ echo {z..a..-2}
### 2.4) Quoting
Finally, a note about using single vs. double quotes in shell code. In
general, variables inside double quotes will be evaluated, but variables
not inside double quotes will not be:
$ echo "$HOME"
/home/jarrod
$ echo '$HOME'
$HOME
**Table. Quotes**
<table>
<thead>
<tr class="header">
<th align="left">Types of Quoting</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>' '</code></td>
<td align="left">hard quote - no substitution allowed</td>
</tr>
<tr class="even">
<td align="left"><code>" "</code></td>
<td align="left">soft quote - allow substitution</td>
</tr>
</tbody>
</table>
This can be useful, for example, when you have a directory with a space
in its name (of course, it is better to avoid spaces in file and
directory names). Since bash uses spaces to parse the elements of the
command line, you might try escaping the spaces with a backslash:
$ ls $HOME/with\ space
file1.txt
However that can be a pain and may not work in all circumstances. A cleaner
approach is to use soft (or double) quotes:
$ ls "$HOME/with space"
file1.txt
If you used hard quotes, you will get this error:
$ ls '$HOME/with space'
ls: cannot access $HOME/with space: No such file or directory
What if you have double quotes in your file or directory name (again, it
is better to avoid using double quotes in file and directory names)? In
this case, you will need to escape the quote:
$ ls "$HOME/\"with\"quote"
So we'll generally use double quotes. We can always work with a literal
double quote by escaping it as seen above.
### 2.5) Basic utilities
Since files are such an essential aspect of Unix and working from the
shell is the primary way to work with Unix, there are a large number of
useful commands and tools to view and manipulate files.
- cat -- concatenate files and print on the standard output
- cp -- copy files and directories
- cut --_remove sections from each line of files
- diff-- find differences between two files
- grep -- print lines matching a pattern
- head -- output the first part of files
- find -- search for files in a directory hierarchy
- less -- opposite of more (and better than more)
- more -- file perusal filter for crt viewing
- mv -- move (rename) files
- nl -- number lines of files
- paste -- merge lines of files
- rm -- remove files or directories
- rmdir -- remove empty directories
- sort -- sort lines of text files.
- split -- split a file into pieces
- tac -- concatenate and print files in reverse
- tail -- output the last part of files
- touch -- change file timestamps
- tr -- translate or delete characters
- uniq -- remove duplicate lines from a sorted file
- wc -- print the number of bytes, words, and lines in files
- wget and curl -- non-interactive network downloader
As we've already seen the general syntax for a Unix program is:
$ command -options argument1 argument2 ...
For example, :
$ grep -i graphics file.txt
looks for the literal string `graphics` (argument 1) in `file.txt`
(argument2) with the option `-i`, which says to ignore the case of the
letters. While :
$ less file.txt
simply pages through a text file (you can navigate up and down) so you
can get a feel for what's in it. To exit `less` type `q`.
To find files by name, modification time, and type:
$ find . -name '*.txt' # find files named *.txt
$ find . -mtime -2 # find files modified less than 2 days ago
$ find . -type l # find links
Unix programs often take options that are identified with a minus
followed by a letter, followed by the specific option (adding a space
before the specific option is fine). Options may also involve two
dashes, e.g., `R --no-save`. A standard two dash option for many
commands is `--help`. For example, try:
$ tail --help
Here are a couple of examples of using the `tail` command:
$ wget https://raw.githubusercontent.com/berkeley-scf/tutorial-using-bash/master/cpds.csv
$ tail -n 10 cpds.csv # last 10 lines of cpds.csv
$ tail -f cpds.csv # shows end of file, continually refreshing
The first line downloads the data from GitHub. The two main tools
for downloading network accessible data from the commandline are `wget`
and `curl`. I tend to use `wget` as my commandline downloading tool as
it is more convenient, but on a Mac, only `curl` is generally available.
A few more tidbits about `grep` (we will see more examples of `grep` in
the section on regular expressions, but it is so useful that it is worth
seeing many times):
$ grep ^2001 cpds.csv # returns lines that start with '2001'
$ grep 0$ cpds.csv # returns lines that end with '0'
$ grep 19.0 cpds.csv # returns lines with '19' separated from '0' by a single character
$ grep 19.*0 cpds.csv # now separated by any number of characters
$ grep -o 19.0 cpds.csv # returns only the content matching the pattern from the relevant lines
Note that the first argument to grep is the pattern you are looking for.
The syntax is different from that used for wildcards in file names.
Also, you can use regular expressions in the pattern. We won’t see this
in detail here, but we will discuss this in the section below on regular
expressions.
It is sometimes helpful to put the pattern inside double quotes, e.g.,
if you want spaces in your pattern:
$ grep "George .* Bush" cpds.csv
More generally in Unix, enclosing a string in quotes is often useful to
indicate that it is a single argument/value.
If you want to explicitly look for one of the special characters used in
creating patterns (such as double quote (`"`), period (`.`), etc., you
can "escape" them by preceding with a back-slash. For example to look
for `"Canada"`, including the quotes:
$ grep "\"Canada\"" cpds.csv
$ grep "19\.0" cpds.csv
If you have a big data file and need to subset it by line (e.g., with
`grep`) or by field (e.g., with `cut`), then you can do it really fast
from the Unix command line, rather than reading it with R, SAS, Python,
etc.
Much of the power of these utilities comes in piping between them (see
the next section) and using wildcards (see the section on Globbing) to
operate on groups of files. The utilities can also be used in shell
scripts to do more complicated things.
We will look at several examples of how to use these utilities below,
but first let's discuss streams and redirection.
**Exercise**
You've already seen some of the above commands. Follow the links above
and while you are reading the abbreviated man pages consider how you
might use these commands.
### 2.6) Streams, Pipes, and Redirects
Unix programs that involve input and/or output often operate by reading
input from a stream known as standard input (*stdin*), and writing their
results to a stream known as standard output (*stdout*). In addition, a
third stream known as standard error (*stderr*) receives error messages
and other information that's not part of the program's results. In the
usual interactive session, standard output and standard error default to
your screen, and standard input defaults to your keyboard.
You can change the place from which programs read and write through
redirection. The shell provides this service, not the individual
programs, so redirection will work for all programs. The following table
shows some examples of redirection.
**Table. Common Redirection Operators**
<table>
<thead>
<tr class="header">
<th align="left">Redirection Syntax</th>
<th align="left">Function</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td align="left"><code>$ cmd > file</code></td>
<td align="left">Send <em>stdout</em> to <em>file</em></td>
</tr>
<tr class="even">
<td align="left"><code>$ cmd 1> file</code></td>
<td align="left">Same as above</td>
</tr>
<tr class="odd">
<td align="left"><code>$ cmd 2> file</code></td>
<td align="left">Send <em>stderr</em> to <em>file</em></td>
</tr>
<tr class="even">
<td align="left"><code>$ cmd > file 2>&1</code></td>
<td align="left">Send both <em>stdout</em> and <em>stderr</em> to <em>file</em></td>
</tr>
<tr class="odd">
<td align="left"><code>$ cmd < file</code></td>
<td align="left">Receive <em>stdin</em> from <em>file</em></td>
</tr>
<tr class="even">
<td align="left"><code>$ cmd >> file</code></td>
<td align="left">Append <em>stdout</em> to <em>file</em>:</td>
</tr>
<tr class="odd">
<td align="left"><code>$ cmd 1>> file</code></td>
<td align="left">Same as above</td>
</tr>
<tr class="even">
<td align="left"><code>$ cmd 2>> file</code></td>
<td align="left">Append <em>stderr</em> to <em>file</em></td>
</tr>
<tr class="odd">
<td align="left"><code>$ cmd >> file 2>&1</code></td>
<td align="left">Append both <em>stdout</em> and <em>stderr</em> to <em>file</em></td>
</tr>
<tr class="even">
<td align="left"><code>$ cmd1 | cmd2</code></td>
<td align="left">Pipe <em>stdout</em> from <em>cmd1</em> to <em>cmd2</em></td>
</tr>
<tr class="odd">
<td align="left"><code>$ cmd1 2>&1 | cmd2</code></td>
<td align="left">Pipe <em>stdout</em> and <em>stderr</em> from <em>cmd1</em> to <em>cmd2</em></td>
</tr>
<tr class="even">
<td align="left"><code>$ cmd1 | tee file1 | cmd2</code></td>
<td align="left">Pipe <em>stdout</em> from <em>cmd1</em> to <em>cmd2</em> while simultaneously writing it to <em>file1</em></td>
</tr>
<tr class="even">
<td align="left"></td>
<td align="left">using <em>tee</em></td>
</tr>
</tbody>
</table>
Note that `cmd` may include options and arguments as seen in the
previous section.
#### 2.6.1) Standard Redirection
Operations where output from one command is used as input to another
command (via the `|` operator) are known as pipes; they are made
especially useful by the convention that many UNIX commands will accept
their input through the standard input stream when no file name is
provided to them.
A simple pipe to `wc` to count the number of words in a string:
$ echo "hey there" | wc -w
2
Translating lowercase to UPPERCASE with `tr`:
$ echo 'user1' | tr 'a-z' 'A-Z'
USER1
Here's an example of finding out how many unique entries there are in
the 2nd column of a data file whose fields are separated by commas:
$ cut -d',' -f2 cpds.csv | sort | uniq | wc
$ cut -d',' -f2 cpds.csv | sort | uniq > countries.txt
Above we use the `cut` utility to extract the second field (`-f2`) or
column of the file `cpds.csv` where the fields (or columns) are split or
delimited by a comma (`-d','`). The standard output of the `cut` command
is then piped (via `|`) to the standard input of the `sort` command.
Then the output of `sort` is sent to the input of `uniq` to remove
duplicate entries in the sorted list provided by `sort`. Rather than
using `sort | uniq`, you could also use `sort -u`. Finally, the first of
the `cut` commands prints a word count summary using `wc`; while the
second saving the sorted information with duplicates removed in the file
`countries.txt`.