Skip to content

UnixStats

pluteski edited this page Apr 5, 2017 · 1 revision

Handy Unix commands for tallying results

Unix provides myriad commands useful for tallying transcripts or keeping tabs on how much work remains. The find command is just one of them but the tool I found most useful.

Listing transcripts

The following examples are for IBM transcripts but are easily modified to handle Google transcripts or any other text file name or extension.

Filters out empty files, sorts by size, smallest first.

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" -size +10c | sort -r -n -k7

Filters out empty files, sorts by size, smallest last.

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" -size +10c | sort -n -k7

Filters out large files and finds only small files having less than 10 bytes, sorted by size, smallest last

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" -size -10c | sort -n -k7

Counts number of IBM transcripts

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" | wc -l

Counts number of IBM transcripts containing at least 10 characters

$ find "/Volumes/Samsung USB/AudioJournals/ibm_stt" -name hypotheses.txt -size +10c | wc -l 10

Counts number of IBM transcripts containing less than 10 characters

$ find "/Volumes/Samsung USB/AudioJournals/ibm_stt" -name hypotheses.txt -size -10c | wc -l 0