UnixStats

Handy Unix commands for tallying results

Unix provides myriad commands useful for tallying transcripts or keeping tabs on how much work remains. The find command is just one of them but the tool I found most useful.

Listing transcripts

The following examples are for IBM transcripts but are easily modified to handle Google transcripts or any other text file name or extension.

Filters out empty files, sorts by size, smallest first.

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" -size +10c | sort -r -n -k7

Filters out empty files, sorts by size, smallest last.

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" -size +10c | sort -n -k7

Filters out large files and finds only small files having less than 10 bytes, sorted by size, smallest last

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" -size -10c | sort -n -k7

Counts number of IBM transcripts

$ find "/Volumes/Samsung USB/AudioJournals_TEST/" -name "hypotheses.*" | wc -l

Counts number of IBM transcripts containing at least 10 characters

$ find "/Volumes/Samsung USB/AudioJournals/ibm_stt" -name hypotheses.txt -size +10c | wc -l 10

Counts number of IBM transcripts containing less than 10 characters

$ find "/Volumes/Samsung USB/AudioJournals/ibm_stt" -name hypotheses.txt -size -10c | wc -l 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly