“Being able to interact flexibly, swiftly, and efficiently with the underlying data and software systems is an indispensable skill.” ― Diomidis Spinellis

These are my personal notes of the edX course Unix Tools: Data, Software and Production Engineering, by Prof. Diomidis Spinellis. I attended this course from March to June, 2020. It was my first MOOC experience. I have to say that I learned a lot in this course, and it makes me consider online education very seriously, as it provides an excellent way of learning from top courses, given by top universities, and tough by top experts in the world.

These notes are not organized in any specific manner. So, they are actually a bunch of very disordered Unix command line hacks and tricks :grin:.

Display elapsed time from the start to the end of process

1
2
3
4
start=$(date +%s) 
sleep 5
end=$(date +%s)
echo "The process lasted $(expr $end - $start) seconds." 

Combine commands

1
cat -n afile >result && rm afile # rm will be executed only if cat succeds
1
2
touch afile 
copy afile bfile || echo Copy failed # the echo will be executed

Loops

For

Function that passes input timezone and convert it to a desire time zone. To use it, run showtime Europe/Sweden America/La_Habana 11:45

1
2
3
4
5
6
7
8
showtime() 
{
   local TZIN=$1
   local TZOUT=$2
   local TIME=$3
   echo -n "$TZIN $TZOUT $TIME"
   TZ=$TZOUT date --date="TZ=\"$TZIN\" $TIME"
}

Use the showtimefor different timezones.

1
2
3
for i in Europe/London Europe/Paris Europe/Madrid; do
    showtime America/Los_Angeles $i 11:45
done
1
2
3
4
5
6
TZONES='America/Los_Angeles America/New_York Europe/London'
for tzin in $TZONES; do
    for tzout in $TZONES; do
      showtime $$tzin $tzout 11:45
    done
done

While

Computes the average number of characters per line across the files in a directory.

1
2
3
4
5
6
7
8
ls | 
while read name ; do # For every entry
    if [ -f $name -a -r $name ] ; then # If is a regular file and readable
      echo -n "$name "
      expr $(wc -c <$name) / $(wc -l <$name)#display average characters per line
    fi
done |
head

Conditionals

If

Create two files, and check if sourcefile is older than destfile.

1
2
3
4
5
6
7
8
touch sourcefile
touch destfile

# Make source newer than destination
if test [ soutcefile -nt destfile ] ; then
    cp sourcefile destfile
    echo Refreshed destfile
fi

If-else

1
2
3
4
5
6
if test [ soutcefile -nt destfile] ; then
   cp sourcefile destfile
   echo Refreshed destfile
else
   echo destfile is up to date
fi

xargs

Executes the commands repeatedly to the output. Apply a set of commands as arguments to a command. The following program counts the number of lines of files in current directory

1
2
3
find . -type f | # Output the name of all files
xargs cat | # Combining them by applying cat
wc -l # Count number of lines

case

Allows to run specific command based on pattern matching.

1
case $(uname) in Linux

Data processing flow

1
2
3
4
5
6
git clone gitrepo; cd gitrepo
git log --pretty=format:%aD master | # Fetch commit dates
cut -d, -f1 # Select the weekday
sort |      # Bring all weekdays together
uniq -c |   # Count all weekday occurrences
sort -rn     # Order by descending popularity

Append a timestamp to a log file

1
echo $(date): operation failed >>log-file

Fetching data

From the web

We can invoke a web service to get some results and then pipe to jq to output the result in pretty-print format.

1
2
curl -s "http://api.currencylayer.com/\                                     Thursday 26 March 07:04
live?access_key=$API_KEY&source=USD&currencies=EUR" | jq .

From a MySQL database

1
2
echo "SELECT COUNT(*) FROM projects" | # SQL query
mysql -ughtorrent -p ghtorrent # MySQL client and database
1
2
3
4
5
6
echo 'select url from projects limit 3' | # Obtain URL of first three projects
mysql -ughtorrent -p ughtorrent | # Invoke MySQL client
while read url ; do
    curl -s $url | # Fetch project's details
    jq -r '{owner: .owner.login, name: .name, pushed: .pushed_at}' # Print owner, project, and last push time
done

Archives

List the content of an archive file in the web without pushing its content in the disk.

1
2
3
curl -Ls https://github.com/castor-software/depclean/archive/1.0.0.tar.gz | # Download tar file
tar -tzvf - | # -t list content, z- indicates zip compression, -v is verbose, -f retrieve file to the output of curl 
head -10 # list first 10 entries

Decompress the file in the disk

1
2
curl -Ls https://github.com/castor-software/depclean/archive/1.0.0.tar.gz | # Download tar file
tar -xzf - 

Create and compress archives

1
tar -czf share.tar.gz /usr/share

Using a sub-shell to pipe various commands

1
2
3
4
5
6
7
8
9
mkdir dict2
(
    cd /usr/share/dict
    tar -cf - .
) | (
    cd dict2
    tar -xf -
)
ls dict2

Compressing the content of current directory

1
ls -lR /home/joe >ls.out ; gzip ls.out

Version Control

Who has performed the most commits?

1
2
3
4
5
git log --pretty=format:%ae | # list each commit author email                                            Saturday  4 April 10:24 148 ↵
sort | # Bring emails together
uniq -c | # Count occurrence
sort -rn | # Order by number
head 

What is the file the largest number of changes?

1
2
3
4
5
6
7
find . -type f -print |                                                         
    while read f ; do # For each file
    echo -n "$f " # Prints its name on a single line
    git log --follow --oneline $f | wc -l # Count the number of changes
    done |
    sort -k 2nr | # Sort by the second field in reverse numerical order
    head

What are the changes made to a file?

1
2
git blame --line-porcelain src/main/java/spoon/Launcher.java | # obtain line metadata
head -15

Which author has contributed more to a file?

1
2
3
4
5
6
git blame --line-porcelain src/main/java/spoon/Launcher.java |
grep "^author " | #Show each line's author
sort |  # Order by author
uniq -c | # Count author instances
sort -rn | # Order by count
head

What is the average date of all lines in a file?

1
2
date +%s # show date in epoch
date -d @1585990273 # parse data from epoch to date
1
2
3
4
5
date -d @$(
   git blame --line-porcelain src/main/java/spoon/Launcher.java |
   awk "/^author-time / {sum += $2; count++} # Maintain sum and count of commit times
   END {printf("%d", sum / count)}"
)

What is the evolution of the file size?

1
2
3
4
5
6
7
8
file=src/main/java/spoon/Launcher.java # File to examine
git log --pretty=format:%H -3 $ $file # Show SHA of commmits
git log --pretty=format:%H $file | # Obtain commits' SHA
while read sha ; do # For each SHA
    git show $sha:$file | # List files stated at that commit
    wc -l
done |
head -15 # First 15 entries

System administration

Unix store administrative date in /etc (stands for “extreme technical context”)

Generators

1
2
3
for i in $(seq 50) ; do
    echo -n "." # displais 50 dots
done

Regular expressions

grep

1
2
3
4
5
cd /usr/share/dict
grep baba words # All lines (words) containing baba
grep "^baba" words # All lines (words) starting with baba
grep "baba$" words # All lines (words) ending with baba
grep a.a.a.a words # Words containing a followed by anything
1
2
3
4
5
6
7
8
9
grep "^t.y$" words # Three letter words starting with t, ending with y
grep "^....$" words | wc -l # Number of four letter words
grep "^k.*d.*k$" words # Words starting with k, ending with k, and with a d in between
grep "^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$" words | wc -l # words that follow the alphabetical order
grep "[0-9]" words # Lines containing a digit
grep "^k.[bdgkpt].k$" words # Words with a hard consonant between ks
grep "^[A-Z]" words | wc -l # Number of proper nouns
grep "[^A-Za-z]" words # Lines with non-alphabetic characters
find ~/Downloads | grep "[[:space:]]" # List files with space characters

egrep (or grep -E)

1
2
3
4
5
6
grep -E "s{2}" words  # Words with two secuentical s characters
grep -E "[^aeiouy]{7}" words # Words with seven consonants
grep -E "^.{,15}$" words | wc -l # Words with a length up to 15
grep -E "^.{15,}$" words | wc -l # Words with at least 15 characters
grep -E "^(.).*\1$" words | head # Words beginning and ending with the same character (the character in parentesis is referenced with \1)
grep -E "^(.)(.)((.)\4)?\2\1$" words # Find 3-4 letter palindromes 

Alternative matches.

1
2
grep -E "^(aba|ono).*(ly|ne)$" words # Words with alternative start/end parts
grep -l vfs *.c # List C files containing vfs

Matches in files (grep -F)

1
2
3
grep -rl read_iter . | head -5 # Search recursively all the files that contain the string read_iter

grep -F ... *.c | head

Other tools

cut

1
2
cd /etc/passwd
cut -d: -f 1 /etc/passwd | head -4 # Output the first field

awk

1
2
3
4
awk "/bash/" /etc/passwd # Output lines containing "bash"
awk -F: "$3 > 1000" /etc/passwd # Lines where field 3 > 1000
awk -F: "{print $1}" /etc/passw | head -5 # Output field 1
awk "!/^#/ {print $1}" /etc/services | head # Print first field that doesn't match the regular expression

ack

ack - grep-like text finder

1
2
ack --ruby foo # Find foo in occurrences in ruby files
ack abc -l # List files with the occurrence of abc character

Processing

Sorting

1
2
3
4
5
6
7
8
sort -k 2 dates | head -5 # Sort by second and subsequent fields (space separated)
sort -k 4r dattes | head # Sorts in reverse order
sort -k 3M -k 2n dates | head  # sort 3rd field (Month) in chronological order, then second field (Month day) in numberical order

sort -t : -k 4n /etc/passwd | head # Sort by numeric group-id

sort -u /etc/passw | head # sort the unique elements
sort dates | sort -C && echo "Sorted"

The comm command allows to select or reject lines common to two files. Both files must be sorted.

1
comm linux.bin freebsd.bin

sed

substitutions

Create JSON from list of items.

1
2
3
4
5
6
7
8
9
10
11
vim tojson.sed

#!/bin/bash
li``
[ # Insert [ at the beginning
s/.*/ "&",/ # Convert lines into strings
$a\  # Append ] at the end
]
EOF

ls /usr | tojson.sed

awk

Summarize size of of files in a directory.

1
2
3
4
5
6
7
8
9
ll > contents.txt 
awk '                    
{ size += $5; n++ }  # Sum size and number of files
END {                # Print summary
print "Total size " size
print "Number of files " n
print "Average file size " size/n
}
' contents.txt

Count the number of file extension

1
2
3
4
5
6
7
8
9
10
11
12
13
14
ll > contents.txt 
awk '
{
sub(".*/", "", $9)   # Remove path
if (!sub(".*\\.", "", $9))    # Keep only extension
next     # Skip files without extension
size[$9] += $5    # Tally size of extension
}
END {
for (i in size)
print i, size[i]
}' content.txt |
sort -k 2nr |
head

diff

1
diff -u a b

$ patch

1
patch john.c <mary.patch # Patch John's copy with Mary's patch

Testing and expressions

test

1
2
3
4
5
6
7
8
9
10
test -d / && echo Directory # Test if root is a directory
test -f / && echo File # Test if root is a file
test hi = there && echo Same # Test if strings are equal
test hi != hi && echo Different # Test if strings are different
test -z "" && echo Empty # Test if string is empty
test -n "a string" && echo Non-empty # Test if string is non empty
test 32 -eq 42 && echo Equal # Test integers are equal
test 32 -lt 50 && echo Less than # Test if integer less than other
test . -nt / && echo . is newer than / # Test if a file is newer than other
test -w / && echo Writable # Test if a file is writable

expr

1
2
3
4
5
6
7
8
9
10
expr 1 + 2
expr 2 \* 10
expr 12 \% 5
expr 10 \< 50
expr 5 = 12 # Test of equality
expr John \> Mary # Compare strings
expr \(1 + 20 \) \* 2
expr length "To be or not to be" # String length
expr substr "To be or not to vbe" 4 2 # Substring of 2 from 4
expr "" \| b # Short-circuit OR (first part failed)
1
2
3
4
5
i=0
while [ $i -lt 10 ]; do
    echo $i
    i=$((i + 1))
done

Dealing with characters

1
2
echo 'This is a test' | tr ' ' - # Replace space with -
echo 'This is a test' | tr a-z A-Z # Replace a-z A-Z

Encryption & Decription

1
2
3
openssl enc -e -aes-256-cbc -pbkdf2 <pride-and-prejudice.txt >real-secret
freq real-secret
$ openssl enc -d -aes-256-cbc -pbkdf2 <real-secret | head

Dealing with files

tac

Print and concatenate files in reverse (last line first).

1
ll | tac

rev

1
tail /usr/share/dict/words | rev # Reverse characters

paste

1
paste - /usr/share/dict/word

shuf

1
2
shuf -n 5 /usr/share/dict/words # Output five random words
shuf -n 1 -e heads tails # Throw a single coin

split

1
split -l 10000 -d /usr/share/dict/words # Split the dictionary

rs

1
head /etc/passwd | rs -c: -C: -T # Transposes the output

Graphs

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
cho "digraph talks {
        bob [gender="male"];
        eliza [gender="female"];
        fred [gender="male"];
        john [gender="male"];
        mary [gender="female"];
        steve [gender="male"];
        sue [gender="female"];
        mark [gender="male"];

        john -> mary;
        john -> bob;
        mary -> sue;
        sue -> bob;
        sue -> mary;
        fred -> bob;
        eliza -> steve;
}" > talk.dot

Count nodes

1
2
gvpr 'N {clone($0, $)}' talk.dot # Clone each node to the output graph
gvpr

Images & sound

Create a symbolic link to a file or directory:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
tifftopnm image.tiff | pnmtopng >image.png # Convert from TIFF to PNG

for i in *.tiff ; do # For each TIFF file
> pngname=$(basename $i .tiff).png # Compute name of PNG file
> tifftopnm $i | # Convert image to PNM
> pnmtopng >$pngname # Convert and write to PNG file
> done

tifftopnm image.tiff | # Convert TIFF to PNM
> pamscale -width=1024 | # Scale to 1024 pixels
> pnmtopng >image.png # Write the result in PNG format

jpegtopnm plate.jpeg |
> pamflip -r90 | # Rotate the image by 90 degrees
> pamscale -width=1024 | # Scale to 1024 pixels
> pnmtojpeg >rplate.jpeg # Write the result in JPEG format

sound

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
play -q sox-orig.wav

sox sox-orig.wav sox-orig.mp3 # Convert between file formats
sox sox-orig.wav sox-low.wav pitch -600 # Lower pitch by 600 cents
play -q sox-low.wav

sox sox-orig.wav sox-fast.wav tempo 1.5 # Increase tempo by 50%
play -q sox-fast.wav

sox sox-orig.wav sox-chorus.wav chorus 0.5 0.9 50 0.4 0.25 2 -t \
60 0.32 0.4 2.3 -t 40 0.3 0.3 1.3 -s # Apply chorus effect
play -q sox-chorus.wav

wget -q -O persephone.mp3 \
http://ccmixter.org/content/hansatom/hansatom_-_Persephone.mp3 # By Hans Atom (CC BY 2.5)
sox persephone.mp3 persephone-trimmed.mp3 fade 0 0:06 1 # Trim to 6s with 1s fade-out
play -q persephone-trimmed.mp3

sox --combine mix -v 0.2 persephone-trimmed.mp3 sox-orig.wav \
sox-persephone.mp3 # Mix the two audio files
play -q sox-persephone.mp3

Good practices

Output error

1
echo Error >&2 # Send output to standard error

Clean up temporary files when script execution finishes

1
2
3
4
5
6
7
8
cat >tmpdir.sh <<\EOF
#!/bin/sh
TMPDIR="${TMP:-/tmp}/$$" # Create temporary directory name
trap 'rm -rf "$TMPDIR"' 0 # Remove it when exiting
trap 'exit 2' 1 2 15 # Exit when the program is interrupted
mkdir "$TMPDIR" # Create the directory
# Do some work in $TMPDIR
EOF

Prefer redirection to pipes

1
command <file # A redirection is all that's needed

Test command, not its exit code

1
2
3
if ! command ; then # A simple negation will do
   echo Error >&2
fi

grep can recurse directories

1
grep -r pattern . # Modern recursive search

Prefer wildcards to ls

1
2
3
for i in * ; do # can be replaced by a wildcard
   . . .
done

Replace awk with cut

1
2
3
cut -d : -f 1,7 # More efficient way to print fields 1 and 7
expr "$LANG" : '.*\.\(.*\)' # More efficient way to isolate encoding
UTF-8

Replace sed with expr

1
2
echo $LANG
en_US.UTF-8

References