Posts Tagged ‘Sysadvent’

Deduplication of old filesystems

Sunday, December 18th, 2016

Modern filesystems, and even storage systems, might have built-in deduplication, but common filesystems still do not. So checking for redundant data and do deduplication when possible might save disk space.

Once up on a a time, there was a system, were we had this 6TB spool of binary files on an production ext4 filesystem, and the volume was running out of disk space. The owner of the data thought it likely that there were duplicates in the vast ammount of files, and wanted to check this up. We checked using fdupes, and yes, there were a lot of duplicates.

Read the rest of the post at Redpill Linpro’s sysadvent blog

Bash: Random numbers for fun and profit

Tuesday, December 13th, 2016

bash has many things that just works automagically. Did you know it has a built-in pseudorandom number generator? Let’s play some games! Read rest of the post here!

Bash process substitution

Saturday, December 12th, 2015

Also posted on Redpill Linpro’s sysadvent blog

In bash, we often use redirects (that is < and > ) to get output from a command to a file, or input from a file to a command. But sometimes, commands takes two or more files as input. Then our ordinary scheme does not work anymore.

Let’s say you want to diff(1) the output of two commands. For example, compare the contents of two directories. You may run the two commands, and redirect the output to files, then diff the files, and finally remove the files. Awkward.

 $ ls dir1 | sort > file1
 $ ls dir2 | sort > file2
 $ diff -u file1 file2
 $ rm file1 file2

Since diff can take stdin as one input via the special filename ‘-‘, we might cut down to one file, but this is still awkward.

 $ ls dir1 | sort > file1
 $ ls dir2 | sort | diff -u file1 -
 $ rm file1

Bash has (of course) a better solution: Process Substition, that is, treat the output (or input) of commands as files. Enter the process substitution operators:

 >(command list) # Input
 <(command list) # Output

Now, let us solve our diff challenge with a simple oneliner:

 $ diff -u <( ls dir1 | sort)  <( ls dir2 | sort )

Neat, isn’t it? I use this all the time!

Bonus: Avoid subshell scripting

The following bash shell loop is a pitfall often missed, leading to subtle bugs that are hard to spot. Pipe to a while loop runs in a subshell, so global variables goes out of scope when they are changed inside the loop.

 #!/bin/bash
 global=0

echo "Outside loop, global=$global"

 for n in 1 2 3; do echo $n; done | \
 while read i; do
     global=$i
     echo "Inside loop: global=$global"
 done
 
 echo "Outside loop, global=$global again :-("

Using command substitution, we avoid this elegantly:

 #!/bin/bash
 global=0
 
 echo "Outside loop, global=$global"
 
 while read i; do
     global=$i
     echo "Inside loop: global=$global"
 done < <( for n in 1 2 3; do echo $n; done )
 
 echo "Outside loop, global=$global still :-)"