Unzip and rename with sed:



Many times when I downloaded zip files with books source code I faced one and the same set of tasks:

1. unzip complete archive into working directory:

2. unzip individual chapters files and rename them.

To automate those tasks above I decided to write simple Linux/Unix bash script!

In this particular case I have downloaded zip file (574841_codefiles.zip) with source code of "Professional C++" book.

After unzipping it into test directory I've got the following:

geo@fermat:/home/work/cpp/profc++/test$ l
total 716
-rw-rw-rw- 1 geo geo   9859 Nov 10  2004 574841_ch10.zip
-rw-rw-rw- 1 geo geo  45024 Nov 10  2004 574841_ch11.zip
-rw-rw-rw- 1 geo geo  14590 Nov 10  2004 574841_ch12.zip
-rw-rw-rw- 1 geo geo   3028 Nov 10  2004 574841_ch13.zip
-rw-rw-rw- 1 geo geo   7403 Nov 10  2004 574841_ch14.zip
-rw-rw-rw- 1 geo geo  21951 Nov 10  2004 574841_ch15.zip
-rw-rw-rw- 1 geo geo  16590 Nov 10  2004 574841_ch16.zip
-rw-rw-rw- 1 geo geo  42543 Nov 10  2004 574841_ch17.zip
-rw-rw-rw- 1 geo geo   3964 Nov 10  2004 574841_ch18.zip
-rw-rw-rw- 1 geo geo   4463 Nov 10  2004 574841_ch19.zip
-rw-rw-rw- 1 geo geo   9835 Nov 10  2004 574841_ch1.zip
-rw-rw-rw- 1 geo geo  16440 Nov 10  2004 574841_ch20.zip
-rw-rw-rw- 1 geo geo  33351 Nov 10  2004 574841_ch21.zip
-rw-rw-rw- 1 geo geo  17703 Nov 10  2004 574841_ch22.zip
-rw-rw-rw- 1 geo geo  14763 Nov 10  2004 574841_ch23.zip
-rw-rw-rw- 1 geo geo   9189 Nov 10  2004 574841_ch24.zip
-rw-rw-rw- 1 geo geo   6258 Nov 10  2004 574841_ch25.zip
-rw-rw-rw- 1 geo geo   7304 Nov 10  2004 574841_ch26.zip
-rw-rw-rw- 1 geo geo    558 Nov 10  2004 574841_ch7.zip
-rw-rw-rw- 1 geo geo  20532 Nov 10  2004 574841_ch8.zip
-rw-rw-rw- 1 geo geo  30760 Nov 10  2004 574841_ch9.zip
-rw-r--r-- 1 geo geo 339934 Jul 22 11:25 574841_codefiles.zip
-rw-rw-rw- 1 geo geo   3219 Nov 10  2004 README
geo@fermat:/home/work/cpp/profc++/test$ 

So my task here is to write a bash script to automate tasks of removing prefix (574841_) and suffix (.zip) before extracting files into individual chapters directory. Let's do it.

I start with bash script header and variable f(ile) which goes though all zip files in the directory and will be transformed inside do/done bash cycle:

#!/bin/bash 
for f in *.zip 
do
done

Next I print out current value of files names using echo command before removing the unwanted prefix (574841_) and suffix(.zip) using sed/substitute commands and backticks and assigning new value to the DIR variable to be used at extraction step. Also I print out value of DIR using echo command:

echo "Initial value of File = $f"
#removing 574841_prefix and .zip suffix from file names using sed/substitute:
#and assign new value to DIR variable to be used for extracting files!!!
DIR=`echo $f |sed 's/574841_//' | sed 's/.zip//'`
echo "New directory name: $DIR"

Running this semi-finished script for testing produced the following results:

Initial value of File = 574841_ch24.zip
New directory name: ch24
Initial value of File = 574841_ch25.zip
New directory name: ch25
Initial value of File = 574841_ch26.zip
New directory name: ch26
Initial value of File = 574841_ch7.zip
New directory name: ch7
Initial value of File = 574841_ch8.zip
New directory name: ch8
Initial value of File = 574841_ch9.zip
New directory name: ch9
Initial value of File = 574841_codefiles.zip
New directory name: codefiles

Which means I successfully removed prefix and suffix from file name and assigned it to new directory name DIR using sed/substitute command and backticks.

Notice that there are no spaces between DIR and backticks and = sign!

Let's have a look into those directories we've unzipped our archives into:

drwxr-xr-x 23 geo geo 4096 Jul 26 10:37 test
geo@fermat:/home/work/cpp/profc++$ l ch14
total 4
drwxr-xr-x 15 geo geo 4096 Jul 21 19:22 574841_ch14
geo@fermat:/home/work/cpp/profc++$

It seems that we are not done yet: each of our DIRs contains 574841_ch* dir which needs to be fixed again!

This time we gonna use find/exec/rename set of bash commands: we gonna find all directories which contain

directories named like 574841_* using "find . -name 574841_* -type d" and execute rename on this set of dirs:

find . -name 574841_* -type d -exec rename '/s/574841_//' '{}' \;

Now if I check the results, I will see the following:

geo@fermat:/home/work/cpp/profc++/test$ l ch26
total 4
drwxr-xr-x 7 geo geo 4096 Jul 22 12:10 ch26
geo@fermat:/home/work/cpp/profc++/test$ l ch26/ch26
total 20
drwxr-xr-x 2 geo geo 4096 Oct 31  2004 CarFactory
drwxr-xr-x 2 geo geo 4096 Oct 31  2004 Decorator
drwxr-xr-x 2 geo geo 4096 Oct 31  2004 ParsedXMLElement
drwxr-xr-x 2 geo geo 4096 Oct 31  2004 SingletonLogger
drwxr-xr-x 2 geo geo 4096 Oct 31  2004 StaticLogger
geo@fermat:/home/work/cpp/profc++/test$ 
Also I need to remove all those unneeded .zip files in the last step ouside of do/done cycle

and my script is ready for test run.

#!/bin/bash

for f in *.zip 
do
echo "Initial value of File = $f"
#remove 574841_prefix and .zip suffix from file names
#and assign new value to DIR variable to be used for extracting files!!!
DIR=`echo $f |sed 's/574841_//' | sed 's/.zip//'`
echo "New directory name: $DIR"
#unziping files into new directories!
unzip $f -d $DIR
done
#removing all original .zip files 
rm -f *.zip

Inside test directory I see the following:

geo@fermat:/home/work/cpp/profc++/test$ l
total 96
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch1
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch10
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch11
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch12
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch13
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch14
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch15
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch16
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch17
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch18
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch19
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch20
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch21
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch22
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch23
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch24
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch25
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch26
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch7
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch8
drwxr-xr-x 3 geo geo 4096 Jul 22 12:10 ch9
drwxr-xr-x 2 geo geo 4096 Jul 22 12:10 codefiles
-rw-rw-rw- 1 geo geo 3219 Nov 10  2004 README
-rw-r--r-- 1 geo geo  365 Jul 22 12:09 unzip-rename.txt
geo@fermat:/home/work/cpp/profc++/test$ 

Now I am ready to remove all ch* directories in my production dir

and move all dirs from test dir to production dir:

$rm -fr ch*; mv test/ch* .;

The final result is here:

geo@fermat:/home/work/cpp/profc++$ l
total 96
-rw-r--r-- 1 geo geo 2389 Jul 26 10:59 bash-scripts.txt
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch1
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch10
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch11
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch12
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch13
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch14
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch15
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch16
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch17
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch18
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch19
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch20
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch21
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch22
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch23
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch24
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch25
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch26
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch7
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch8
drwxr-xr-x 3 geo geo 4096 Jul 26 11:00 ch9
-rw-rw-rw- 1 geo geo 3219 Nov 10  2004 README
drwxr-xr-x 2 geo geo 4096 Jul 26 13:28 test
geo@fermat:/home/work/cpp/profc++$

Stop services with awk:

Sometimes you need to stop services related to a program (e.g. evolution mail) because you need memory for performance critical task or simply because you do not need that program now. Here's how you stop evolution services using grep, awk and backticks:

geo@fermat:~$ ps axf|wc
    144     932    8162
geo@fermat:~$ ps axf|grep [e]vol
 2104 ?        S      0:00          \_ /usr/lib/evolution/2.30/evolution-alarm-notify
 2121 ?        S      0:00 /usr/lib/evolution/e-calendar-factory
 2146 ?        S      0:00 /usr/lib/evolution/e-addressbook-factory
geo@fermat:~$ ps axf|grep [e]vol|kill -9 `awk '{print $1}'`
geo@fermat:~$ ps axf|grep [e]vol
geo@fermat:~$ ps axf|wc
    141     916    7943
geo@fermat:~$ 

Essentially I take column 1 (PID)from grep [e]vol output using awk and send it back to kill using backticks. And that is it! All evolution related processes are gone (as clear after second grep).

'-9' parameter for kill is needed to terminate numerous Java threads, for example when one is working with JBoss application server.

You can download file with scripts and from here : bash-scripts.txt.