Google SRE Question on Mass Changing File Extensions

I recently applied for a SRE (Service Reliability Engineer) position at Google, and one of my questions was on mass changing all the files in a directory from html to htm and vise versa. I don’t remember what crappy answer I gave, but here is what I worked out afterwards. (I don’t script all the time, so my solution is a little rough)
First, create a few files in, say, your /tmp directory with which you can goof around.

#cd /tmp
#touch 1.htm 2.htm 3.htm 4.htm

(1) List the to make sure that they all are there ok:

#ls
1.htm 2.htm 3.htm 4.htm

(2) Now, we’ll use awk to print out “mv ” plus the results of each colume (in this case, a column equals a file) two times with a space in between, the second time adding an “L” to the end of the output.

#ls | awk '{print "mv "$1" "$1"l"}'
mv 1.htm 1.html
mv 2.htm 2.html
mv 3.htm 3.html
mv 4.htm 4.html

(3) Once that looks good, we’ll tack on a pipe and the ’sh’ command in order the command line to execute that script.

#ls | awk '{print "mv "$1" "$1"l"}' | sh

Le voila! All those files are changed.

Now, say that they want the opposite? This gets a little tricky just using awk because we’re deleting a character as opposed to simply adding, so we’ll add some of unique garbage to the end of the first file, then use sed to search and replace that garbage string.

(1) List all the files in the directory.

#ls
1.html 2.html 3.html 4.html

(2) Now, we do something very similar to our previous step using awk, except this time, we put on some trash string at the end in order to uniquely idendify it later when we do a search and replace.


#ls | awk '{print "mv "$1" " " "$1".ZZZ"}'
mv 1.html 1.html.ZZZ
mv 2.html 2.html.ZZZ
mv 3.html 3.html.ZZZ
mv 4.html 4.html.ZZZ

(3) Perfect, now we can replace the ‘html.ZZZ’ ending with our desired ‘htm’ one instead

#ls | awk '{print "mv "$1" " " "$1".ZZZ"}' | sed 's_html.ZZZ_htm_'
mv 1.html 1.htm
mv 2.html 2.htm
mv 3.html 3.htm
mv 4.html 4.htm

(4) Like before, once the output looks good, we’ll tack on ” | sh” to the end to tell the computer to run it as a special command.

ls | awk '{print "mv "$1" " " "$1".ZZZ"}' | sed 's_html.ZZZ_htm_' | sh

Update: Sr. Unix Pimp, Jarvis Talley, just wrote me with a more elegant solution:

mkdir /tmp/htm_files
grep htm * | grep -v html | cut -d : -f 1 | xargs -t -i cp {} /tmp/htm_files.and

for i in `cat /tmp/htm_files`
do
cat $i | sed ’s/htm/html/g’
done


About this entry