Unix Command Line Tools Tips

Xah Lee, 2007-03

This page shows some advanced unix command line tool tips.

Note: common unix shell utilities such as “find”, “xargs”, “ps”, “diff”, “basename” etc here assumes the GNU version, which is the version used in linuxes. (as opposed to the OSX, BSD, Solaris, or other unix versions. (They differ slightly in options and features.))

Copying a Directory to Server

How to copy local directory to a remote machine, in one shot?

For a one-way copying (or updating), use “rsync”. The remote machine must have rsync installed. Example:

rsync -z -av --rsh="ssh -l mary" ~/web/ mary@example.org:~/

This will copy the local dir “~/web/” to the remote dir “~/” on the machine with domain name “example.org”, using login “mary” thru the ssh protocol. The “-z” is to use compression. The “-a” is for archived mode, basically making the file's meta data (owner/perm/timestam etc) same as the local file (when possible) and do recursive (i.e. upload the whole dir). The “-v” is for verbose mode, which basically makes rsync print out which files is being updated. (rsync does not upload files that's already on the destination and identical.)

For example, here's what i use to sync/upload my website on my local machine to my server.

rsync -z -av --exclude="*~" --exclude=".DS_Store" --exclude=".bash_history" --exclude="*/_curves_robert_yates/*.png" --exclude="logs/*" --exclude="xlogs/*" --delete --rsh="ssh -l u40651121" ~/web/ u40651121@s168753656.onlinehome.us:~/

I used this command daily. The “--exclude” tells it to disregard any files matching that pattern (i.e. if it matches, don't upload it nor delete it on remote server)

See rsync↗.

You can creat a bash alias for the long command e.g. “alias l="ls -al";”, or use bash's back history by “Ctrl+r” then type rsync.

Syncronize Directories on 2 Machines

How to 2-way sync local dir and remote machine?

Use “unison”. Both machines must have unison installed. The “rsync” tool does just one way sync (overwritting any changes on the remote machine), while “unison” asks you for each changed file (or non-existant file/dir) which direction you want the update be.

Here's a sample command i use to update a server i work on.

unison -servercmd /sw/bin/unison /Users/xah/uci-server/vmm ssh://xahlee@virtualmathmuseum.org//Library/WebServer/Documents/vmm

In this server, it contains works done by other people, so i can't just update it one-way with rsync.

The “-servercmd /sw/bin/unison” specifies the path of the unison command on the server. (needed when it is not in the default search path on remote machine's user account) The “/Users/xah/uci-server/vmm” is the local dir. The “ssh://xahlee@virtualmathmuseum.org//Library/WebServer/Documents/vmm” specifies the remote dir, remote machine's domain name, login account, and the protocol to use.

See Unison (file synchronizer)↗.

Downloading a Entire Website

How to download a entire website for offline reading?

Use “wget”. Example: “wget --wait=9 --recursive --level=2 http://example.org/” will download all files from example.org, up to 2 levels deep, with 9 seconds between each fetch. (so you don't spam the server) Some sites check on user agent, so you might add this option “--user-agent='Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)'”.

See wget↗.

How to download just one single file from a website?

Use “curl”. Example: “curl -O http://example.org/somedir/largeMovie.mov” will download largeMovie.mov to your current dir.

Curl can be also used to download a series files with a pattern in their name. For example, “curl -O http://example.org/somedir/girl[01-20].jpg” will download all files in somedir named girl01.jpg, girl02.jpg ...etc. If you use girl[1-20].jpg, then it'll be girl1.jpg, girl2.jpg etc.

Other useful options are “--referer http://example.org/”, “--user-agent "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322)"”. These can be used in case the porn site blogs requests by referer or browser.

Note: curl cannot be used to download entire website recursively like wget can.

See cURL↗.

Comparing Files and Directories

How to tell if 2 binary files have identical content?

cmp ~/myfile1 ~/myfile2”. This is particular useful for binary files.

How to compare 2 text files's differences?

diff ~/myfile1 ~/myfile2”.

Some useful options are: “-i --ignore-case”, “-E --ignore-tab-expansion”, “-b --ignore-space-change”, “-w --ignore-all-space”, “-B --ignore-blank-lines”, “--strip-trailing-cr”

How to test if 2 directories have identical content? (same subdirs and all files in any subdir)

diff -r --brief ~/mydir1 ~/mydir2”. The “-r” means recurvise (all subdirs), and the “--brief” means only output if files differ (as opposed to how they differ) or non-existant.

Text Processing

How to show only certain lines that contains a text pattern?

Use grep. Example: “grep 'html HTTP' myFile” will print only lines containing the text “html HTTP”. “grep 'html HTTP' *html” will apply grep to all files with html suffix. “grep -r 'html HTTP' *html” will apply grep to all html files in a dir.

Use “-f” to include file name in the result, use “-h” to not print file name. Use “-v” to print lines NOT containing the text. Use “-E” for extended regular expression (similar to Perl's) or use “-P” for perl's regex syntax. Use “-i” to ignore case.

Examples:

The last example will show lines only containing “html HTTP” in my apache web log “myFile”, then shown only the 12th and 7th columns (which are referral url, and the requested file), then show only lines that contain “livejournal” or “blogspot” text (“-i” for ignore case and “-E” for extended regex pattern), then sort them, then show only unique lines with number of occurance in prepended, then sort that by the numbers.

How to show only certain columns in a text file?

awk '{print $12 , $7} myFile'” will print the 12th and 7th column. (columns are separated by spaces by default.) For delimiter other than space, for example the straight double quote, use “awk -F\" '{print $12 , $7} myFile'”.

Alternative solution is to use the “cut” utility, but it does not accept regex as delimeters. So, if you have column separated by spaces, “cut” is incapable.

How to sort lines in a file?

To sort lines use “sort myfile”. To sort by considering text as numbers, use “sort -n myfile”. To reverse order, use “-r”. To sort by comparing a particular column, for example the 2nd column, use “sort -k 2 myfile”.

Here's a more complex example: “sort -k 2,2r -k3,3nr myFile”. This will sort by first column first, with reverse order, if tie, sort by 3rd column as numbers and reverse order.

Note, sort does destructive sort by default. For example, if your text file is:

b y
b x

and you use “sort -k 1 myFile”, it will re-order your lines. To make it leave unspecified field as is, use “-s”.

How to show only uniq lines in a file?

sort ~/myfile | uniq”. To prepend the line with a count of repeatition, use “sort ~/myfile | uniq -c

How to sum up the 2nd column in a file?

awk '{sum += $2} END {print sum}' myfile”.

How to show only first few lines of a huge file?

head ~/myfile”. If you want to see first n lines, use “head -n 100 ~/myfile”. If you want to see the bottom of a file, use “tail”.

For complex text processing, you need a full language. See: Perl and Python Tutorial, Emacs Lisp Tutorial.

Simple File Management

How to list only files who's name matches a text pattern?

find ~/myDir -name "*.html"” will show just files with “.html” suffix.

How to list only files larger than n bytes?

find ~/myDir -size +900000c” will list files in “~/myDir” larger than 9 Mega bytes.

To list files smaller than a given size, use a minus sign “-” instead of the plus. To list files of exactly a give size, don't use the plus or minus.

How to delete all files who's name matches a text pattern?

find ~/myDir -name "*~" -exec rm {} \;” will delete all files files who's name ends with “~”.

Using “find” and “xargs”

How to use “find” on file names that may contain spaces or dash?

find . -print0 | xargs -0 -L -i echo "{}";”.

The “-print0” tells “find” to print the file names separeted by a null char. (as opposed to a newline char by “-print”) The “-0” tells xargs to parse input using null char as seperators and take any special char in file name as literal.

The “-L” tells “xargs” to pass just one file name at a time. The “-i” allows you to use “{}” as the file name. The “"{}"” creates quoting around the entire file name, so that “echo” (or another program) will see it as one argument instead of several. (Note: the “-i” must come after “-L”)

Here's a example that uses “find”, “xargs”, and “basename” and ImageMagick's “convert” to convert “bmp” image files to “png”: “find . -name "*bmp" -print0 | xargs -0 -L -i basename -s ".bmp" "{}" | xargs -0 -L -i convert "{}.bmp" "{}.png"”.

Man Page

How to get a text output of a man page?

man ls | col -b”. The “col -b” formats the man page to plain text (rid of control chars).

How to read a non-compressed man page without the “man” command?

nroff -man n43921.man | col -b

This is convenient when you need to read a man-page file once without adding the dir to your $MANPATH.

How to read a compressed man page without the “man” command?

cat n43921.1 | compress -cd - | nroff -man | col -b

How to read a unformatted man page?

a possible solution: “nroff -man ftpshut.8

The “man” command is essentially “nroff -e -man file_name | more -s”.


Related essays:


Page created: 2007-10.
© 2007 by Xah Lee.
Xah Signet