Performing a Join from the Command Line

The examples in this section use the following files:

$ cat one
9999 first line file one.
aaaa second line file one.
cccc third line file one.

$ cat two
aaaa FIRST line file two.
bbbb SECOND line file two.
cccc THIRD line file two.

The first example shows the simplest use of join. The files named one and two are joined based, by default, on the first field in each line of both files. Both files are in sorted order based on the join field. The join fields on two pairs of lines match and join displays those lines.

$ join one two
aaaa second line file one. FIRST line file two.
cccc third line file one. THIRD line file two.

You can use the – –check-order option to see if both files are properly sorted. Following, sort with the –r option sorts one in reverse alphabetical order and sends the output through a pipe to join. The shell replaces the argument to join with the standard input to join which comes from the pipe; join displays an error message.

$ sort -r one | join --check-order - two
join: file 1 is not in sorted order

Next, the –a option with an argument of 1 causes join to display, in addition to its normal output, lines from the first file (one) that do not have a matching join field.

$ join -a 1 one two
9999 first line file one.
aaaa second line file one. FIRST line file two.
cccc third line file one. THIRD line file two.

Use –v in place of –a to inhibit the display of lines that join normally displays (those that have a matching join field).

$ join -v 1 one two
9999 first line file one.

The final example uses onea as the first file and specifies the third field of the first file (–1 3) as the match field. The second file (two) uses the default (first) field for matching.

$ cat onea
first line aaaa file one.
second line 1111 file one.
third line cccc file one.
$ join -1 3 onea two
aaaa first line file one. FIRST line file two.
cccc third line file one. THIRD line file two.
This entry was posted in bash, Sysadmin. Bookmark the permalink.