Saturday, May 15, 2010

Join two files based on a column

empfile
12345,Mary J,HR
34512,J Smith,Admin
34700,A Ryan,Admin
34900,B Wilson,HR
59000,C Diaz,HR

mgrfile
12345,HR
34700,Admin

So if we need to get name of the managers for each department then we would need to join the mgrfile with empfile on employee number.

$join -t , -o '2.2 2.3' mgrfile empfile
Mary J,HR
A Ryan,Admin

Option -t is to specify the field separator in the files.
Option -o is to list the fields that we need in the output. "2.3" means 3rd field of 2nd file.

By default join command joins on the first field of the files. So if we needed to get the manager for each employee, we would join the two files on department type.

To do this correctly both the files should be sorted** on the field being used for join.

$join -t, -13 -22 -o '1.2 1.3 2.1' empfilesorted mgrfilesorted
J Smith,Admin,34700
A Ryan,Admin,34700
Mary J,HR,12345
B Wilson,HR,12345
C Diaz,HR,12345

We have two new entries on the cmd above : -13 and -22. These can also be written as -1 3 and -2 2. These indicates the fields we are joining on.
-1 3 means 3rd field of 1st file

**See here on how to sort on a field.

No comments:

Post a Comment