Wednesday, February 11, 2015

How to count words in a delimited file?

I was asked how would one do word count in a delimited file and it took me by a surprise since my goto Unix command "wc" works only with tab or space delimited text.

From wc manual page - "A word is defined as a string of characters delimited by white space characters."

This is where sed is comes to rescue.

ruch:coding ruchi$ cat poem.csv twinkle,twinkle,little star,how,I, wonder what,you, are. ruch:coding ruchi$ cat poem.csv | wc -w 5 ruch:coding ruchi$ sed 's/,/ /g' poem.csv | wc -w 10 ruch:coding ruchi$

Can you guess why wc returns 5 instead of 3 from poem.csv?




No comments:

Post a Comment