Monday, August 30, 2010

SERIES : awk#4 Simple Calculations

This is the same employee file that we used in the previous post.
$cat empf
100,Smith James,80000,101 Ave N,San Francisco,CA
200,Lloyd Beth,80000,2010 University St,San Francisco,CA
300,Doe John,90000,1005 Royal Dr,Hartford,CT
400,Day Matt,85000,100 W 4th Ave,Seattle,WA
700,Bell Amy,60000,201 Winner CT,Atlanta,GA
800,Hawn Ruby,75000,120 Red Carpet Ln,Los Angeles,CA
While creating the script, it's required to keep "{" on the same line as BEGIN and END. The script errors out otherwise.
NR is the inbuilt awk variable representing "Number of Records"
#!/usr/bin/awk
BEGIN{
FS=","
}
{
# sal $3 , name is $2
saltotal=saltotal+$3
}
END{
print "Number of Employees :", NR
print "Salary Total :", saltotal
print "Average Salary : ", saltotal/NR
}
Here's the output.
$awk -f avgsal.awk empf
Number of Employees : 6
Salary Total : 470000
Average Salary : 78333.3
Optionally, we can pass the file separator on the command line instead of having a BEGIN block in the script.

Note how awk automatically provides a float data type.

SERIES : awk#3 Match only a specific field

Here's the input file with employee data.

$cat empf
100,Smith James,80000,101 Ave N,San Francisco,CA
200,Lloyd Beth,80000,2010 University St,San Francisco,CA
300,Doe John,90000,1005 Royal Dr,Hartford,CT
400,Day Matt,85000,100 W 4th Ave,Seattle,WA
700,Bell Amy,60000,201 Winner CT,Atlanta,GA
800,Hawn Ruby,75000,120 Red Carpet Ln,Los Angeles,CA
Let's say we want only Connecticut CT employees. We could look for "CT" in the file.

$awk '/CT/ {print $0}' empf
300,Doe John,90000,1005 Royal Dr,Hartford,CT
700,Bell Amy,60000,201 Winner CT,Atlanta,GA
To check for a pattern strictly against a field, we can use tilde operator "~"

$awk -F, '$6 ~ /CT/ {print $0}' empf
300,Doe John,90000,1005 Royal Dr,Hartford,CT
On a similar note, to exclude all the CA employees.

awk -F, '$6 !~ /CA/ {print $0}' empf
300,Doe John,90000,1005 Royal Dr,Hartford,CT
400,Day Matt,85000,100 W 4th Ave,Seattle,WA
700,Bell Amy,60000,201 Winner CT,Atlanta,GA

Friday, August 27, 2010

Create Dictionary from a nested List

def crdic(a):
b={}
printOP=0
for i in a:
if len(i) != 2:
print 'ERROR - Inner list has more/less than two elements'
printOP=1
break
else:
b[i[0]]=i[1]
if printOP != 1:
print 'Got list : ', a
print 'Created dictionary : ', b

a=[['name','ruchi'],['nickname','r007']]
a1=[['name','ruchi','penguin'],['nickname','r007']]

crdic(a)
crdic(a1)



>>> 
Got list : [['name', 'ruchi'], ['nickname', 'r007']]
Created dictionary : {'nickname': 'r007', 'name': 'ruchi'}
ERROR - Inner list has more/less than two elements
>>>

Thursday, August 26, 2010

SERIES : awk#2 Simple Patterns

cat monkey
Monkey Mink
100 Tree Blvd
Banana County
Monkeyville, MY
Zip 12001
555133 Area 255

Get lines that have a digit anywhere.

awk '/[0-9]+/ {print "Has a digit. : ", $0}' monkey
Has a digit. : 100 Tree Blvd
Has a digit. : Zip 12001
Has a digit. : 555133 Area 255

Get lines that begin with a digit

awk '/^[0-9]+/ {print "Begins with digit. : ", $0}' monkey
Begins with digit. : 100 Tree Blvd
Begins with digit. : 555133 Area 255

Has characters somewhere

awk '/[aA-zZ]+/ {print "Has a word. : ", $0}' monkey
Has a word. : Monkey Mink
Has a word. : 100 Tree Blvd
Has a word. : Banana County
Has a word. : Monkeyville, MY
Has a word. : Zip 12001
Has a word. : 555133 Area 255

Has only letters

awk '/^[aA-zZ]+$/ {print "Has only letters. : ", $0}' monkey
<-- no output -->

Because space is not counted as letters.

Monday, August 23, 2010

SERIES : awk#1 Begin and End

Awk comes with inbuilt loop. It performs the given operations for each line in the input file provided they are not qualified by "BEGIN" or "END".
cat notxt
<-- empty file -->

cat sometxt
monkey goes to market

awk '{print "Hello World!"}' notxt
<--no ouput-->

awk 'BEGIN {print "Hello World!"} {print} END {print "Good bye!"}' notxt
Hello World!
Good bye!

awk 'BEGIN {print "Hello World!"} {print} END {print "Good bye!"}' sometxt
Hello World!
monkey goes to market
Good bye!

ex#1.
Since notxt is empty; awk doesn't iterate and no output is printed.

ex#2.
although notxt is empty; BEGIN and END statements are still executed and output produced for those commands.

Thursday, August 19, 2010

Macro to help remove duplicate rows in excel spreadsheet

Sub duplicate_flg()
' Use to remove duplicates
' Check for duplicates based on columns colx and coly. Customize below.
' flag them in column colflg
' Rowcounts based on column - colx
' ******************************************************
' NEEDs a sorted sheet and assumes a header
' Runs on the active sheet in the active workbook
' ******************************************************

colx = 1
coly = 2
colflg = 3
Cells(1, colflg) = "Is Duplicate?"

lastrowcnt = Cells(Cells.Rows.Count, colx).End(xlUp).Row
'lastrowcnt = 7
ActiveWorkbook.Activate
Set ws = ActiveWorkbook.ActiveSheet

' header assumed. starts from row 2
For i = 2 To lastrowcnt
If ws.Cells(i, colx) = ws.Cells(i + 1, colx) And _
ws.Cells(i, coly) = ws.Cells(i + 1, coly) Then
ws.Cells(i + 1, colflg) = "Y"
End If
Next i

End Sub

Thursday, August 12, 2010

Paste in vi without annoying auto indent

vi annoys the hell out of me with its auto-indenting during clipboard paste. So thanks to this post I've a solution now.

In command mode "set paste" and after pasting the text, turn it back off by "set nopaste".

what a breath of fresh air ;)