Thursday, 27 May 2021

AWK Commands

 

Awk is abbreviated from the names of the developers – Aho, Weinberger, and Kernighan. 

WHAT CAN WE DO WITH AWK ? 

1. AWK Operations: 
(a) Scans a file line by line 
(b) Splits each input line into fields 
(c) Compares input line/fields to pattern 
(d) Performs action(s) on matched lines 

2. Useful For: 
(a) Transform data files 
(b) Produce formatted reports 

3. Programming Constructs: 
(a) Format output lines 
(b) Arithmetic and string operations 
(c) Conditionals and loops 

Syntax: 

The general syntax of awk is:

# awk 'script' filename

 

awk options 'selection _criteria {action }' input-file > output-file

 

$cat > employee.txt 
 
ajay manager account 45000
sunil clerk account 25000
varun manager sales 50000
amit manager account 47000
tarun peon sales 15000
deepak clerk sales 23000
sunil peon sales 13000
satvik director purchase 80000 

 

$ awk '{print}' employee.txt

Output: 
 

ajay manager account 45000
sunil clerk account 25000
varun manager sales 50000
amit manager account 47000
tarun peon sales 15000
deepak clerk sales 23000
sunil peon sales 13000
satvik director purchase 80000 

 

2. Print the lines which matches with the given pattern. 

 

$ awk '/manager/ {print}' employee.txt 

Output: 
 

ajay manager account 45000
varun manager sales 50000
amit manager account 47000 

3. Splitting a Line Into Fields : For each record i.e line, the awk command splits the record delimited by whitespace character by default and stores it in the $n variables. If the line has 4 words, it will be stored in $1, $2, $3 and $4 respectively. Also, $0 represents the whole line. 

 

$ awk '{print $1,$4}' employee.txt 

Output: 
 

ajay 45000
sunil 25000
varun 50000
amit 47000
tarun 15000
deepak 23000
sunil 13000
satvik 80000 

Built In Variables In Awk

Awk’s built-in variables include the field variables—$1, $2, $3, and so on ($0 is the entire line) — that break a line of text into individual words or pieces called fields. 

NR: NR command keeps a current count of the number of input records. Remember that records are usually lines. Awk command performs the pattern/action statements once for each record in a file. 

NF: NF command keeps a count of the number of fields within the current input record. 

FS: FS command contains the field separator character which is used to divide fields on the input line. The default is “white space”, meaning space and tab characters. FS can be reassigned to another character (typically in BEGIN) to change the field separator. 

RS: RS command stores the current record separator character. Since, by default, an input line is the input record, the default record separator character is a newline. 

OFS: OFS command stores the output field separator, which separates the fields when Awk prints them. The default is a blank space. Whenever print has several parameters separated with commas, it will print the value of OFS in between each parameter. 

ORS: ORS command stores the output record separator, which separates the output lines when Awk prints them. The default is a newline character. print automatically outputs the contents of ORS at the end of whatever it is given to print. 

Examples: 

Use of NR built-in variables (Display Line Number) 

 

$ awk '{print NR,$0}' employee.txt 

Output: 
 

1 ajay manager account 45000
2 sunil clerk account 25000
3 varun manager sales 50000
4 amit manager account 47000
5 tarun peon sales 15000
6 deepak clerk sales 23000
7 sunil peon sales 13000
8 satvik director purchase 80000 

In the above example, the awk command with NR prints all the lines along with the line number. 

Use of NF built-in variables (Display Last Field) 

 

$ awk '{print $1,$NF}' employee.txt 

Output: 
 

ajay 45000
sunil 25000
varun 50000
amit 47000
tarun 15000
deepak 23000
sunil 13000
satvik 80000 

Another use of NR built-in variables (Display Line From 3 to 6) 

 

$ awk 'NR==3, NR==6 {print NR,$0}' employee.txt 

Output: 
 

3 varun manager sales 50000
4 amit manager account 47000
5 tarun peon sales 15000
6 deepak clerk sales 23000 

Printing lines with more than 10 characters: 
 

$ awk 'length($0) > 10' geeksforgeeks.txt
 

7) To find/check for any string in any specific column: 
 

$ awk '{ if($3 == "B6") print $0;}' geeksforgeeks.txt
 

8) To print the squares of first numbers from 1 to n say 6: 
 

$ awk 'BEGIN { for(i=1;i<=6;i++) print "square of", i, "is",i*i; }'

No comments: