AWK Commnd in Linux
AWK Introduction:
AWK is a powerful text-processing tool and scripting language used in Unix and Linux systems. It’s named after its creators: Aho, Weinberger, and Kernighan.
AWK is particularly useful for:
- Processing text files
- Extracting and manipulating data
- Generating formatted reports
Basic AWK Structure:
An AWK command typically follows this structure:
awk 'pattern { action }' input_file
- pattern: Optional. A condition for executing the action.
- action: The operation to perform when the pattern matches.
- input_file: The file to process.
Key Concepts:
- Records and Fields:
- AWK processes input line by line, with each line called a “record”.
- Each record is automatically split into “fields” based on whitespace (by default).
- Fields are accessed using $1, $2, $3, etc. $0 represents the entire line.
2. Built-in Variables:
- NR: Current record number
- NF: Number of fields in the current record
- FS: Field separator (default is whitespace)
- OFS: Output field separator
3. Patterns:
- Can be regular expressions, comparisons, or special patterns like BEGIN and END.
4. Actions:
- Enclosed in curly braces {}
- Can include print statements, calculations, control structures, etc.
Example:
Let’s say we have a file named “employees.txt” with this content:
John Doe 50000
Jane Smith 55000
Bob Johnson 48000
Patterns in AWK:
- Regular Expressions:
awk '/John/' employees.txt # Prints lines containing "John"
2. Relational Expressions:
awk '$3 > 50000' employees.txt # Prints lines where the 3rd field is greater than 50000
3. Special Patterns:
- BEGIN: Executed before processing any input
- END: Executed after processing all input
awk 'BEGIN {print "Employee List:"} {print $0} END {print "End of list."}' employees.txt
Actions in AWK:
- Print Statement:
awk '{print $1, $3}' employees.txt # Prints first and third fields
2. Formatted Print:
awk '{printf "Name: %-15s Salary: $%d\\n", $1 " " $2, $3}' employees.txt
root@MMLITPAWANY /tmp# awk '{printf "Name: %-15s Salary: $%d\\n", $1 " " $2, $3}' emp.txt
Name: John Doe Salary: $50000
Name: Jane Smith Salary: $55000
Name: Bob Johnson Salary: $48000
root@MMLITPAWANY /tmp# awk '{printf "Name: %-20s Salary: $%d\\n", $1 " " $2, $3}' emp.txt
Name: John Doe Salary: $50000
Name: Jane Smith Salary: $55000
Name: Bob Johnson Salary: $48000
root@MMLITPAWANY /tmp# awk '{printf "Name: %20s Salary: $%d\\n", $1 " " $2, $3}' emp.txt
Name: John Doe Salary: $50000
Name: Jane Smith Salary: $55000
Name: Bob Johnson Salary: $48000
3. Conditional Statements:
awk '{if ($3 > 50000) print $1 " " $2 " is highly paid"}' employees.txt
root@MMLITPAWANY /tmp# awk '{if ($3 > 50000) print $1 " " $2 " is highly paid"}' emp.txt
Jane Smith is highly paid
Advanced AWK Concepts:
- Custom Field Separator:
awk -F: '{print $1}' /etc/passwd # Uses colon as field separator
root@MMLITPAWANY /tmp# awk -F: '{print $1}' /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
2. Array Usage:
awk '{count[$1]++} END {for (name in count) print name, count[name]}' employees.txt
root@MMLITPAWANY /tmp# awk '{count[$1]++} END {for (name in count) print name, count[name]}' emp.txt
Bob 1
John 1
Jane 1
root@MMLITPAWANY /tmp# awk '{count[$2]++} END {for (name in count) print name, count[name]}' emp.txt
Johnson 1
Smith 1
Doe 1
root@MMLITPAWANY /tmp# awk '{count[$3]++} END {for (name in count) print name, count[name]}' emp.txt
48000 1
50000 1
55000 1
3. Built-in Functions:
- length(), substr(), tolower(), toupper(), etc.
awk '{print tolower($1), length($0)}' employees.txt
root@MMLITPAWANY /tmp# awk '{print tolower($1), length($0)}' emp.txt
john 14
jane 16
bob 17
root@MMLITPAWANY /tmp# awk '{print toupper($0), length($0)}' emp.txt
JOHN DOE 50000 14
JANE SMITH 55000 16
BOB JOHNSON 48000 17
4. User-Defined Functions:
awk '
function capitalize(string) {
return toupper(substr(string, 1, 1)) substr(string, 2)
}
{print capitalize($1), capitalize($2)}
' employees.txt
root@MMLITPAWANY /tmp# awk '
function capitalize(string) {
return toupper(substr(string, 1, 1)) substr(string, 2)
}
{print capitalize($1), capitalize($2)}
' emp.txt
John Doe
Jane Smith
Bob Johnsonbas
5. Multi-line AWK Scripts:
You can write more complex AWK scripts in a separate file and execute them:
Save this as process_employees.awk and run:
#!/usr/bin/awk -f
BEGIN { print "Processing file:" }
{
print "Line " NR ": " $0
total += $3
}
END {
print "Total salary: " total
print "Average salary: " total/NR
}
awk -f process_employees.awk employees.txt