عجفت الغور

awk

Tags: computers

Structure

  • Everything in awk is a pattern -> action

    Pattern1 { ACTIONS; }
    
    Pattern2 { ACTIONS; }
    
  • Each line will go through each of the patterns, one at a time

Data Types

  • awk only has strings and numbers
  • strings can be cast into numbers
  • both can be assigned to variables in ACTIONS with the = operator
  • variables can be declared anywhere, uninitialized variables have "" empty string value
  • awk has unidimensional associative arrays
    • var[key] = value
  • can simulate multidimensional arrays, but not very good

Patterns

Regular and Boolean Expressions

  • supports non-pcre regex (gawk might)
  • patterns can’t capture specific groups, only to match
  • boolean expressions have && || and !
  • note that == does fuzzy matching (like js)
  • booleans can be used alongside regex
/admin/ || debug == true { ACTION }
  • specific string matching against regex can use ~ or !~
  • can also just use {ACTIONS} to have it run against every line

Special Patterns

  • BEGIN matches only before any line has been input to the file, can initiate variables and other state
  • END lets you do some final cleanup
  • Fields see below

Fields

# According to the following line
#
# $1         $2    $3
# 00:34:23   GET   /foo/bar.html
# \_____________  _____________/
#               $0

# Hack attempt?
/admin.html$/ && $2 == "DELETE" {
  print "Hacker Alert!";
}
  • Fields are default separated by whitespace
  • You can modify the line by assigning to the field

Actions

Many possible actions, most useful ones:

{ print $0; }  # prints $0. In this case, equivalent to 'print' alone
{ exit; }      # ends the program
{ next; }      # skips to the next line of input
{ a=$1; b=$0 } # variable assignment
{ c[$1] = $2 } # variable assignment (array)

{ if (BOOLEAN) { ACTION }
  else if (BOOLEAN) { ACTION }
  else { ACTION }
}
{ for (i=1; i<x; i++) { ACTION } }
{ for (item in c) { ACTION } }

ALL VARIABLES ARE GLOBAL

Functions

Functions can be called with typical syntax { somecall($2) }

Built-in functions: https://www.gnu.org/software/gawk/manual/html_node/Built_002din.html#Built_002din

User defined functions:

# function arguments are call-by-value
function name(parameter-list) {
  ACTIONS; # same actions as usual
}

# return is a valid keyword
function add1(val) {
  return val+1;
}

Special Variables

BEGIN { # Can be modified by the user
  FS = ",";   # Field Separator
  RS = "\n";  # Record Separator (lines)
  OFS = " ";  # Output Field Separator
  ORS = "\n"; # Output Record Separator (lines)
}
{ # Can't be modified by the user
  NF          # Number of Fields in the current Record (line)
  NR          # Number of Records seen so far
  ARGV / ARGC # Script Arguments
}