...
- <pattern> (a regular expression, or regex) can contain special match metacharacters and modifiers. The ones below are Perl metacharacters, which are supported by most languages (e.g. grep -P)
- ^ – matches beginning of line
- $ – matches end of line
- . – (period) matches any single character
- * – modifier; place after an expression to match 0 or more occurrences
- + – modifier, place after an expression to match 1 or more occurrences
- ? – modifier, place after an expression to match 0 or 1 occurrences
- \s – matches any whitespace character (\S any non-whitespace)
- \d – matches digits 0-9
- \w – matches any word character: A-Z, a-z, 0-9 and _ (underscore)
- \t matches Tab;
- \n matches Linefeed; \r matches Carriage return
- [xyz123] – matches any single character (including special characters) among those listed between the brackets [ ]
- this is called a character class.
- use [^xyz123] to match any single character not listed in the class
- (Xyz|Abc) – matches either Xyz or Abc or any text or expressions inside parentheses separated by | characters
- note that parentheses ( ) may also be used to capture matched sub-expressions for later use
- in Perl, where a pattern is delimited by //, modifiers appear after the pattern:
- i - perform case-insensitive text matching
- g - perform the specified substitution globally, not just on the 1st match
Regular expression modules are available in nearly every programming language (Perl, Python, Java, PHP, awk, even R)
...
- here are some good ones:
- a good general one: https://www.regular-expressions.info/
- Ryan's tutorials on Regular Expressions: http://ryanstutorials.net/regular-expressions-tutorial/
- RegexOne: http://regexone.com
- and a perl Perl regex tutorial: http://perldoc.perl.org/perlretut.html
- perl Perlregular expressions are the "gold standard" used in most other languages
...