regular expressions
Mo Info : https://www.regular-expressions.info
There are generally two major formats : POSIX and Perl. The most used seems perl, sometimes called PCRE
PCRE Means Perl Compatible Regular Expressions
PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and than that of many other regular-expression libraries.
Vim is almost PCRE, the differences are mostly in obscure features, except:
Vim and Perl handle newline characters inside a string a bit differently:
In Perl, ^ and $ only match at the very beginning and end of the text, by default, but you can set the 'm' flag, which lets them match at embedded newlines as well.
POSIX
BRE
meta-characters : { } ^ . $ *
There is a BRE (Basic Regular Expression) syntax and an ERE(Extended) flavour. BRE is the oldest you will encounter, and it requires you to put backslashes before most metacharacters : { } ( ). This flavour was used by grep but if you 'man grep' you will directly see an -perl-regexp option presented.
ERE
meta-characters : { } ^ . $ * ? + {n} {n,m} {n,}
ERE is an extension that let's you drop the backslashes, they are actually used to escape that character !
Extended" is relative to the original UNIX grep, which only had bracket expressions, dot, caret, dollar and star. {}.^$*
Most modern regexp versions are extensions of the ERE Flavour. And the most used extension is PCRE.
wildcards in bash
Note that bash wildcards are a little different from regular expressions. For instance the dot (.) is a much used character in filenames and extensions and directories. So the * would be .* in regexp. Also the [^] pattern is written as [!] in bash.
These patterns are also called glob patterns which stands for global.
| wildcards | |
|---|---|
PCRE or perl
If you look at https://regex101.com/ you will see PCRE is the default flavor and also that the other ones are the same except some enclosing characters.
| char | meaning |
|---|---|
| a | literal character to match |
| . | any character |
| * 0 | or more repetitions of previous text |
| + 1 | or more repetitions of previous text |
| ? | 0 or 1 repetitions of previous text |
| [] se | t of matching chars [a-z] [abc] |
| [^] set | of non-matching chars [^a] |
| $ s | tart of line anchor |
| ^ e | nd of line anchor |
| () | capture group : <([^>]+)>.*1> |
| n | back reference capture group n |
| ?= | lookaround |
The capture group one would match tags from opening to closing tags in html. The name of the tag is captured in ([^>]+) and reused with 1 later in the regexp. Then you can finish it off with the > at the end.
The lookaround is not available in vim, but it matches something but not return it in the result. So you can find something by context, for instance /b(?=ody) would return all single letters b that are in the word body.
These are also great in search+replace actions (like in vim). Note also that in regexp you can name capture groups, but since it does not work in vim i just won't use that.
(?
Does not work in vim, and vim limits capture groups to 10:0 to 9