Lightgrep Cheat Sheet
c the character c
⇤
\a U+0007 (BEL) bell
\e U+001B (ESC) escape
\f U+000C (FF) form feed
\n U+000A (NL) newline
\r U+000D (CR) carriage return
\t U+0009 (TAB) horizontal tab
\ooo U+ooo,1–3octaldigitso, 0377
\xhh U+00hh, 2 hexadecimal digits h
\x{hhhhhh} U+hhhhhh,1–6hexdigitsh
\zhh the byte 0xhh (not the character!)
†
\N{name} the character called name
\N{U+hhhhhh} same as \x{hhhhhh}
\c the c harac ter c
‡
⇤
except U+ 000 0 (NUL) and metacharacters
†
Lightgrep extension; not part of PCRE.
‡
except any of: adefnprstwDPSW1234567890
1 Single Characters
. any character
\d [0-9] (= ASCII digits)
\D [^0-9]
\s [\t\n\f\r ] (= ASCII whitespace )
\S [^\t\n\f\r ]
\w [0-9A-Za-z_] (= ASCII words)
\W [^0-9A-Za-z_]
\p{property} any character having property
\P{property} any character lacking property
2 Named Character Classes
[stu ] any character in stu
[^stu ] any character not in stu
where stu is. . .
c acharacter
a-b acharacterrange,inclusive
\zhh abyte
\zhh-\zhh abyterange,inclusive
[S] acharacterclass
ST S [ T (union)
S&&TS\ T (intersection)
S--TS T (dierence)
S~~TS4 T (symmetric dierence, XOR)
3 Character Classes
(S) makes any pattern S atomic
4 Grouping
ST matches S, then matches T
S|T matches S or T , preferring S
5 Concatenation & Alte rnat ion
Repeats S...
Greedy
S* 0 or more times (= S{0,})
S+ 1 or more times (= S{1,})
S? 0 or 1 time (= S{0,1})
S{n,} n or more times
S{n,m} n–m times, inclusive
Reluctant
S*? 0 or more times (= S{0,})
S+? 1 or more times (= S{1,})
S?? 0 or 1 time (= S{0,1})
S{n,}? n or mo re times
S{n,m}? n–m times, inclusive
6 Repetition
Any Assigned
Alphabetic White_Space
Uppercase Lowercase
ASCII Noncharacter_Code_Point
Name=name Default_Ignorable_Code_Point
General_Category=category
L, Letter P, Punctuation
Lu, Uppercase Letter Pc, Connector Punctuation
Ll, Lowercase Letter Pd, Dash Punctuation
Lt, Titlecase Letter Ps, Open Punctuation
Lm, Modifier Letter Pe, Close Punctuation
Lo, Other Letter Pi, Initial Punctuation
M, Mark Pf, Final Punctuation
Mn, Non-Spacing Mark Po, Other Punctuation
Me, Enclosing Mark Z, Separator
N, Number Zs, Space Separator
Nd, Decimal Digit Number Zl, Line Separator
Nl, Letter Number Zp, Paragraph Separator
No, Other Number C, Other
S, Symbol Cc, Control
Sm, Math Symbol Cf, Format
Sc, Currency Symbol Cs, Surrogate
Sk, Modifier Symbol Co, Private Use
So, Other Symbol Cn, Not Assigned
Script=script
Common Latin Greek Cyrillic Armenian Hebrew Ara-
bic Syraic Thaana Devanagari Bengali Gurmukhi Gu-
jarati Oriya Tamil Telugu Kannada Malayalam Sin-
hala Thai Lao Tibetan Myanmar Georgian Hangul
Ethiopic Cherokee Ogham Runic Khmer Mongolian
Hiragana Katakana Bopomofo Han Yi Old_Italic
Gothic Inherited Tagalog Hanunoo Buhid Tagbanwa
Limbu Tai_Le Linear_B Ugaritic Shavian Osmanya
Cypriot Buginese Coptic New_Tai_Lue Glagolitic
Tifinagh Syloti_Nagri Old_Persian Kharoshthi Ba-
linese Cuneiform Phoenician Phags_Pa Nko Sudanese
Lepcha ... See Unicode Standard for more.
7 Selected Unicode Properties
c the character c (except metacharacters)
\xhh U+00hh, 2 hexadecimal digits h
\whhhh U+hhhh, 4 hexadecimal digits h
\c the character c
. any character
#[0-9](= ASCII digits)
[a-b] any charac ter in the range a–b
[S] any character in S
[^S] any charac ter not in S
(S) grouping
S* repeat S 0 or more times (max 255)
S+ repeat S 1 or more times (max 255)
S? repeat S 0 or 1 or time
S{n,m} repeat Sn–m times (max 255)
ST matches S, then matches T
S|T matches S or T
8 EnCase GREP Synt ax
\whhhh ! \xhhhh
# ! \d
S* ! S{0,255}
S+ ! S{1,255}
S*
and
S+
are limited to
255 repetitions by EnCase;
Lightgrep preserves this in
imported p att ern s.
\w is limited to BMP characters ( U+10000) only.
9 Importing from EnC ase into Li g ht g rep
Some people, when confronted with a problem, think “I know,
I’ll use regular expressions.” Now they have two problems.
—JWZ in alt.religion.emacs, 12 August 1997
Lightgrep Search
for EnCase
R
Fast Search for
Forensics
www.lightgrep.com
Notes & Examples
Characters:
.*?\x00 (= null- te rmin ate d string)
\z50\z4B\z03\z04 (= ZIP signature)
\N{EURO SIGN}, \N{NO-BREAK SPACE}
\x{042F} (= CYRILLIC CAPITAL LETTER YA)
\+12\.5% (= escaping metacharacters)
Grouping: Operators bind tightly. Use
(aa)+
,
not aa+, to match pairs of a’s.
Ordered alternation:
a|ab
matches
a
twice in
aab. Left alternatives preferred to right.
Repetition: Greedy operators match as much
as possible. Reluctant operators match as little
as p ossi b l e.
a+a
matches all of
aaaa
;
a+?a
matches the first aa,thenthesecondaa.
.+
will (uselessly) match the
entire
input.
Prefer reluc tan t operators when possible.
Character classes:
[abc] = a, b,orc
[^a] =anythingbuta
[A-Z] = A to Z
[A\-Z]
= A, Z,orhyphen(!)
[A-Zaeiou] =capitals
or lowercas e vowels
[.+*?\]]
= ., +, *, ?,or]
[Q\z00-\z7F]
= Q or 7-bit bytes
[[abcd][bce]]
= a, b, c, d,ore
[[abcd]&&[bce]]
= b or c
[[abcd]--[bce]]
= a or d
[[abcd]~~[bce]]
= a, d,ore
[\p{Greek}\d]
=Greekordigits
[^\p{Greek}7]
=neitherGreeknor7
[\ p { G r e e k } & & \ p { L l }]
= lowercas e Greek
Operators need not be
escaped inside char-
acter classes.
Email addresses: [a-z\d!#$%&’*+/=?^_‘{|}~-][a-z\d!#$%&’*+/=?^_‘{|}~.-]{0,63}
@[a-z\d.-]{1,253}\.[a-z\d-]{2,22}
Hostnames: ([a-z\d]([a-z\d_-]{0,61}[a-z\d])?\.){2,5}[a-z\d][a-z\d-]{1,22}
N. American phone numbers: \(?\d{3}[ ).-]{0,2}\d{3}[ .-]?\d{4}\D
Visa, MasterCard: \d{4}([ -]?\d{4}){3}
American Express: 3[47]\d{2}[ -]?\d{6}[ -]?\d{5}
Diners Club: 3[08]\d{2}[ -]?\d{6}[ -]?\d{4}
EMF header: \z01\z00\z00\z00.{36}\z20EMF
JPEG: \zFF\zD8\zFF[\zC4\zDB\zE0-\zEF\zFE] Footer: \zFF\zD9
GIF: GIF8[79] Footer: \z00\z3B BMP: BM.{4}\z00\z00\z00\z00.{4}\z28
PNG: \z89\z50\z4E\z47 Footer: \z49\z45\z4E\z44\zAE\z42\z60\z82
ZIP: \z50\z4B\z03\z04 Footer: \z50\z4B\z05\z06
RAR: \z52\z61\z72\z21\z1a\z07\z00...[\z00-\z7F]
Footer: \z88\zC4\z3D\z7B\z00\z40\z07\z00
GZIP: \z1F\z8B\z08 MS Oce 97–03: \zD0\zCF\z11\zE0\zA1\zB1\z1A\zE1
LNK: \z4c\z00\z00\z00\z01\z14\z02\z00
PDF: \z25\z50\z44\z46\z2D\z31 Footer: \z25\z45\z4F\z46