Skip to content

global lpeg

This type definition is based on the HTML documentation of the LPeg library. A different HTML documentation can be found at http://stevedonovan.github.io/lua-stdlibs/modules/lpeg.html.

LPeg is a new pattern-matching library for Lua, based on Parsing Expression Grammars (PEGs). This text is a reference manual for the library. For a more formal treatment of LPeg, as well as some discussion about its implementation, see A Text Pattern-Matching Tool based on Parsing Expression Grammars. (You may also be interested in my talk about LPeg given at the III Lua Workshop.)

Following the Snobol tradition, LPeg defines patterns as first-class objects. That is, patterns are regular Lua values (represented by userdata). The library offers several functions to create and compose patterns. With the use of metamethods, several of these functions are provided as infix or prefix operators. On the one hand, the result is usually much more verbose than the typical encoding of patterns using the so called regular expressions (which typically are not regular expressions in the formal sense). On the other hand, first-class patterns allow much better documentation (as it is easy to comment the code, to break complex definitions in smaller parts, etc.) and are extensible, as we can define new functions to create and compose patterns.

😱 Types incomplete or incorrect? 🙏 Please contribute!


methods


lpeg.match


function lpeg.match(
  pattern: (Pattern|string|integer|boolean|table|function),
  subject: string,
  init: integer?,
  ...: any
) ->  any ...

Match the given pattern against the subject string.

If the match succeeds, returns the index in the subject of the first character after the match, or the captured values (if the pattern captured any value).

An optional numeric argument init makes the match start at that position in the subject string. As usual in Lua libraries, a negative value counts from the end.

Unlike typical pattern-matching functions, match works only in anchored mode; that is, it tries to match the pattern with a prefix of the given subject string (at position init), not with an arbitrary substring of the subject. So, if we want to find a pattern anywhere in a string, we must either write a loop in Lua or write a pattern that matches anywhere. This second approach is easy and quite efficient;

Example:

local pattern = lpeg.R('az') ^ 1 * -1
assert(pattern:match('hello') == 6)
assert(lpeg.match(pattern, 'hello') == 6)
assert(pattern:match('1 hello') == nil)

lpeg.type


function lpeg.type(value) ->  "pattern"?

Return the string "pattern" if the given value is a pattern, otherwise nil.

lpeg.setmaxstack


function lpeg.setmaxstack(max: integer)

Set a limit for the size of the backtrack stack used by LPeg to track calls and choices.

The default limit is 400. Most well-written patterns need little backtrack levels and therefore you seldom need to change this limit; before changing it you should try to rewrite your pattern to avoid the need for extra space. Nevertheless, a few useful patterns may overflow. Also, with recursive grammars, subjects with deep recursion may also need larger limits.

lpeg.P


function lpeg.P(value: (Pattern|string|integer|boolean|table|function)) ->  Pattern {
    match = function,
}

Convert the given value into a proper pattern.

The following rules are applied:

  • If the argument is a pattern, it is returned unmodified.

  • If the argument is a string, it is translated to a pattern that matches the string literally.

  • If the argument is a non-negative number n, the result is a pattern that matches exactly n characters.

  • If the argument is a negative number -n, the result is a pattern that succeeds only if the input string has less than n characters left: lpeg.P(-n) is equivalent to -lpeg.P(n) (see the unary minus operation).

  • If the argument is a boolean, the result is a pattern that always succeeds or always fails (according to the boolean value), without consuming any input.

  • If the argument is a table, it is interpreted as a grammar (see Grammars).

  • If the argument is a function, returns a pattern equivalent to a match-time capture over the empty string.

lpeg.B


function lpeg.B(pattern: (Pattern|string|integer|boolean|table)) ->  Pattern {
    match = function,
}

Return a pattern that matches only if the input string at the current position is preceded by patt.

Pattern patt must match only strings with some fixed length, and it cannot contain captures.

Like the and predicate, this pattern never consumes any input, independently of success or failure.

lpeg.R


function lpeg.R(...: string) ->  Pattern {
    match = function,
}

Return a pattern that matches any single character belonging to one of the given ranges.

Each range is a string xy of length 2, representing all characters with code between the codes of x and y (both inclusive).

As an example, the pattern lpeg.R('09') matches any digit, and lpeg.R('az', 'AZ') matches any ASCII letter.

Example:

local pattern = lpeg.R('az') ^ 1 * -1
assert(pattern:match('hello') == 6)

lpeg.S


function lpeg.S(string: string) ->  Pattern {
    match = function,
}

Return a pattern that matches any single character that appears in the given string. (The S stands for Set.)

As an example, the pattern lpeg.S('+-*/') matches any arithmetic operator.

Note that, if s is a character (that is, a string of length 1), then lpeg.P(s) is equivalent to lpeg.S(s) which is equivalent to lpeg.R(s..s). Note also that both lpeg.S('') and lpeg.R() are patterns that always fail.

lpeg.V


function lpeg.V(v: (boolean|string|number|function|table|thread|userdata|lightuserdata)) ->  Pattern {
    match = function,
}

Create a non-terminal (a variable) for a grammar.

This operation creates a non-terminal (a variable) for a grammar. The created non-terminal refers to the rule indexed by v in the enclosing grammar.

Example:

local b = lpeg.P({'(' * ((1 - lpeg.S '()') + lpeg.V(1)) ^ 0 * ')'})
assert(b:match('((string))') == 11)
assert(b:match('(') == nil)

lpeg.locale


function lpeg.locale(tab: table?) ->  Locale {
    alnum = userdata,
    alpha = userdata,
    cntrl = userdata,
    digit = userdata,
    graph = userdata,
    lower = userdata,
    print = userdata,
    punct = userdata,
    space = userdata,
    upper = userdata,
    xdigit = userdata,
}

Return a table with patterns for matching some character classes according to the current locale.

The table has fields named alnum, alpha, cntrl, digit, graph, lower, print, punct, space, upper, and xdigit, each one containing a correspondent pattern. Each pattern matches any single character that belongs to its class.

If called with an argument table, then it creates those fields inside the given table and returns that table.

Example:

lpeg.locale(lpeg)
local space = lpeg.space ^ 0
local name = lpeg.C(lpeg.alpha ^ 1) * space
local sep = lpeg.S(',;') * space
local pair = lpeg.Cg(name * '=' * space * name) * sep ^ -1
local list = lpeg.Cf(lpeg.Ct('') * pair ^ 0, rawset)
local t = list:match('a=b, c = hi; next = pi')
assert(t.a == 'b')
assert(t.c == 'hi')
assert(t.next == 'pi')

local locale = lpeg.locale()
assert(type(locale.digit) == 'userdata')

lpeg.C


function lpeg.C(patt: (Pattern|string|integer|boolean|table|function)) ->  Capture

Create a simple capture.

Creates a simple capture, which captures the substring of the subject that matches patt. The captured value is a string. If patt has other captures, their values are returned after this one.

Example:

local function split (s, sep)
  sep = lpeg.P(sep)
  local elem = lpeg.C((1 - sep) ^ 0)
  local p = elem * (sep * elem) ^ 0
  return lpeg.match(p, s)
end

local a, b, c = split('a,b,c', ',')
assert(a == 'a')
assert(b == 'b')
assert(c == 'c')

lpeg.Carg


function lpeg.Carg(n: integer) ->  Capture

Create an argument capture.

This pattern matches the empty string and produces the value given as the nth extra argument given in the call to lpeg.match.

lpeg.Cb


function lpeg.Cb(name: any) ->  Capture

Create a back capture.

This pattern matches the empty string and produces the values produced by the most recent group capture named name (where name can be any Lua value).

Most recent means the last complete outermost group capture with the given name. A Complete capture means that the entire pattern corresponding to the capture has matched. An Outermost capture means that the capture is not inside another complete capture.

In the same way that LPeg does not specify when it evaluates captures, it does not specify whether it reuses values previously produced by the group or re-evaluates them.

lpeg.Cc


function lpeg.Cc(...: any) ->  Capture

Create a constant capture.

This pattern matches the empty string and produces all given values as its captured values.

lpeg.Cf


function lpeg.Cf(
  patt: (Pattern|string|integer|boolean|table|function),
  func: fun(acc, newvalue) -> any
) ->  Capture

Create a fold capture.

If patt produces a list of captures C1 C2 ... Cn, this capture will produce the value func(...func(func(C1, C2), C3)...,Cn), that is, it will fold (or accumulate, or reduce) the captures from patt using function func.

This capture assumes that patt should produce at least one capture with at least one value (of any type), which becomes the initial value of an accumulator. (If you need a specific initial value, you may prefix a constant capture to patt.) For each subsequent capture, LPeg calls func with this accumulator as the first argument and all values produced by the capture as extra arguments; the first result from this call becomes the new value for the accumulator. The final value of the accumulator becomes the captured value.

Example:

local number = lpeg.R('09') ^ 1 / tonumber
local list = number * (',' * number) ^ 0
local function add(acc, newvalue) return acc + newvalue end
local sum = lpeg.Cf(list, add)
assert(sum:match('10,30,43') == 83)

lpeg.Cg


function lpeg.Cg(
  patt: (Pattern|string|integer|boolean|table|function),
  name: string?
) ->  Capture

Create a group capture.

It groups all values returned by patt into a single capture. The group may be anonymous (if no name is given) or named with the given name (which can be any non-nil Lua value).

lpeg.Cp


function lpeg.Cp() ->  Capture

Create a position capture.

It matches the empty string and captures the position in the subject where the match occurs. The captured value is a number.

Example:

local I = lpeg.Cp()
local function anywhere(p) return lpeg.P({I * p * I + 1 * lpeg.V(1)}) end

local match_start, match_end = anywhere('world'):match('hello world!')
assert(match_start == 7)
assert(match_end == 12)

lpeg.Cs


function lpeg.Cs(patt: (Pattern|string|integer|boolean|table|function)) ->  Capture

Create a substitution capture.

This function creates a substitution capture, which captures the substring of the subject that matches patt, with substitutions. For any capture inside patt with a value, the substring that matched the capture is replaced by the capture value (which should be a string). The final captured value is the string resulting from all replacements.

Example:

local function gsub (s, patt, repl)
  patt = lpeg.P(patt)
  patt = lpeg.Cs((patt / repl + 1) ^ 0)
  return lpeg.match(patt, s)
end
assert(gsub('Hello, xxx!', 'xxx', 'World') == 'Hello, World!')

lpeg.Ct


function lpeg.Ct(patt: (Pattern|string|integer|boolean|table|function)) ->  Capture

Create a table capture.

This capture returns a table with all values from all anonymous captures made by patt inside this table in successive integer keys, starting at 1. Moreover, for each named capture group created by patt, the first value of the group is put into the table with the group name as its key. The captured value is only the table.

lpeg.Cmt


function lpeg.Cmt(
  patt: (Pattern|string|integer|boolean|table|function),
  fn: fun(s: string, i: integer, ...: any) -> ((boolean|integer),unknown)
) ->  Capture

Create a match-time capture.

Unlike all other captures, this one is evaluated immediately when a match occurs (even if it is part of a larger pattern that fails later). It forces the immediate evaluation of all its nested captures and then calls function.

The given function gets as arguments the entire subject, the current position (after the match of patt), plus any capture values produced by patt.

The first value returned by fn defines how the match happens. If the call returns a number, the match succeeds and the returned number becomes the new current position. (Assuming a subject and current position i, the returned number must be in the range [i, len(s) + 1].) If the call returns true, the match succeeds without consuming any input. (So, to return true is equivalent to return i.) If the call returns false, nil, or no value, the match fails.

Any extra values returned by the function become the values produced by the capture.

fields


lpeg.version


lpeg.version : string

A string (not a function) with the running version of LPeg.

Note: In earlier versions of LPeg this field was a function.