Parent

Dhaka::LexerSpecification

Abstract base class for lexer specifications.

Use this to specify the transformations that will be performed when the lexer recognizes a given pattern. Actions are listed in descending order of priority. For example in the following lexer specification:

class LexerSpec < Dhaka::LexerSpecification
  for_pattern 'zz' do
    "recognized two zs"
  end

  for_pattern '\w(\w|\d)*' do
    "recognized word token #{current_lexeme.value}"
  end

  for_pattern '(\d)+(\.\d+)?' do
    "recognized number #{current_lexeme.value}"
  end

  for_pattern ' +' do
    #ignores whitespace
  end

  for_pattern "\n+" do
    "recognized newline"
  end
end

the pattern 'zz' takes precedence over the pattern immediately below it, so the lexer will announce that it has recognized two 'z's instead of a word token.

The patterns are not Ruby regular expressions - a lot of operators featured in Ruby's regular expression engine are not yet supported. See dhaka.rubyforge.org/regex_grammar.html for the current syntax. Patterns may be specified using Ruby regular expression literals as well as string literals.

There are a few things to keep in mind with regard to the regular expression implementation:

Public Class Methods

for_pattern(pattern, &blk) click to toggle source

Associates blk as the action to be performed when a lexer recognizes pattern. When Lexer#lex is invoked, it creates a LexerRun object that provides the context for blk to be evaluated in. Methods available in this block are LexerRun#current_lexeme and LexerRun#create_token.

# File lib/dhaka/lexer/specification.rb, line 51
def for_pattern(pattern, &blk)
  source = case pattern
             when String : pattern
             when Regexp : pattern.source
           end
  items[source] = LexerSpecificationItem.new(source, priority, blk)
  self.priority += 1
end
for_symbol(symbol, &blk) click to toggle source

Use this to automatically handle escaping for regular expression metacharacters. For example,

for_symbol('+') { ... }

translates to:

for_pattern('\+') { ... }
# File lib/dhaka/lexer/specification.rb, line 64
def for_symbol(symbol, &blk)
  if LexerSupport::OPERATOR_CHARACTERS.include?(symbol)
    for_pattern("\\#{symbol}", &blk)
  else
    for_pattern(symbol, &blk)
  end
end

[Validate]

Generated with the Darkfish Rdoc Generator 2.