Sophie

Sophie

distrib > Mageia > 1 > i586 > by-pkgid > a46a385b8ffc2eea333e6d5af2f8e004 > files > 36

deadwood-3.0.02-1.mga1.i586.rpm

This is a description of the mararc parser that the Deadwood project 
uses.  This is a rewrite of ParseMaraRc.c.

We use a finite state machine to parse a mararc file.  Here are the 
various classes of characters that we need to test for:

A alpha: The letters A through Z, a through z, and the _ character 

B alphanum: The letters A-Z, a-z, the _ character, and the numbers 0-9

Y alphastart: The lettera A through Z, and a through z

[ leftbrace: The [ character

Q quote: The " character

] rightbrace: The ] character

D dname: The letters A-Z, a-z, the - character, the numbers 0-9, the 
       '.' character, and the '_' character.

S dname_start: The letters A-Z, a-z, and the numbers 0-9

. dot: The '.' character

+ plus: The + symbol

= equals: The = symbol

N number: The numbers 0-9

H hash: The # Symbol

I in_string: Any printable ASCII character except for the #, ", and 
           newline

R carriage return: The \r non-printable ASCII character

T newline: The \n non-printable ASCII characters

W whitespace: The space (' ') or tab character

X any: Any printable ASCII character and the tab character

{ left curly brace: The { character

} right curly brace: The } character

So, we have 13 character classes with letter shortcuts.  We have
ten multi-character classes, and the " and # characters get letter
shortcuts (": Because otherwise we need to have ugly \" sequences 
in the quoted state machine definition; #: So we can potentially
add comments to state machine definitions)

We also have seven actions:

1: Add the character we are looking at to variable 1
2: Add the character we are looking at to variable 2
3: Add the character we are looking at to variable 3
4: Add the character we are looking at to variable 4
5: Add the character we are looking at to variable 5
6: Add the character we are looking at to variable 6
;: End the processing of the current line successfully

The variables are:

1: The mararc parameter
2: The dictionary index
3: The mararc string value
4: The mararc numeric value
5: If this is set, then we append instead of assigning
6: If this is set, then we initialize the specified dictionary variable
7: The filename to read and parse as a dwood2rc file

Should there not be a new state specified for a given character class
in a given state, we halt processing with an error

We also have 51 states, represented by lower case letters.  The initial 
state is state 'a'.  If the first letter of the state is 'x', the state
representation uses two lower-case letters (such as 'xa' or 'xp'). 

Instructions for the state machine are as follows:

<state name>: <character class><action (optional)><new state>

Again, here are the character classes using letters:

A: A-Za-z_ 		B: A-Za-z_0-9		D: -A-Za-z0-9._
H: #			I: pASCII except # and "
N: 0-9			Q: "			S: A-Za-z0-9
T: \n			W: [ \t]		X: pASCII, hi-bit, and \t
Y: A-Za-z		R: \r

And here is the specified state machine for mararc processing.  This
state machine is run for each line in the mararc file

Start of line:                       a: Hb Y1c Wa Rxp T;
In comment:                          b: Xb Rxp T;
Reading mararc parameter:            c: B1c Wd =e [f +g (y
Whitespace after mararc parameter:   d: Wd =e [f +g
Equal sign:                          e: We N4h Qi {6w
Leftbrace:                           f: Wf Qn
Plus sign:                           g: =5e
Numeric mararc parameter:            h: N4h Wk Hb Rxp T;
Quote beginning mararc parameter:    i: I3m
End of line:                         k: Wk Hb Rxp T;
In mararc parameter:                 m: I3m Qk
Quote beginning dictionary index:    n: .2o S2p
Dot as dictionary index:             o: Qq
Dictionary index:                    p: D2p Qq 
Quote at end of dictionary index:    q: Wq ]r
Right brace ending dictionary index: r: Wr =s +t
Equal sign before dictionary value:  s: Ws Qu 
Plus sign before dictionary value:   t: =5s
Quote beginning dictionary value:    u: I3v
In dictionary value:                 v: I3v Qk
At left curly brace:                 w: }k
At carriage return:                  xp: T;
At left paren:			     y: Qz
In filename for execfile:	     z: I7z Qxa
Quote after execfile filename:	     xa: )k

Once a line is processed, we then look at the value of variable 1 (the 
mararc parameter):

Should 1 be a known normal mararc parameter we support, store the value
of variable 3 in the parameter indexed by variable 1

Note that, to make life easier for the initial version of Deadwood, we 
will only support a dictionary index of ".".  This is temporary, so we 
can more quickly get a very basic forwarding DNS server written.

---

When tokenizing the state machine, the state is converted from a lower
case letter to a number between 0 (for 'a') to 52 (for 'xz').  Each
<class><action><new state> is stored as three 8-bit numbers:

* The pattern, which is the literal character.  E.G. Pattern 'A' is
  tokenized as the number 65 ('A' in ASCII)

* Action, which is a number from 0 to 10.  0 indicates "no action"; 1-9
  indicate actions #1-9.  Action #10 means "terminate reading line with 
  success".

* New state: This is the converted state number ('a' becomes 0; 'z' becomes
  25; 'xz' becomes 51; note that 'x' [23] isn't used)

A given state can only have seven different patterns (this can be expanded
by changing DWM_MAX_PATTERNS in DwMararc.h)