qt5/gnuwin32/man/cat1p/yacc.1p.txt

YACC(1P)           POSIX Programmer's Manual           YACC(1P)


PROLOG
       This  manual page is part of the POSIX Programmer's Man-
       ual.  The Linux implementation  of  this  interface  may
       differ  (consult the corresponding Linux manual page for
       details of Linux behavior), or the interface may not  be
       implemented on Linux.

NAME
       yacc - yet another compiler compiler (DEVELOPMENT)

SYNOPSIS
       yacc [-dltv][-b file_prefix][-p sym_prefix] grammar

DESCRIPTION
       The  yacc utility shall read a description of a context-
       free grammar in grammar and write C  source  code,  con-
       forming  to  the  ISO C  standard,  to  a code file, and
       optionally header information into a header file, in the
       current  directory.  The  C code shall define a function
       and related routines and macros for  an  automaton  that
       executes a parsing algorithm meeting the requirements in
       Algorithms .

       The form and meaning of the grammar are described in the
       EXTENDED DESCRIPTION section.

       The C source code and header file shall be produced in a
       form suitable as input for the C compiler (see c99 ).

OPTIONS
       The yacc utility shall conform to the  Base  Definitions
       volume  of  IEEE Std 1003.1-2001,  Section 12.2, Utility
       Syntax Guidelines.

       The following options shall be supported:

       -b  file_prefix
              Use file_prefix instead of y as  the  prefix  for
              all  output filenames. The code file y.tab.c, the
              header file y.tab.h (created when  -d  is  speci-
              fied), and the description file y.output (created
              when  -v  is  specified),  shall  be  changed  to
              file_prefix   .tab.c,   file_prefix  .tab.h,  and
              file_prefix .output, respectively.

       -d     Write the header file; by default only  the  code
              file is written. The #define statements associate
              the token codes assigned by yacc with  the  user-
              declared  token  names.  This allows source files
              other than y.tab.c to access the token codes.

       -l     Produce a code file that  does  not  contain  any
              #line constructs.  If this option is not present,
              it is unspecified whether the code file or header
              file  contains #line directives. This should only
              be used after  the  grammar  and  the  associated
              actions are fully debugged.

       -p  sym_prefix

              Use  sym_prefix  instead  of yy as the prefix for
              all external names produced by  yacc.  The  names
              affected  shall  include the functions yyparse(),
              yylex(), and yyerror(), and the variables yylval,
              yychar,  and  yydebug.  (In the remainder of this
              section, the six  symbols  cited  are  referenced
              using  their  default  names only as a notational
              convenience.) Local names may also be affected by
              the  -p  option; however, the -p option shall not
              affect #define symbols generated by yacc.

       -t     Modify conditional compilation directives to per-
              mit  compilation  of  debugging  code in the code
              file. Runtime debugging statements  shall  always
              be  contained  in  the  code file, but by default
              conditional compilation directives prevent  their
              compilation.

       -v     Write  a  file  containing  a  description of the
              parser and a report  of  conflicts  generated  by
              ambiguities in the grammar.


OPERANDS
       The following operand is required:

       grammar
              A  pathname  of  a  file containing instructions,
              hereafter called grammar, for which a  parser  is
              to  be  created.  The  format  for the grammar is
              described in the EXTENDED DESCRIPTION section.


STDIN
       Not used.

INPUT FILES
       The file grammar shall be a text file formatted as spec-
       ified in the EXTENDED DESCRIPTION section.

ENVIRONMENT VARIABLES
       The  following  environment  variables  shall affect the
       execution of yacc:

       LANG   Provide a default value for the internationaliza-
              tion  variables  that are unset or null. (See the
              Base Definitions volume of  IEEE Std 1003.1-2001,
              Section  8.2,  Internationalization Variables for
              the precedence of internationalization  variables
              used  to  determine  the  values  of locale cate-
              gories.)

       LC_ALL If set to a non-empty string value, override  the
              values  of  all  the  other  internationalization
              variables.

       LC_CTYPE
              Determine the locale for  the  interpretation  of
              sequences  of  bytes  of  text data as characters
              (for example, single-byte as  opposed  to  multi-
              byte characters in arguments and input files).

       LC_MESSAGES
              Determine  the  locale  that  should  be  used to
              affect the format and contents of diagnostic mes-
              sages written to standard error.

       NLSPATH
              Determine  the  location  of message catalogs for
              the processing of LC_MESSAGES .


       The LANG and LC_* variables affect the execution of  the
       yacc  utility  as stated. The main() function defined in
       Yacc Library shall call:


              setlocale(LC_ALL, "")

       and thus the program generated by  yacc  shall  also  be
       affected  by the contents of these variables at runtime.

ASYNCHRONOUS EVENTS
       Default.

STDOUT
       Not used.

STDERR
       If shift/reduce or reduce/reduce conflicts are  detected
       in grammar, yacc shall write a report of those conflicts
       to the standard error in an unspecified format.

       Standard error shall also be used  for  diagnostic  mes-
       sages.

OUTPUT FILES
       The code file, the header file, and the description file
       shall be text files. All are described in the  following
       sections.

   Code File
       This  file  shall  contain  the  C  source  code for the
       yyparse() function. It shall contain code for the  vari-
       ous  semantic  actions with macro substitution performed
       on them as described in the  EXTENDED  DESCRIPTION  sec-
       tion. It also shall contain a copy of the #define state-
       ments in the header file. If  a  %union  declaration  is
       used, the declaration for YYSTYPE shall also be included
       in this file.

   Header File
       The header file shall contain  #define  statements  that
       associate  the  token numbers with the token names. This
       allows source files other than the code file  to  access
       the  token  codes.  If a %union declaration is used, the
       declaration for YYSTYPE and  an  extern  YYSTYPE  yylval
       declaration shall also be included in this file.

   Description File
       The  description  file shall be a text file containing a
       description of the state machine  corresponding  to  the
       parser, using an unspecified format. Limits for internal
       tables (see Limits )  shall  also  be  reported,  in  an
       implementation-defined manner. (Some implementations may
       use dynamic allocation techniques and have  no  specific
       limit values to report.)

EXTENDED DESCRIPTION
       The  yacc  command  accepts  a  language that is used to
       define a grammar for a target language to be  parsed  by
       the  tables  and  code  generated  by yacc. The language
       accepted by yacc as a grammar for the target language is
       described below using the yacc input language itself.

       The  input  grammar  includes rules describing the input
       structure of the target language and code to be  invoked
       when  these  rules are recognized to provide the associ-
       ated semantic action. The  code  to  be  executed  shall
       appear  as bodies of text that are intended to be C-lan-
       guage code. The C-language inclusions  are  presumed  to
       form  a correct function when processed by yacc into its
       output files. The code included in  this  way  shall  be
       executed  during the recognition of the target language.

       Given a grammar, the yacc utility  generates  the  files
       described in the OUTPUT FILES section. The code file can
       be compiled and linked using c99. If the declaration and
       programs  sections  of  the grammar file did not include
       definitions of main(), yylex(), and yyerror(), the  com-
       piled  output  requires linking with externally supplied
       versions of those functions. Default versions of  main()
       and  yyerror()  are supplied in the yacc library and can
       be linked in by using the -l y operand to c99.  The yacc
       library  interfaces  need  not  support  interfaces with
       other than the default yy symbol prefix. The application
       provides the lexical analyzer function, yylex(); the lex
       utility is specifically designed to generate such a rou-
       tine.

   Input Language
       The  application  shall  ensure that every specification
       file consists of three sections in order:  declarations,
       grammar rules, and programs, separated by double percent
       signs ( "%%" ). The declarations and  programs  sections
       can be empty. If the latter is empty, the preceding "%%"
       mark separating it from the rules section can  be  omit-
       ted.

       The  input  is free form text following the structure of
       the grammar defined below.

   Lexical Structure of the Grammar
       The <blank>s,  <newline>s,  and  <form-feed>s  shall  be
       ignored,  except  that the application shall ensure that
       they do not appear in names or multi-character  reserved
       symbols.  Comments shall be enclosed in "/* ... */", and
       can appear wherever a name is valid.

       Names are of arbitrary length, made up of letters, peri-
       ods  ( '.'  ), underscores ( '_' ), and non-initial dig-
       its. Uppercase and lowercase letters are distinct.  Con-
       forming applications shall not use names beginning in yy
       or YY since the yacc parser uses such names. Many of the
       names  appear in the final output of yacc, and thus they
       should be chosen to conform with  any  additional  rules
       created by the C compiler to be used. In particular they
       appear in #define statements.

       A literal shall consist of a single  character  enclosed
       in  single-quotes  (  '"  ). All of the escape sequences
       supported for character constants by the ISO C  standard
       shall be supported by yacc.

       The  relationship with the lexical analyzer is discussed
       in detail below.

       The application shall ensure that the NUL  character  is
       not used in grammar rules or literals.

   Declarations Section
       The  declarations  section is used to define the symbols
       used to define the target language and  their  relation-
       ship  with  each other. In particular, much of the addi-
       tional information required to  resolve  ambiguities  in
       the context-free grammar for the target language is pro-
       vided here.

       Usually yacc assigns the relationship between  the  sym-
       bolic  names  it  generates and their underlying numeric
       value. The declarations section  makes  it  possible  to
       control the assignment of these values.

       It is also possible to keep semantic information associ-
       ated with the tokens currently on the parse stack  in  a
       user-defined  C-language  union,  if  the members of the
       union are associated with the various names in the gram-
       mar. The declarations section provides for this as well.

       The first group of declarators below all take a list  of
       names  as  arguments.   That list can optionally be pre-
       ceded by the name of a C  union  member  (called  a  tag
       below)  appearing  within '<' and '>' . (As an exception
       to the typographical conventions of  the  rest  of  this
       volume  of IEEE Std 1003.1-2001, in this case <tag> does
       not represent a  metavariable,  but  the  literal  angle
       bracket characters surrounding a symbol.) The use of tag
       specifies that the tokens named on this line shall be of
       the  same  C type as the union member referenced by tag.
       This is discussed in more detail below.

       For lists used to define tokens, the first appearance of
       a  given token can be followed by a positive integer (as
       a string of decimal digits). If this is done, the under-
       lying value assigned to it for lexical purposes shall be
       taken to be that number.

       The following declares name to be a token:


              %token [<tag>] name [number][name [number]]...

       If tag is present, the C type for  all  tokens  on  this
       line shall be declared to be the type referenced by tag.
       If a positive integer,  number,  follows  a  name,  that
       value shall be assigned to the token.

       The  following  declares name to be a token, and assigns
       precedence to it:


              %left [<tag>] name [number][name [number]]...
              %right [<tag>] name [number][name [number]]...

       One or more lines, each beginning with one of these sym-
       bols, can appear in this section. All tokens on the same
       line have the same precedence level  and  associativity;
       the lines are in order of increasing precedence or bind-
       ing strength. %left denotes that the operators  on  that
       line  are left associative, and %right similarly denotes
       right associative operators. If tag is present, it shall
       declare a C type for names as described for %token.

       The following declares name to be a token, and indicates
       that this cannot be used associatively:


              %nonassoc [<tag>] name [number][name [number]]...

       If the parser encounters associative use of  this  token
       it reports an error. If tag is present, it shall declare
       a C type for names as described for %token.

       The following declares that union member names are  non-
       terminals,  and  thus it is required to have a tag field
       at its beginning:


              %type <tag> name...

       Because it deals with non-terminals  only,  assigning  a
       token  number  or using a literal is also prohibited. If
       this construct  is  present,  yacc  shall  perform  type
       checking;  if  this  construct is not present, the parse
       stack shall hold only the int type.

       Every name used in grammar  not  defined  by  a  %token,
       %left,  %right,  or  %nonassoc declaration is assumed to
       represent a non-terminal symbol. The yacc utility  shall
       report  an  error  for any non-terminal symbol that does
       not appear on the left side  of  at  least  one  grammar
       rule.

       Once  the type, precedence, or token number of a name is
       specified, it shall not be changed. If the first  decla-
       ration  of  a token does not assign a token number, yacc
       shall assign a token number.  Once  this  assignment  is
       made,  the token number shall not be changed by explicit
       assignment.

       The following declarators do  not  follow  the  previous
       pattern.

       The  following  declares the non-terminal name to be the
       start symbol, which represents the largest, most general
       structure described by the grammar rules:


              %start name

       By  default, it is the left-hand side of the first gram-
       mar rule; this default can be overridden with this  dec-
       laration.

       The  following  declares  the  yacc  value stack to be a
       union of the various types of values desired:


              %union { body of union (in C) }

       By default, the values returned by actions  (see  below)
       and  the lexical analyzer shall be of type int. The yacc
       utility keeps track of types, and it shall insert corre-
       sponding  union  member names in order to perform strict
       type checking of the resulting parser.

       Alternatively, given that at least one  <tag>  construct
       is  used,  the  union  can  be declared in a header file
       (which shall be included in the declarations section  by
       using  a  #include  construct  within  %{ and %}), and a
       typedef used to define the symbol YYSTYPE  to  represent
       this  union. The effect of %union is to provide the dec-
       laration of YYSTYPE directly from the yacc input.

       C-language declarations and definitions  can  appear  in
       the  declarations  section,  enclosed  by  the following
       marks:


              %{ ... %}

       These statements shall be copied into the code file, and
       have  global scope within it so that they can be used in
       the rules and program sections.

       The application shall ensure that the declarations  sec-
       tion is terminated by the token %%.

   Grammar Rules in yacc
       The rules section defines the context-free grammar to be
       accepted by the function yacc generates, and  associates
       with  those  rules  C-language  actions  and  additional
       precedence information.  The grammar is described below,
       and a formal definition follows.

       The  rules  section  is comprised of one or more grammar
       rules. A grammar rule has the form:


              A : BODY ;

       The symbol A represents a non-terminal  name,  and  BODY
       represents  a  sequence of zero or more names, literals,
       and semantic  actions  that  can  then  be  followed  by
       optional  precedence  rules. Only the names and literals
       participate in the formation of the grammar; the  seman-
       tic actions and precedence rules are used in other ways.
       The colon and the semicolon  are  yacc  punctuation.  If
       there are several successive grammar rules with the same
       left-hand side, the vertical bar  '|'  can  be  used  to
       avoid  rewriting  the  left-hand  side; in this case the
       semicolon appears only after the  last  rule.  The  BODY
       part  can  be  empty (or empty of names and literals) to
       indicate that the non-terminal symbol matches the  empty
       string.

       The  yacc  utility assigns a unique number to each rule.
       Rules using  the  vertical  bar  notation  are  distinct
       rules.  The  number  assigned to the rule appears in the
       description file.

       The elements comprising a BODY are:

       name, literal
              These form the rules  of  the  grammar:  name  is
              either  a token or a non-terminal; literal stands
              for itself (less the lexically required quotation
              marks).

       semantic action

              With  each  grammar  rule, the user can associate
              actions to be performed each  time  the  rule  is
              recognized  in  the input process. (Note that the
              word "action" can also refer to  the  actions  of
              the parser-shift, reduce, and so on.)

       These  actions can return values and can obtain the val-
       ues returned by previous actions. These values are  kept
       in  objects  of  type  YYSTYPE  (see %union). The result
       value of the action shall be kept  on  the  parse  stack
       with  the  left-hand side of the rule, to be accessed by
       other reductions as part of their right-hand  side.   By
       using the <tag> information provided in the declarations
       section, the code generated by yacc can be strictly type
       checked  and contain arbitrary information. In addition,
       the lexical analyzer can provide the same kinds of  val-
       ues for tokens, if desired.

       An action is an arbitrary C statement and as such can do
       input or output, call subprograms,  and  alter  external
       variables.  An  action  is  one  or  more  C  statements
       enclosed in curly braces '{' and '}' .

       Certain pseudo-variables can  be  used  in  the  action.
       These  are  macros  for  access to data structures known
       internally to yacc.

       $$
              The value of the action can be set  by  assigning
              it  to  $$.  If  type checking is enabled and the
              type of the value to be assigned cannot be deter-
              mined, a diagnostic message may be generated.

       $number
              This  refers  to the value returned by the compo-
              nent specified by the token number in  the  right
              side  of a rule, reading from left to right; num-
              ber can be zero or negative. If number is zero or
              negative,  it  refers to the data associated with
              the name on  the  parser's  stack  preceding  the
              leftmost  symbol  of  the current rule. (That is,
              "$0" refers to the name immediately preceding the
              leftmost  name in the current rule to be found on
              the parser's stack and "$-1" refers to the symbol
              to its left.) If number refers to an element past
              the current point in the rule, or beyond the bot-
              tom  of  the  stack,  the result is undefined. If
              type checking is enabled  and  the  type  of  the
              value  to  be  assigned  cannot  be determined, a
              diagnostic message may be generated.

       $<tag>number

              These correspond  exactly  to  the  corresponding
              symbols  without the tag inclusion, but allow for
              strict type checking (and preclude unwanted  type
              conversions).  The  effect  is  that the macro is
              expanded to use tag to select an element from the
              YYSTYPE  union (using dataname.tag). This is par-
              ticularly useful if number is not positive.

       $<tag>$
              This imposes on the reference  the  type  of  the
              union member referenced by tag. This construction
              is applicable when a reference to a left  context
              value  occurs  in  the grammar, and provides yacc
              with a means for selecting a type.


       Actions can occur anywhere in a rule (not  just  at  the
       end); an action can access values returned by actions to
       its left, and in  turn  the  value  it  returns  can  be
       accessed  by  actions to its right.  An action appearing
       in the middle of a rule shall be equivalent to replacing
       the  action with a new non-terminal symbol and adding an
       empty rule with that non-terminal symbol  on  the  left-
       hand  side.  The semantic action associated with the new
       rule shall be equivalent to the original action. The use
       of  actions  within rules might introduce conflicts that
       would not otherwise exist.

       By default, the value of a rule shall be  the  value  of
       the  first  element in it. If the first element does not
       have a type (particularly in the case of a literal)  and
       type  checking  is  turned on by %type, an error message
       shall result.

       precedence
              The keyword %prec  can  be  used  to  change  the
              precedence  level  associated  with  a particular
              grammar rule. Examples of this are in cases where
              a  unary  and  binary operator have the same sym-
              bolic representation, but need to be  given  dif-
              ferent  precedences,  or where the handling of an
              ambiguous if-else construction is necessary.  The
              reserved  symbol  %prec  can  appear  immediately
              after the body of the grammar  rule  and  can  be
              followed  by  a token name or a literal. It shall
              cause the  precedence  of  the  grammar  rule  to
              become  that  of the following token name or lit-
              eral. The action for the rule as a whole can fol-
              low %prec.


       If  a  program  section  follows,  the application shall
       ensure that the grammar rules are terminated by %%.

   Programs Section
       The programs section can include the definition  of  the
       lexical  analyzer  yylex(), and any other functions; for
       example, those used in  the  actions  specified  in  the
       grammar  rules.   It is unspecified whether the programs
       section precedes or follows the semantic actions in  the
       output  file; therefore, if the application contains any
       macro definitions and declarations intended to apply  to
       the  code  in  the semantic actions, it shall place them
       within "%{ ... %}" in the declarations section.

   Input Grammar
       The following input to yacc  yields  a  parser  for  the
       input  to yacc. This formal syntax takes precedence over
       the preceding text syntax description.

       The lexical structure is defined less precisely; Lexical
       Structure  of the Grammar defines most terms. The corre-
       spondence between the  previous  terms  and  the  tokens
       below is as follows.

       IDENTIFIER
              This  corresponds  to  the concept of name, given
              previously. It also includes literals as  defined
              previously.

       C_IDENTIFIER
              This  is  a name, and additionally it is known to
              be followed by a colon.  A literal  cannot  yield
              this token.

       NUMBER A  string of digits (a non-negative decimal inte-
              ger).

       TYPE, LEFT, MARK, LCURL, RCURL

              These correspond directly to  %type,  %left,  %%,
              %{, and %}.

       { ... }
              This  indicates  C-language source code, with the
              possible inclusion of  '$'  macros  as  discussed
              previously.


              /* Grammar for the input to yacc. */
              /* Basic entries. */
              /* The following are recognized by the lexical analyzer. */


              %token    IDENTIFIER      /* Includes identifiers and literals */
              %token    C_IDENTIFIER    /* identifier (but not literal)
                                           followed by a :. */
              %token    NUMBER          /* [0-9][0-9]* */


              /* Reserved words : %type=>TYPE %left=>LEFT, and so on */


              %token    LEFT RIGHT NONASSOC TOKEN PREC TYPE START UNION


              %token    MARK            /* The %% mark. */
              %token    LCURL           /* The %{ mark. */
              %token    RCURL           /* The %} mark. */


              /* 8-bit character literals stand for themselves; */
              /* tokens have to be defined for multi-byte characters. */


              %start    spec


              %%


              spec  : defs MARK rules tail
                    ;
              tail  : MARK
                    {
                      /* In this action, set up the rest of the file. */
                    }
                    | /* Empty; the second MARK is optional. */
                    ;
              defs  : /* Empty. */
                    |    defs def
                    ;
              def   : START IDENTIFIER
                    |    UNION
                    {
                      /* Copy union definition to output. */
                    }
                    |    LCURL
                    {
                      /* Copy C code to output file. */
                    }
                      RCURL
                    |    rword tag nlist
                    ;
              rword : TOKEN
                    | LEFT
                    | RIGHT
                    | NONASSOC
                    | TYPE
                    ;
              tag   : /* Empty: union tag ID optional. */
                    | '<' IDENTIFIER '>'
                    ;
              nlist : nmno
                    | nlist nmno
                    ;
              nmno  : IDENTIFIER         /* Note: literal invalid with % type. */
                    | IDENTIFIER NUMBER  /* Note: invalid with % type. */
                    ;


              /* Rule section */


              rules : C_IDENTIFIER rbody prec
                    | rules  rule
                    ;
              rule  : C_IDENTIFIER rbody prec
                    | '|' rbody prec
                    ;
              rbody : /* empty */
                    | rbody IDENTIFIER
                    | rbody act
                    ;
              act   : '{'
                      {
                        /* Copy action, translate $$, and so on. */
                      }
                      '}'
                    ;
              prec  : /* Empty */
                    | PREC IDENTIFIER
                    | PREC IDENTIFIER act
                    | prec ';'
                    ;

   Conflicts
       The  parser  produced  for  an input grammar may contain
       states in which conflicts  occur.  The  conflicts  occur
       because the grammar is not LALR(1). An ambiguous grammar
       always contains at least one LALR(1) conflict. The  yacc
       utility   shall  resolve  all  conflicts,  using  either
       default rules or user-specified precedence rules.

       Conflicts   are   either   shift/reduce   conflicts   or
       reduce/reduce  conflicts.   A  shift/reduce  conflict is
       where, for a given state and lookahead  symbol,  both  a
       shift  action  and  a  reduce  action  are  possible.  A
       reduce/reduce conflict is where, for a given  state  and
       lookahead  symbol, reductions by two different rules are
       possible.

       The rules below describe how to specify what actions  to
       take  when  a conflict occurs. Not all shift/reduce con-
       flicts can be successfully resolved this way because the
       conflict  may  be due to something other than ambiguity,
       so incautious use of these facilities can cause the lan-
       guage  accepted  by the parser to be much different from
       that which was intended. The description file shall con-
       tain  sufficient  information to understand the cause of
       the conflict. Where ambiguity is the reason  either  the
       default  or explicit rules should be adequate to produce
       a working parser.

       The declared precedences and associativities (see Decla-
       rations  Section ) are used to resolve parsing conflicts
       as follows:

        1. A precedence and associativity  is  associated  with
           each grammar rule; it is the precedence and associa-
           tivity of the last token or literal in the  body  of
           the rule. If the %prec keyword is used, it overrides
           this default. Some grammar rules might not have both
           precedence and associativity.


        2. If  there  is  a shift/reduce conflict, and both the
           grammar rule and the input  symbol  have  precedence
           and  associativity  associated  with  them, then the
           conflict is resolved in favor of the  action  (shift
           or reduce) associated with the higher precedence. If
           the precedences are the same, then the associativity
           is  used;  left  associative  implies  reduce, right
           associative  implies  shift,   and   non-associative
           implies an error in the string being parsed.


        3. When there is a shift/reduce conflict that cannot be
           resolved by rule 2, the  shift  is  done.  Conflicts
           resolved this way are counted in the diagnostic out-
           put described in Error Handling .


        4. When there is a reduce/reduce conflict, a  reduction
           is  done  by the grammar rule that occurs earlier in
           the input sequence.  Conflicts resolved this way are
           counted  in the diagnostic output described in Error
           Handling .


       Conflicts resolved by precedence or associativity  shall
       not  be  counted  in  the shift/reduce and reduce/reduce
       conflicts reported by yacc on either standard  error  or
       in the description file.

   Error Handling
       The  token  error  shall be reserved for error handling.
       The name error can be used in grammar  rules.  It  indi-
       cates  places where the parser can recover from a syntax
       error. The default value of  error  shall  be  256.  Its
       value  can  be  changed  using a %token declaration. The
       lexical analyzer should not return the value of error.

       The parser shall detect a syntax error when it is  in  a
       state  where  the  action  associated with the lookahead
       symbol is error. A semantic action can cause the  parser
       to  initiate error handling by executing the macro YYER-
       ROR. When  YYERROR  is  executed,  the  semantic  action
       passes  control  back  to  the parser. YYERROR cannot be
       used outside of semantic actions.

       When the parser detects  a  syntax  error,  it  normally
       calls yyerror() with the character string "syntax error"
       as its argument. The call  shall  not  be  made  if  the
       parser  is  still  recovering from a previous error when
       the error is detected. The parser is  considered  to  be
       recovering  from  a  previous error until the parser has
       shifted over at least three normal input  symbols  since
       the  last  error  was  detected or a semantic action has
       executed the macro yyerrok. The parser  shall  not  call
       yyerror() when YYERROR is executed.

       The macro function YYRECOVERING shall return 1 if a syn-
       tax error has been detected and the parser has  not  yet
       fully  recovered  from  it.  Otherwise,  zero  shall  be
       returned.

       When a syntax error  is  detected  by  the  parser,  the
       parser  shall  check if a previous syntax error has been
       detected. If a previous error was detected,  and  if  no
       normal input symbols have been shifted since the preced-
       ing error was detected, the parser checks if the  looka-
       head  symbol is an endmarker (see Interface to the Lexi-
       cal Analyzer ). If it is, the parser shall return with a
       non-zero value. Otherwise, the lookahead symbol shall be
       discarded and normal parsing shall resume.

       When YYERROR is executed or when the  parser  detects  a
       syntax error and no previous error has been detected, or
       at least one normal input symbol has been shifted  since
       the  previous  error  was detected, the parser shall pop
       back one state at a time until the parse stack is  empty
       or  the current state allows a shift over error.  If the
       parser empties the parse stack, it shall return  with  a
       non-zero value. Otherwise, it shall shift over error and
       then resume normal parsing. If the parser reads a looka-
       head  symbol  before the error was detected, that symbol
       shall still be the  lookahead  symbol  when  parsing  is
       resumed.

       The  macro  yyerrok in a semantic action shall cause the
       parser to act as if it has fully recovered from any pre-
       vious errors. The macro yyclearin shall cause the parser
       to discard the current lookahead token. If  the  current
       lookahead  token  has not yet been read, yyclearin shall
       have no effect.

       The macro YYACCEPT shall cause the parser to return with
       the value zero. The macro YYABORT shall cause the parser
       to return with a non-zero value.

   Interface to the Lexical Analyzer
       The yylex() function is an integer-valued function  that
       returns  a  token  number representing the kind of token
       read. If there is a  value  associated  with  the  token
       returned  by  yylex() (see the discussion of tag above),
       it shall be assigned to the external variable yylval.

       If the parser and yylex() do not agree  on  these  token
       numbers,  reliable  communication  between  them  cannot
       occur. For (single-byte character) literals,  the  token
       is simply the numeric value of the character in the cur-
       rent character set. The numbers  for  other  tokens  can
       either  be  chosen  by  yacc,  or chosen by the user. In
       either case, the #define construct of C is used to allow
       yylex()  to  return  these  numbers  symbolically.   The
       #define statements are put into the code file,  and  the
       header  file if that file is requested. The set of char-
       acters permitted by yacc in an identifier is larger than
       that  permitted  by C. Token names found to contain such
       characters shall not be included in the #define declara-
       tions.

       If  the  token  numbers  are  chosen by yacc, the tokens
       other than literals shall be  assigned  numbers  greater
       than  256,  although no order is implied. A token can be
       explicitly assigned a  number  by  following  its  first
       appearance  in  the  declarations section with a number.
       Names and literals not defined  this  way  retain  their
       default  definition.  All token numbers assigned by yacc
       shall be unique and distinct from the token numbers used
       for  literals  and  user-assigned  tokens.  If duplicate
       token numbers cause conflicts in parser generation, yacc
       shall  report  an  error;  otherwise,  it is unspecified
       whether the token assignment is accepted or an error  is
       reported.

       The end of the input is marked by a special token called
       the endmarker, which has a token number that is zero  or
       negative.  (These  values  are  invalid  for  any  other
       token.) All lexical analyzers shall return zero or nega-
       tive  as  a  token number upon reaching the end of their
       input. If the tokens up to, but excluding, the endmarker
       form  a  structure  that  matches  the start symbol, the
       parser shall accept the input. If the endmarker is  seen
       in any other context, it shall be considered an error.

   Completing the Program
       In  addition  to  yyparse()  and  yylex(), the functions
       yyerror() and main() are required  to  make  a  complete
       program.  The  application  can  supply main() and yyer-
       ror(), or those routines can be obtained from  the  yacc
       library.

   Yacc Library
       The  following  functions  shall appear only in the yacc
       library accessible through the -l y operand to c99; they
       can therefore be redefined by a conforming application:

       int  main(void)

              This  function shall call yyparse() and exit with
              an unspecified value. Other actions  within  this
              function are unspecified.

       int  yyerror(const char *s)

              This  function  shall  write  the  NUL-terminated
              argument to standard error, followed by  a  <new-
              line>.


       The  order of the -l y and -l l operands given to c99 is
       significant; the application shall  either  provide  its
       own main() function or ensure that -l y precedes -l l.

   Debugging the Parser
       The  parser  generated  by  yacc  shall  have diagnostic
       facilities in it  that  can  be  optionally  enabled  at
       either compile time or at runtime (if enabled at compile
       time). The compilation of the runtime debugging code  is
       under  the control of YYDEBUG, a preprocessor symbol. If
       YYDEBUG has a non-zero value, the debugging  code  shall
       be included. If its value is zero, the code shall not be
       included.

       In parsers where the debugging code has  been  included,
       the  external  int yydebug can be used to turn debugging
       on (with a non-zero value) and off (zero value) at  run-
       time. The initial value of yydebug shall be zero.

       When  -t is specified, the code file shall be built such
       that, if YYDEBUG is not already defined  at  compilation
       time  (using  the  c99  -D YYDEBUG option, for example),
       YYDEBUG shall be set explicitly to 1.  When  -t  is  not
       specified,  the  code  file shall be built such that, if
       YYDEBUG is not already defined, it shall be set  explic-
       itly to zero.

       The  format  of  the debugging output is unspecified but
       includes at least enough information  to  determine  the
       shift and reduce actions, and the input symbols. It also
       provides information about error recovery.

   Algorithms
       The parser constructed by  yacc  implements  an  LALR(1)
       parsing algorithm as documented in the literature. It is
       unspecified  whether  the  parser  is  table-driven   or
       direct-coded.

       A  parser generated by yacc shall never request an input
       symbol from yylex() while in  a  state  where  the  only
       actions  other than the error action are reductions by a
       single rule.

       The literature of parsing theory defines these concepts.

   Limits
       The  yacc  utility may have several internal tables. The
       minimum maximums for these tables are shown in the  fol-
       lowing  table.  The  exact  meaning  of  these values is
       implementation-defined.  The implementation shall define
       the  relationship  between these values and between them
       and any error messages that the implementation may  gen-
       erate should it run out of space for any internal struc-
       ture. An implementation  may  combine  groups  of  these
       resources into a single pool as long as the total avail-
       able to the user does not fall  below  the  sum  of  the
       sizes specified by this section.

                    Table: Internal Limits in yacc

                 Minimum
    Limit        Maximum   Description
    {NTERMS}     126       Number of tokens.
    {NNONTERM}   200       Number of non-terminals.
    {NPROD}      300       Number of rules.
    {NSTATES}    600       Number of states.
    {MEMSIZE}    5200      Length of rules. The total length, in
                           names (tokens and non-terminals), of all
                           the rules of the grammar. The left-hand
                           side is counted for each rule, even if
                           it is not explicitly repeated, as speci-
                           fied in Grammar Rules in yacc .
    {ACTSIZE}    4000      Number of actions. "Actions" here (and
                           in the description file) refer to parser
                           actions (shift, reduce, and so on) not
                           to semantic actions defined in Grammar
                           Rules in yacc .

EXIT STATUS
       The following exit values shall be returned:

        0     Successful completion.

       >0     An error occurred.


CONSEQUENCES OF ERRORS
       If any errors are encountered, the run  is  aborted  and
       yacc  exits  with  a non-zero status. Partial code files
       and header files may be produced. The  summary  informa-
       tion in the description file shall always be produced if
       the -v flag is present.

       The following sections are informative.

APPLICATION USAGE
       Historical implementations experience name conflicts  on
       the  names  yacc.tmp,  yacc.acts,  yacc.debug,  y.tab.c,
       y.tab.h, and y.output if more than one copy of  yacc  is
       running in a single directory at one time. The -b option
       was added to overcome this problem. The related  problem
       of  allowing  multiple  yacc parsers to be placed in the
       same file was addressed by adding a -p option  to  over-
       ride the previously hard-coded yy variable prefix.

       The  description  of the -p option specifies the minimal
       set of function and variable names that  cause  conflict
       when  multiple parsers are linked together. YYSTYPE does
       not need to be changed. Instead, the programmer can  use
       -b  to  give the header files for different parsers dif-
       ferent names, and then the file with the yylex()  for  a
       given  parser  can  include  the header for that parser.
       Names such as yyclearerr  do  not  need  to  be  changed
       because  they  are used only in the actions; they do not
       have linkage. It is possible that an implementation  has
       other  names,  either  internal  ones  for  implementing
       things such as  yyclearerr,  or  providing  non-standard
       features that it wants to change with -p.

       Unary  operators  that  are  the  same token as a binary
       operator in general need their precedence adjusted. This
       is  handled by the %prec advisory symbol associated with
       the particular grammar rule defining that  unary  opera-
       tor.  (See Grammar Rules in yacc .) Applications are not
       required to use this operator for unary  operators,  but
       the grammars that do not require it are rare.

EXAMPLES
       Access  to  the  yacc  library  is obtained with library
       search operands to c99. To use the yacc library main():


              c99 y.tab.c -l y

       Both the  lex  library  and  the  yacc  library  contain
       main().  To access the yacc main():


              c99 y.tab.c lex.yy.c -l y -l l

       This ensures that the yacc library is searched first, so
       that its main() is used.

       The historical yacc libraries have contained two  simple
       functions  that  are  normally  coded by the application
       programmer.  These functions are similar to the  follow-
       ing code:


              #include <locale.h>
              int main(void)
              {
                  extern int yyparse();


                  setlocale(LC_ALL, "");


                  /* If the following parser is one created by lex, the
                     application must be careful to ensure that LC_CTYPE
                     and LC_COLLATE are set to the POSIX locale. */
                  (void) yyparse();
                  return (0);
              }


              #include <stdio.h>


              int yyerror(const char *msg)
              {
                  (void) fprintf(stderr, "%s\n", msg);
                  return (0);
              }

RATIONALE
       The  references  in  may  be helpful in constructing the
       parser generator.  The referenced DeRemer  and  Pennello
       article (along with the works it references) describes a
       technique to generate parsers that conform to this  vol-
       ume  of IEEE Std 1003.1-2001.  Work in this area contin-
       ues to be done, so implementors should  consult  current
       literature  before  doing  any  new implementations. The
       original Knuth article is the theoretical basis for this
       kind  of parser, but the tables it generates are imprac-
       tically large for reasonable grammars and should not  be
       used.  The  "equivalent  to"  wording  is intentional to
       assure that the best tables that are LALR(1) can be gen-
       erated.

       There  has been confusion between the class of grammars,
       the algorithms needed to generate parsers, and the algo-
       rithms  needed to parse the languages. They are all rea-
       sonably orthogonal. In particular,  a  parser  generator
       that  accepts  the full range of LR(1) grammars need not
       generate a table any more complex than one that  accepts
       SLR(1)  (a  relatively  weak class of LR grammars) for a
       grammar that happens to be SLR(1). Such  an  implementa-
       tion need not recognize the case, either; table compres-
       sion can yield the SLR(1) table  (or  one  even  smaller
       than  that)  without  recognizing  that  the  grammar is
       SLR(1). The speed of an LR(1) parser for  any  class  is
       dependent  more  upon  the table representation and com-
       pression (or the code generation if a direct  parser  is
       generated) than upon the class of grammar that the table
       generator handles.

       The speed of the parser generator is somewhat  dependent
       upon the class of grammar it handles. However, the orig-
       inal  Knuth  article  algorithms  for  constructing   LR
       parsers  were  judged  by its author to be impractically
       slow at that time. Although full LR is more complex than
       LALR(1),  as computer speeds and algorithms improve, the
       difference (in terms of acceptable wall-clock  execution
       time) is becoming less significant.

       Potential  authors  are  cautioned  that  the referenced
       DeRemer and Pennello article previously cited identifies
       a  bug  (an  over-simplification  of  the computation of
       LALR(1) lookahead sets) in some of the LALR(1) algorithm
       statements  that preceded it to publication. They should
       take the time to seek out that paper, as well as current
       relevant work, particularly Aho's.

       The -b option was added to provide a portable method for
       permitting yacc to work on multiple separate parsers  in
       the  same  directory.  If a directory contains more than
       one yacc grammar, and both grammars are  constructed  at
       the  same  time  (by,  for example, a parallel make pro-
       gram), conflict results.  While the solution is not his-
       torical practice, it corrects a known deficiency in his-
       torical implementations. Corresponding changes were made
       to  all  sections  that referenced the filenames y.tab.c
       (now "the code file"), y.tab.h (now "the header  file"),
       and y.output (now "the description file").

       The grammar for yacc input is based on System V documen-
       tation.  The textual description shows  there  that  the
       ';'  is required at the end of the rule. The grammar and
       the implementation do not  require  this.  (The  use  of
       C_IDENTIFIER  causes  a  reduce  to  occur  in the right
       place.)

       Also, in that implementation,  the  constructs  such  as
       %token can be terminated by a semicolon, but this is not
       permitted by the grammar. The keywords  such  as  %token
       can  also  appear  in uppercase, which is again not dis-
       cussed. In most places where '%' is  used,  '\'  can  be
       substituted,  and there are alternate spellings for some
       of the symbols (for example, %LEFT can be "%<"  or  even
       "\<" ).

       Historically,  <tag>  can  contain any characters except
       '>', including white space, in the implementation.  How-
       ever,  since  the  tag  must reference an ISO C standard
       union member,  in  practice  conforming  implementations
       need  to  support  only  the set of characters for ISO C
       standard identifiers in this context.

       Some historical  implementations  are  known  to  accept
       actions  that  are  terminated  by  a period. Historical
       implementations often allow '$' in names.  A  conforming
       implementation  does not need to support either of these
       behaviors.

       Deciding when to use %prec illustrates the difficulty in
       specifying the behavior of yacc. There may be situations
       in which the  grammar  is  not,  strictly  speaking,  in
       error,  and  yet yacc cannot interpret it unambiguously.
       The resolution of ambiguities in the grammar can in many
       instances  be  resolved by providing additional informa-
       tion, such as using %type or %union declarations. It  is
       often  easier  and it usually yields a smaller parser to
       take this alternative when it is appropriate.

       The size and execution time of a program produced  with-
       out  the  runtime  debugging code is usually smaller and
       slightly faster in historical implementations.

       Statistics messages from several historical  implementa-
       tions include the following types of information:


              n/512 terminals, n/300 non-terminals
              n/600 grammar rules, n/1500 states
              n shift/reduce, n reduce/reduce conflicts reported
              n/350 working sets used
              Memory: states, etc. n/15000, parser n/15000
              n/600 distinct lookahead sets
              n extra closures
              n shift entries, n exceptions
              n goto entries
              n entries saved by goto default
              Optimizer space used: input n/15000, output n/15000
              n table entries, n zero
              Maximum spread: n, Maximum offset: n

       The report of internal tables in the description file is
       left implementation-defined because all aspects of these
       limits are also implementation-defined. Some implementa-
       tions may use dynamic allocation techniques and have  no
       specific limit values to report.

       The  format  of  the  y.output file is not given because
       specification of the format  was  not  seen  to  enhance
       applications   portability.  The  listing  is  primarily
       intended to help human users understand  and  debug  the
       parser;  use  of  y.output  by  a conforming application
       script would be  unusual.  Furthermore,  implementations
       have  not produced consistent output and no popular for-
       mat was apparent. The format selected by the implementa-
       tion  should  be  human-readable,  in  addition  to  the
       requirement that it be a text file.

       Standard error reports are  not  specifically  described
       because  they  are  seldom of use to conforming applica-
       tions and there was no reason  to  restrict  implementa-
       tions.

       Some implementations recognize "={" as equivalent to '{'
       because it appears  in  historical  documentation.  This
       construction  was  recognized and documented as obsolete
       as long ago as 1978, in the referenced Yacc: Yet Another
       Compiler-Compiler.  This  volume of IEEE Std 1003.1-2001
       chose to leave it as obsolete and omit it.

       Multi-byte characters should be recognized by the  lexi-
       cal  analyzer and returned as tokens. They should not be
       returned as multi-byte  character  literals.  The  token
       error  that  is  used  for  error  recovery  is normally
       assigned the value 256 in the historical implementation.
       Thus,  the token value 256, which is used in many multi-
       byte character sets, is not available  for  use  as  the
       value of a user-defined token.

FUTURE DIRECTIONS
       None.

SEE ALSO
       c99, lex

COPYRIGHT
       Portions  of  this  text are reprinted and reproduced in
       electronic form from  IEEE  Std  1003.1,  2003  Edition,
       Standard  for Information Technology -- Portable Operat-
       ing System Interface (POSIX), The Open Group Base Speci-
       fications Issue 6, Copyright (C) 2001-2003 by the Insti-
       tute of Electrical and Electronics  Engineers,  Inc  and
       The  Open Group. In the event of any discrepancy between
       this version and the original IEEE and  The  Open  Group
       Standard,  the original IEEE and The Open Group Standard
       is the referee document. The original  Standard  can  be
       obtained        online        at        http://www.open-
       group.org/unix/online.html .


IEEE/The Open Group           2003                     YACC(1P)