(S) is space characters. (*) is any character not mentioned higher in the list it occurs EOF is end-of-file 'a' matches a string of characters a XXX EOF is poorly handled right now XXX need to say what actually happens with tokens data & "process entity" < tag (*) data (append (*)) tag ! markup-declaration / close-tag ? pi-open (S) data (emit >(S) before switching) (*) start-tag-name (append (*) to name) markup-declaration '--' comment-start 'DOCTYPE' (start tag not seen && doctype not seen) ? doctype-start : bogus-comment '[CDATA[' cdata-section (*) bogus-comment comment-start - comment-dash (*) comment-start comment-dash - comment-end (*) comment-dash comment-end > data (emit comment) - comment-dash (*) comment-start doctype-start (S) doctype-before-root-name (*) bogus-comment doctype-before-root-name (S) doctype-before-root-name (*) doctype-root-name doctype-root-name [ doctype-internal-subset (S) doctype-root-name-after (*) doctype-root-name doctype-root-name-after [ doctype-internal-subset > data " doctype-identifier-double-quoted ' doctype-identifier-single-quoted (*) doctype-root-name-after doctype-identifier-double-quoted " doctype-root-name-after (*) doctype-identifier-double-quoted doctype-identifier-single-quoted ' doctype-root-name-after (*) doctype-identifier-single-quoted doctype-internal-subset ] doctype-internal-subset-after % "process parameter entity" (S) doctype-internal-subset < doctype-tag doctype-tag ! doctype-markupdeclaration ? doctype-pi-open (*) doctype-bogus-comment doctype-markupdeclaration -- don't need ELEMENT 'ATTLIST' doctype-attlist 'ENTITY' doctype-entity 'NOTATION' doctype-notation '--' doctype-comment (*) doctype-bogus-comment doctype-attlist (S) doctype-attlist-before-name (*) doctype-bogus-comment doctype-attlist-before-name (S) doctype-attlist-before-name (*) doctype-attlist-name doctype-attlist-name (S) doctype-attlist-after-name (*) doctype-attlist-name doctype-attlist-after-name (S) doctype-attlist-after-name > doctype-internal-subset (*) doctype-attlist-attrname doctype-attlist-attrname (S) doctype-attlist-after-attrname (*) doctype-attlist-attrname doctype-attlist-after-attrname (S) doctype-attlist-after-attrname (*) doctype-attlist-attrtype doctype-attlist-attrtype (S) doctype-attlist-after-attrtype (*) doctype-attlist-attrtype doctype-attlist-after-attrtype (S) doctype-attlist-after-attrtype # doctype-attlist-attrdecl-maybe (*) doctype-bogus-comment doctype-attlist-attrdecl-maybe (S) doctype-bogus-comment (*) doctype-attlist-attrdecl doctype-attlist-attrdecl (S) docxtype-attlist-after-attrdecl (*) doctype-attlist-attrdecl doctype-attlist-after-attrdecl (S) doctype-attlist-after-attrdecl " doctype-attlist-attrval-double-quoted ' doctype-attlist-attrval-single-quoted (*) doctype-bogus-comment doctype-attlist-attrval-double-quoted " doctype-attlist-after-name & "process entity" % "process parameter entity" (*) doctype-attlist-attrval-double-quoted doctype-attlist-attrval-single-quoted ' doctype-attlist-after-name & "process entity" % "process parameter entity" (*) doctype-attlist-attrval-single-quoted doctype-entity (S) doctype-entity-before-type (*) doctype-bogus-comment doctype-entity-before-type (S) doctype-entity-before-type % doctype-entity-parameter-maybe (*) doctype-entity-name doctype-entity-parameter-maybe (S) doctype-entity-parameter (*) doctype-bogus-comment doctype-entity-parameter (S) doctype-entity-parameter (*) doctype-entity-parameter-name doctype-entity-parameter-name (S) doctype-entity-after-name (*) doctype-entity-parameter-name doctype-entity-after-name (S) doctype-entity-after-name " doctype-entity-val-double-quoted ' doctype-entity-val-single-quoted (*) doctype-entity-identifier doctype-entity-val-double-quoted XXX what about & and % " doctype-entity-after-val (*) doctype-entity-val-double-quoted doctype-entity-val-single-quoted XXX what about & and % ' doctype-entity-after-val (*) doctype-entity-val-single-quoted doctype-entity-after-val > doctype-internal-subset (S) doctype-entity-after-val (*) doctype-entity-after-val (error) doctype-entity-identifier > doctype-internal-subset " doctype-entity-identifier-double-quoted ' doctype-entity-identifier-single-quoted (*) doctype-entity-identifier doctype-entity-identifier-double-quoted " doctype-entity-identifier (*) doctype-entity-identifier-double-quoted doctype-entity-identifier-single-quoted ' doctype-entity-identifier (*) doctype-entity-identifier-single-quoted doctype-entity-name (S) doctype-entity-after-name (*) doctype-entity-name doctype-notation (S) doctype-notation-identifier (*) doctype-bogus-comment doctype-notation-identifier > doctype-internal-subset " doctype-notation-identifier-double-quoted ' doctype-notation-identifier-single-quoted (*) doctype-notation-identifier doctype-notation-identifier-double-quoted " doctype-notation-identifier (*) doctype-notation-identifier-double-quoted doctype-notation-identifier-single-quoted ' doctype-notation-identifier (*) doctype-notation-identifier-single-quoted doctype-comment XXX comment doctype-pi-open XXX pi-open doctype-bogus-comment > doctype-internal-subset (emit comment) (*) doctype-bogus-comment doctype-internal-subset-after > data (*) doctype-internal-subset-after cdata-section ] (next two characters are ']>') ? data : cdata-section (reprocess) (*) cdata-section bogus-comment > data (emit comment) (*) bogus-comment close-tag > data (do nothing) (S) bogus-comment (reprocess (S)) (*) close-tag-name (append (*)) close-tag-name (S) close-tag-name-end > data (also emit close tag) (*) close-tag-name (append (*)) close-tag-name-end > data (*) close-tag-name-end pi-open (S) bogus comment (*) pi-name pi-name (S) (pi-name known) ? pi-attributes-before : pi-normal ? pi-maybe-close (*) pi-name (append (*)) pi-attributes-before XXX pi-normal (S) pi-normal ? pi-maybe-close (*) pi-data pi-data ? pi-maybe-close (*) pi-data pi-maybe-close ? pi-maybe-close > data (emit pi) (*) pi-data start-tag-name / maybe-void-tag (S) tag-attribute-name-before (*) start-tag-name (append (*)) maybe-void-tag > data (emit void tag) (*) tag-attribute-name-before (reprocess character) tag-attribute-name-before (S) tag-attribute-name-before > data (emit start tag) / maybe-void-tag (*) tag-attribute-name tag-attribute-name (S) tag-attribute-name-after = tag-attribute-value-before > data (emit start tag) / maybe-void-tag (*) tag-attribute-name Important: When leaving tag-attribute-name drop duplicate attributes. tag-attribute-name-after (S) tag-attribute-name-after = tag-attribute-value-before > data (emit start tag) / maybe-void-tag (*) tag-attribute-name tag-attribute-value-before " tag-attribute-value-double-quoted ' tag-attribute-value-single-quoted & tag-attribute-value-unquoted (reprocess) > data (emit start tag) (*) tag-attribute-value tag-attribute-value-double-quoted " tag-attribute-name-before & "process entity" (*) tag-attribute-value-double-quoted tag-attribute-value-single-quoted ' tag-attribute-name-before & "process entity" (*) tag-attribute-value-single-quoted tag-attribute-value-unquoted (S) tag-attribute-name-before & "process entity" > data (emit start tag) (*) tag-attribute-value-unquoted