Jump to content

SGML

From Wikiversity

Welcome to the Wikiversity Center for the Study of SGML or Standard Generalized Markup Language. This is a content development project where participants create, organize and develop learning resources about SGML.

Remember this is a WIKI. To put it in other words, criticism is good but contributions are better.

Purpose

[edit | edit source]

Untangle the SGML syntax by describing each production. The participant should be able to understand a SGML declaration, and the basics of SGML.

Motivation

[edit | edit source]

I think you can name it "computer archaeology" ;-) as the world is evolving there is no practical interest behind this topic.

This is an overwhelming task, but IMHO it's worthwhile (and amusing?). As of 2008 there is no complete lay-oriented description of of SGML (at least I haven't found any). Most of SGML-related web resources which were available some years ago (by 2001) now are gone, and I think the English Wikipedia is not the right place for recreating this material (at least, until we can assure its quality).

Schedule

[edit | edit source]

Honestly, I don't think this will be done from one day to another... maybe in one year or two.

Introduction

[edit | edit source]

The Standard Generalized Markup Language (SGML) is a metalanguage in which one can define markup languages for documents. SGML is a descendant of IBM's Generalized Markup Language (GML), developed in the 1960s by Charles Goldfarb, Edward Mosher and Raymond Lorie.

SGML provides an abstract syntax that can be realized in many different concrete syntaxes. It was originally designed to enable the sharing of machine-readable documents in large projects in government, law and industry, which have to remain readable for several decades. It has also been used extensively in the printing and publishing industries, but its complexity has prevented its widespread application for small-scale general-purpose use.

Participants

[edit | edit source]

I have no Earthly idea why anyone in their right mind would sign up for this, but here's a place to do so:

  1. You ~~~~
  2. Mr Rho 07:38, 16 Nov 2017 (UTC) (I have no earthly idea why I signed up for this, but I am not in my right mind at most times)
  3. w:Rjgodoy 13:46, 2 May 2008 (UTC)

Content

[edit | edit source]

(under construction)

An Overview of the SGML declaration

[edit | edit source]

The SGML declaration is composed of

  • CHARSET: a description of the character set.
  • CAPACITY: restricts the maximum length of a document.
  • SCOPE: whether the syntax applies to the document instance only, or both document prolog and instance.
  • SYNTAX: the concrete syntax to be used within the document, which contains:
    • a list of illegal characters (SHUNCHAR),
    • a description of the character set used in the syntax (BASESET and DESCSET),
    • The definition of special characters (FUNCTION),
    • NAMING rules,
    • a list of general and short-reference delimiters (DELIM),
    • a list of reserved keywords for use in the DTD (NAMES),
  • QUANTITY: restricts the maximum length of individual productions.
  • FEATURES: optional features which modify the markup.
  • APPINFO: application-specific information.

Reference Concrete Syntax

[edit | edit source]
<!SGML "ISO 8879:1986"
  CHARSET
  BASESET "ISO 646:1991//CHARSET IRV//ESC 2/8 4/2"
  DESCSET
      0   9 UNUSED
      9   2   9         -- TAB, LF --
     11   2 UNUSED
     13   1  13         -- CR --
     14  18 UNUSED
     32  95  32
    127   1 UNUSED
  CAPACITY SGMLREF
    TOTALCAP    35000
    ENTCAP      35000
    ENTCHCAP    35000
    ELEMCAP     35000
    GRPCAP      35000
    EXGRPCAP    35000
    EXNMCAP     35000
    ATTCAP      35000
    ATTCHCAP    35000
    AVGRPCAP    35000
    NOTCAP      35000
    NOTCHCAP    35000
    IDCAP       35000
    IDREFCAP    35000
    MAPCAP      35000
    LKSETCAP    35000
    LKNMCAP     35000
  SCOPE DOCUMENT
  SYNTAX
    SHUNCHAR NONE       -- do not change this --
    BASESET "ISO 646:1991//CHARSET IRV//ESC 2/8 4/2"
    DESCSET 0 128 0
    FUNCTION
      RE    13          -- CR  --
      RS    10          -- LF  --
      SPACE 32          -- SP  --
      TAB   SEPCHAR  9  -- TAB --
    NAMING
      LCNMSTRT  ""      -- in addition to a..z --
      UCNMSTRT  ""      -- in addition to A..Z --
      LCNMCHAR  "-."    -- in addition to 0..9 --
      UCNMCHAR  "-."    -- in addition to 0..9 --
      NAMECASE
        GENERAL YES
        ENTITY  NO
    DELIM
      GENERAL SGMLREF
      MDO	"<!"	        -- markup decl open --
      MDC	">"	        -- markup decl close --
      DSO	"["	        -- declaration subset open --
      DSC	"]"	        -- declaration subset close --
      MSC	"]]"	        -- marked section close --
      COM       "--"	        -- comment --
      RNI	"#"	        -- reserved name indicator --
      LIT       """         -- literal --
      LITA	"'"	        -- alternative literal --
      GRPO	"("	        -- group open --
      GRPC	")"	        -- group close --
      AND	"&"	        -- and connector --
      OR	"|"	        -- or connector --
      SEQ	","	        -- seq connector --
      OPT	"?"	        -- opt occurrence indicator --
      REP	"*"	        -- rep occurrence indicator --
      PLUS	"+"	        -- plus occ ind, inclusion --
      MINUS	"-"	        -- exclusion, omission flag --
      CRO	"&#"	        -- character reference open --
      ERO	"&"	        -- entity reference open --
      PERO	"%"	        -- parameter entity reference open --
      REFC	";"	        -- reference close --
      PIO	"<?"	        -- processing instruction open --
      PIC	">"	        -- processing instruction close --
      STAGO	"<"	        -- start tag open --
      ETAGO	"</"	        -- end tag open --
      TAGC	">"	        -- tag close --
      NET	"/"	        -- null end-tag --
      VI	"="	        -- value indicator --
    SHORTREF NONE
      "&#TAB;"
      "&#RE;"
      "&#RS;"
      "&#RS;B"
      "&#RS;&#RE;"
      "&#RS;B&#RE;"
      "B&#RE;"
      "&#SPACE;"
      "BB"
      """
      "#"
      "%"
      "'"
      "("
      ")"
      "*"
      "+"
      ","
      "-"
      "--"
      ":"
      ";"
      "="
      "@"
      "["
      "]"
      "^"
      "_"
      "{"
      "|"
      "}"
      "~"
    NAMES SGMLREF
      -- available names for substitution, grouped by area of use.
         names marked with (*) are overloaded and must be substituted
         only once, but the translation needs to fit all uses.
      DOCTYPE
        ELEMENT
          ANY
          CDATA (*)
          RCDATA (*)
          PCDATA
          EMPTY (*)
          O
        ATTLIST
          ID
          IDREF
          IDREFS
          ENTITY (*)
          ENTITIES
          NOTATION (*)
          NAME
          NAMES
          NMTOKEN
          NMTOKENS
          NUTOKEN
          NUTOKENS
          NUMBER
          NUMBERS
          CDATA (*)
          FIXED
          CONREF
          CURRENT
          REQUIRED
          IMPLIED (*)
        ENTITY (*)
          DEFAULT
          STARTTAG
          ENDTAG
          MD
          MS
          PI
          CDATA (*)
          SDATA
          NDATA
          SUBDOC
          SYSTEM
          PUBLIC
        (marked section keywords)
          CDATA (*)
          RCDATA (*)
          IGNORE
          INCLUDE
          TEMP
        NOTATION (*)
        SHORTREF
        USEMAP
          EMPTY (*)
        LINKTYPE
            SIMPLE
            IMPLIED (*)
          LINK
            INITIAL
          IDLINK
          USELINK
            RESTORE
             EMPTY (*)
          POSTLINK
        (named character entity references)
          RE
          RS
          SPACE
      --
    QUANTITY SGMLREF
      NAMELEN     8
      LITLEN    240
      PILEN     240
      TAGLEN    960
      ATTSPLEN  960
      TAGLVL     24
      ENTLVL     16
      ATTCNT     40
      GRPCNT     32
      GRPGTCNT   96
      GRPLVL     16
      BSEQLEN   960
  FEATURES
    MINIMIZE
      DATATAG   NO      
      OMITTAG   YES
      RANK      NO     
      SHORTTAG  YES
    LINK
      SIMPLE    NO      -- YES requires number --
      IMPLICIT  NO
      EXPLICIT  NO      -- YES requires number --
    OTHER
      CONCUR    NO      
      SUBDOC    NO      -- YES requires number --
      FORMAL    YES     
  APPINFO NONE
>
[edit | edit source]
  • SGML in the English Wikipedia
  • The SGML Declaration, in SGML and HTML Explained, Martin Bryan (1997)
  • SGML Declarations - Wayne Wohler, IBM Corporation, 1994.
  • The SGML Handbook Charles F. Goldfarb, Oxford University Press, 1990. ISBN 0198537379, ISBN 9780198537373.