% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved. % Copying of this file is authorized only if (1) you are D. E. Knuth, or if % (2) you make absolutely no changes to your copy. (The WEB system provides % for alterations via an auxiliary file; the master file should stay intact.) % See Appendix H of the WEB manual for hints on how to install this program. % And see Appendix A of the TRIP manual for details about how to validate it. % TeX is a trademark of the American Mathematical Society. % METAFONT is a trademark of Addison-Wesley Publishing Company. % Version 0 was released in September 1982 after it passed a variety of tests. % Version 1 was released in November 1983 after thorough testing. % Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984). % Version 1.2 allowed `0' in response to an error, et alia (October 1984). % Version 1.3 made memory allocation more flexible and local (November 1984). % Version 1.4 fixed accents right after line breaks, et alia (April 1985). % Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985). % Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986). % Version 2.1 corrected anomalies in discretionary breaks (January 1987). % Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987). % Version 2.3 avoided incomplete page in premature termination (August 1987). % Version 2.4 fixed \noaligned rules in indented displays (August 1987). % Version 2.5 saved cur_order when expanding tokens (September 1987). % Version 2.6 added 10sp slop when shipping leaders (November 1987). % Version 2.7 improved rounding of negative-width characters (November 1987). % Version 2.8 fixed weird bug if no \patterns are used (December 1987). % Version 2.9 made \csname\endcsname's "relax" local (December 1987). % Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988). % Version 2.92 fixed \patterns, also file names with complex macros (May 1988). % Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988). % Version 2.94 kept open_log_file from calling fatal_error (November 1988). % Version 2.95 solved that problem a better way (December 1988). % Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989). % Version 2.97 corrected blunder in creating 2.95 (February 1989). % Version 2.98 omitted save_for_after at outer level (March 1989). % Version 2.99 caught $$\begingroup\halign..$$ (June 1989). % Version 2.991 caught .5\ifdim.6... (June 1989). % Version 2.992 introduced major changes for 8-bit extensions (September 1989). % Version 2.993 fixed a save_stack synchronization bug et alia (December 1989). % Version 3.0 fixed unusual displays; was more \output robust (March 1990). % Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990). % Version 3.14 fixed unprintable font names and corrected typos (March 1991). % Version 3.141 more of same; reconstituted ligatures better (March 1992). % Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993). % Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995). % A reward of $327.68 will be paid to the first finder of any remaining bug, % not counting changes introduced after August 1989. % Although considerable effort has been expended to make the TeX program % correct and reliable, no warranty is implied; the author disclaims any % obligation or liability for damages, including but not limited to % special, indirect, or consequential damages arising out of or in % connection with the use or performance of this software. This work has % been a ``labor of love'' and the author hopes that users enjoy it. % Here is TeX material that gets inserted after \input webmac \def\hang{\hangindent 3em\noindent\ignorespaces} \def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces} \font\ninerm=cmr9 \let\mc=\ninerm % medium caps for names like SAIL \def\PASCAL{Pascal} \def\ph{\hbox{Pascal-H}} \def\pct!{{\char`\%}} % percent sign in ordinary text \font\logo=logo10 % font used for the METAFONT logo \def\MF{{\logo META}\-{\logo FONT}} \def\<#1>{$\langle#1\rangle$} \def\section{\mathhexbox278} \def\(#1){} % this is used to make section names sort themselves better \def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> \outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section \def\rhead{PART #2:\uppercase{#3}} % define running headline \message{*\modno} % progress report \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces} \let\?=\relax % we want to be able to \write a \? \def\title{\TeX82} \def\topofcontents{\hsize 5.5in \vglue 0pt plus 1fil minus 1.5in \def\?##1]{\hbox to 1in{\hfil##1.\ }} } \def\botofcontents{\vskip 0pt plus 1fil minus 1.5in} \pageno=3 \def\glob{13} % this should be the section number of "" \def\gglob{20, 26} % this should be the next two sections of "" @* \[1] Introduction. This is \TeX, a document compiler intended to produce typesetting of high quality. The \PASCAL\ program that follows is the definition of \TeX82, a standard @:PASCAL}{\PASCAL@> @!@:TeX82}{\TeX82@> version of \TeX\ that is designed to be highly portable so that identical output will be obtainable on a great variety of computers. The main purpose of the following program is to explain the algorithms of \TeX\ as clearly as possible. As a result, the program will not necessarily be very efficient when a particular \PASCAL\ compiler has translated it into a particular machine language. However, the program has been written so that it can be tuned to run efficiently in a wide variety of operating environments by making comparatively few changes. Such flexibility is possible because the documentation that follows is written in the \.{WEB} language, which is at a higher level than \PASCAL; the preprocessing step that converts \.{WEB} to \PASCAL\ is able to introduce most of the necessary refinements. Semi-automatic translation to other languages is also feasible, because the program below does not make extensive use of features that are peculiar to \PASCAL. A large piece of software like \TeX\ has inherent complexity that cannot be reduced below a certain level of difficulty, although each individual part is fairly simple by itself. The \.{WEB} language is intended to make the algorithms as readable as possible, by reflecting the way the individual program pieces fit together and by providing the cross-references that connect different parts. Detailed comments about what is going on, and about why things were done in certain ways, have been liberally sprinkled throughout the program. These comments explain features of the implementation, but they rarely attempt to explain the \TeX\ language itself, since the reader is supposed to be familiar with {\sl The \TeX book}. @.WEB@> @:TeXbook}{\sl The \TeX book@> @ The present implementation has a long ancestry, beginning in the summer of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded a prototype @^Plass, Michael Frederick@> @^Liang, Franklin Mark@> @^Knuth, Donald Ervin@> based on some specifications that the author had made in May of that year. This original proto\TeX\ included macro definitions and elementary manipulations on boxes and glue, but it did not have line-breaking, page-breaking, mathematical formulas, alignment routines, error recovery, or the present semantic nest; furthermore, it used character lists instead of token lists, so that a control sequence like \.{\\halign} was represented by a list of seven characters. A complete version of \TeX\ was designed and coded by the author in late 1977 and early 1978; that program, like its prototype, was written in the {\mc SAIL} language, for which an excellent debugging system was available. Preliminary plans to convert the {\mc SAIL} code into a form somewhat like the present ``web'' were developed by Luis Trabb~Pardo and the author at the beginning of 1979, and a complete implementation was created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which @^Zabala Salelles, Ignacio Andres@> was written by the author during the latter part of 1981 and the early part of 1982, also incorporates ideas from the 1979 implementation of @^Guibas, Leonidas Ioannis@> @^Sedgewick, Robert@> @^Wyatt, Douglas Kirk@> \TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick, and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred refinements were introduced into \TeX82 based on the experiences gained with the original implementations, so that essentially every part of the system has been substantially improved. After the appearance of ``Version 0'' in September 1982, this program benefited greatly from the comments of many other people, notably David~R. Fuchs and Howard~W. Trickey. A final revision in September 1989 extended the input character set to eight-bit codes and introduced the ability to hyphenate words from different languages, based on some ideas of Michael~J. Ferguson. @^Fuchs, David Raymond@> @^Trickey, Howard Wellington@> @^Ferguson, Michael John@> No doubt there still is plenty of room for improvement, but the author is firmly committed to keeping \TeX82 ``frozen'' from now on; stability and reliability are to be its main virtues. On the other hand, the \.{WEB} description can be extended without changing the core of \TeX82 itself, and the program has been designed so that such extensions are not extremely difficult to make. The |banner| string defined here should be changed whenever \TeX\ undergoes any modifications, so that it will be clear which version of \TeX\ might be the guilty party when a problem arises. @^extensions to \TeX@> @^system dependencies@> If this program is changed, the resulting system should not be called `\TeX'; the official name `\TeX' by itself is reserved for software systems that are fully compatible with each other. A special test suite called the ``\.{TRIP} test'' is available for helping to determine whether a particular implementation deserves to be known as `\TeX' [cf.~Stanford Computer Science report CS1027, November 1984]. @d banner=='This is TeX, Version 3.14159' {printed when \TeX\ starts} @ Different \PASCAL s have slightly different conventions, and the present @!@:PASCAL H}{\ph@> program expresses \TeX\ in terms of the \PASCAL\ that was available to the author in 1982. Constructions that apply to this particular compiler, which we shall call \ph, should help the reader see how to make an appropriate interface for other systems if necessary. (\ph\ is Charles Hedrick's modification of a compiler @^Hedrick, Charles Locke@> for the DECsystem-10 that was originally developed at the University of Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976), 29--42. The \TeX\ program below is intended to be adaptable, without extensive changes, to most other versions of \PASCAL, so it does not fully use the admirable features of \ph. Indeed, a conscious effort has been made here to avoid using several idiosyncratic features of standard \PASCAL\ itself, so that most of the code can be translated mechanically into other high-level languages. For example, the `\&{with}' and `\\{new}' features are not used, nor are pointer types, set types, or enumerated scalar types; there are no `\&{var}' parameters, except in the case of files; there are no tag fields on variant records; there are no assignments |real:=integer|; no procedures are declared local to other procedures.) The portions of this program that involve system-dependent code, where changes might be necessary because of differences between \PASCAL\ compilers and/or differences between operating systems, can be identified by looking at the sections whose numbers are listed under `system dependencies' in the index. Furthermore, the index entries for `dirty \PASCAL' list all places where the restrictions of \PASCAL\ have not been followed perfectly, for one reason or another. @!@^system dependencies@> @!@^dirty \PASCAL@> @ The program begins with a normal \PASCAL\ program heading, whose components will be filled in later, using the conventions of \.{WEB}. @.WEB@> For example, the portion of the program called `\X\glob:Global variables\X' below will be replaced by a sequence of variable declarations that starts in $\section\glob$ of this documentation. In this way, we are able to define each individual global variable when we are prepared to understand what it means; we do not have to define all of the globals at once. Cross references in $\section\glob$, where it says ``See also sections \gglob, \dots,'' also make it possible to look at the set of all global variables, if desired. Similar remarks apply to the other portions of the program heading. Actually the heading shown here is not quite normal: The |program| line does not mention any |output| file, because \ph\ would ask the \TeX\ user to specify a file name if |output| were specified here. @^system dependencies@> @d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:} @f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'} @f type==true {but `|type|' will not be treated as a reserved word} @p @t\4@>@@/ program TEX; {all file names are defined dynamically} label @@/ const @@/ mtype @@/ var @@/ @# procedure initialize; {this procedure gets things started properly} var @@/ begin @@; end;@# @t\4@>@@/ @t\4@>@@/ @ The overall \TeX\ program begins with the heading just shown, after which comes a bunch of procedure declarations and function declarations. Finally we will get to the main program, which begins with the comment `|start_here|'. If you want to skip down to the main program now, you can look up `|start_here|' in the index. But the author suggests that the best way to understand this program is to follow pretty much the order of \TeX's components as they appear in the \.{WEB} description you are now reading, since the present ordering is intended to combine the advantages of the ``bottom up'' and ``top down'' approaches to the problem of understanding a somewhat complicated system. @ Three labels must be declared in the main program, so we give them symbolic names. @d start_of_TEX=1 {go here when \TeX's variables are initialized} @d end_of_TEX=9998 {go here to close files and terminate gracefully} @d final_end=9999 {this label marks the ending of the program} @= start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end; {key control points} @ Some of the code below is intended to be used only when diagnosing the strange behavior that sometimes occurs when \TeX\ is being installed or when system wizards are fooling around with \TeX\ without quite knowing what they are doing. Such code will not normally be compiled; it is delimited by the codewords `$|debug|\ldots|gubed|$', with apologies to people who wish to preserve the purity of English. Similarly, there is some conditional code delimited by `$|stat|\ldots|tats|$' that is intended for use when statistics are to be kept about \TeX's memory usage. The |stat| $\ldots$ |tats| code also implements diagnostic information for \.{\\tracingparagraphs} and \.{\\tracingpages}. @^debugging@> @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} @f debug==begin @f gubed==end @# @d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering usage statistics} @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering usage statistics} @f stat==begin @f tats==end @ This program has two important variations: (1) There is a long and slow version called \.{INITEX}, which does the extra calculations needed to @.INITEX@> initialize \TeX's internal tables; and (2)~there is a shorter and faster production version, which cuts the initialization to a bare minimum. Parts of the program that are needed in (1) but not in (2) are delimited by the codewords `$|init|\ldots|tini|$'. @d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version} @d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version} @f init==begin @f tini==end @= @@/ @!init @@;@+tini @ If the first character of a \PASCAL\ comment is a dollar sign, \ph\ treats the comment as a list of ``compiler directives'' that will affect the translation of this program into machine language. The directives shown below specify full checking and inclusion of the \PASCAL\ debugger when \TeX\ is being debugged, but they cause range checking and other redundant code to be eliminated when the production system is being generated. Arithmetic overflow will be detected in all cases. @^system dependencies@> @^Overflow in arithmetic@> @= @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} @ This \TeX\ implementation conforms to the rules of the {\sl Pascal User @:PASCAL}{\PASCAL@> @^system dependencies@> Manual} published by Jensen and Wirth in 1975, except where system-dependent @^Wirth, Niklaus@> @^Jensen, Kathleen@> code is necessary to make a useful system program, and except in another respect where such conformity would unnecessarily obscure the meaning and clutter up the code: We assume that |case| statements may include a default case that applies if no matching label is found. Thus, we shall use constructions like $$\vbox{\halign{\ignorespaces#\hfil\cr |case x of|\cr 1: $\langle\,$code for $x=1\,\rangle$;\cr 3: $\langle\,$code for $x=3\,\rangle$;\cr |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr |endcases|\cr}}$$ since most \PASCAL\ compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the \ph\ compiler allows `|others|:' as a default label, and other \PASCAL s allow syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| and |endcases| should be changed to agree with local conventions. Note that no semicolon appears before |endcases| in this program, so the definition of |endcases| should include a semicolon if the compiler wants one. (Of course, if no default mechanism is available, the |case| statements of \TeX\ will have to be laboriously extended by listing all remaining cases. People who are stuck with such \PASCAL s have, in fact, done this, successfully but not happily!) @d othercases == others: {default for cases not listed explicitly} @d endcases == @+end {follows the default case in an extended |case| statement} @f othercases == else @f endcases == end @ The following parameters can be changed at compile time to extend or reduce \TeX's capacity. They may have different values in \.{INITEX} and in production versions of \TeX. @.INITEX@> @^system dependencies@> @= @!mem_max=30000; {greatest index in \TeX's internal |mem| array; must be strictly less than |max_halfword|; must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|} @!mem_min=0; {smallest index in \TeX's internal |mem| array; must be |min_halfword| or more; must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|} @!buf_size=500; {maximum number of characters simultaneously present in current lines of open files and in control sequences between \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|} @!error_line=72; {width of context lines on terminal error messages} @!half_error_line=42; {width of first lines of contexts in terminal error messages; should be between 30 and |error_line-15|} @!max_print_line=79; {width of longest text lines output; should be at least 60} @!stack_size=200; {maximum number of simultaneous input sources} @!max_in_open=6; {maximum number of input files and error insertions that can be going on simultaneously} @!font_max=75; {maximum internal font number; must not exceed |max_quarterword| and must be at most |font_base+256|} @!font_mem_size=20000; {number of words of |font_info| for all fonts} @!param_size=60; {maximum number of simultaneous macro parameters} @!nest_size=40; {maximum number of semantic levels simultaneously active} @!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|} @!string_vacancies=8000; {the minimum number of characters that should be available for the user's control sequences and font names, after \TeX's own error messages are stored} @!pool_size=32000; {maximum number of characters in strings, including all error messages and help texts, and the names of all fonts and control sequences; must exceed |string_vacancies| by the total length of \TeX's own strings, which is currently about 23000} @!save_size=600; {space for saving values outside of current group; must be at most |max_halfword|} @!trie_size=8000; {space for hyphenation patterns; should be larger for \.{INITEX} than it is in production versions of \TeX} @!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns} @!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8} @!file_name_size=40; {file names shouldn't be longer than this} @!pool_name='TeXformats:TEX.POOL '; {string of length |file_name_size|; tells where the string pool appears} @.TeXformats@> @ Like the preceding parameters, the following quantities can be changed at compile time to extend or reduce \TeX's capacity. But if they are changed, it is necessary to rerun the initialization program \.{INITEX} @.INITEX@> to generate new tables for the production \TeX\ program. One can't simply make helter-skelter changes to the following constants, since certain rather complex initialization numbers are computed from them. They are defined here using \.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to emphasize this distinction. @d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX}; must not be less than |mem_min|} @d mem_top==30000 {largest index in the |mem| array dumped by \.{INITEX}; must be substantially larger than |mem_bot| and not greater than |mem_max|} @d font_base=0 {smallest internal font number; must not be less than |min_quarterword|} @d hash_size=2100 {maximum number of control sequences; it should be at most about |(mem_max-mem_min)/10|} @d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|} @d hyph_size=307 {another prime; the number of \.{\\hyphenation} exceptions} @^system dependencies@> @ In case somebody has inadvertently made bad settings of the ``constants,'' \TeX\ checks them using a global variable called |bad|. This is the first of many sections of \TeX\ where global variables are defined. @= @!bad:integer; {is some ``constant'' wrong?} @ Later on we will say `\ignorespaces|if mem_max>=max_halfword then bad:=14|', or something similar. (We can't do that until |max_halfword| has been defined.) @= bad:=0; if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1; if max_print_line<60 then bad:=2; if dvi_buf_size mod 8<>0 then bad:=3; if mem_bot+1100>mem_top then bad:=4; if hash_prime>hash_size then bad:=5; if max_in_open>=128 then bad:=6; if mem_top<256+11 then bad:=7; {we will want |null_list>255|} @ Labels are given symbolic names by the following definitions, so that occasional |goto| statements will be meaningful. We insert the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a procedure in which we have used the `|return|' statement defined below; the label `|restart|' is occasionally used at the very beginning of a procedure; and the label `|reswitch|' is occasionally used just prior to a |case| statement in which some cases change the conditions and we wish to branch to the newly applicable case. Loops that are set up with the |loop| construction defined below are commonly exited by going to `|done|' or to `|found|' or to `|not_found|', and they are sometimes repeated by going to `|continue|'. If two or more parts of a subroutine start differently but end up the same, the shared code may be gathered together at `|common_ending|'. Incidentally, this program never declares a label that isn't actually used, because some fussy \PASCAL\ compilers will complain about redundant labels. @d exit=10 {go here to leave a procedure} @d restart=20 {go here to start a procedure again} @d reswitch=21 {go here to start a case statement again} @d continue=22 {go here to resume a loop} @d done=30 {go here to exit a loop} @d done1=31 {like |done|, when there is more than one loop} @d done2=32 {for exiting the second loop in a long block} @d done3=33 {for exiting the third loop in a very long block} @d done4=34 {for exiting the fourth loop in an extremely long block} @d done5=35 {for exiting the fifth loop in an immense block} @d done6=36 {for exiting the sixth loop in a block} @d found=40 {go here when you've found it} @d found1=41 {like |found|, when there's more than one per routine} @d found2=42 {like |found|, when there's more than two per routine} @d not_found=45 {go here when you've found nothing} @d common_ending=50 {go here when you want to merge with another branch} @ Here are some macros for common programming idioms. @d incr(#) == #:=#+1 {increase a variable by unity} @d decr(#) == #:=#-1 {decrease a variable by unity} @d negate(#) == #:=-# {change the sign of a variable} @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} @f loop == xclause {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'} @d do_nothing == {empty statement} @d return == goto exit {terminate a procedure call} @f return == nil @d empty=0 {symbolic name for a null constant} @* \[2] The character set. In order to make \TeX\ readily portable to a wide variety of computers, all of its input text is converted to an internal eight-bit code that includes standard ASCII, the ``American Standard Code for Information Interchange.'' This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user's external representation just before they are output to a text file. Such an internal code is relevant to users of \TeX\ primarily because it governs the positions of characters in the fonts. For example, the character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets this letter it specifies character number 65 in the current font. If that font actually has `\.A' in a different position, \TeX\ doesn't know what the real position is; the program that does the actual printing from \TeX's device-independent files is responsible for converting from ASCII to a particular font encoding. @^ASCII code@> \TeX's internal code also defines the value of constants that begin with a reverse apostrophe; and it provides an index to the \.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode} tables. @ Characters of text that have been converted to \TeX's internal form are said to be of type |ASCII_code|, which is a subrange of the integers. @= @!ASCII_code=0..255; {eight-bit numbers} @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lowercase letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for typesetting; so the present specification of \TeX\ has been written under the assumption that the \PASCAL\ compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes @'40 through @'176; all of these characters are now available on most computer terminals. Since we are dealing with more characters than were present in the first \PASCAL\ compilers, we have to decide what to call the associated data type. Some \PASCAL s use the original name |char| for the characters in text files, even though there now are more than 64 such characters, while other \PASCAL s consider |char| to be a 64-element subrange of a larger data type that has some other name. In order to accommodate this difference, we shall use the name |text_char| to stand for the data type of the characters that are converted to and from |ASCII_code| when they are input and output. We shall also assume that |text_char| consists of the elements |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The following definitions should be adjusted if necessary. @^system dependencies@> @d text_char == char {the data type of characters in text files} @d first_text_char=0 {ordinal number of the smallest element of |text_char|} @d last_text_char=255 {ordinal number of the largest element of |text_char|} @= @!i:integer; @ The \TeX\ processor converts between ASCII code and the user's external character set by means of arrays |xord| and |xchr| that are analogous to \PASCAL's |ord| and |chr| functions. @= @!xord: array [text_char] of ASCII_code; {specifies conversion of input characters} @!xchr: array [ASCII_code] of text_char; {specifies conversion of output characters} @ Since we are assuming that our \PASCAL\ system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize the standard part of the |xchr| array properly, without needing any system-dependent changes. On the other hand, it is possible to implement \TeX\ with less complete character sets, and in such cases it will be necessary to change something here. @^system dependencies@> @= xchr[@'40]:=' '; xchr[@'41]:='!'; xchr[@'42]:='"'; xchr[@'43]:='#'; xchr[@'44]:='$'; xchr[@'45]:='%'; xchr[@'46]:='&'; xchr[@'47]:='''';@/ xchr[@'50]:='('; xchr[@'51]:=')'; xchr[@'52]:='*'; xchr[@'53]:='+'; xchr[@'54]:=','; xchr[@'55]:='-'; xchr[@'56]:='.'; xchr[@'57]:='/';@/ xchr[@'60]:='0'; xchr[@'61]:='1'; xchr[@'62]:='2'; xchr[@'63]:='3'; xchr[@'64]:='4'; xchr[@'65]:='5'; xchr[@'66]:='6'; xchr[@'67]:='7';@/ xchr[@'70]:='8'; xchr[@'71]:='9'; xchr[@'72]:=':'; xchr[@'73]:=';'; xchr[@'74]:='<'; xchr[@'75]:='='; xchr[@'76]:='>'; xchr[@'77]:='?';@/ xchr[@'100]:='@@'; xchr[@'101]:='A'; xchr[@'102]:='B'; xchr[@'103]:='C'; xchr[@'104]:='D'; xchr[@'105]:='E'; xchr[@'106]:='F'; xchr[@'107]:='G';@/ xchr[@'110]:='H'; xchr[@'111]:='I'; xchr[@'112]:='J'; xchr[@'113]:='K'; xchr[@'114]:='L'; xchr[@'115]:='M'; xchr[@'116]:='N'; xchr[@'117]:='O';@/ xchr[@'120]:='P'; xchr[@'121]:='Q'; xchr[@'122]:='R'; xchr[@'123]:='S'; xchr[@'124]:='T'; xchr[@'125]:='U'; xchr[@'126]:='V'; xchr[@'127]:='W';@/ xchr[@'130]:='X'; xchr[@'131]:='Y'; xchr[@'132]:='Z'; xchr[@'133]:='['; xchr[@'134]:='\'; xchr[@'135]:=']'; xchr[@'136]:='^'; xchr[@'137]:='_';@/ xchr[@'140]:='`'; xchr[@'141]:='a'; xchr[@'142]:='b'; xchr[@'143]:='c'; xchr[@'144]:='d'; xchr[@'145]:='e'; xchr[@'146]:='f'; xchr[@'147]:='g';@/ xchr[@'150]:='h'; xchr[@'151]:='i'; xchr[@'152]:='j'; xchr[@'153]:='k'; xchr[@'154]:='l'; xchr[@'155]:='m'; xchr[@'156]:='n'; xchr[@'157]:='o';@/ xchr[@'160]:='p'; xchr[@'161]:='q'; xchr[@'162]:='r'; xchr[@'163]:='s'; xchr[@'164]:='t'; xchr[@'165]:='u'; xchr[@'166]:='v'; xchr[@'167]:='w';@/ xchr[@'170]:='x'; xchr[@'171]:='y'; xchr[@'172]:='z'; xchr[@'173]:='{'; xchr[@'174]:='|'; xchr[@'175]:='}'; xchr[@'176]:='~';@/ @ Some of the ASCII codes without visible characters have been given symbolic names in this program because they are used with a special meaning. @d null_code=@'0 {ASCII code that might disappear} @d carriage_return=@'15 {ASCII code used at end of line} @d invalid_code=@'177 {ASCII code that many systems prohibit in text files} @ The ASCII code is ``standard'' only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/} gives a complete specification of the intended correspondence between characters and \TeX's internal representation. @:TeXbook}{\sl The \TeX book@> If \TeX\ is being used on a garden-variety \PASCAL\ for which only standard ASCII codes will appear in the input and output files, it doesn't really matter what codes are specified in |xchr[0..@'37]|, but the safest policy is to blank everything out by using the code shown below. However, other settings of |xchr| will make \TeX\ more friendly on computers that have an extended character set, so that users can type things like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can assign codes arbitrarily, giving an |xchr| equivalent to whatever characters the users of \TeX\ are allowed to have in their input files. It is best to make the codes correspond to the intended interpretations as shown in Appendix~C whenever possible; but this is not necessary. For example, in countries with an alphabet of more than 26 letters, it is usually best to map the additional letters into codes less than~@'40. To get the most ``permissive'' character set, change |' '| on the right of these assignment statements to |chr(i)|. @^character set dependencies@> @^system dependencies@> @= for i:=0 to @'37 do xchr[i]:=' '; for i:=@'177 to @'377 do xchr[i]:=' '; @ The following system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| where |i= for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code; for i:=@'200 to @'377 do xord[xchr[i]]:=i; for i:=0 to @'176 do xord[xchr[i]]:=i; @* \[3] Input and output. The bane of portability is the fact that different operating systems treat input and output quite differently, perhaps because computer scientists have not given sufficient attention to this problem. People have felt somehow that input and output are not part of ``real'' programming. Well, it is true that some kinds of programming are more fun than others. With existing input/output conventions being so diverse and so messy, the only sources of joy in such parts of the code are the rare occasions when one can find a way to make the program a little less bad than it might have been. We have two choices, either to attack I/O now and get it over with, or to postpone I/O until near the end. Neither prospect is very attractive, so let's get it over with. The basic operations we need to do are (1)~inputting and outputting of text, to or from a file or the user's terminal; (2)~inputting and outputting of eight-bit bytes, to or from a file; (3)~instructing the operating system to initiate (``open'') or to terminate (``close'') input or output from a specified file; (4)~testing whether the end of an input file has been reached. \TeX\ needs to deal with two kinds of files. We shall use the term |alpha_file| for a file that contains textual data, and the term |byte_file| for a file that contains eight-bit binary information. These two types turn out to be the same on many computers, but sometimes there is a significant distinction, so we shall be careful to distinguish between them. Standard protocols for transferring such files from computer to computer, via high-speed networks, are now becoming available to more and more communities of users. The program actually makes use also of a third kind of file, called a |word_file|, when dumping and reloading base information for its own initialization. We shall define a word file later; but it will be possible for us to specify simple operations on word files before they are defined. @= @!eight_bits=0..255; {unsigned one-byte quantity} @!alpha_file=packed file of text_char; {files that contain textual data} @!byte_file=packed file of eight_bits; {files that contain binary data} @ Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in \PASCAL, i.e., the routines called |get|, |put|, |eof|, and so on. But standard \PASCAL\ does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement \TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| is crucial for our purposes. We shall assume that |name_of_file| is a variable of an appropriate type such that the \PASCAL\ run-time system being used to implement \TeX\ can open a file whose external name is specified by |name_of_file|. @^system dependencies@> @= @!name_of_file:packed array[1..file_name_size] of char;@;@/ {on some systems this may be a \&{record} variable} @!name_length:0..file_name_size;@/{this many characters are actually relevant in |name_of_file| (the rest are blank)} @ The \ph\ compiler with which the present version of \TeX\ was prepared has extended the rules of \PASCAL\ in a very convenient way. To open file~|f|, we can write $$\vbox{\halign{#\hfil\qquad&#\hfil\cr |reset(f,@t\\{name}@>,'/O')|&for input;\cr |rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$ The `\\{name}' parameter, which is of type `{\bf packed array $[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in \\{name} are ignored. The `\.{/O}' parameter tells the operating system not to issue its own error messages if something goes wrong. If a file of the specified name cannot be found, or if such a file cannot be opened for some other reason (e.g., someone may already be trying to write the same file), we will have |@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|. This allows \TeX\ to undertake appropriate corrective action. @:PASCAL H}{\ph@> @^system dependencies@> \TeX's file-opening procedures return |false| if no file identified by |name_of_file| could be opened. @d reset_OK(#)==erstat(#)=0 @d rewrite_OK(#)==erstat(#)=0 @p function a_open_in(var f:alpha_file):boolean; {open a text file for input} begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f); end; @# function a_open_out(var f:alpha_file):boolean; {open a text file for output} begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f); end; @# function b_open_in(var f:byte_file):boolean; {open a binary file for input} begin reset(f,name_of_file,'/O'); b_open_in:=reset_OK(f); end; @# function b_open_out(var f:byte_file):boolean; {open a binary file for output} begin rewrite(f,name_of_file,'/O'); b_open_out:=rewrite_OK(f); end; @# function w_open_in(var f:word_file):boolean; {open a word file for input} begin reset(f,name_of_file,'/O'); w_open_in:=reset_OK(f); end; @# function w_open_out(var f:word_file):boolean; {open a word file for output} begin rewrite(f,name_of_file,'/O'); w_open_out:=rewrite_OK(f); end; @ Files can be closed with the \ph\ routine `|close(f)|', which @^system dependencies@> should be used when all input or output with respect to |f| has been completed. This makes |f| available to be opened again, if desired; and if |f| was used for output, the |close| operation makes the corresponding external file appear on the user's area, ready to be read. These procedures should not generate error messages if a file is being closed before it has been successfully opened. @p procedure a_close(var f:alpha_file); {close a text file} begin close(f); end; @# procedure b_close(var f:byte_file); {close a binary file} begin close(f); end; @# procedure w_close(var f:word_file); {close a word file} begin close(f); end; @ Binary input and output are done with \PASCAL's ordinary |get| and |put| procedures, so we don't have to make any other special arrangements for binary~I/O. Text output is also easy to do with standard \PASCAL\ routines. The treatment of text input is more difficult, however, because of the necessary translation to |ASCII_code| values. \TeX's conventions should be efficient, and they should blend nicely with the user's operating environment. @ Input from text files is read one line at a time, using a routine called |input_ln|. This function is defined in terms of global variables called |buffer|, |first|, and |last| that will be described in detail later; for now, it suffices for us to know that |buffer| is an array of |ASCII_code| values, and that |first| and |last| are indices into this array representing the beginning and ending of a line of text. @= @!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read} @!first:0..buf_size; {the first unused position in |buffer|} @!last:0..buf_size; {end of the line just input to |buffer|} @!max_buf_stack:0..buf_size; {largest index used in |buffer|} @ The |input_ln| function brings the next line of input from the specified file into available positions of the buffer array and returns the value |true|, unless the file has already been entirely read, in which case it returns |false| and sets |last:=first|. In general, the |ASCII_code| numbers that represent the next line of the file are input into |buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the global variable |last| is set equal to |first| plus the length of the line. Trailing blanks are removed from the line; thus, either |last=first| (in which case the line was entirely blank) or |buffer[last-1]<>" "|. An overflow error is given, however, if the normal actions of |input_ln| would make |last>=buf_size|; this is done so that other parts of \TeX\ can safely look at the contents of |buffer[last+1]| without overstepping the bounds of the |buffer| array. Upon entry to |input_ln|, the condition |first @p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean; {inputs the next line or returns |false|} var last_nonblank:0..buf_size; {|last| with trailing blanks removed} begin if bypass_eoln then if not eof(f) then get(f); {input the first character of the line into |f^|} last:=first; {cf.\ Matthew 19\thinspace:\thinspace30} if eof(f) then input_ln:=false else begin last_nonblank:=first; while not eoln(f) do begin if last>=max_buf_stack then begin max_buf_stack:=last+1; if max_buf_stack=buf_size then @; end; buffer[last]:=xord[f^]; get(f); incr(last); if buffer[last-1]<>" " then last_nonblank:=last; end; last:=last_nonblank; input_ln:=true; end; end; @ The user's terminal acts essentially like other files of text, except that it is used both for input and for output. When the terminal is considered an input file, the file variable is called |term_in|, and when it is considered an output file the file variable is |term_out|. @^system dependencies@> @= @!term_in:alpha_file; {the terminal as an input file} @!term_out:alpha_file; {the terminal as an output file} @ Here is how to open the terminal files in \ph. The `\.{/I}' switch suppresses the first |get|. @^system dependencies@> @d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input} @d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output} @ Sometimes it is necessary to synchronize the input/output mixture that happens on the user's terminal, and three system-dependent procedures are used for this purpose. The first of these, |update_terminal|, is called when we want to make sure that everything we have output to the terminal so far has actually left the computer's internal buffers and been sent. The second, |clear_terminal|, is called when we wish to cancel any input that the user may have typed ahead (since we are about to issue an unexpected error message). The third, |wake_up_terminal|, is supposed to revive the terminal if the user has disabled it by some instruction to the operating system. The following macros show how these operations can be specified in \ph: @^system dependencies@> @d update_terminal == break(term_out) {empty the terminal output buffer} @d clear_terminal == break_in(term_in,true) {clear the terminal input buffer} @d wake_up_terminal == do_nothing {cancel the user's cancellation of output} @ We need a special routine to read the first line of \TeX\ input from the user's terminal. This line is different because it is read before we have opened the transcript file; there is sort of a ``chicken and egg'' problem here. If the user types `\.{\\input paper}' on the first line, or if some macro invoked by that line does such an \.{\\input}, the transcript file will be named `\.{paper.log}'; but if no \.{\\input} commands are performed during the first line of terminal input, the transcript file will acquire its default name `\.{texput.log}'. (The transcript file will not contain error messages generated by the first line before the first \.{\\input} command.) @.texput@> The first line is even more special if we are lucky enough to have an operating system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object program. It's nice to let the user start running a \TeX\ job by typing a command line like `\.{tex paper}'; in such a case, \TeX\ will operate as if the first line of input were `\.{paper}', i.e., the first line will consist of the remainder of the command line, after the part that invoked \TeX. The first line is special also because it may be read before \TeX\ has input a format file. In such cases, normal error messages cannot yet be given. The following code uses concepts that will be explained later. (If the \PASCAL\ compiler does not support non-local |@!goto|\unskip, the @^system dependencies@> statement `|goto final_end|' should be replaced by something that quietly terminates the program.) @= if format_ident=0 then begin write_ln(term_out,'Buffer size exceeded!'); goto final_end; @.Buffer size exceeded@> end else begin cur_input.loc_field:=first; cur_input.limit_field:=last-1; overflow("buffer size",buf_size); @:TeX capacity exceeded buffer size}{\quad buffer size@> end @ Different systems have different ways to get started. But regardless of what conventions are adopted, the routine that initializes the terminal should satisfy the following specifications: \yskip\textindent{1)}It should open file |term_in| for input from the terminal. (The file |term_out| will already be open for output to the terminal.) \textindent{2)}If the user has given a command line, this line should be considered the first line of terminal input. Otherwise the user should be prompted with `\.{**}', and the first line of input should be whatever is typed in response. \textindent{3)}The first line of input, which might or might not be a command line, should appear in locations |first| to |last-1| of the |buffer| array. \textindent{4)}The global variable |loc| should be set so that the character to be read next by \TeX\ is in |buffer[loc]|. This character should not be blank, and we should have |loc @p function init_terminal:boolean; {gets the terminal input started} label exit; begin t_open_in; loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal; @.**@> if not input_ln(term_in,true) then {this shouldn't happen} begin write_ln(term_out); write(term_out,'! End of file on the terminal... why?'); @.End of file on the terminal@> init_terminal:=false; return; end; loc:=first; while (loc which converts single-character strings into the ASCII code number of the single character involved, while it converts other strings into integers and builds a string pool file. Thus, when the string constant \.{"."} appears in the program below, \.{WEB} converts it into the integer 46, which is the ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"} into some integer greater than~255. String number 46 will presumably be the single character `\..'; but some ASCII codes have no standard visible representation, and \TeX\ sometimes needs to be able to print an arbitrary ASCII character, so the first 256 strings are used to specify exactly what should be printed for each of the 256 possibilities. Elements of the |str_pool| array must be ASCII codes that can actually be printed; i.e., they must have an |xchr| equivalent in the local character set. (This restriction applies only to preloaded strings, not to those generated dynamically by the user.) Some \PASCAL\ compilers won't pack integers into a single byte unless the integers lie in the range |-128..127|. To accommodate such systems we access the string pool only via macros that can easily be redefined. @d si(#) == # {convert from |ASCII_code| to |packed_ASCII_code|} @d so(#) == # {convert from |packed_ASCII_code| to |ASCII_code|} @= @!pool_pointer = 0..pool_size; {for variables that point into |str_pool|} @!str_number = 0..max_strings; {for variables that point into |str_start|} @!packed_ASCII_code = 0..255; {elements of |str_pool| array} @ @= @!str_pool:packed array[pool_pointer] of packed_ASCII_code; {the characters} @!str_start : array[str_number] of pool_pointer; {the starting pointers} @!pool_ptr : pool_pointer; {first unused position in |str_pool|} @!str_ptr : str_number; {number of the current string being created} @!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|} @!init_str_ptr : str_number; {the starting value of |str_ptr|} @ Several of the elementary string operations are performed using \.{WEB} macros instead of \PASCAL\ procedures, because many of the operations are done quite frequently and we want to avoid the overhead of procedure calls. For example, here is a simple macro that computes the length of a string. @.WEB@> @d length(#)==(str_start[#+1]-str_start[#]) {the number of characters in string number \#} @ The length of the current string is called |cur_length|: @d cur_length == (pool_ptr - str_start[str_ptr]) @ Strings are created by appending character codes to |str_pool|. The |append_char| macro, defined here, does not check to see if the value of |pool_ptr| has gotten too high; this test is supposed to be made before |append_char| is used. There is also a |flush_char| macro, which erases the last character appended. To test if there is room to append |l| more characters to |str_pool|, we shall write |str_room(l)|, which aborts \TeX\ and gives an apologetic error message if there isn't enough room. @d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|} begin str_pool[pool_ptr]:=si(#); incr(pool_ptr); end @d flush_char == decr(pool_ptr) {forget the last character in the pool} @d str_room(#) == {make sure that the pool hasn't overflowed} begin if pool_ptr+# > pool_size then overflow("pool size",pool_size-init_pool_ptr); @:TeX capacity exceeded pool size}{\quad pool size@> end @ Once a sequence of characters has been appended to |str_pool|, it officially becomes a string when the function |make_string| is called. This function returns the identification number of the new string as its value. @p function make_string : str_number; {current string enters the pool} begin if str_ptr=max_strings then overflow("number of strings",max_strings-init_str_ptr); @:TeX capacity exceeded number of strings}{\quad number of strings@> incr(str_ptr); str_start[str_ptr]:=pool_ptr; make_string:=str_ptr-1; end; @ To destroy the most recently made string, we say |flush_string|. @d flush_string==begin decr(str_ptr); pool_ptr:=str_start[str_ptr]; end @ The following subroutine compares string |s| with another string of the same length that appears in |buffer| starting at position |k|; the result is |true| if and only if the strings are equal. Empirical tests indicate that |str_eq_buf| is used in such a way that it tends to return |true| about 80 percent of the time. @p function str_eq_buf(@!s:str_number;@!k:integer):boolean; {test equality of strings} label not_found; {loop exit} var j: pool_pointer; {running index} @!result: boolean; {result of comparison} begin j:=str_start[s]; while jbuffer[k] then begin result:=false; goto not_found; end; incr(j); incr(k); end; result:=true; not_found: str_eq_buf:=result; end; @ Here is a similar routine, but it compares two strings in the string pool, and it does not assume that they have the same length. @p function str_eq_str(@!s,@!t:str_number):boolean; {test equality of strings} label not_found; {loop exit} var j,@!k: pool_pointer; {running indices} @!result: boolean; {result of comparison} begin result:=false; if length(s)<>length(t) then goto not_found; j:=str_start[s]; k:=str_start[t]; while jstr_pool[k] then goto not_found; incr(j); incr(k); end; result:=true; not_found: str_eq_str:=result; end; @ The initial values of |str_pool|, |str_start|, |pool_ptr|, and |str_ptr| are computed by the \.{INITEX} program, based in part on the information that \.{WEB} has output while processing \TeX. @.INITEX@> @^string pool@> @p @!init function get_strings_started:boolean; {initializes the string pool, but returns |false| if something goes wrong} label done,exit; var k,@!l:0..255; {small indices or counters} @!m,@!n:text_char; {characters input from |pool_file|} @!g:str_number; {garbage} @!a:integer; {accumulator for check sum} @!c:boolean; {check sum has been checked} begin pool_ptr:=0; str_ptr:=0; str_start[0]:=0; @; @; exit:end; tini @ @d app_lc_hex(#)==l:=#; if l<10 then append_char(l+"0")@+else append_char(l-10+"a") @= for k:=0 to 255 do begin if (@) then begin append_char("^"); append_char("^"); if k<@'100 then append_char(k+@'100) else if k<@'200 then append_char(k-@'100) else begin app_lc_hex(k div 16); app_lc_hex(k mod 16); end; end else append_char(k); g:=make_string; end @ The first 128 strings will contain 95 standard ASCII characters, and the other 33 characters will be printed in three-symbol form like `\.{\^\^A}' unless a system-dependent change is made here. Installations that have an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|, would like string @'32 to be the single character @'32 instead of the three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand, even people with an extended character set will want to represent string @'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to produce visible strings instead of tabs or line-feeds or carriage-returns or bell-rings or characters that are treated anomalously in text files. Unprintable characters of codes 128--255 are, similarly, rendered \.{\^\^80}--\.{\^\^ff}. The boolean expression defined here should be |true| unless \TeX\ internal code number~|k| corresponds to a non-troublesome visible symbol in the local character set. An appropriate formula for the extended character set recommended in {\sl The \TeX book\/} would, for example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'. If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or |k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46, @'60..@'71, @'141..@'146, @'160..@'171]| must be printable. Thus, at least 80 printable characters are needed. @:TeXbook}{\sl The \TeX book@> @^character set dependencies@> @^system dependencies@> @= (k<" ")or(k>"~") @ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB} description that you are now reading, it outputs the \PASCAL\ program \.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX} @.WEB@>@.INITEX@> program reads the latter file, where each string appears as a two-digit decimal length followed by the string itself, and the information is recorded in \TeX's string memory. @= @!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}} tini @ @d bad_pool(#)==begin wake_up_terminal; write_ln(term_out,#); a_close(pool_file); get_strings_started:=false; return; end @= name_of_file:=pool_name; {we needn't set |name_length|} if a_open_in(pool_file) then begin c:=false; repeat @; until c; a_close(pool_file); get_strings_started:=true; end else bad_pool('! I can''t read TEX.POOL.') @.I can't read TEX.POOL@> @ @= begin if eof(pool_file) then bad_pool('! TEX.POOL has no check sum.'); @.TEX.POOL has no check sum@> read(pool_file,m,n); {read two digits of string length} if m='*' then @ else begin if (xord[m]<"0")or(xord[m]>"9")or@| (xord[n]<"0")or(xord[n]>"9") then bad_pool('! TEX.POOL line doesn''t begin with two digits.'); @.TEX.POOL line doesn't...@> l:=xord[m]*10+xord[n]-"0"*11; {compute the length} if pool_ptr+l+string_vacancies>pool_size then bad_pool('! You have to increase POOLSIZE.'); @.You have to increase POOLSIZE@> for k:=1 to l do begin if eoln(pool_file) then m:=' '@+else read(pool_file,m); append_char(xord[m]); end; read_ln(pool_file); g:=make_string; end; end @ The \.{WEB} operation \.{@@\$} denotes the value that should be at the end of this \.{TEX.POOL} file; any other value means that the wrong pool file has been loaded. @^check sum@> @= begin a:=0; k:=1; loop@+ begin if (xord[n]<"0")or(xord[n]>"9") then bad_pool('! TEX.POOL check sum doesn''t have nine digits.'); @.TEX.POOL check sum...@> a:=10*a+xord[n]-"0"; if k=9 then goto done; incr(k); read(pool_file,n); end; done: if a<>@$ then bad_pool('! TEX.POOL doesn''t match; TANGLE me again.'); @.TEX.POOL doesn't match@> c:=true; end @* \[5] On-line and off-line printing. Messages that are sent to a user's terminal and to the transcript-log file are produced by several `|print|' procedures. These procedures will direct their output to a variety of places, based on the setting of the global variable |selector|, which has the following possible values: \yskip \hang |term_and_log|, the normal setting, prints on the terminal and on the transcript file. \hang |log_only|, prints only on the transcript file. \hang |term_only|, prints only on the terminal. \hang |no_print|, doesn't print at all. This is used only in rare cases before the transcript file is open. \hang |pseudo|, puts output into a cyclic buffer that is used by the |show_context| routine; when we get to that routine we shall discuss the reasoning behind this curious mode. \hang |new_string|, appends the output to the current string in the string pool. \hang 0 to 15, prints on one of the sixteen files for \.{\\write} output. \yskip \noindent The symbolic names `|term_and_log|', etc., have been assigned numeric codes that satisfy the convenient relations |no_print+1=term_only|, |no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|. Three additional global variables, |tally| and |term_offset| and |file_offset|, record the number of characters that have been printed since they were most recently cleared to zero. We use |tally| to record the length of (possibly very long) stretches of printing; |term_offset| and |file_offset|, on the other hand, keep track of how many characters have appeared so far on the current line that has been output to the terminal or to the transcript file, respectively. @d no_print=16 {|selector| setting that makes data disappear} @d term_only=17 {printing is destined for the terminal only} @d log_only=18 {printing is destined for the transcript file only} @d term_and_log=19 {normal |selector| setting} @d pseudo=20 {special |selector| setting for |show_context|} @d new_string=21 {printing is deflected to the string pool} @d max_selector=21 {highest selector setting} @= @!log_file : alpha_file; {transcript of \TeX\ session} @!selector : 0..max_selector; {where to print a message} @!dig : array[0..22] of 0..15; {digits in a number being output} @!tally : integer; {the number of characters recently printed} @!term_offset : 0..max_print_line; {the number of characters on the current terminal line} @!file_offset : 0..max_print_line; {the number of characters on the current file line} @!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for pseudoprinting} @!trick_count: integer; {threshold for pseudoprinting, explained later} @!first_count: integer; {another variable for pseudoprinting} @ @= selector:=term_only; tally:=0; term_offset:=0; file_offset:=0; @ Macro abbreviations for output to the terminal and to the log file are defined here for convenience. Some systems need special conventions for terminal output, and it is possible to adhere to those conventions by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section. @^system dependencies@> @d wterm(#)==write(term_out,#) @d wterm_ln(#)==write_ln(term_out,#) @d wterm_cr==write_ln(term_out) @d wlog(#)==write(log_file,#) @d wlog_ln(#)==write_ln(log_file,#) @d wlog_cr==write_ln(log_file) @ To end a line of text output, we call |print_ln|. @= procedure print_ln; {prints an end-of-line} begin case selector of term_and_log: begin wterm_cr; wlog_cr; term_offset:=0; file_offset:=0; end; log_only: begin wlog_cr; file_offset:=0; end; term_only: begin wterm_cr; term_offset:=0; end; no_print,pseudo,new_string: do_nothing; othercases write_ln(write_file[selector]) endcases;@/ end; {|tally| is not affected} @ The |print_char| procedure sends one character to the desired destination, using the |xchr| array to map it into an external character compatible with |input_ln|. All printing comes through |print_ln| or |print_char|. @= procedure print_char(@!s:ASCII_code); {prints a single character} label exit; begin if @ then if selector @= procedure print(@!s:integer); {prints string |s|} label exit; var j:pool_pointer; {current character code position} @!nl:integer; {new-line character to restore} begin if s>=str_ptr then s:="???" {this can't happen} @.???@> else if s<256 then if s<0 then s:="???" {can't happen} else begin if selector>pseudo then begin print_char(s); return; {internal strings are not expanded} end; if (@) then if selector= procedure slow_print(@!s:integer); {prints string |s|} var j:pool_pointer; {current character code position} begin if (s>=str_ptr) or (s<256) then print(s) else begin j:=str_start[s]; while j= wterm(banner); if format_ident=0 then wterm_ln(' (no format preloaded)') else begin slow_print(format_ident); print_ln; end; update_terminal; @ The procedure |print_nl| is like |print|, but it makes sure that the string appears at the beginning of a new line. @= procedure print_nl(@!s:str_number); {prints string |s| at beginning of line} begin if ((term_offset>0)and(odd(selector)))or@| ((file_offset>0)and(selector>=log_only)) then print_ln; print(s); end; @ The procedure |print_esc| prints a string that is preceded by the user's escape character (which is usually a backslash). @= procedure print_esc(@!s:str_number); {prints escape character, then |s|} var c:integer; {the escape character code} begin @; if c>=0 then if c<256 then print(c); slow_print(s); end; @ An array of digits in the range |0..15| is printed by |print_the_digs|. @= procedure print_the_digs(@!k:eight_bits); {prints |dig[k-1]|$\,\ldots\,$|dig[0]|} begin while k>0 do begin decr(k); if dig[k]<10 then print_char("0"+dig[k]) else print_char("A"-10+dig[k]); end; end; @ The following procedure, which prints out the decimal representation of a given integer |n|, has been written carefully so that it works properly if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div| to negative arguments, since such operations are not implemented consistently by all \PASCAL\ compilers. @= procedure print_int(@!n:integer); {prints an integer in decimal form} var k:0..23; {index to current digit; we assume that $|n|<10^{23}$} @!m:integer; {used to negate |n| in possibly dangerous cases} begin k:=0; if n<0 then begin print_char("-"); if n>-100000000 then negate(n) else begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1; if m<10 then dig[0]:=m else begin dig[0]:=0; incr(n); end; end; end; repeat dig[k]:=n mod 10; n:=n div 10; incr(k); until n=0; print_the_digs(k); end; @ Here is a trivial procedure to print two digits; it is usually called with a parameter in the range |0<=n<=99|. @p procedure print_two(@!n:integer); {prints two least significant digits} begin n:=abs(n) mod 100; print_char("0"+(n div 10)); print_char("0"+(n mod 10)); end; @ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|. @p procedure print_hex(@!n:integer); {prints a positive integer in hexadecimal form} var k:0..22; {index to current digit; we assume that $0\L n<16^{22}$} begin k:=0; print_char(""""); repeat dig[k]:=n mod 16; n:=n div 16; incr(k); until n=0; print_the_digs(k); end; @ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function is now subsumed by |print|. We retain the old name here as a possible aid to future software arch\ae ologists. @d print_ASCII == print @ Roman numerals are produced by the |print_roman_int| routine. Readers who like puzzles might enjoy trying to figure out how this tricky code works; therefore no explanation will be given. Notice that 1990 yields \.{mcmxc}, not \.{mxm}. @p procedure print_roman_int(@!n:integer); label exit; var j,@!k: pool_pointer; {mysterious indices into |str_pool|} @!u,@!v: nonnegative_integer; {mysterious numbers} begin j:=str_start["m2d5c2l5x2v5i"]; v:=1000; loop@+ begin while n>=v do begin print_char(so(str_pool[j])); n:=n-v; end; if n<=0 then return; {nonpositive input produces no output} k:=j+2; u:=v div (so(str_pool[k-1])-"0"); if str_pool[k-1]=si("2") then begin k:=k+2; u:=u div (so(str_pool[k-1])-"0"); end; if n+u>=v then begin print_char(so(str_pool[k])); n:=n+u; end else begin j:=j+2; v:=v div (so(str_pool[j-1])-"0"); end; end; exit:end; @ The |print| subroutine will not print a string that is still being created. The following procedure will. @p procedure print_current_string; {prints a yet-unmade string} var j:pool_pointer; {points to current character code} begin j:=str_start[str_ptr]; while j term_offset:=0; {the user's line ended with \<\rm return>} decr(selector); {prepare to echo the input} if last<>first then for k:=first to last-1 do print(buffer[k]); print_ln; incr(selector); {restore previous status} end; @* \[6] Reporting errors. When something anomalous is detected, \TeX\ typically does something like this: $$\vbox{\halign{#\hfil\cr |print_err("Something anomalous has been detected");|\cr |help3("This is the first line of my offer to help.")|\cr |("This is the second line. I'm trying to")|\cr |("explain the best way for you to proceed.");|\cr |error;|\cr}}$$ A two-line help message would be given using |help2|, etc.; these informal helps should use simple vocabulary that complements the words used in the official error message that was printed. (Outside the U.S.A., the help messages should preferably be translated into the local vernacular. Each line of help is at most 60 characters long, in the present implementation, so that |max_print_line| will not be exceeded.) The |print_err| procedure supplies a `\.!' before the official message, and makes sure that the terminal is awake if a stop is going to occur. The |error| procedure supplies a `\..' after the official message, then it shows the location of the error; and if |interaction=error_stop_mode|, it also enters into a dialog with the user, during which time the help message may be printed. @^system dependencies@> @ The global variable |interaction| has four settings, representing increasing amounts of user interaction: @d batch_mode=0 {omits all stops and omits terminal output} @d nonstop_mode=1 {omits all stops} @d scroll_mode=2 {omits error stops} @d error_stop_mode=3 {stops at every opportunity to interact} @d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal; print_nl("! "); print(#); end @= @!interaction:batch_mode..error_stop_mode; {current level of interaction} @ @=interaction:=error_stop_mode; @ \TeX\ is careful not to call |error| when the print |selector| setting might be unusual. The only possible values of |selector| at the time of error messages are \yskip\hang|no_print| (when |interaction=batch_mode| and |log_file| not yet open); \hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open); \hang|log_only| (when |interaction=batch_mode| and |log_file| is open); \hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open). @= if interaction=batch_mode then selector:=no_print@+else selector:=term_only @ A global variable |deletions_allowed| is set |false| if the |get_next| routine is active when |error| is called; this ensures that |get_next| and related routines like |get_token| will never be called recursively. A similar interlock is provided by |set_box_allowed|. @^recursion@> The global variable |history| records the worst level of error that has been detected. It has four possible values: |spotless|, |warning_issued|, |error_message_issued|, and |fatal_error_stop|. Another global variable, |error_count|, is increased by one when an |error| occurs without an interactive dialog, and it is reset to zero at the end of every paragraph. If |error_count| reaches 100, \TeX\ decides that there is no point in continuing further. @d spotless=0 {|history| value when nothing has been amiss yet} @d warning_issued=1 {|history| value when |begin_diagnostic| has been called} @d error_message_issued=2 {|history| value when |error| has been called} @d fatal_error_stop=3 {|history| value when termination was premature} @= @!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?} @!set_box_allowed:boolean; {is it safe to do a \.{\\setbox} assignment?} @!history:spotless..fatal_error_stop; {has the source input been clean so far?} @!error_count:-1..100; {the number of scrolled errors since the last paragraph ended} @ The value of |history| is initially |fatal_error_stop|, but it will be changed to |spotless| if \TeX\ survives the initialization process. @= deletions_allowed:=true; set_box_allowed:=true; error_count:=0; {|history| is initialized elsewhere} @ Since errors can be detected almost anywhere in \TeX, we want to declare the error procedures near the beginning of the program. But the error procedures in turn use some other procedures, which need to be declared |forward| before we get to |error| itself. It is possible for |error| to be called recursively if some error arises when |get_token| is being used to delete a token, and/or if some fatal error occurs while \TeX\ is trying to fix a non-fatal one. But such recursion @^recursion@> is never more than two levels deep. @= procedure@?normalize_selector; forward;@t\2@>@/ procedure@?get_token; forward;@t\2@>@/ procedure@?term_input; forward;@t\2@>@/ procedure@?show_context; forward;@t\2@>@/ procedure@?begin_file_reading; forward;@t\2@>@/ procedure@?open_log_file; forward;@t\2@>@/ procedure@?close_files_and_terminate; forward;@t\2@>@/ procedure@?clear_for_error_prompt; forward;@t\2@>@/ procedure@?give_err_help; forward;@t\2@>@/ @t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help; forward;@;@+gubed @ Individual lines of help are recorded in the array |help_line|, which contains entries in positions |0..(help_ptr-1)|. They should be printed in reverse order, i.e., with |help_line[0]| appearing last. @d hlp1(#)==help_line[0]:=#;@+end @d hlp2(#)==help_line[1]:=#; hlp1 @d hlp3(#)==help_line[2]:=#; hlp2 @d hlp4(#)==help_line[3]:=#; hlp3 @d hlp5(#)==help_line[4]:=#; hlp4 @d hlp6(#)==help_line[5]:=#; hlp5 @d help0==help_ptr:=0 {sometimes there might be no help} @d help1==@+begin help_ptr:=1; hlp1 {use this with one help line} @d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines} @d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines} @d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines} @d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines} @d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines} @= @!help_line:array[0..5] of str_number; {helps for the next |error|} @!help_ptr:0..6; {the number of help lines present} @!use_err_help:boolean; {should the |err_help| list be shown?} @ @= help_ptr:=0; use_err_help:=false; @ The |jump_out| procedure just cuts across all active procedure levels and goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the whole program. It is used when there is no recovery from a particular error. Some \PASCAL\ compilers do not implement non-local |goto| statements. @^system dependencies@> In such cases the body of |jump_out| should simply be `|close_files_and_terminate|;\thinspace' followed by a call on some system procedure that quietly terminates the program. @= procedure jump_out; begin goto end_of_TEX; end; @ Here now is the general |error| routine. @= procedure error; {completes the job of error reporting} label continue,exit; var c:ASCII_code; {what the user types} @!s1,@!s2,@!s3,@!s4:integer; {used to save global variables when deleting tokens} begin if history; incr(error_count); if error_count=100 then begin print_nl("(That makes 100 errors; please try again.)"); @.That makes 100 errors...@> history:=fatal_error_stop; jump_out; end; @; exit:end; @ @= loop@+begin continue: clear_for_error_prompt; prompt_input("? "); @.?\relax@> if last=first then return; c:=buffer[first]; if c>="a" then c:=c+"A"-"a"; {convert to uppercase} @; end @ It is desirable to provide an `\.E' option here that gives the user an easy way to return from \TeX\ to the system editor, with the offending line ready to be edited. But such an extension requires some system wizardry, so the present implementation simply types out the name of the file that should be edited and the relevant line number. @^system dependencies@> There is a secret `\.D' option available when the debugging routines haven't been commented~out. @^debugging@> @= case c of "0","1","2","3","4","5","6","7","8","9": if deletions_allowed then @; @t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/ "E": if base_ptr>0 then begin print_nl("You want to edit file "); @.You want to edit file x@> slow_print(input_stack[base_ptr].name_field); print(" at line "); print_int(line); interaction:=scroll_mode; jump_out; end; "H": @; "I":@; "Q","R","S":@; "X":begin interaction:=scroll_mode; jump_out; end; othercases do_nothing endcases;@/ @ @ @= begin print("Type to proceed, S to scroll future error messages,");@/ @.Type to proceed...@> print_nl("R to run without stopping, Q to run quietly,");@/ print_nl("I to insert something, "); if base_ptr>0 then print("E to edit your file,"); if deletions_allowed then print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"); print_nl("H for help, X to quit."); end @ Here the author of \TeX\ apologizes for making use of the numerical relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings |batch_mode|, |nonstop_mode|, |scroll_mode|. @^Knuth, Donald Ervin@> @= begin error_count:=0; interaction:=batch_mode+c-"Q"; print("OK, entering "); case c of "Q":begin print_esc("batchmode"); decr(selector); end; "R":print_esc("nonstopmode"); "S":print_esc("scrollmode"); end; {there are no other cases} print("..."); print_ln; update_terminal; return; end @ When the following code is executed, |buffer[(first+1)..(last-1)]| may contain the material inserted by the user; otherwise another prompt will be given. In order to understand this part of the program fully, you need to be familiar with \TeX's input stacks. @= begin begin_file_reading; {enter a new syntactic level for terminal input} {now |state=mid_line|, so an initial blank space will count as a blank} if last>first+1 then begin loc:=first+1; buffer[first]:=" "; end else begin prompt_input("insert>"); loc:=first; @.insert>@> end; first:=last; cur_input.limit_field:=last-1; {no |end_line_char| ends this line} return; end @ We allow deletion of up to 99 tokens at a time. @= begin s1:=cur_tok; s2:=cur_cmd; s3:=cur_chr; s4:=align_state; align_state:=1000000; OK_to_interrupt:=false; if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then c:=c*10+buffer[first+1]-"0"*11 else c:=c-"0"; while c>0 do begin get_token; {one-level recursive call of |error| is possible} decr(c); end; cur_tok:=s1; cur_cmd:=s2; cur_chr:=s3; align_state:=s4; OK_to_interrupt:=true; help2("I have just deleted some text, as you asked.")@/ ("You can now delete more, or insert, or whatever."); show_context; goto continue; end @ @= begin if use_err_help then begin give_err_help; use_err_help:=false; end else begin if help_ptr=0 then help2("Sorry, I don't know how to help in this situation.")@/ @t\kern1em@>("Maybe you should try asking a human?"); repeat decr(help_ptr); print(help_line[help_ptr]); print_ln; until help_ptr=0; end; help4("Sorry, I already gave what help I could...")@/ ("Maybe you should try asking a human?")@/ ("An error might have occurred before I noticed any problems.")@/ ("``If all else fails, read the instructions.''");@/ goto continue; end @ @= if interaction>batch_mode then decr(selector); {avoid terminal output} if use_err_help then begin print_ln; give_err_help; end else while help_ptr>0 do begin decr(help_ptr); print_nl(help_line[help_ptr]); end; print_ln; if interaction>batch_mode then incr(selector); {re-enable terminal output} print_ln @ A dozen or so error messages end with a parenthesized integer, so we save a teeny bit of program space by declaring the following procedure: @p procedure int_error(@!n:integer); begin print(" ("); print_int(n); print_char(")"); error; end; @ In anomalous cases, the print selector might be in an unknown state; the following subroutine is called to fix things just enough to keep running a bit longer. @p procedure normalize_selector; begin if log_opened then selector:=term_and_log else selector:=term_only; if job_name=0 then open_log_file; if interaction=batch_mode then decr(selector); end; @ The following procedure prints \TeX's last words before dying. @d succumb==begin if interaction=error_stop_mode then interaction:=scroll_mode; {no more interaction} if log_opened then error; @!debug if interaction>batch_mode then debug_help;@+gubed@;@/ history:=fatal_error_stop; jump_out; {irrecoverable error} end @= procedure fatal_error(@!s:str_number); {prints |s|, and that's it} begin normalize_selector;@/ print_err("Emergency stop"); help1(s); succumb; @.Emergency stop@> end; @ Here is the most dreaded error message. @= procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness} begin normalize_selector; print_err("TeX capacity exceeded, sorry ["); @.TeX capacity exceeded ...@> print(s); print_char("="); print_int(n); print_char("]"); help2("If you really absolutely need more capacity,")@/ ("you can ask a wizard to enlarge me."); succumb; end; @ The program might sometime run completely amok, at which point there is no choice but to stop. If no previous error has been detected, that's bad news; a message is printed that is really intended for the \TeX\ maintenance person instead of the user (unless the user has been particularly diabolical). The index entries for `this can't happen' may help to pinpoint the problem. @^dry rot@> @= procedure confusion(@!s:str_number); {consistency check violated; |s| tells where} begin normalize_selector; if history help1("I'm broken. Please show this to someone who can fix can fix"); end else begin print_err("I can't go on meeting you like this"); @.I can't go on...@> help2("One of your faux pas seems to have wounded me deeply...")@/ ("in fact, I'm barely conscious. Please fix it and try again."); end; succumb; end; @ Users occasionally want to interrupt \TeX\ while it's running. If the \PASCAL\ runtime system allows this, one can implement a routine that sets the global variable |interrupt| to some nonzero value when such an interrupt is signalled. Otherwise there is probably at least a way to make |interrupt| nonzero using the \PASCAL\ debugger. @^system dependencies@> @^debugging@> @d check_interrupt==begin if interrupt<>0 then pause_for_instructions; end @= @!interrupt:integer; {should \TeX\ pause for instructions?} @!OK_to_interrupt:boolean; {should interrupts be observed?} @ @= interrupt:=0; OK_to_interrupt:=true; @ When an interrupt has been detected, the program goes into its highest interaction level and lets the user have nearly the full flexibility of the |error| routine. \TeX\ checks for interrupts only at times when it is safe to do this. @p procedure pause_for_instructions; begin if OK_to_interrupt then begin interaction:=error_stop_mode; if (selector=log_only)or(selector=no_print) then incr(selector); print_err("Interruption"); @.Interruption@> help3("You rang?")@/ ("Try to insert some instructions for me (e.g.,`I\showlists'),")@/ ("unless you just want to quit by typing `X'."); deletions_allowed:=false; error; deletions_allowed:=true; interrupt:=0; end; end; @* \[7] Arithmetic with scaled dimensions. The principal computations performed by \TeX\ are done entirely in terms of integers less than $2^{31}$ in magnitude; and divisions are done only when both dividend and divisor are nonnegative. Thus, the arithmetic specified in this program can be carried out in exactly the same way on a wide variety of computers, including some small ones. Why? Because the arithmetic calculations need to be spelled out precisely in order to guarantee that \TeX\ will produce identical output on different machines. If some quantities were rounded differently in different implementations, we would find that line breaks and even page breaks might occur in different places. Hence the arithmetic of \TeX\ has been designed with care, and systems that claim to be implementations of \TeX82 should follow precisely the @:TeX82}{\TeX82@> calculations as they appear in the present program. (Actually there are three places where \TeX\ uses |div| with a possibly negative numerator. These are harmless; see |div| in the index. Also if the user sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic information will involve negative-numerator division. The same remarks apply for |mod| as well as for |div|.) @ Here is a routine that calculates half of an integer, using an unambiguous convention with respect to signed odd numbers. @p function half(@!x:integer):integer; begin if odd(x) then half:=(x+1) div 2 else half:=x @!div 2; end; @ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit positions from the right end of a binary computer word. @d unity == @'200000 {$2^{16}$, represents 1.00000} @d two == @'400000 {$2^{17}$, represents 2.00000} @= @!scaled = integer; {this type is used for scaled integers} @!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$} @!small_number=0..63; {this type is self-explanatory} @ The following function is used to create a scaled integer from a given decimal fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is given in |dig[i]|, and the calculation produces a correctly rounded result. @p function round_decimals(@!k:small_number) : scaled; {converts a decimal fraction} var a:integer; {the accumulator} begin a:=0; while k>0 do begin decr(k); a:=(a+dig[k]*two) div 10; end; round_decimals:=(a+1) div 2; end; @ Conversely, here is a procedure analogous to |print_int|. If the output of this procedure is subsequently read by \TeX\ and converted by the |round_decimals| routine above, it turns out that the original value will be reproduced exactly; the ``simplest'' such decimal number is output, but there is always at least one digit following the decimal point. The invariant relation in the \&{repeat} loop is that a sequence of decimal digits yet to be printed will yield the original number if and only if they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}funity then s:=s+@'100000-50000; {round the last digit} print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10; until s<=delta; end; @ Physical sizes that a \TeX\ user specifies for portions of documents are represented internally as scaled points. Thus, if we define an `sp' (scaled @^sp@> point) as a unit equal to $2^{-16}$ printer's points, every dimension inside of \TeX\ is an integer number of sp. There are exactly 4,736,286.72 sp per inch. Users are not allowed to specify dimensions larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583 meters); two such quantities can be added without overflow on a 32-bit computer. The present implementation of \TeX\ does not check for overflow when @^Overflow in arithmetic@> dimensions are added or subtracted. This could be done by inserting a few dozen tests of the form `\ignorespaces|if x>=@'10000000000 then @t\\{report\_overflow}@>|', but the chance of overflow is so remote that such tests do not seem worthwhile. \TeX\ needs to do only a few arithmetic operations on scaled quantities, other than addition and subtraction, and the following subroutines do most of the work. A single computation might use several subroutine calls, and it is desirable to avoid producing multiple error messages in case of arithmetic overflow; so the routines set the global variable |arith_error| to |true| instead of reporting errors directly to the user. Another global variable, |remainder|, holds the remainder after a division. @= @!arith_error:boolean; {has arithmetic overflow occurred recently?} @!remainder:scaled; {amount subtracted to get an exact division} @ The first arithmetical subroutine we need computes $nx+y$, where |x| and~|y| are |scaled| and |n| is an integer. We will also use it to multiply integers. @d nx_plus_y(#)==mult_and_add(#,@'7777777777) @d mult_integers(#)==mult_and_add(#,0,@'17777777777) @p function mult_and_add(@!n:integer;@!x,@!y,@!max_answer:scaled):scaled; begin if n<0 then begin negate(x); negate(n); end; if n=0 then mult_and_add:=y else if ((x<=(max_answer-y) div n)and(-x<=(max_answer+y) div n)) then mult_and_add:=n*x+y else begin arith_error:=true; mult_and_add:=0; end; end; @ We also need to divide scaled dimensions by integers. @p function x_over_n(@!x:scaled;@!n:integer):scaled; var negative:boolean; {should |remainder| be negated?} begin negative:=false; if n=0 then begin arith_error:=true; x_over_n:=0; remainder:=x; end else begin if n<0 then begin negate(x); negate(n); negative:=true; end; if x>=0 then begin x_over_n:=x div n; remainder:=x mod n; end else begin x_over_n:=-((-x) div n); remainder:=-((-x) mod n); end; end; if negative then negate(remainder); end; @ Then comes the multiplication of a scaled number by a fraction |n/d|, where |n| and |d| are nonnegative integers |<=@t$2^{16}$@>| and |d| is positive. It would be too dangerous to multiply by~|n| and then divide by~|d|, in separate operations, since overflow might well occur; and it would be too inaccurate to divide by |d| and then multiply by |n|. Hence this subroutine simulates 1.5-precision arithmetic. @p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled; var positive:boolean; {was |x>=0|?} @!t,@!u,@!v:nonnegative_integer; {intermediate quantities} begin if x>=0 then positive:=true else begin negate(x); positive:=false; end; t:=(x mod @'100000)*n; u:=(x div @'100000)*n+(t div @'100000); v:=(u mod d)*@'100000 + (t mod @'100000); if u div d>=@'100000 then arith_error:=true else u:=@'100000*(u div d) + (v div d); if positive then begin xn_over_d:=u; remainder:=v mod d; end else begin xn_over_d:=-u; remainder:=-(v mod d); end; end; @ The next subroutine is used to compute the ``badness'' of glue, when a total~|t| is supposed to be made from amounts that sum to~|s|. According to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$; however, badness is simply a heuristic, so we need not squeeze out the last drop of accuracy when computing it. All we really want is an approximation that has similar properties. @:TeXbook}{\sl The \TeX book@> The actual method used to compute the badness is easier to read from the program than to describe in words. It produces an integer value that is a reasonably close approximation to $100(t/s)^3$, and all implementations of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is treated as infinitely bad, and represented by 10000. It is not difficult to prove that $$\hbox{|badness(t+1,s)>=badness(t,s) >=badness(t,s+1)|}.$$ The badness function defined here is capable of computing at most 1095 distinct values, but that is plenty. @d inf_bad = 10000 {infinitely bad value} @p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t>=0|} var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx 100\cdot2^{18}$} begin if t=0 then badness:=0 else if s<=0 then badness:=inf_bad else begin if t<=7230584 then r:=(t*297) div s {$297^3=99.94\times2^{18}$} else if s>=1663497 then r:=t div (s div 297) else r:=t; if r>1290 then badness:=inf_bad {$1290^3<2^{31}<1291^3$} else badness:=(r*r*r+@'400000) div @'1000000; end; {that was $r^3/2^{18}$, rounded to the nearest integer} end; @ When \TeX\ ``packages'' a list into a box, it needs to calculate the proportionality ratio by which the glue inside the box should stretch or shrink. This calculation does not affect \TeX's decision making, so the precise details of rounding, etc., in the glue calculation are not of critical importance for the consistency of results on different computers. We shall use the type |glue_ratio| for such proportionality ratios. A glue ratio should take the same amount of memory as an |integer| (usually 32 bits) if it is to blend smoothly with \TeX's other data structures. Thus |glue_ratio| should be equivalent to |short_real| in some implementations of \PASCAL. Alternatively, it is possible to deal with glue ratios using nothing but fixed-point arithmetic; see {\sl TUGboat \bf3},1 (March 1982), 10--27. (But the routines cited there must be modified to allow negative glue ratios.) @^system dependencies@> @d set_glue_ratio_zero(#) == #:=0.0 {store the representation of zero ratio} @d set_glue_ratio_one(#) == #:=1.0 {store the representation of unit ratio} @d float(#) == # {convert from |glue_ratio| to type |real|} @d unfloat(#) == # {convert from |real| to type |glue_ratio|} @d float_constant(#) == #.0 {convert |integer| constant to |real|} @= @!glue_ratio=real; {one-word representation of a glue expansion factor} @* \[8] Packed data. In order to make efficient use of storage space, \TeX\ bases its major data structures on a |memory_word|, which contains either a (signed) integer, possibly scaled, or a (signed) |glue_ratio|, or a small number of fields that are one half or one quarter of the size used for storing integers. If |x| is a variable of type |memory_word|, it contains up to four fields that can be referred to as follows: $$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr |x|&.|int|&(an |integer|)\cr |x|&.|sc|\qquad&(a |scaled| integer)\cr |x|&.|gr|&(a |glue_ratio|)\cr |x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr |x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword field)\cr |x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt &\qquad\qquad\qquad(four quarterword fields)\cr}}$$ This is somewhat cumbersome to write, and not very readable either, but macros will be used to make the notation shorter and more transparent. The \PASCAL\ code below gives a formal definition of |memory_word| and its subsidiary types, using packed variant records. \TeX\ makes no assumptions about the relative positions of the fields within a word. Since we are assuming 32-bit integers, a halfword must contain at least 16 bits, and a quarterword must contain at least 8 bits. @^system dependencies@> But it doesn't hurt to have more bits; for example, with enough 36-bit words you might be able to have |mem_max| as large as 262142, which is eight times as much memory as anybody had during the first four years of \TeX's existence. N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled by a \PASCAL\ that packs all of the |memory_word| variants into the space of a single integer. This means, for example, that |glue_ratio| words should be |short_real| instead of |real| on some computers. Some \PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into an eight-bit field, but others insist on allocating space for an additional sign bit; on such systems you can get 256 values into a quarterword only if the subrange is `|-128..127|'. The present implementation tries to accommodate as many variations as possible, so it makes few assumptions. If integers having the subrange `|min_quarterword..max_quarterword|' can be packed into a quarterword, and if integers having the subrange `|min_halfword..max_halfword|' can be packed into a halfword, everything should work satisfactorily. It is usually most efficient to have |min_quarterword=min_halfword=0|, so one should try to achieve this unless it causes a severe problem. The values defined here are recommended for most 32-bit computers. @d min_quarterword=0 {smallest allowable value in a |quarterword|} @d max_quarterword=255 {largest allowable value in a |quarterword|} @d min_halfword==0 {smallest allowable value in a |halfword|} @d max_halfword==65535 {largest allowable value in a |halfword|} @ Here are the inequalities that the quarterword and halfword values must satisfy (or rather, the inequalities that they mustn't satisfy): @= init if (mem_min<>mem_bot)or(mem_max<>mem_top) then bad:=10;@+tini@;@/ if (mem_min>mem_bot)or(mem_max0)or(max_quarterword<127) then bad:=11; if (min_halfword>0)or(max_halfword<32767) then bad:=12; if (min_quarterwordmax_halfword) then bad:=13; if (mem_min=max_halfword)or@| (mem_bot-mem_min>max_halfword+1) then bad:=14; if (font_basemax_quarterword) then bad:=15; if font_max>font_base+256 then bad:=16; if (save_size>max_halfword)or(max_strings>max_halfword) then bad:=17; if buf_size>max_halfword then bad:=18; if max_quarterword-min_quarterword<255 then bad:=19; @ The operation of adding or subtracting |min_quarterword| occurs quite frequently in \TeX, so it is convenient to abbreviate this operation by using the macros |qi| and |qo| for input and output to and from quarterword format. The inner loop of \TeX\ will run faster with respect to compilers that don't optimize expressions like `|x+0|' and `|x-0|', if these macros are simplified in the obvious way when |min_quarterword=0|. @^inner loop@>@^system dependencies@> @d qi(#)==#+min_quarterword {to put an |eight_bits| item into a quarterword} @d qo(#)==#-min_quarterword {to take an |eight_bits| item out of a quarterword} @d hi(#)==#+min_halfword {to put a sixteen-bit item into a halfword} @d ho(#)==#-min_halfword {to take a sixteen-bit item from a halfword} @ The reader should study the following definitions closely: @^system dependencies@> @d sc==int {|scaled| data is equivalent to |integer|} @= @!quarterword = min_quarterword..max_quarterword; {1/4 of a word} @!halfword=min_halfword..max_halfword; {1/2 of a word} @!two_choices = 1..2; {used when there are two variants in a record} @!four_choices = 1..4; {used when there are four variants in a record} @!two_halves = packed record@;@/ @!rh:halfword; case two_choices of 1: (@!lh:halfword); 2: (@!b0:quarterword; @!b1:quarterword); end; @!four_quarters = packed record@;@/ @!b0:quarterword; @!b1:quarterword; @!b2:quarterword; @!b3:quarterword; end; @!memory_word = record@;@/ case four_choices of 1: (@!int:integer); 2: (@!gr:glue_ratio); 3: (@!hh:two_halves); 4: (@!qqqq:four_quarters); end; @!word_file = file of memory_word; @ When debugging, we may want to print a |memory_word| without knowing what type it is; so we print it in all modes. @^dirty \PASCAL@>@^debugging@> @p @!debug procedure print_word(@!w:memory_word); {prints |w| in all ways} begin print_int(w.int); print_char(" ");@/ print_scaled(w.sc); print_char(" ");@/ print_scaled(round(unity*float(w.gr))); print_ln;@/ @^real multiplication@> print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":"); print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/ print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":"); print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3); end; gubed @* \[9] Dynamic memory allocation. The \TeX\ system does nearly all of its own memory allocation, so that it can readily be transported into environments that do not have automatic facilities for strings, garbage collection, etc., and so that it can be in control of what error messages the user receives. The dynamic storage requirements of \TeX\ are handled by providing a large array |mem| in which consecutive blocks of words are used as nodes by the \TeX\ routines. Pointer variables are indices into this array, or into another array called |eqtb| that will be explained later. A pointer variable might also be a special flag that lies outside the bounds of |mem|, so we allow pointers to assume any |halfword| value. The minimum halfword value represents a null pointer. \TeX\ does not assume that |mem[null]| exists. @d pointer==halfword {a flag or a location in |mem| or |eqtb|} @d null==min_halfword {the null pointer} @= @!temp_ptr:pointer; {a pointer variable for occasional emergency use} @ The |mem| array is divided into two regions that are allocated separately, but the dividing line between these two regions is not fixed; they grow together until finding their ``natural'' size in a particular job. Locations less than or equal to |lo_mem_max| are used for storing variable-length records consisting of two or more words each. This region is maintained using an algorithm similar to the one described in exercise 2.5--19 of {\sl The Art of Computer Programming}. However, no size field appears in the allocated nodes; the program is responsible for knowing the relevant size when a node is freed. Locations greater than or equal to |hi_mem_min| are used for storing one-word records; a conventional \.{AVAIL} stack is used for allocation in this region. Locations of |mem| between |mem_bot| and |mem_top| may be dumped as part of preloaded format files, by the \.{INITEX} preprocessor. @.INITEX@> Production versions of \TeX\ may extend the memory at both ends in order to provide more space; locations between |mem_min| and |mem_bot| are always used for variable-size nodes, and locations between |mem_top| and |mem_max| are always used for single-word nodes. The key pointers that govern |mem| allocation have a prescribed order: $$\advance\thickmuskip-2mu \hbox{|null<=mem_min<=mem_bot= @!mem : array[mem_min..mem_max] of memory_word; {the big dynamic storage area} @!lo_mem_max : pointer; {the largest location of variable-size memory in use} @!hi_mem_min : pointer; {the smallest location of one-word memory in use} @ In order to study the memory requirements of particular applications, it is possible to prepare a version of \TeX\ that keeps track of current and maximum memory usage. When code between the delimiters |@!stat| $\ldots$ |tats| is not ``commented out,'' \TeX\ will run a bit slower but it will report these statistics when |tracing_stats| is sufficiently large. @= @!var_used, @!dyn_used : integer; {how much memory is in use} @ Let's consider the one-word memory region first, since it's the simplest. The pointer variable |mem_end| holds the highest-numbered location of |mem| that has ever been used. The free locations of |mem| that occur between |hi_mem_min| and |mem_end|, inclusive, are of type |two_halves|, and we write |info(p)| and |link(p)| for the |lh| and |rh| fields of |mem[p]| when it is of this type. The single-word free locations form a linked list $$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$ terminated by |null|. @d link(#) == mem[#].hh.rh {the |link| field of a memory word} @d info(#) == mem[#].hh.lh {the |info| field of a memory word} @= @!avail : pointer; {head of the list of available one-word nodes} @!mem_end : pointer; {the last one-word node used in |mem|} @ If memory is exhausted, it might mean that the user has forgotten a right brace. We will define some procedures later that try to help pinpoint the trouble. @p @@/ @ @ The function |get_avail| returns a pointer to a new one-word node whose |link| field is null. However, \TeX\ will halt if there is no more room left. @^inner loop@> If the available-space list is empty, i.e., if |avail=null|, we try first to increase |mem_end|. If that cannot be done, i.e., if |mem_end=mem_max|, we try to decrease |hi_mem_min|. If that cannot be done, i.e., if |hi_mem_min=lo_mem_max+1|, we have to quit. @p function get_avail : pointer; {single-word node allocation} var p:pointer; {the new node being got} begin p:=avail; {get top location in the |avail| stack} if p<>null then avail:=link(avail) {and pop it off} else if mem_end end; end; link(p):=null; {provide an oft-desired initialization of the new node} @!stat incr(dyn_used);@+tats@;{maintain statistics} get_avail:=p; end; @ Conversely, a one-word node is recycled by calling |free_avail|. This routine is part of \TeX's ``inner loop,'' so we want it to be fast. @^inner loop@> @d free_avail(#)== {single-word node liberation} begin link(#):=avail; avail:=#; @!stat decr(dyn_used);@+tats@/ end @ There's also a |fast_get_avail| routine, which saves the procedure-call overhead at the expense of extra programming. This routine is used in the places that would otherwise account for the most calls of |get_avail|. @^inner loop@> @d fast_get_avail(#)==@t@>@;@/ begin #:=avail; {avoid |get_avail| if possible, to save time} if #=null then #:=get_avail else begin avail:=link(#); link(#):=null; @!stat incr(dyn_used);@+tats@/ end; end @ The procedure |flush_list(p)| frees an entire linked list of one-word nodes that starts at position |p|. @^inner loop@> @p procedure flush_list(@!p:pointer); {makes list of single-word nodes available} var @!q,@!r:pointer; {list traversers} begin if p<>null then begin r:=p; repeat q:=r; r:=link(r); @!stat decr(dyn_used);@+tats@/ until r=null; {now |q| is the last node on the list} link(q):=avail; avail:=p; end; end; @ The available-space list that keeps track of the variable-size portion of |mem| is a nonempty, doubly-linked circular list of empty nodes, pointed to by the roving pointer |rover|. Each empty node has size 2 or more; the first word contains the special value |max_halfword| in its |link| field and the size in its |info| field; the second word contains the two pointers for double linking. Each nonempty node also has size 2 or more. Its first word is of type |two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|. Otherwise there is complete flexibility with respect to the contents of its other fields and its other words. (We require |mem_max= @!rover : pointer; {points to some node in the list of empties} @ A call to |get_node| with argument |s| returns a pointer to a new node of size~|s|, which must be 2~or more. The |link| field of the first word of this new node is set to null. An overflow stop occurs if no suitable space exists. If |get_node| is called with $s=2^{30}$, it simply merges adjacent free areas and returns the value |max_halfword|. @p function get_node(@!s:integer):pointer; {variable-size node allocation} label found,exit,restart; var p:pointer; {the node currently under inspection} @!q:pointer; {the node physically after node |p|} @!r:integer; {the newly allocated node, or a candidate for this honor} @!t:integer; {temporary register} begin restart: p:=rover; {start at some free node in the ring} repeat @; @^inner loop@> p:=rlink(p); {move to the next node in the ring} until p=rover; {repeat until the whole list has been traversed} if s=@'10000000000 then begin get_node:=max_halfword; return; end; if lo_mem_max+2; overflow("main memory size",mem_max+1-mem_min); {sorry, nothing satisfactory is left} @:TeX capacity exceeded main memory size}{\quad main memory size@> found: link(r):=null; {this node is now nonempty} @!stat var_used:=var_used+s; {maintain usage statistics} tats@;@/ get_node:=r; exit:end; @ The lower part of |mem| grows by 1000 words at a time, unless we are very close to going under. When it grows, we simply link a new node into the available-space list. This method of controlled growth helps to keep the |mem| usage consecutive when \TeX\ is implemented on ``virtual memory'' systems. @^virtual memory@> @= begin if hi_mem_min-lo_mem_max>=1998 then t:=lo_mem_max+1000 else t:=lo_mem_max+1+(hi_mem_min-lo_mem_max) div 2; {|lo_mem_max+2<=tmem_bot+max_halfword then t:=mem_bot+max_halfword; rlink(q):=rover; llink(q):=p; link(q):=empty_flag; node_size(q):=t-lo_mem_max;@/ lo_mem_max:=t; link(lo_mem_max):=null; info(lo_mem_max):=null; rover:=q; goto restart; end @ Empirical tests show that the routine in this section performs a node-merging operation about 0.75 times per allocation, on the average, after which it finds that |r>p+1| about 95\pct! of the time. @= q:=p+node_size(p); {find the physical successor} @^inner loop@> while is_empty(q) do {merge node |p| with node |q|} begin t:=rlink(q); if q=rover then rover:=t; llink(t):=llink(q); rlink(llink(q)):=t;@/ q:=q+node_size(q); end; r:=q-s; if r>p+1 then @; if r=p then if rlink(p)<>p then @; node_size(p):=q-p {reset the size in case it grew} @ @= begin node_size(p):=r-p; {store the remaining size} @^inner loop@> rover:=p; {start searching here next time} goto found; end @ Here we delete node |p| from the ring, and let |rover| rove around. @= begin rover:=rlink(p); t:=llink(p); llink(rover):=t; rlink(t):=rover; goto found; end @ Conversely, when some variable-size node |p| of size |s| is no longer needed, the operation |free_node(p,s)| will make its words available, by inserting |p| as a new empty node just before where |rover| now points. @^inner loop@> @p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node liberation} var q:pointer; {|llink(rover)|} begin node_size(p):=s; link(p):=empty_flag; q:=llink(rover); llink(p):=q; rlink(p):=rover; {set both links} llink(rover):=p; rlink(q):=p; {insert |p| into the ring} @!stat var_used:=var_used-s;@+tats@;{maintain statistics} end; @ Just before \.{INITEX} writes out the memory, it sorts the doubly linked available space list. The list is probably very short at such times, so a simple insertion sort is used. The smallest available location will be pointed to by |rover|, the next-smallest by |rlink(rover)|, etc. @p @!init procedure sort_avail; {sorts the available variable-size nodes by location} var p,@!q,@!r: pointer; {indices into |mem|} @!old_rover:pointer; {initial |rover| setting} begin p:=get_node(@'10000000000); {merge adjacent free areas} p:=rlink(rover); rlink(rover):=max_halfword; old_rover:=rover; while p<>old_rover do @; p:=rover; while rlink(p)<>max_halfword do begin llink(rlink(p)):=p; p:=rlink(p); end; rlink(p):=rover; llink(rover):=p; end; tini @ The following |while| loop is guaranteed to terminate, since the list that starts at |rover| ends with |max_halfword| during the sorting procedure. @= if p@^Chinese characters@>@^Japanese characters@> and styles of type. It is suggested that Chinese and Japanese fonts be handled by representing such characters in two consecutive |char_node| entries: The first of these has |font=font_base|, and its |link| points to the second; the second identifies the font and the character dimensions. The saving feature about oriental characters is that most of them have the same box dimensions. The |character| field of the first |char_node| is a ``\\{charext}'' that distinguishes between graphic symbols whose dimensions are identical for typesetting purposes. (See the \MF\ manual.) Such an extension of \TeX\ would not be difficult; further details are left to the reader. In order to make sure that the |character| code fits in a quarterword, \TeX\ adds the quantity |min_quarterword| to the actual code. Character nodes appear only in horizontal lists, never in vertical lists. @d is_char_node(#) == (#>=hi_mem_min) {does the argument point to a |char_node|?} @d font == type {the font code in a |char_node|} @d character == subtype {the character code in a |char_node|} @ An |hlist_node| stands for a box that was made from a horizontal list. Each |hlist_node| is seven words long, and contains the following fields (in addition to the mandatory |type| and |link|, which we shall not mention explicitly when discussing the other node types): The |height| and |width| and |depth| are scaled integers denoting the dimensions of the box. There is also a |shift_amount| field, a scaled integer indicating how much this box should be lowered (if it appears in a horizontal list), or how much it should be moved to the right (if it appears in a vertical list). There is a |list_ptr| field, which points to the beginning of the list from which this box was fabricated; if |list_ptr| is |null|, the box is empty. Finally, there are three fields that represent the setting of the glue: |glue_set(p)| is a word of type |glue_ratio| that represents the proportionality constant for glue setting; |glue_sign(p)| is |stretching| or |shrinking| or |normal| depending on whether or not the glue should stretch or shrink or remain rigid; and |glue_order(p)| specifies the order of infinity to which glue setting applies (|normal|, |fil|, |fill|, or |filll|). The |subtype| field is not used. @d hlist_node=0 {|type| of hlist nodes} @d box_node_size=7 {number of words to allocate for a box node} @d width_offset=1 {position of |width| field in a box node} @d depth_offset=2 {position of |depth| field in a box node} @d height_offset=3 {position of |height| field in a box node} @d width(#) == mem[#+width_offset].sc {width of the box, in sp} @d depth(#) == mem[#+depth_offset].sc {depth of the box, in sp} @d height(#) == mem[#+height_offset].sc {height of the box, in sp} @d shift_amount(#) == mem[#+4].sc {repositioning distance, in sp} @d list_offset=5 {position of |list_ptr| field in a box node} @d list_ptr(#) == link(#+list_offset) {beginning of the list inside the box} @d glue_order(#) == subtype(#+list_offset) {applicable order of infinity} @d glue_sign(#) == type(#+list_offset) {stretching or shrinking} @d normal=0 {the most common case when several cases are named} @d stretching = 1 {glue setting applies to the stretch components} @d shrinking = 2 {glue setting applies to the shrink components} @d glue_offset = 6 {position of |glue_set| in a box node} @d glue_set(#) == mem[#+glue_offset].gr {a word of type |glue_ratio| for glue setting} @ The |new_null_box| function returns a pointer to an |hlist_node| in which all subfields have the values corresponding to `\.{\\hbox\{\}}'. The |subtype| field is set to |min_quarterword|, since that's the desired |span_count| value if this |hlist_node| is changed to an |unset_node|. @p function new_null_box:pointer; {creates a new box node} var p:pointer; {the new node} begin p:=get_node(box_node_size); type(p):=hlist_node; subtype(p):=min_quarterword; width(p):=0; depth(p):=0; height(p):=0; shift_amount(p):=0; list_ptr(p):=null; glue_sign(p):=normal; glue_order(p):=normal; set_glue_ratio_zero(glue_set(p)); new_null_box:=p; end; @ A |vlist_node| is like an |hlist_node| in all respects except that it contains a vertical list. @d vlist_node=1 {|type| of vlist nodes} @ A |rule_node| stands for a solid black rectangle; it has |width|, |depth|, and |height| fields just as in an |hlist_node|. However, if any of these dimensions is $-2^{30}$, the actual value will be determined by running the rule up to the boundary of the innermost enclosing box. This is called a ``running dimension.'' The |width| is never running in an hlist; the |height| and |depth| are never running in a~vlist. @d rule_node=2 {|type| of rule nodes} @d rule_node_size=4 {number of words to allocate for a rule node} @d null_flag==-@'10000000000 {$-2^{30}$, signifies a missing item} @d is_running(#) == (#=null_flag) {tests for a running dimension} @ A new rule node is delivered by the |new_rule| function. It makes all the dimensions ``running,'' so you have to change the ones that are not allowed to run. @p function new_rule:pointer; var p:pointer; {the new node} begin p:=get_node(rule_node_size); type(p):=rule_node; subtype(p):=0; {the |subtype| is not used} width(p):=null_flag; depth(p):=null_flag; height(p):=null_flag; new_rule:=p; end; @ Insertions are represented by |ins_node| records, where the |subtype| indicates the corresponding box number. For example, `\.{\\insert 250}' leads to an |ins_node| whose |subtype| is |250+min_quarterword|. The |height| field of an |ins_node| is slightly misnamed; it actually holds the natural height plus depth of the vertical list being inserted. The |depth| field holds the |split_max_depth| to be used in case this insertion is split, and the |split_top_ptr| points to the corresponding |split_top_skip|. The |float_cost| field holds the |floating_penalty| that will be used if this insertion floats to a subsequent page after a split insertion of the same class. There is one more field, the |ins_ptr|, which points to the beginning of the vlist for the insertion. @d ins_node=3 {|type| of insertion nodes} @d ins_node_size=5 {number of words to allocate for an insertion} @d float_cost(#)==mem[#+1].int {the |floating_penalty| to be used} @d ins_ptr(#)==info(#+4) {the vertical list to be inserted} @d split_top_ptr(#)==link(#+4) {the |split_top_skip| to be used} @ A |mark_node| has a |mark_ptr| field that points to the reference count of a token list that contains the user's \.{\\mark} text. This field occupies a full word instead of a halfword, because there's nothing to put in the other halfword; it is easier in \PASCAL\ to use the full word than to risk leaving garbage in the unused half. @d mark_node=4 {|type| of a mark node} @d small_node_size=2 {number of words to allocate for most node types} @d mark_ptr(#)==mem[#+1].int {head of the token list for a mark} @ An |adjust_node|, which occurs only in horizontal lists, specifies material that will be moved out into the surrounding vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}' operation. The |adjust_ptr| field points to the vlist containing this material. @d adjust_node=5 {|type| of an adjust node} @d adjust_ptr==mark_ptr {vertical list to be moved out of horizontal list} @ A |ligature_node|, which occurs only in horizontal lists, specifies a character that was fabricated from the interaction of two or more actual characters. The second word of the node, which is called the |lig_char| word, contains |font| and |character| fields just as in a |char_node|. The characters that generated the ligature have not been forgotten, since they are needed for diagnostic messages and for hyphenation; the |lig_ptr| field points to a linked list of character nodes for all original characters that have been deleted. (This list might be empty if the characters that generated the ligature were retained in other nodes.) The |subtype| field is 0, plus 2 and/or 1 if the original source of the ligature included implicit left and/or right boundaries. @d ligature_node=6 {|type| of a ligature node} @d lig_char(#)==#+1 {the word where the ligature is to be found} @d lig_ptr(#)==link(lig_char(#)) {the list of characters} @ The |new_ligature| function creates a ligature node having given contents of the |font|, |character|, and |lig_ptr| fields. We also have a |new_lig_item| function, which returns a two-word node having a given |character| field. Such nodes are used for temporary processing as ligatures are being created. @p function new_ligature(@!f,@!c:quarterword; @!q:pointer):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); type(p):=ligature_node; font(lig_char(p)):=f; character(lig_char(p)):=c; lig_ptr(p):=q; subtype(p):=0; new_ligature:=p; end; @# function new_lig_item(@!c:quarterword):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); character(p):=c; lig_ptr(p):=null; new_lig_item:=p; end; @ A |disc_node|, which occurs only in horizontal lists, specifies a ``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text that starts at |pre_break(p)| will precede the break, the text that starts at |post_break(p)| will follow the break, and text that appears in the next |replace_count(p)| nodes will be ignored. For example, an ordi