% This program is copyright (C) 1985 by Oren Patashnik; all rights reserved. % Copying of this file is authorized only if (1) you are Oren Patashnik, or if % (2) you make absolutely no changes to your copy. (The WEB system provides % for alterations via an auxiliary file; the master file should stay intact.) % See Appendix H of the WEB manual for hints on how to install this program. % Version 0.98f was released in March 1985. % Version 0.98g was released in April; it removed some system dependencies % (introducing term_in and term_out in place of just tty, and removing % some nonlocal goto's) and it gave context for certain parsing errors. % Version 0.98h was released in April; it patched a bug in the output % line-breaking routine that can arise with some nonstandard style files. % Version 0.98i was released in May; its main change split up the main program % and some procedures to help certain compilers cope with size % limitations, among other things changing error and warning macros so % they'd produce (much) less inline code; it also redefined the class of % legal style-file identifiers---although this affects only the bizarre % ones, it makes BibTeX's error messages more coherent; and it had many % minor changes, including about a 15% speed-up on TOPS-20. % Version 0.99a was released in January 1988. Its main changes: allowed the % inclusion of entire .bib files (rather than just those entries % \cited or \nocited); made the sorting algorithm stable; eliminated % any case conversion for file names; allowed concatenation in database % fields and string definitions; handled hyphenated names properly; % handled accented characters properly; implemented new empty$, % preamble$, text.length$, text.prefix$, and warning$ built-in functions; % allowed a new cross-referencing feature; and made many minor fixes, % including about a 40% speed-up on TOPS-20. % Version 0.99b was released in February 1988. It changed text.length$ and % text.prefix$ to not count braces as text characters, and it changed % text.prefix$ to add any necessary matching right braces. % Version 0.99c was released in February 1988. It removed two begin-end pairs % that, for convention only, surrounded entire modules, but that elicited % label-related complaints from some compilers. % Please report any bugs to Oren Patashnik (PATASHNIK@@SCORE.STANFORD.EDU) % Although considerable effort has been expended to make the BibTeX program % correct and reliable, no warranty is implied; the author disclaims any % obligation or liability for damages, including but not limited to % special, indirect, or consequential damages arising out of or in % connection with the use or performance of this software. % This program was written by Oren Patashnik, in consultation with Leslie % Lamport, to be used with Lamport's LaTeX document preparation system. % Some modules were taken from Knuth's TeX and TeXware with his permission. % Here is TeX material that gets inserted after \input webmac \def\hang{\hangindent 3em\indent\ignorespaces} \font\ninerm=cmr9 \let\mc=\ninerm % medium caps for names like PASCAL \def\PASCAL{{\mc PASCAL}} \def\ph{{\mc PASCAL-H}} \def\<#1>{$\langle#1\rangle$} \def\section{\mathhexbox278} \def\(#1){} % this is used to make section names sort themselves better \def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> % Note: WEAVE will typeset an upper-case `E' in a PASCAL identifier a % bit strangely so that the `TeX' in the name of this program is typeset % correctly; if this becomes a problem remove these three lines to get % normal upper-case `E's in PASCAL identifiers \def\drop{\kern-.1667em\lower.5ex\hbox{E}\kern-.125em} % middle of TeX \catcode`E=13 \uppercase{\def E{e}} \def\\#1{\hbox{\let E=\drop\it#1\/\kern.05em}} % italic type for identifiers \font\sc=cmcsc10 \def\BibTeX{{\rm B\kern-.05em{\sc i\kern-.025em b}\kern-.08em T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}} \def\LaTeX{{\rm L\kern-.36em\raise.3ex\hbox{\sc a}\kern-.15em T\kern-.1667em\lower.7ex\hbox{E}\kern-.125emX}} \def\title{\BibTeX\ } \def\today{\ifcase\month\or January\or February\or March\or April\or May\or June\or July\or August\or September\or October\or November\or December\fi \space\number\day, \number\year} \def\topofcontents{\null\vfill \def\titlepage{F} \centerline{\:\titlefont The {\:\ttitlefont \BibTeX} preprocessor} \vskip 15pt \centerline{(Version 0.99c---\today)} \vfill} \pageno=\contentspagenumber \advance\pageno by 1 @* Introduction. @^documentation@> @^space savings@> @^system dependencies@> @^wizard@> @!@:BibTeX}{\BibTeX@> @!@:BibTeX documentation}{\BibTeX\ documentation@> @:LaTeX}{\LaTeX@> \BibTeX\ is a preprocessor (with elements of postprocessing as explained below) for the \LaTeX\ document-preparation system. It handles most of the formatting decisions required to produce a reference list, outputting a \.{.bbl} file that a user can edit to add any finishing touches \BibTeX\ isn't designed to handle (in practice, such editing almost never is needed); with this file \LaTeX\ actually produces the reference list. Here's how \BibTeX\ works. It takes as input (a)~an \.{.aux} file produced by \LaTeX\ on an earlier run; (b)~a \.{.bst} file (the style file), which specifies the general reference-list style and specifies how to format individual entries, and which is written by a style designer (called a wizard throughout this program) in a special-purpose language described in the \BibTeX\ documentation---see the file {\.{btxdoc.tex}}; and (c)~\.{.bib} file(s) constituting a database of all reference-list entries the user might ever hope to use. \BibTeX\ chooses from the \.{.bib} file(s) only those entries specified by the \.{.aux} file (that is, those given by \LaTeX's \.{\\cite} or \.{\\nocite} commands), and creates as output a \.{.bbl} file containing these entries together with the formatting commands specified by the \.{.bst} file (\BibTeX\ also creates a \.{.blg} log file, which includes any error or warning messages, but this file isn't used by any program). \LaTeX\ will use the \.{.bbl} file, perhaps edited by the user, to produce the reference list. Many modules of \BibTeX\ were taken from Knuth's \TeX\ and \TeX ware, with his permission. All known system-dependent modules are marked in the index entry ``system dependencies''; Dave Fuchs helped exorcise unwanted ones. In addition, a few modules that can be changed to make \BibTeX\ smaller are marked in the index entry ``space savings''. Megathanks to Howard Trickey, for whose suggestions future users and style writers would be eternally grateful, if only they knew. The |banner| string defined here should be changed whenever \BibTeX\ gets modified. @d banner=='This is BibTeX, Version 0.99c' {printed when the program starts} @ @^system dependencies@> Terminal output goes to the file |term_out|, while terminal input comes from |term_in|. On our system, these (system-dependent) files are already opened at the beginning of the program, and have the same real name. @d term_out == tty @d term_in == tty @ @^system dependencies@> This program uses the term |print| instead of |write| when writing on both the |log_file| and (system-dependent) |term_out| file, and it uses |trace_pr| when in |trace| mode, for which it writes on just the |log_file|. If you want to change where either set of macros writes to, you should also change the other macros in this program for that set; each such macro begins with |print_| or |trace_pr_|. @d print(#) == begin write(log_file,#); write(term_out,#); end @d print_ln(#) == begin write_ln(log_file,#); write_ln(term_out,#); end @d print_newline == print_a_newline {making this a procedure saves a little space} @# @d trace_pr(#) == begin write(log_file,#); end @d trace_pr_ln(#) == begin write_ln(log_file,#); end @d trace_pr_newline == begin write_ln(log_file); end @= procedure print_a_newline; begin write_ln(log_file); write_ln(term_out); end; @ @^debugging@> @^statistics@> Some of the code below is intended to be used only when diagnosing the strange behavior that sometimes occurs when \BibTeX\ is being installed or when system wizards are fooling around with \BibTeX\ without quite knowing what they are doing. Such code will not normally be compiled; it is delimited by the codewords `$|debug|\ldots|gubed|$', with apologies to people who wish to preserve the purity of English. Similarly, there is some conditional code delimited by `$|stat|\ldots|tats|$' that is intended only for use when statistics are to be kept about \BibTeX's memory/cpu usage, and there is conditional code delimited by `$|trace|\ldots|ecart|$' that is intended to be a trace facility for use mainly when debugging \.{.bst} files. @d debug == @{ { remove the `|@{|' when debugging } @d gubed == @t@>@} { remove the `|@}|' when debugging } @f debug == begin @f gubed == end @# @d stat == @{ { remove the `|@{|' when keeping statistics } @d tats == @t@>@} { remove the `|@}|' when keeping statistics } @f stat == begin @f tats == end @# @d trace == @{ { remove the `|@{|' when in |trace| mode } @d ecart == @t@>@} { remove the `|@}|' when in |trace| mode } @f trace == begin @f ecart == end @ @^system dependencies@> We assume that |case| statements may include a default case that applies if no matching label is found, since most \PASCAL\ compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the \ph\ compiler allows `|others|:' as a default label, and other \PASCAL s allow syntaxes like `\ignorespaces|else|\unskip' or `\\{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| and |endcases| should be changed to agree with local conventions. Note that no semicolon appears before |endcases| in this program, so the definition of |endcases| should include a semicolon if the compiler wants one. (Of course, if no default mechanism is available, the |case| statements of \BibTeX\ will have to be laboriously extended by listing all remaining cases. People who are stuck with such \PASCAL s have in fact done this, successfully but not happily!) @d othercases == others: {default for cases not listed explicitly} @d endcases == @+end {follows the default case in an extended |case| statement} @f othercases == else @f endcases == end @ Labels are given symbolic names by the following definitions, so that occasional |goto| statements will be meaningful. We insert the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a procedure in which we have used the `|return|' statement defined below (and this is the only place `|exit|:' appears). This label is sometimes used for exiting loops that are set up with the |loop| construction defined below. Another generic label is `|loop_exit|:'; it appears immediately after a loop. Incidentally, this program never declares a label that isn't actually used, because some fussy \PASCAL\ compilers will complain about redundant labels. @d exit=10 {go here to leave a procedure} @d loop_exit=15 {go here to leave a loop within a procedure} @d loop1_exit=16 {the first generic label for a procedure with two} @d loop2_exit=17 {the second} @ @^for loops@> And |while| we're discussing loops: This program makes into |while| loops many that would otherwise be |for| loops because of Standard \PASCAL\ limitations (it's a bit complicated---standard \PASCAL\ doesn't allow a global variable as the index of a |for| loop inside a procedure; furthermore, many compilers have fairly severe limitations on the size of a block, including the main block of the program; so most of the code in this program occurs inside procedures, and since for other reasons this program must use primarily global variables, it doesn't use many |for| loops). @ @^program conventions@> This program uses this convention: If there are several quantities in a boolean expression, they are ordered by expected frequency (except perhaps when an error message results) so that execution will be fastest; this is more an attempt to understand the program than to make it faster. @ Here are some macros for common programming idioms. @d incr(#) == #:=#+1 {increase a variable by unity} @d decr(#) == #:=#-1 {decrease a variable by unity} @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} @f loop == xclause {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'} @d do_nothing == {empty statement} @d return == goto exit {terminate a procedure call} @f return == nil @d empty=0 {symbolic name for a null constant} @d any_value=0 {this appeases \PASCAL's boolean-evaluation scheme} @* The main program. @^system dependencies@> @:LaTeX}{\LaTeX@> This program first reads the \.{.aux} file that \LaTeX\ produces, (\romannumeral1) determining which \.{.bib} file(s) and \.{.bst} file to read and (\romannumeral2) constructing a list of cite keys in order of occurrence. The \.{.aux} file may have other \.{.aux} files nested within. Second, it reads and executes the \.{.bst} file, (\romannumeral1) determining how and in which order to process the database entries in the \.{.bib} file(s) corresponding to those cite keys in the list (or in some cases, to all the entries in the \.{.bib} file(s)), (\romannumeral2) determining what text to be output for each entry and determining any additional text to be output, and (\romannumeral3) actually outputting this text to the \.{.bbl} file. In addition, the program sends error messages and other remarks to the |log_file| and terminal. @d close_up_shop=9998 {jump here after fatal errors} @d exit_program=9999 {jump here if we couldn't even get started} @p @t\4@>@@/ program BibTEX; {all files are opened dynamically} label close_up_shop,@!exit_program @; const @ type @ var @@; @@; @ @# begin initialize; print_ln(banner);@/ @; @; close_up_shop: @; exit_program: end. @ @^overflow in arithmetic@> @^system dependencies@> If the first character of a \PASCAL\ comment is a dollar sign, \ph\ treats the comment as a list of ``compiler directives'' that will affect the translation of this program into machine language. The directives shown below specify full checking and inclusion of the \PASCAL\ debugger when \BibTeX\ is being debugged, but they cause range checking and other redundant code to be eliminated when the production system is being generated. Arithmetic overflow will be detected in all cases. @= @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} @ @^bottom up@> @^gymnastics@> @^mooning@> All procedures in this program (except for |initialize|) are grouped into one of the seven classes below, and these classes are dispersed throughout the program. However: Much of this program is written top down, yet \PASCAL\ wants its procedures bottom up. Since mooning is neither a technically nor a socially acceptable solution to the bottom-up problem, this section instead performs the topological gymnastics that \.{WEB} allows, ordering these classes to satisfy \PASCAL\ compilers. There are a few procedures still out of place after this ordering, though, and the other modules that complete the task have ``gymnastics'' as an index entry. @= @@; @@; @@; @@; @@; @@; @ @ This procedure gets things started properly. @= procedure initialize; var @ begin @; if (bad > 0) then begin write_ln (term_out,bad:0,' is a bad bad'); goto exit_program; end; @; pre_def_certain_strings;@/ get_the_top_level_aux_file_name; end; @ @^space savings@> @^system dependencies@> These parameters can be changed at compile time to extend or reduce \BibTeX's capacity. They are set to accommodate about 750 cites when used with the standard styles, although |pool_size| is usually the first limitation to be a problem, often when there are 500 cites. @= @!buf_size=1000; {maximum number of characters in an input line (or string)} @!min_print_line=3; {minimum \.{.bbl} line length: must be |>=3|} @!max_print_line=79; {the maximum: must be |>min_print_line| and | @^system dependencies@> These parameters can also be changed at compile time, but they're needed to define some \.{WEB} numeric macros so they must be so defined themselves. @d hash_size=5000 {must be |>= max_strings| and |>= hash_prime|} @d hash_prime=4253 {a prime number about 85\% of |hash_size| and |>= 128| and |< @t$2^{14}-2^6$@>|} @d file_name_size=40 {file names shouldn't be longer than this} @d max_glob_strs=10 {maximum number of |str_global_var| names} @d max_glb_str_minus_1 = max_glob_strs-1 {to avoid wasting a |str_global_var|} @ In case somebody has inadvertently made bad settings of the ``constants,'' \BibTeX\ checks them using a global variable called |bad|. This is the first of many sections of \BibTeX\ where global variables are defined. @= @!bad:integer; {is some ``constant'' wrong?} @ Each digit-value of |bad| has a specific meaning. @= bad := 0; if (min_print_line < 3) then bad:=1; if (max_print_line <= min_print_line) then bad:=10*bad+2; if (max_print_line >= buf_size) then bad:=10*bad+3; if (hash_prime < 128) then bad:=10*bad+4; if (hash_prime > hash_size) then bad:=10*bad+5; if (hash_prime >= (16384-64)) then bad:=10*bad+6; if (max_strings > hash_size) then bad:=10*bad+7; if (max_cites > max_strings) then bad:=10*bad+8; if (ent_str_size > buf_size) then bad:=10*bad+9; if (glob_str_size > buf_size) then bad:=100*bad+11; {well, almost each} @ A global variable called |history| will contain one of four values at the end of every run: |spotless| means that no unusual messages were printed; |warning_message| means that a message of possible interest was printed but no serious errors were detected; |error_message| means that at least one error was found; |fatal_message| means that the program terminated abnormally. The value of |history| does not influence the behavior of the program; it is simply computed for the convenience of systems that might want to use such information. @d spotless=0 {|history| value for normal jobs} @d warning_message=1 {|history| value when non-serious info was printed} @d error_message=2 {|history| value when an error was noted} @d fatal_message=3 {|history| value when we had to stop prematurely} @= procedure mark_warning; begin if (history = warning_message) then incr(err_count) else if (history = spotless) then begin history := warning_message; err_count := 1; end; end; @# procedure mark_error; begin if (history < error_message) then begin history := error_message; err_count := 1; end else {|history = error_message|} incr(err_count); end; @# procedure mark_fatal; begin history := fatal_message; end; @ For the two states |warning_message| and |error_message| we keep track of the number of messages given; but since |warning_message|s aren't so serious, we ignore them once we've seen an |error_message|. Hence we need just the single variable |err_count| to keep track. @= @!history:spotless..fatal_message; {how bad was this run?} @!err_count:integer; @ The |err_count| gets set or reset when |history| first changes to |warning_message| or |error_message|, so we don't need to initialize it. @= history := spotless; @* The character set. @^ASCII code@> (The following material is copied (almost) verbatim from \TeX. Thus, the same system-dependent changes should be made to both programs.) In order to make \TeX\ readily portable between a wide variety of computers, all of its input text is converted to an internal seven-bit code that is essentially standard ASCII, the ``American Standard Code for Information Interchange.'' This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user's external representation just before they are output to a text file. Such an internal code is relevant to users of \TeX\ primarily because it governs the positions of characters in the fonts. For example, the character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets this letter it specifies character number 65 in the current font. If that font actually has `\.A' in a different position, \TeX\ doesn't know what the real position is; the program that does the actual printing from \TeX's device-independent files is responsible for converting from ASCII to a particular font encoding. \TeX's internal code is relevant also with respect to constants that begin with a reverse apostrophe. @ Characters of text that have been converted to \TeX's internal form are said to be of type |ASCII_code|, which is a subrange of the integers. @= @!ASCII_code=0..127; {seven-bit numbers} @ @^character set dependencies@> @^system dependencies@> The original \PASCAL\ compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lower-case letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for typesetting; so the present specification of \TeX\ has been written under the assumption that the \PASCAL\ compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes @'40 through @'176; all of these characters are now available on most computer terminals. Since we are dealing with more characters than were present in the first \PASCAL\ compilers, we have to decide what to call the associated data type. Some \PASCAL s use the original name |char| for the characters in text files, even though there now are more than 64 such characters, while other \PASCAL s consider |char| to be a 64-element subrange of a larger data type that has some other name. In order to accommodate this difference, we shall use the name |text_char| to stand for the data type of the characters that are converted to and from |ASCII_code| when they are input and output. We shall also assume that |text_char| consists of the elements |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The following definitions should be adjusted if necessary. @d text_char == char {the data type of characters in text files} @d first_text_char=0 {ordinal number of the smallest element of |text_char|} @d last_text_char=127 {ordinal number of the largest element of |text_char|} @= i:0..last_text_char; {this is the first one declared} @ The \TeX\ processor converts between ASCII code and the user's external character set by means of arrays |xord| and |xchr| that are analogous to \PASCAL's |ord| and |chr| functions. @= @!xord: array [text_char] of ASCII_code; {specifies conversion of input characters} @!xchr: array [ASCII_code] of text_char; {specifies conversion of output characters} @ @^character set dependencies@> @^system dependencies@> Since we are assuming that our \PASCAL\ system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize most of the |xchr| array properly, without needing any system-dependent changes. On the other hand, it is possible to implement \TeX\ with less complete character sets, and in such cases it will be necessary to change something here. @= xchr[@'40]:=' '; xchr[@'41]:='!'; xchr[@'42]:='"'; xchr[@'43]:='#'; xchr[@'44]:='$'; xchr[@'45]:='%'; xchr[@'46]:='&'; xchr[@'47]:='''';@/ xchr[@'50]:='('; xchr[@'51]:=')'; xchr[@'52]:='*'; xchr[@'53]:='+'; xchr[@'54]:=','; xchr[@'55]:='-'; xchr[@'56]:='.'; xchr[@'57]:='/';@/ xchr[@'60]:='0'; xchr[@'61]:='1'; xchr[@'62]:='2'; xchr[@'63]:='3'; xchr[@'64]:='4'; xchr[@'65]:='5'; xchr[@'66]:='6'; xchr[@'67]:='7';@/ xchr[@'70]:='8'; xchr[@'71]:='9'; xchr[@'72]:=':'; xchr[@'73]:=';'; xchr[@'74]:='<'; xchr[@'75]:='='; xchr[@'76]:='>'; xchr[@'77]:='?';@/ xchr[@'100]:='@@'; xchr[@'101]:='A'; xchr[@'102]:='B'; xchr[@'103]:='C'; xchr[@'104]:='D'; xchr[@'105]:='E'; xchr[@'106]:='F'; xchr[@'107]:='G';@/ xchr[@'110]:='H'; xchr[@'111]:='I'; xchr[@'112]:='J'; xchr[@'113]:='K'; xchr[@'114]:='L'; xchr[@'115]:='M'; xchr[@'116]:='N'; xchr[@'117]:='O';@/ xchr[@'120]:='P'; xchr[@'121]:='Q'; xchr[@'122]:='R'; xchr[@'123]:='S'; xchr[@'124]:='T'; xchr[@'125]:='U'; xchr[@'126]:='V'; xchr[@'127]:='W';@/ xchr[@'130]:='X'; xchr[@'131]:='Y'; xchr[@'132]:='Z'; xchr[@'133]:='['; xchr[@'134]:='\'; xchr[@'135]:=']'; xchr[@'136]:='^'; xchr[@'137]:='_';@/ xchr[@'140]:='`'; xchr[@'141]:='a'; xchr[@'142]:='b'; xchr[@'143]:='c'; xchr[@'144]:='d'; xchr[@'145]:='e'; xchr[@'146]:='f'; xchr[@'147]:='g';@/ xchr[@'150]:='h'; xchr[@'151]:='i'; xchr[@'152]:='j'; xchr[@'153]:='k'; xchr[@'154]:='l'; xchr[@'155]:='m'; xchr[@'156]:='n'; xchr[@'157]:='o';@/ xchr[@'160]:='p'; xchr[@'161]:='q'; xchr[@'162]:='r'; xchr[@'163]:='s'; xchr[@'164]:='t'; xchr[@'165]:='u'; xchr[@'166]:='v'; xchr[@'167]:='w';@/ xchr[@'170]:='x'; xchr[@'171]:='y'; xchr[@'172]:='z'; xchr[@'173]:='{'; xchr[@'174]:='|'; xchr[@'175]:='}'; xchr[@'176]:='~';@/ xchr[0]:=' '; xchr[@'177]:=' '; {ASCII codes 0 and |@'177| do not appear in text} @ @^character set dependencies@> @^system dependencies@> Some of the ASCII codes without visible characters have been given symbolic names in this program because they are used with a special meaning. The |tab| character may be system dependent. @d null_code=@'0 {ASCII code that might disappear} @d tab=@'11 {ASCII code treated as |white_space|} @d space=@'40 {ASCII code treated as |white_space|} @d invalid_code=@'177 {ASCII code that should not appear} @ @^character set dependencies@> @^system dependencies@> @:TeXbook}{\sl The \TeX book@> The ASCII code is ``standard'' only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/} gives a complete specification of the intended correspondence between characters and \TeX's internal representation. If \TeX\ is being used on a garden-variety \PASCAL\ for which only standard ASCII codes will appear in the input and output files, it doesn't really matter what codes are specified in |xchr[1..@'37]|, but the safest policy is to blank everything out by using the code shown below. However, other settings of |xchr| will make \TeX\ more friendly on computers that have an extended character set, so that users can type things like `\.^^Z' instead of `\.{\\ne}'. At MIT, for example, it would be more appropriate to substitute the code $$\hbox{|for i:=1 to @'37 do xchr[i]:=chr(i);|}$$ \TeX's character set is essentially the same as MIT's, even with respect to characters less than~@'40. People with extended character sets can assign codes arbitrarily, giving an |xchr| equivalent to whatever characters the users of \TeX\ are allowed to have in their input files. It is best to make the codes correspond to the intended interpretations as shown in Appendix~C whenever possible; but this is not necessary. For example, in countries with an alphabet of more than 26 letters, it is usually best to map the additional letters into codes less than~@'40. @= for i:=1 to @'37 do xchr[i]:=' '; xchr[tab]:=chr(tab); @ This system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| where |i= for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code; for i:=1 to @'176 do xord[xchr[i]]:=i; @ Also, various characters are given symbolic names; all the ones this program uses are collected here. We use the sharp sign as the |concat_char|, rather than something more natural (like an ampersand), for uniformity of database syntax (ampersand is a valid character in identifiers). @d double_quote = """" {delimits strings} @d number_sign = "#" {marks an |int_literal|} @d comment = "%" {ignore the rest of a \.{.bst} or \TeX\ line} @d single_quote = "'" {marks a quoted function} @d left_paren = "(" {optional database entry left delimiter} @d right_paren = ")" {corresponding right delimiter} @d comma = "," {separates various things} @d minus_sign = "-" {for a negative number} @d equals_sign = "=" {separates a field name from a field value} @d at_sign = "@@" {the beginning of a database entry} @d left_brace = "{" {left delimiter of many things} @d right_brace = "}" {corresponding right delimiter} @d period = "." {these are three} @d question_mark = "?" {string-ending characters} @d exclamation_mark = "!" {of interest in \.{add.period\$}} @d tie = "~" {the default space char, in \.{format.name\$}} @d hyphen = "-" {like |white_space|, in \.{format.name\$}} @d star = "*" {for including entire database} @d concat_char = "#" {for concatenating field tokens} @d colon = ":" {for lower-casing (usually title) strings} @d backslash = "\" {used to recognize accented characters} @ These arrays give a lexical classification for the |ASCII_code|s; |lex_class| is used for general scanning and |id_class| is used for scanning identifiers. @= @!lex_class: array [ASCII_code] of lex_type; @!id_class: array [ASCII_code] of id_type; @ Every character has two types of the lexical classifications. The first type is general, and the second type tells whether the character is legal in identifiers. @d illegal = 0 {the unrecognized |ASCII_code|s} @d white_space = 1 {things like |space|s that you can't see} @d alpha = 2 {the upper- and lower-case letters} @d numeric = 3 {the ten digits} @d sep_char = 4 {things sometimes treated like |white_space|} @d other_lex = 5 {when none of the above applies} @d last_lex = 5 {the same number as on the line above} @# @d illegal_id_char = 0 {a few forbidden ones} @d legal_id_char = 1 {most printing characters} @= @!lex_type = 0..last_lex;@/ @!id_type = 0..1; @ @^character set dependencies@> @^system dependencies@> Now we initialize the system-dependent |lex_class| array. The |tab| character may be system dependent. Note that the order of these assignments is important here. @= for i:=0 to @'177 do lex_class[i] := other_lex; for i:=0 to @'37 do lex_class[i] := illegal; lex_class[invalid_code] := illegal; lex_class[tab] := white_space; lex_class[space] := white_space; lex_class[tie] := sep_char; lex_class[hyphen] := sep_char; for i:=@'60 to @'71 do lex_class[i] := numeric; for i:=@'101 to @'132 do lex_class[i] := alpha; for i:=@'141 to @'172 do lex_class[i] := alpha; @ @^character set dependencies@> @^system dependencies@> And now the |id_class| array. @= for i:=0 to @'177 do id_class[i] := legal_id_char; for i:=0 to @'37 do id_class[i] := illegal_id_char; id_class[space] := illegal_id_char; id_class[tab] := illegal_id_char; id_class[double_quote] := illegal_id_char; id_class[number_sign] := illegal_id_char; id_class[comment] := illegal_id_char; id_class[single_quote] := illegal_id_char; id_class[left_paren] := illegal_id_char; id_class[right_paren] := illegal_id_char; id_class[comma] := illegal_id_char; id_class[equals_sign] := illegal_id_char; id_class[left_brace] := illegal_id_char; id_class[right_brace] := illegal_id_char; @ The array |char_width| gives relative printing widths of each |ASCII_code|, and |string_width| will be used later to sum up |char_width|s in a string. @= @!char_width : array [ASCII_code] of integer; @!string_width : integer; @ @^character set dependencies@> @^system dependencies@> Now we initialize the system-dependent |char_width| array, for which |space| is the only |white_space| character given a nonzero printing width. The widths here are taken from Stanford's June~'87 $cmr10$~font and represent hundredths of a point (rounded), but since they're used only for relative comparisons, the units have no meaning. @d ss_width = 500 {character |@'31|'s width in the $cmr10$ font} @d ae_width = 722 {character |@'32|'s width in the $cmr10$ font} @d oe_width = 778 {character |@'33|'s width in the $cmr10$ font} @d upper_ae_width = 903 {character |@'35|'s width in the $cmr10$ font} @d upper_oe_width = 1014 {character |@'36|'s width in the $cmr10$ font} @= for i:=0 to @'177 do char_width[i] := 0; @# char_width[@'40] := 278; char_width[@'41] := 278; char_width[@'42] := 500; char_width[@'43] := 833; char_width[@'44] := 500; char_width[@'45] := 833; char_width[@'46] := 778; char_width[@'47] := 278; char_width[@'50] := 389; char_width[@'51] := 389; char_width[@'52] := 500; char_width[@'53] := 778; char_width[@'54] := 278; char_width[@'55] := 333; char_width[@'56] := 278; char_width[@'57] := 500; char_width[@'60] := 500; char_width[@'61] := 500; char_width[@'62] := 500; char_width[@'63] := 500; char_width[@'64] := 500; char_width[@'65] := 500; char_width[@'66] := 500; char_width[@'67] := 500; char_width[@'70] := 500; char_width[@'71] := 500; char_width[@'72] := 278; char_width[@'73] := 278; char_width[@'74] := 278; char_width[@'75] := 778; char_width[@'76] := 472; char_width[@'77] := 472; char_width[@'100] := 778; char_width[@'101] := 750; char_width[@'102] := 708; char_width[@'103] := 722; char_width[@'104] := 764; char_width[@'105] := 681; char_width[@'106] := 653; char_width[@'107] := 785; char_width[@'110] := 750; char_width[@'111] := 361; char_width[@'112] := 514; char_width[@'113] := 778; char_width[@'114] := 625; char_width[@'115] := 917; char_width[@'116] := 750; char_width[@'117] := 778; char_width[@'120] := 681; char_width[@'121] := 778; char_width[@'122] := 736; char_width[@'123] := 556; char_width[@'124] := 722; char_width[@'125] := 750; char_width[@'126] := 750; char_width[@'127] :=1028; char_width[@'130] := 750; char_width[@'131] := 750; char_width[@'132] := 611; char_width[@'133] := 278; char_width[@'134] := 500; char_width[@'135] := 278; char_width[@'136] := 500; char_width[@'137] := 278; char_width[@'140] := 278; char_width[@'141] := 500; char_width[@'142] := 556; char_width[@'143] := 444; char_width[@'144] := 556; char_width[@'145] := 444; char_width[@'146] := 306; char_width[@'147] := 500; char_width[@'150] := 556; char_width[@'151] := 278; char_width[@'152] := 306; char_width[@'153] := 528; char_width[@'154] := 278; char_width[@'155] := 833; char_width[@'156] := 556; char_width[@'157] := 500; char_width[@'160] := 556; char_width[@'161] := 528; char_width[@'162] := 392; char_width[@'163] := 394; char_width[@'164] := 389; char_width[@'165] := 556; char_width[@'166] := 528; char_width[@'167] := 722; char_width[@'170] := 528; char_width[@'171] := 528; char_width[@'172] := 444; char_width[@'173] := 500; char_width[@'174] :=1000; char_width[@'175] := 500; char_width[@'176] := 500; @* Input and output. The basic operations we need to do are (1)~inputting and outputting of text characters to or from a file; (2)~instructing the operating system to initiate (``open'') or to terminate (``close'') input or output to or from a specified file; and (3)~testing whether the end of an input file has been reached. @= @!alpha_file=packed file of text_char; {files that contain textual data} @ @^system dependencies@> Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in \PASCAL, i.e., the routines called |get|, |put|, |eof|, and so on. But standard \PASCAL\ does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement \BibTeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| is crucial for our purposes. We shall assume that |name_of_file| is a variable of an appropriate type such that the \PASCAL\ run-time system being used to implement \BibTeX\ can open a file whose external name is specified by |name_of_file|. \BibTeX\ does no case conversion for file names. @= @!name_of_file:packed array[1..file_name_size] of char; {on some systems this is a \&{record} variable} @!name_length:0..file_name_size; {this many characters are relevant in |name_of_file| (the rest are blank)} @!name_ptr:0..file_name_size+1; {index variable into |name_of_file|} @ @^system dependencies@> @:PASCAL H}{\ph@> The \ph\ compiler with which the present version of \TeX\ was prepared has extended the rules of \PASCAL\ in a very convenient way. To open file~|f|, we can write $$\vbox{\halign{#\hfil\qquad&#\hfil\cr |reset(f,@t\\{name}@>,'/O')|&for input;\cr |rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$ The `\\{name}' parameter, which is of type `\ignorespaces|packed array[@t\<\\{any}>@>] of text_char|', stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in \\{name} are ignored. The `\.{/O}' parameter tells the operating system not to issue its own error messages if something goes wrong. If a file of the specified name cannot be found, or if such a file cannot be opened for some other reason (e.g., someone may already be trying to write the same file), we will have |@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|. This allows \TeX\ to undertake appropriate corrective action. \TeX's file-opening procedures return |false| if no file identified by |name_of_file| could be opened. @d reset_OK(#)==erstat(#)=0 @d rewrite_OK(#)==erstat(#)=0 @= function erstat(var f:file):integer; extern; {in the runtime library} @#@t\2@> function a_open_in(var f:alpha_file):boolean; {open a text file for input} begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f); end; @# function a_open_out(var f:alpha_file):boolean; {open a text file for output} begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f); end; @ @^system dependencies@> Files can be closed with the \ph\ routine `|close(f)|', which should be used when all input or output with respect to |f| has been completed. This makes |f| available to be opened again, if desired; and if |f| was used for output, the |close| operation makes the corresponding external file appear on the user's area, ready to be read. @= procedure a_close(var f:alpha_file); {close a text file} begin close(f); end; @ Text output is easy to do with the ordinary \PASCAL\ |put| procedure, so we don't have to make any other special arrangements. The treatment of text input is more difficult, however, because of the necessary translation to |ASCII_code| values, and because \TeX's conventions should be efficient and they should blend nicely with the user's operating environment. @ Input from text files is read one line at a time, using a routine called |input_ln|. This function is defined in terms of global variables called |buffer| and |last|. The |buffer| array contains |ASCII_code| values, and |last| is an index into this array marking the end of a line of text. (Occasionally, |buffer| is used for something else, in which case it is copied to a temporary array.) @= @!buffer:buf_type; {usually, lines of characters being read} @!last:buf_pointer; {end of the line just input to |buffer|} @ @^save space@> @^space savings@> @^system dependencies@> The type |buf_type| is used for |buffer|, for saved copies of it, or for scratch work. It's not |packed| because otherwise the program would run much slower on some systems (more than 25 percent slower, for example, on a TOPS-20 operating system). But on systems that are byte-addressable and that have a good compiler, packing |buf_type| would save lots of space without much loss of speed. Other modules that have packable arrays are also marked with a ``space savings'' index entry. @= @!buf_pointer = 0..buf_size; {an index into a |buf_type|} @!buf_type = array[buf_pointer] of ASCII_code; {for various buffers} @ @^kludge@> And while we're at it, we declare another buffer for general use. Because buffers are not packed and can get large, we use |sv_buffer| several purposes; this is a bit kludgy, but it helps make the stack space not overflow on some machines. It's used when reading the entire database file (in the \.{read} command) and when doing name-handling (through the alias |name_buf|) in the |built_in| functions \.{format.names\$} and \.{num.names\$}. @= @!sv_buffer : buf_type; @!sv_ptr1 : buf_pointer; @!sv_ptr2 : buf_pointer; @!tmp_ptr,@!tmp_end_ptr : integer; {copy pointers only, usually for buffers} @ @.BibTeX capacity exceeded@> When something in the program wants to be bigger or something out there wants to be smaller, it's time to call it a run. Here's the first of several macros that have associated procedures so that they produce less inline code. @d overflow(#)==begin {fatal error---close up shop} print_overflow; print_ln(#:0); goto close_up_shop; end @= procedure print_overflow; begin print ('Sorry---you''ve exceeded BibTeX''s '); mark_fatal; end; @ @.this can't happen@> When something happens that the program thinks is impossible, call the maintainer. @d confusion(#)==begin {fatal error---close up shop} print (#); print_confusion; goto close_up_shop; end @= procedure print_confusion; begin print_ln ('---this can''t happen'); print_ln ('*Please notify the BibTeX maintainer*'); mark_fatal; end; @ @:BibTeX capacity exceeded}{\quad buffer size@> When a buffer overflows, it's time to complain (and then quit). @= procedure buffer_overflow; begin overflow('buffer size ',buf_size); end; @ @:BibTeX capacity exceeded}{\quad buffer size@> The |input_ln| function brings the next line of input from the specified file into available positions of the buffer array and returns the value |true|, unless the file has already been entirely read, in which case it returns |false| and sets |last:=0|. In general, the |ASCII_code| numbers that represent the next line of the file are input into |buffer[0]|, |buffer[1]|, \dots, |buffer[last-1]|; and the global variable |last| is set equal to the length of the line. Trailing |white_space| characters are removed from the line (|white_space| characters are explained in the character-set section% ---most likely they're blanks); thus, either |last=0| (in which case the line was entirely blank) or |lex_class[buffer[last-1]]<>white_space|. An overflow error is given if the normal actions of |input_ln| would make |last>buf_size|. Standard \PASCAL\ says that a file should have |eoln| immediately before |eof|, but \BibTeX\ needs only a weaker restriction: If |eof| occurs in the middle of a line, the system function |eoln| should return a |true| result (even though |f^| will be undefined). @= function input_ln(var f:alpha_file) : boolean; {inputs the next line or returns |false|} label loop_exit; begin last:=0; if (eof(f)) then input_ln:=false else begin while (not eoln(f)) do begin if (last >= buf_size) then buffer_overflow; buffer[last]:=xord[f^]; get(f); incr(last); end; get(f); while (last > 0) do {remove trailing |white_space|} if (lex_class[buffer[last-1]] = white_space) then decr(last) else goto loop_exit; loop_exit: input_ln:=true; end; end; @* String handling. \BibTeX\ uses variable-length strings of seven-bit characters. Since \PASCAL\ does not have a well-developed string mechanism, \BibTeX\ does all its string processing by home-grown (predominantly \TeX's) methods. Unlike \TeX, however, \BibTeX\ does not use a |pool_file| for string storage; it creates its few pre-defined strings at run-time. The necessary operations are handled with a simple data structure. The array |str_pool| contains all the (seven-bit) ASCII codes in all the strings \BibTeX\ must ever search for (generally identifiers names), and the array |str_start| contains indices of the starting points of each such string. Strings are referred to by integer numbers, so that string number |s| comprises the characters |str_pool[j]| for |str_start[s]<=j= @!str_pool : packed array[pool_pointer] of ASCII_code; {the characters} @!str_start : packed array[str_number] of pool_pointer; {the starting pointers} @!pool_ptr : pool_pointer; {first unused position in |str_pool|} @!str_ptr : str_number; {start of the current string being created} @!str_num : str_number; {general index variable into |str_start|} @!p_ptr1,@!p_ptr2 : pool_pointer; {several procedures use these locally} @ Where |pool_pointer| and |str_number| are pointers into |str_pool| and |str_start|. @= @!pool_pointer = 0..pool_size; {for variables that point into |str_pool|} @!str_number = 0..max_strings; {for variables that point into |str_start|} @ These macros send a string in |str_pool| to an output file. @d max_pop = 3 {---see the |built_in| functions section} @# @d print_pool_str(#) == print_a_pool_str(#) {making this a procedure saves a little space} @# @d trace_pr_pool_str(#) == begin out_pool_str(log_file,#); end @ @^kludge@> @^system dependencies@> @:this can't happen}{\quad Illegal string number@> And here are the associated procedures. Note: The |term_out| file is system dependent. @= procedure out_pool_str (var f:alpha_file; @!s:str_number); var i:pool_pointer; begin {allowing |str_ptr <= s < str_ptr+max_pop| is a \.{.bst}-stack kludge} if ((s<0) or (s>=str_ptr+max_pop) or (s>=max_strings)) then confusion ('Illegal string number:',s:0); for i := str_start[s] to str_start[s+1]-1 do write(f,xchr[str_pool[i]]); end; @# procedure print_a_pool_str (@!s:str_number); begin out_pool_str(term_out,s); out_pool_str(log_file,s); end; @ @.WEB@> Several of the elementary string operations are performed using \.{WEB} macros instead of using \PASCAL\ procedures, because many of the operations are done quite frequently and we want to avoid the overhead of procedure calls. For example, here is a simple macro that computes the length of a string. @d length(#) == (str_start[#+1]-str_start[#]) {the number of characters in string number \#} @ @:BibTeX capacity exceeded}{\quad pool size@> Strings are created by appending character codes to |str_pool|. The macro called |append_char|, defined here, does not check to see if the value of |pool_ptr| has gotten too high; this test is supposed to be made before |append_char| is used. To test if there is room to append |l| more characters to |str_pool|, we shall write |str_room(l)|, which aborts \BibTeX\ and gives an error message if there isn't enough room. @d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|} begin str_pool[pool_ptr]:=#; incr(pool_ptr); end @# @d str_room(#) == {make sure that the pool hasn't overflowed} begin if (pool_ptr+# > pool_size) then pool_overflow; end @= procedure pool_overflow; begin overflow('pool size ',pool_size); end; @ @:BibTeX capacity exceeded}{\quad number of strings@> Once a sequence of characters has been appended to |str_pool|, it officially becomes a string when the function |make_string| is called. It returns the string number of the string it just made. @= function make_string : str_number; {current string enters the pool} begin if (str_ptr=max_strings) then overflow('number of strings ',max_strings); incr(str_ptr); str_start[str_ptr]:=pool_ptr; make_string := str_ptr - 1; end; @ These macros destroy and recreate the string at the end of the pool. @d flush_string == begin decr(str_ptr); pool_ptr := str_start[str_ptr]; end @# @d unflush_string == begin incr(str_ptr); pool_ptr := str_start[str_ptr]; end @ This subroutine compares string |s| with another string that appears in the buffer |buf| between positions |bf_ptr| and |bf_ptr+len-1|; the result is |true| if and only if the strings are equal. @= function str_eq_buf (@!s:str_number; var buf:buf_type; @!bf_ptr,@!len:buf_pointer) : boolean; {test equality of strings} label exit; var i : buf_pointer; {running} @!j : pool_pointer; {indices} begin if (length(s) <> len) then {strings of unequal length} begin str_eq_buf := false; return; end; i := bf_ptr; j := str_start[s]; while (j < str_start[s+1]) do begin if (str_pool[j] <> buf[i]) then begin str_eq_buf := false; return; end; incr(i); incr(j); end; str_eq_buf := true; exit: end; @ This subroutine compares two |str_pool| strings and returns true |true| if and only if the strings are equal. @= function str_eq_str (@!s1,@!s2:str_number) : boolean; label exit; begin if (length(s1) <> length(s2)) then begin str_eq_str := false; return; end; p_ptr1 := str_start[s1]; p_ptr2 := str_start[s2]; while (p_ptr1 < str_start[s1+1]) do begin if (str_pool[p_ptr1] <> str_pool[p_ptr2]) then begin str_eq_str := false; return; end; incr(p_ptr1); incr(p_ptr2); end; str_eq_str:=true; exit: end; @ @:BibTeX capacity exceeded}{\quad file name size@> This procedure copies file name |file_name| into the beginning of |name_of_file|, if it will fit. It also sets the global variable |name_length| to the appropriate value. @= procedure start_name (@!file_name:str_number); var p_ptr: pool_pointer; {running index} begin if (length(file_name) > file_name_size) then begin print ('File='); print_pool_str (file_name); print_ln (','); file_nm_size_overflow; end; name_ptr := 1; p_ptr := str_start[file_name]; while (p_ptr < str_start[file_name+1]) do begin name_of_file[name_ptr] := chr (str_pool[p_ptr]); incr(name_ptr); incr(p_ptr); end; name_length := length(file_name); end; @ @:BibTeX capacity exceeded}{\quad file name size@> Yet another complaint-before-quiting. @= procedure file_nm_size_overflow; begin overflow('file name size ',file_name_size); end; @ @:BibTeX capacity exceeded}{\quad file name size@> This procedure copies file extension |ext| into the array |name_of_file| starting at position |name_length+1|. It also sets the global variable |name_length| to the appropriate value. @= procedure add_extension(@!ext:str_number); var p_ptr: pool_pointer; {running index} begin if (name_length + length(ext) > file_name_size) then begin print ('File=',name_of_file,', extension='); print_pool_str (ext); print_ln (','); file_nm_size_overflow; end; name_ptr := name_length + 1; p_ptr := str_start[ext]; while (p_ptr < str_start[ext+1]) do begin name_of_file[name_ptr] := chr (str_pool[p_ptr]); incr(name_ptr); incr(p_ptr); end; name_length := name_length + length(ext); name_ptr := name_length+1; while (name_ptr <= file_name_size) do {pad with blanks} begin name_of_file[name_ptr] := ' '; incr(name_ptr); end; end; @ @:BibTeX capacity exceeded}{\quad file name size@> This procedure copies the default logical area name |area| into the array |name_of_file| starting at position 1, after shifting up the rest of the filename. It also sets the global variable |name_length| to the appropriate value. @= procedure add_area(@!area:str_number); var p_ptr: pool_pointer; {running index} begin if (name_length + length(area) > file_name_size) then begin print ('File='); print_pool_str (area); print (name_of_file,','); file_nm_size_overflow; end; name_ptr := name_length; while (name_ptr > 0) do {shift up name} begin name_of_file[name_ptr+length(area)] := name_of_file[name_ptr]; decr(name_ptr); end; name_ptr := 1; p_ptr := str_start[area]; while (p_ptr < str_start[area+1]) do begin name_of_file[name_ptr] := chr (str_pool[p_ptr]); incr(name_ptr); incr(p_ptr); end; name_length := name_length + length(area); end; @ This system-independent procedure converts upper-case characters to lower case for the specified part of |buf|. It is system independent because it uses only the internal representation for characters. @d case_difference = "a" - "A" @= procedure lower_case (var buf:buf_type; @!bf_ptr,@!len:buf_pointer); var i:buf_pointer; begin if (len > 0) then for i := bf_ptr to bf_ptr+len-1 do if ((buf[i]>="A") and (buf[i]<="Z")) then buf[i] := buf[i] + case_difference; end; @ This system-independent procedure is the same as the previous except that it converts lower- to upper-case letters. @= procedure upper_case (var buf:buf_type; @!bf_ptr,@!len:buf_pointer); var i:buf_pointer; begin if (len > 0) then for i := bf_ptr to bf_ptr+len-1 do if ((buf[i]>="a") and (buf[i]<="z")) then buf[i] := buf[i] - case_difference; end; @* The hash table. All static strings that \BibTeX\ might have to search for, generally identifiers, are stored and retrieved by means of a fairly standard hash-table algorithm (but slightly altered here) called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C in {\sl The Art of Computer Programming}). Once a string enters the table, it is never removed. The actual sequence of characters forming a string is stored in the |str_pool| array. The hash table consists of the four arrays |hash_next|, |hash_text|, |hash_ilk|, and |ilk_info|. The first array, |hash_next[p]|, points to the next identifier belonging to the same coalesced list as the identifier corresponding to~|p|. The second, |hash_text[p]|, points to the |str_start| entry for |p|'s string. If position~|p| of the hash table is empty, we have |hash_text[p]=0|; if position |p| is either empty or the end of a coalesced hash list, we have |hash_next[p]=empty|; an auxiliary pointer variable called |hash_used| is maintained in such a way that all locations |p>=hash_used| are nonempty. The third, |hash_ilk[p]|, tells how this string is used (as ordinary text, as a variable name, as an \.{.aux} file command, etc). The fourth, |ilk_info[p]|, contains information specific to the corresponding |hash_ilk|---for |integer_ilk|s: the integer's value; for |cite_ilk|s: a pointer into |cite_list|; for |lc_cite_ilk|s: a pointer to a |cite_ilk| string; for |command_ilk|s: a constant to be used in a |case| statement; for |bst_fn_ilk|s: function-specific information; for |macro_ilk|s: a pointer to its definition string; for |control_seq_ilk|s: a constant for use in a |case| statement; for all other |ilk|s it contains no information. This |ilk|-specific information is set in other parts of the program rather than here in the hashing routine. @d hash_base = empty + 1 {lowest numbered hash-table location} @d hash_max = hash_base + hash_size - 1 {highest numbered hash-table location} @d hash_is_full == (hash_used=hash_base) {test if all positions are occupied} @# @d text_ilk = 0 {a string of ordinary text} @d integer_ilk = 1 {an integer (possibly with a |minus_sign|)} @d aux_command_ilk = 2 {an \.{.aux}-file command} @d aux_file_ilk = 3 {an \.{.aux} file name} @d bst_command_ilk = 4 {a \.{.bst}-file command} @d bst_file_ilk = 5 {a \.{.bst} file name} @d bib_file_ilk = 6 {a \.{.bib} file name} @d file_ext_ilk = 7 {one of \.{.aux}, \.{.bst}, \.{.bib}, \.{.bbl}, or \.{.blg}} @d file_area_ilk = 8 {one of \.{texinputs:} or \.{texbib:}} @d cite_ilk = 9 {a \.{\\citation} argument} @d lc_cite_ilk = 10 {a \.{\\citation} argument converted to lower case} @d bst_fn_ilk = 11 {a \.{.bst} function name} @d bib_command_ilk = 12 {a \.{.bib}-file command} @d macro_ilk = 13 {a \.{.bst} macro or a \.{.bib} string} @d control_seq_ilk = 14 {a control sequence specifying a foreign character} @d last_ilk = 14 {the same number as on the line above} @= @!hash_loc=hash_base..hash_max; {a location within the hash table} @!hash_pointer=empty..hash_max; {either |empty| or a |hash_loc|} @# @!str_ilk=0..last_ilk; {the legal string types} @ @= @!hash_next : packed array[hash_loc] of hash_pointer; {coalesced-list link} @!hash_text : packed array[hash_loc] of str_number; {pointer to a string} @!hash_ilk : packed array[hash_loc] of str_ilk; {the type of string} @!ilk_info : packed array[hash_loc] of integer; {|ilk|-specific info} @!hash_used : hash_base..hash_max+1; {allocation pointer for hash table} @!hash_found : boolean; {set to |true| if it's already in the hash table} @!dummy_loc : hash_loc; {receives |str_lookup| value whenever it's useless} @ @= @!k:hash_loc; @ Now it's time to initialize the hash table; note that |str_start[0]| must be unused if |hash_text[k] := 0| is to have the desired effect. @= for k:=hash_base to hash_max do begin hash_next[k] := empty; hash_text[k] := 0; {thus, no need to initialize |hash_ilk| or |ilk_info|} end; hash_used := hash_max + 1; {nothing in table initially} @ Here is the subroutine that searches the hash table for a (string,~|str_ilk|) pair, where the string is of length |l>=0| and appears in |buffer[j..(j+l-1)]|. If it finds the pair, it returns the corresponding hash-table location and sets the global variable |hash_found| to |true|. Otherwise it sets |hash_found| to |false|, and if the parameter |insert_it| is |true|, it inserts the pair into the hash table, inserts the string into |str_pool| if not previously encountered, and returns its location. Note that two different pairs can have the same string but different |str_ilk|s, in which case the second pair encountered, if |insert_it| were |true|, would be inserted into the hash table though its string wouldn't be inserted into |str_pool| because it would already be there. @d max_hash_value = hash_prime+hash_prime-2+127 {|h|'s maximum value} @d do_insert == true {insert string if not found in hash table} @d dont_insert == false {don't insert string} @# @d str_found = 40 {go here when you've found the string} @d str_not_found = 45 {go here when you haven't} @= function str_lookup(var buf:buf_type; @!j,@!l:buf_pointer; @!ilk:str_ilk; @!insert_it:boolean) : hash_loc; {search the hash table} label str_found,@!str_not_found; var h:0..max_hash_value; {hash code} @!p:hash_loc; {index into |hash_| arrays} @!k:buf_pointer; {index into |buf| array} @!old_string:boolean; {set to |true| if it's an already encountered string} @!str_num:str_number; {pointer to an already encountered string} begin @; p:=h+hash_base; {start searching here; note that |0<=h; if (hash_next[p]=empty) then {location |p| may or may not be empty} begin if (not insert_it) then goto str_not_found; @; goto str_found; end; p:=hash_next[p]; {old and new locations |p| are not empty} end; str_not_found: do_nothing; {don't insert pair; function value meaningless} str_found: str_lookup:=p; end; @ @^for loops@> @.WEB@> The value of |hash_prime| should be roughly 85\% of |hash_size|, and it should be a prime number (it should also be less than $2^{14} + 2^{6} = 16320$ because of \.{WEB}'s simple-macro bound). The theory of hashing tells us to expect fewer than two table probes, on the average, when the search is successful. @= begin h := 0; {note that this works for zero-length strings} k := j; while (k < j+l) do {not a |for| loop in case |j = l = 0|} begin h:=h+h+buf[k]; while (h >= hash_prime) do h:=h-hash_prime; incr(k); end; end @ Here we handle the case in which we've already encountered this string; note that even if we have, we'll still have to insert the pair into the hash table if |str_ilk| doesn't match. @= begin if (hash_text[p]>0) then {there's something here} if (str_eq_buf(hash_text[p],buf,j,l)) then {it's the right string} if (hash_ilk[p] = ilk) then {it's the right |str_ilk|} begin hash_found := true; goto str_found; end else begin {it's the wrong |str_ilk|} old_string := true; str_num := hash_text[p]; end; end @ @^for loops@> @:BibTeX capacity exceeded}{\quad hash size@> This code inserts the pair in the appropriate unused location. @= begin if (hash_text[p]>0) then {location |p| isn't empty} begin repeat if (hash_is_full) then overflow('hash size ',hash_size); decr(hash_used); until (hash_text[hash_used]=0); {search for an empty location} hash_next[p]:=hash_used; p:=hash_used; end; {now location |p| is empty} if (old_string) then {it's an already encountered string} hash_text[p] := str_num else begin {it's a new string} str_room(l); {make sure it'll fit in |str_pool|} k := j; while (k < j+l) do {not a |for| loop in case |j = l = 0|} begin append_char(buf[k]); incr(k); end; hash_text[p] := make_string; {and make it official} end; hash_ilk[p] := ilk; end @ @^string pool@> Now that we've defined the hash-table workings we can initialize the string pool. Unlike \TeX, \BibTeX\ does not use a |pool_file| for string storage; instead it inserts its pre-defined strings into |str_pool|---this makes one file fewer for the \BibTeX\ implementor to deal with. This section initializes |str_pool|; the pre-defined strings will be inserted into it shortly; and other strings are inserted while processing the input files. @= pool_ptr:=0; str_ptr:=1; {hash table must have |str_start[0]| unused} str_start[str_ptr]:=pool_ptr; @ The longest pre-defined string determines type definitions used to insert the pre-defined strings into |str_pool|. @d longest_pds=12 {the length of `\.{change.case\$}'} @= @!pds_loc = 1..longest_pds; @!pds_len = 0..longest_pds; @!pds_type = packed array [pds_loc] of char; @ The variables in this program beginning with |s_| specify the locations in |str_pool| for certain often-used strings. Those here have to do with the file system; the next section will actually insert them into |str_pool|. @= @!s_aux_extension : str_number; {\.{.aux}} @!s_log_extension : str_number; {\.{.blg}} @!s_bbl_extension : str_number; {\.{.bbl}} @!s_bst_extension : str_number; {\.{.bst}} @!s_bib_extension : str_number; {\.{.bib}} @!s_bst_area : str_number; {\.{texinputs:}} @!s_bib_area : str_number; {\.{texbib:}} @ @^important note@> @^system dependencies@> It's time to insert some of the pre-defined strings into |str_pool| (and thus the hash table). These system-dependent strings should contain no upper-case letters, and they must all be exactly |longest_pds| characters long (even if fewer characters are actually stored). The |pre_define| routine appears shortly. Important notes: These pre-definitions must not have any glitches or the program may bomb because the |log_file| hasn't been opened yet, and |text_ilk|s should be pre-defined later, for \.{.bst}-function-execution purposes. @= pre_define('.aux ',4,file_ext_ilk); s_aux_extension := hash_text[pre_def_loc]; pre_define('.bbl ',4,file_ext_ilk); s_bbl_extension := hash_text[pre_def_loc]; pre_define('.blg ',4,file_ext_ilk); s_log_extension := hash_text[pre_def_loc]; pre_define('.bst ',4,file_ext_ilk); s_bst_extension := hash_text[pre_def_loc]; pre_define('.bib ',4,file_ext_ilk); s_bib_extension := hash_text[pre_def_loc]; pre_define('texinputs: ',10,file_area_ilk); s_bst_area := hash_text[pre_def_loc]; pre_define('texbib: ',7,file_area_ilk); s_bib_area := hash_text[pre_def_loc]; @ This global variable gives the hash-table location of pre-defined strings generated by calls to |str_lookup|. @= @!pre_def_loc : hash_loc; @ This procedure initializes a pre-defined string of length at most |longest_pds|. @= procedure pre_define (@!pds:pds_type; @!len:pds_len; @!ilk:str_ilk); var i : pds_len; begin for i:=1 to len do buffer[i] := xord[pds[i]]; pre_def_loc := str_lookup(buffer,1,len,ilk,do_insert); end; @ These constants all begin with |n_| and are used for the |case| statement that determines which command to execute. The variable |command_num| is set to one of these and is used to do the branching, but it must have the full |integer| range because at times it can assume an arbitrary |ilk_info| value (though it will be one of the values here when we actually use it). @d n_aux_bibdata = 0 {\.{\\bibdata}} @d n_aux_bibstyle = 1 {\.{\\bibstyle}} @d n_aux_citation = 2 {\.{\\citation}} @d n_aux_input = 3 {\.{\\@@input}} @# @d n_bst_entry = 0 {\.{entry}} @d n_bst_execute = 1 {\.{execute}} @d n_bst_function = 2 {\.{function}} @d n_bst_integers = 3 {\.{integers}} @d n_bst_iterate = 4 {\.{iterate}} @d n_bst_macro = 5 {\.{macro}} @d n_bst_read = 6 {\.{read}} @d n_bst_reverse = 7 {\.{reverse}} @d n_bst_sort = 8 {\.{sort}} @d n_bst_strings = 9 {\.{strings}} @# @d n_bib_comment = 0 {\.{comment}} @d n_bib_preamble = 1 {\.{preamble}} @d n_bib_string = 2 {\.{string}} @= @!command_num : integer; @ @^important note@> Now we pre-define the command strings; they must all be exactly |longest_pds| characters long. Important note: These pre-definitions must not have any glitches or the program may bomb because the |log_file| hasn't been opened yet. @= pre_define('\citation ',9,aux_command_ilk); ilk_info[pre_def_loc] := n_aux_citation; pre_define('\bibdata ',8,aux_command_ilk); ilk_info[pre_def_loc] := n_aux_bibdata; pre_define('\bibstyle ',9,aux_command_ilk); ilk_info[pre_def_loc] := n_aux_bibstyle; pre_define('\@@input ',7,aux_command_ilk); ilk_info[pre_def_loc] := n_aux_input; @# pre_define('entry ',5,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_entry; pre_define('execute ',7,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_execute; pre_define('function ',8,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_function; pre_define('integers ',8,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_integers; pre_define('iterate ',7,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_iterate; pre_define('macro ',5,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_macro; pre_define('read ',4,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_read; pre_define('reverse ',7,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_reverse; pre_define('sort ',4,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_sort; pre_define('strings ',7,bst_command_ilk); ilk_info[pre_def_loc] := n_bst_strings; @# pre_define('comment ',7,bib_command_ilk); ilk_info[pre_def_loc] := n_bib_comment; pre_define('preamble ',8,bib_command_ilk); ilk_info[pre_def_loc] := n_bib_preamble; pre_define('string ',6,bib_command_ilk); ilk_info[pre_def_loc] := n_bib_string; @* Scanning an input line. This section describes the various |buffer| scanning routines. The two global variables |buf_ptr1| and |buf_ptr2| are used in scanning an input line. Between scans, |buf_ptr1| points to the first character of the current token and |buf_ptr2| points to that of the next. The global variable |last|, set by the function |input_ln|, marks the end of the current line; it equals 0 at the end of the current file. All the procedures and functions in this section will indicate an end-of-line when it's the end of the file. @d token_len == (buf_ptr2 - buf_ptr1) {of the current token} @d scan_char == buffer[buf_ptr2] {the current character} @= @!buf_ptr1:buf_pointer; {points to the first position of the current token} @!buf_ptr2:buf_pointer; {used to find the end of the current token} @ These macros send the current token, in |buffer[buf_ptr1]| to |buffer[buf_ptr2-1]|, to an output file. @d print_token == print_a_token {making this a procedure saves a little space} @# @d trace_pr_token == begin out_token(log_file); end @ @^system dependencies@> And here are the associated procedures. Note: The |term_out| file is system dependent. @= procedure out_token (var f:alpha_file); var i:buf_pointer; begin i := buf_ptr1; while (i < buf_ptr2) do begin write(f,xchr[buffer[i]]); incr(i); end; end; @# procedure print_a_token; begin out_token(term_out); out_token(log_file); end; @ This function scans the |buffer| for the next token, starting at the global variable |buf_ptr2| and ending just before either the single specified stop-character or the end of the current line, whichever comes first, respectively returning |true| or |false|; afterward, |scan_char| is the first character following this token. @= function scan1 (@!char1:ASCII_code) : boolean; begin buf_ptr1 := buf_ptr2; {scan until end-of-line or the specified character} while ((scan_char <> char1) and (buf_ptr2 < last)) do incr(buf_ptr2); if (buf_ptr2 < last) then scan1 := true else scan1 := false; end; @ This function is the same but stops at |white_space| characters as well. @= function scan1_white (@!char1:ASCII_code) : boolean; begin buf_ptr1 := buf_ptr2; {scan until end-of-line, the specified character, or |white_space|} while ((lex_class[scan_char] <> white_space) and (scan_char <> char1) and (buf_ptr2 < last)) do incr(buf_ptr2); if (buf_ptr2 < last) then scan1_white := true else scan1_white := false; end; @ This function is similar to |scan1|, but stops at either of two stop-characters as well as the end of the current line. @= function scan2 (@!char1,@!char2:ASCII_code) : boolean; begin buf_ptr1 := buf_ptr2; {scan until end-of-line or the specified characters} while ((scan_char <> char1) and (scan_char <> char2) and (buf_ptr2 < last)) do incr(buf_ptr2); if (buf_ptr2 < last) then scan2 := true else scan2 := false; end; @ This function is the same but stops at |white_space| characters as well. @= function scan2_white (@!char1,@!char2:ASCII_code) : boolean; begin buf_ptr1 := buf_ptr2; {scan until end-of-line, the specified characters, or |white_space|} while ((scan_char <> char1) and (scan_char <> char2) and (lex_class[scan_char] <> white_space) and (buf_ptr2 < last)) do incr(buf_ptr2); if (buf_ptr2 < last) then scan2_white := true else scan2_white := false; end; @ This function is similar to |scan2|, but stops at either of three stop-characters as well as the end of the current line. @= function scan3 (@!char1,@!char2,@!char3:ASCII_code) : boolean; begin buf_ptr1 := buf_ptr2; {scan until end-of-line or the specified characters} while ((scan_char <> char1) and (scan_char <> char2) and (scan_char <> char3) and (buf_ptr2 < last)) do incr(buf_ptr2); if (buf_ptr2 < last) then scan3 := true else scan3 := false; end; @ This function scans for letters, stopping at the first nonletter; it returns |true| if there is at least one letter. @= function scan_alpha : boolean; begin buf_ptr1 := buf_ptr2; {scan until end-of-line or a nonletter} while ((lex_class[scan_char] = alpha) and (buf_ptr2 < last)) do incr(buf_ptr2); if (token_len = 0) then scan_alpha := false else scan_alpha := true; end; @ These are the possible values for |scan_result|; they're set by the |scan_identifier| procedure and are described in the next section. @d id_null = 0 @d specified_char_adjacent = 1 @d other_char_adjacent = 2 @d white_adjacent = 3 @= @!scan_result : id_null..white_adjacent; @ This procedure scans for an identifier, stopping at the first |illegal_id_char|, or stopping at the first character if it's |numeric|. It sets the global variable |scan_result| to |id_null| if the identifier is null, else to |white_adjacent| if it ended at a |white_space| character or an end-of-line, else to |specified_char_adjacent| if it ended at one of |char1| or |char2| or |char3|, else to |other_char_adjacent| if it ended at a nonspecified, non|white_space| |illegal_id_char|. By convention, when some calling code really wants just one or two ``specified'' characters, it merely repeats one of the characters. @= procedure scan_identifier (@!char1,@!char2,@!char3:ASCII_code); begin buf_ptr1 := buf_ptr2; if (lex_class[scan_char] <> numeric) then {scan until end-of-line or an |illegal_id_char|} while ((id_class[scan_char] = legal_id_char) and (buf_ptr2 < last)) do incr(buf_ptr2); if (token_len = 0) then scan_result := id_null else if ((lex_class[scan_char] = white_space) or (buf_ptr2 = last)) then scan_result := white_adjacent else if ((scan_char = char1) or (scan_char = char2) or (scan_char = char3)) then scan_result := specified_char_adjacent else scan_result := other_char_adjacent; end; @ The next two procedures scan for an integer, setting the global variable |token_value| to the corresponding integer. @d char_value == (scan_char - "0") {the value of the digit being scanned} @= @!token_value : integer; {the numeric value of the current token} @ This function scans for a nonnegative integer, stopping at the first nondigit; it sets the value of |token_value| accordingly. It returns |true| if the token was a legal nonnegative integer (i.e., consisted of one or more digits). @= function scan_nonneg_integer : boolean; begin buf_ptr1 := buf_ptr2; token_value := 0; {scan until end-of-line or a nondigit} while ((lex_class[scan_char] = numeric) and (buf_ptr2 < last)) do begin token_value := token_value*10 + char_value; incr(buf_ptr2); end; if (token_len = 0) then {there were no digits} scan_nonneg_integer := false else scan_nonneg_integer := true; end; @ This procedure scans for an integer, stopping at the first nondigit; it sets the value of |token_value| accordingly. It returns |true| if the token was a legal integer (i.e., consisted of an optional |minus_sign| followed by one or more digits). @d negative == (sign_length = 1) {if this integer is negative} @= function scan_integer : boolean; var sign_length : 0..1; {1 if there's a |minus_sign|, 0 if not} begin buf_ptr1 := buf_ptr2; if (scan_char = minus_sign) then {it's a negative number} begin sign_length := 1; incr(buf_ptr2); {skip over the |minus_sign|} end else sign_length := 0; token_value := 0; {scan until end-of-line or a nondigit} while ((lex_class[scan_char] = numeric) and (buf_ptr2 < last)) do begin token_value := token_value*10 + char_value; incr(buf_ptr2); end; if (negative) then token_value := -token_value; if (token_len = sign_length) then {there were no digits} scan_integer := false else scan_integer := true; end; @ This function scans over |white_space| characters, stopping either at the first nonwhite character or the end of the line, respectively returning |true| or |false|. @= function scan_white_space : boolean; begin {scan until end-of-line or a nonwhite} while ((lex_class[scan_char] = white_space) and (buf_ptr2 < last)) do incr(buf_ptr2); if (buf_ptr2 < last) then scan_white_space := true else scan_white_space := false; end; @ The |print_bad_input_line| procedure prints the current input line, splitting it at the character being scanned: It prints |buffer[0]|, |buffer[1]|, \dots, |buffer[buf_ptr2-1]| on one line and |buffer[buf_ptr2]|, \dots, |buffer[last-1]| on the next (and both lines start with a colon between two |space|s). Each |white_space| character is printed as a |space|. @= procedure print_bad_input_line; var bf_ptr : buf_pointer; begin print (' : '); bf_ptr := 0; while (bf_ptr < buf_ptr2) do begin if (lex_class[buffer[bf_ptr]] = white_space) then print (xchr[space]) else print (xchr[buffer[bf_ptr]]); incr(bf_ptr); end; print_newline; print (' : '); bf_ptr := 0; while (bf_ptr < buf_ptr2) do begin print (xchr[space]); incr(bf_ptr); end; bf_ptr := buf_ptr2; while (bf_ptr < last) do begin if (lex_class[buffer[bf_ptr]] = white_space) then print (xchr[space]) else print (xchr[buffer[bf_ptr]]); incr(bf_ptr); end; print_newline;@/ bf_ptr := 0; while ((bf_ptr < buf_ptr2) and (lex_class[buffer[bf_ptr]] = white_space)) do incr(bf_ptr); if (bf_ptr = buf_ptr2) then print_ln ('(Error may have been on previous line)'); mark_error; end; @ This little procedure exists because it's used by at least two other procedures and thus saves some space. @= procedure print_skipping_whatever_remains; begin print ('I''m skipping whatever remains of this '); end; @* Getting the top-level auxiliary file name. @^system dependencies@> These modules read the name of the top-level \.{.aux} file. Some systems will try to find this on the command line; if it's not there it will come from the user's terminal. In either case, the name goes into the |char| array |name_of_file|, and the files relevant to this name are opened. @d aux_found=41 {go here when the \.{.aux} name is legit} @d aux_not_found=46 {go here when it's not} @= @!aux_name_length : 0..file_name_size+1; {\.{.aux} name sans extension} @ @^system dependencies@> @^user abuse@> I mean, this is truly disgraceful. A user has to type something in to the terminal just once during the entire run. And it's not some complicated string where you have to get every last punctuation mark just right, and it's not some fancy list where you get nervous because if you forget one item you have to type the whole thing again; it's just a simple, ordinary, file name. Now you'd think a five-year-old could do it; you'd think it's so simple a user should be able to do it in his sleep. But noooooooooo. He had to sit there droning on and on about who knows what until he exceeded the bounds of common sense, and he probably didn't even realize it. Just pitiful. What's this world coming to? We should probably just delete all his files and be done with him. Note: The |term_out| file is system dependent. @d sam_you_made_the_file_name_too_long == begin sam_too_long_file_name_print; goto aux_not_found; end @= procedure sam_too_long_file_name_print; begin write (term_out,'File name `'); name_ptr := 1; while (name_ptr <= aux_name_length) do begin write (term_out,name_of_file[name_ptr]); incr(name_ptr); end; write_ln (term_out,''' is too long'); end; @ @^system dependencies@> @^user abuse@> We've abused the user enough for one section; suffice it to say here that most of what we said last module still applies. Note: The |term_out| file is system dependent. @d sam_you_made_the_file_name_wrong == begin sam_wrong_file_name_print; goto aux_not_found; end @= procedure sam_wrong_file_name_print; begin write (term_out,'I couldn''t open file name `'); name_ptr := 1; while (name_ptr <= name_length) do begin write (term_out,name_of_file[name_ptr]); incr(name_ptr); end; write_ln (term_out,''''); end; @ @^system dependencies@> This procedure consists of a loop that reads and processes a (nonnull) \.{.aux} file name. It's this module and the next two that must be changed on those systems using command-line arguments. Note: The |term_out| and |term_in| files are system dependent. @= procedure get_the_top_level_aux_file_name; label aux_found,@!aux_not_found; var @@/ begin check_cmnd_line := false; {many systems will change this} loop begin if (check_cmnd_line) then @ else begin write (term_out,'Please type input file name (no extension)--'); if (eoln(term_in)) then {so the first |read| works} read_ln (term_in); aux_name_length := 0; while (not eoln(term_in)) do begin if (aux_name_length = file_name_size) then begin while (not eoln(term_in)) do {discard the rest of the line} get(term_in); sam_you_made_the_file_name_too_long; end; incr(aux_name_length); name_of_file[aux_name_length] := term_in^; get(term_in); end; end; @; aux_not_found: check_cmnd_line := false; end; aux_found: {now we're ready to read the \.{.aux} file} end; @ @^system dependencies@> The switch |check_cmnd_line| tells us whether we're to check for a possible command-line argument. @= @!check_cmnd_line : boolean; {|true| if we're to check the command line} @ @^system dependencies@> Here's where we do the real command-line work. Those systems needing more than a single module to handle the task should add the extras to the ``System-dependent changes'' section. @= begin do_nothing; {the ``default system'' doesn't use the command line} end @ Here we orchestrate this \.{.aux} name's handling: we add the various extensions, try to open the files with the resulting name, and store the name strings we'll need later. @= begin if ((aux_name_length + length(s_aux_extension) > file_name_size) or@| (aux_name_length + length(s_log_extension) > file_name_size) or@| (aux_name_length + length(s_bbl_extension) > file_name_size)) then sam_you_made_the_file_name_too_long; @; @; goto aux_found; end @ Here we set up definitions and declarations for files opened in this section. Each element in |aux_list| (except for |aux_list[aux_stack_size]|, which is always unused) is a pointer to the appropriate |str_pool| string representing the \.{.aux} file name. The array |aux_file| contains the corresponding \PASCAL\ |file| variables. @d cur_aux_str == aux_list[aux_ptr] {shorthand for the current \.{.aux} file} @d cur_aux_file == aux_file[aux_ptr] {shorthand for the current |aux_file|} @d cur_aux_line == aux_ln_stack[aux_ptr] {line number of current \.{.aux} file} @= @!aux_file : array[aux_number] of alpha_file; {open \.{.aux} |file| variables} @!aux_list : array[aux_number] of str_number; {the open \.{.aux} file list} @!aux_ptr : aux_number; {points to the currently open \.{.aux} file} @!aux_ln_stack : array[aux_number] of integer; {open \.{.aux} line numbers} @# @!top_lev_str : str_number; {the top-level \.{.aux} file's name} @# @!log_file : alpha_file; {the |file| variable for the \.{.blg} file} @!bbl_file : alpha_file; {the |file| variable for the \.{.bbl} file} @ Where |aux_number| is the obvious. @= @!aux_number = 0..aux_stack_size; {gives the |aux_list| range} @ @^system dependencies@> We must make sure the (top-level) \.{.aux}, \.{.blg}, and \.{.bbl} files can be opened. @= begin name_length := aux_name_length; {set to last used position} add_extension (s_aux_extension); {this also sets |name_length|} aux_ptr := 0; {initialize the \.{.aux} file stack} if (not a_open_in(cur_aux_file)) then sam_you_made_the_file_name_wrong; @# name_length := aux_name_length; add_extension (s_log_extension); {this also sets |name_length|} if (not a_open_out(log_file)) then sam_you_made_the_file_name_wrong; @# name_length := aux_name_length; add_extension (s_bbl_extension); {this also sets |name_length|} if (not a_open_out(bbl_file)) then sam_you_made_the_file_name_wrong; end @ @:this can't happen}{\quad Already encountered auxiliary file@> This code puts the \.{.aux} file name, both with and without the extension, into the hash table, and it initializes |aux_list|. Note that all previous top-level \.{.aux}-file stuff must have been successful. @= begin name_length := aux_name_length; add_extension (s_aux_extension); {this also sets |name_length|} name_ptr := 1; while (name_ptr <= name_length) do begin buffer[name_ptr] := xord[name_of_file[name_ptr]]; incr(name_ptr); end; top_lev_str := hash_text[ str_lookup(buffer,1,aux_name_length,text_ilk,do_insert)]; cur_aux_str := hash_text[ str_lookup(buffer,1,name_length,aux_file_ilk,do_insert)]; {note that this has initialized |aux_list|} if (hash_found) then begin trace print_aux_name; ecart@/ confusion ('Already encountered auxiliary file'); end; cur_aux_line := 0; {this finishes initializing the top-level \.{.aux} file} end @ Print the name of the current \.{.aux} file, followed by a |newline|. @= procedure print_aux_name; begin print_pool_str (cur_aux_str); print_newline; end; @* Reading the auxiliary file(s). @^auxiliary-file commands@> Now it's time to read the \.{.aux} file. The only commands we handle are \.{\\citation} (there can be arbitrarily many, each having arbitrarily many arguments), \.{\\bibdata} (there can be just one, but it can have arbitrarily many arguments), \.{\\bibstyle} (there can be just one, and it can have just one argument), and \.{\\@@input} (there can be arbitrarily many, each with one argument, and they can be nested to a depth of |aux_stack_size|). Each of these commands is assumed to be on just a single line. The rest of the \.{.aux} file is ignored. @d aux_done=31 {go here when finished with the \.{.aux} files} @= ,@!aux_done @ We keep reading and processing input lines until none left. This is part of the main program; hence, because of the |aux_done| label, there's no conventional |begin|-|end| pair surrounding the entire module. @= print ('The top-level auxiliary file: '); print_aux_name; loop begin {|pop_the_aux_stack| will exit the loop} incr(cur_aux_line); if (not input_ln(cur_aux_file)) then {end of current \.{.aux} file} pop_the_aux_stack else get_aux_command_and_process; end; trace trace_pr_ln ('Finished reading the auxiliary file(s)'); ecart@/ aux_done: last_check_for_aux_errors; @ When we find a bug, we print a message and flush the rest of the line. This macro must be called from within a procedure that has an |exit| label. @d aux_err_return == begin aux_err_print; return; {flush this input line} end @d aux_err(#) == begin print (#); aux_err_return; end @= procedure aux_err_print; begin print ('---line ',cur_aux_line:0,' of file '); print_aux_name;@/ print_bad_input_line; {this call does the |mark_error|} print_skipping_whatever_remains; print_ln ('command') end; @ @:this can't happen}{\quad Illegal auxiliary-file command@> Here are a bunch of macros whose print statements are used at least twice. Thus we save space by making the statements procedures. This macro complains when there's a repeated command that's to be used just once. @d aux_err_illegal_another(#) == begin aux_err_illegal_another_print (#); aux_err_return; end @= procedure aux_err_illegal_another_print (@!cmd_num : integer); begin print ('Illegal, another \bib'); case (cmd_num) of n_aux_bibdata : print ('data'); n_aux_bibstyle : print ('style'); othercases confusion ('Illegal auxiliary-file command') endcases; print (' command'); end; @ This one complains when a command is missing its |right_brace|. @d aux_err_no_right_brace == begin aux_err_no_right_brace_print; aux_err_return; end @= procedure aux_err_no_right_brace_print; begin print ('No "',xchr[right_brace],'"'); end; @ This one complains when a command has stuff after its |right_brace|. @d aux_err_stuff_after_right_brace == begin aux_err_stuff_after_right_brace_print; aux_err_return; end @= procedure aux_err_stuff_after_right_brace_print; begin print ('Stuff after "',xchr[right_brace],'"'); end; @ And this one complains when a command has |white_space| in its argument. @d aux_err_white_space_in_argument == begin aux_err_white_space_in_argument_print; aux_err_return; end @= procedure aux_err_white_space_in_argument_print; begin print ('White space in argument'); end; @ @^auxiliary-file commands@> @:this can't happen}{\quad Unknown auxiliary-file command@> We're not at the end of an \.{.aux} file, so we see if the current line might be a command of interest. A command of interest will be a line without blanks, consisting of a command name, a |left_brace|, one or more arguments separated by commas, and a |right_brace|. @= procedure get_aux_command_and_process; label exit; begin buf_ptr2 := 0; {mark the beginning of the next token} if (not scan1(left_brace)) then {no |left_brace|---flush line} return; command_num := ilk_info[ str_lookup(buffer,buf_ptr1,token_len,aux_command_ilk,dont_insert)]; if (hash_found) then case (command_num) of n_aux_bibdata : aux_bib_data_command; n_aux_bibstyle : aux_bib_style_command; n_aux_citation : aux_citation_command; n_aux_input : aux_input_command; othercases confusion ('Unknown auxiliary-file command') endcases; exit: end; @ Here we introduce some variables for processing a \.{\\bibdata} command. Each element in |bib_list| (except for |bib_list[max_bib_files]|, which is always unused) is a pointer to the appropriate |str_pool| string representing the \.{.bib} file name. The array |bib_file| contains the corresponding \PASCAL\ |file| variables. @d cur_bib_str == bib_list[bib_ptr] {shorthand for current \.{.bib} file} @d cur_bib_file == bib_file[bib_ptr] {shorthand for current |bib_file|} @= @!bib_list : array[bib_number] of str_number; {the \.{.bib} file list} @!bib_ptr : bib_number; {pointer for the current \.{.bib} file} @!num_bib_files : bib_number; {the total number of \.{.bib} files} @!bib_seen : boolean; {|true| if we've already seen a \.{\\bibdata} command} @!bib_file : array[bib_number] of alpha_file; {corresponding |file| variables} @ Where |bib_number| is the obvious. @= @!bib_number = 0..max_bib_files; {gives the |bib_list| range} @ @= bib_ptr := 0; {this makes |bib_list| empty} bib_seen := false; {we haven't seen a \.{\\bibdata} command yet} @ @:auxiliary-file commands}{\quad \.{\\bibdata}@> A \.{\\bibdata} command will have its arguments between braces and separated by commas. There must be exactly one such command in the \.{.aux} file(s). All upper-case letters are converted to lower case. @= procedure aux_bib_data_command; label exit; begin if (bib_seen) then aux_err_illegal_another (n_aux_bibdata); bib_seen := true; {now we've seen a \.{\\bibdata} command} while (scan_char <> right_brace) do begin incr(buf_ptr2); {skip over the previous stop-character} if (not scan2_white(right_brace,comma)) then aux_err_no_right_brace; if (lex_class[scan_char] = white_space) then aux_err_white_space_in_argument; if ((last > buf_ptr2+1) and (scan_char = right_brace)) then aux_err_stuff_after_right_brace; @; end; exit: end; @ Here's a procedure we'll need shortly. It prints the name of the current \.{.bib} file, followed by a |newline|. @= procedure print_bib_name; begin print_pool_str (cur_bib_str); print_pool_str (s_bib_extension); print_newline; end; @ This macro is similar to |aux_err| but it complains specifically about opening a file for a \.{\\bibdata} command. @d open_bibdata_aux_err(#) == begin print (#); print_bib_name; aux_err_return; {this does the |mark_error|} end @ @:BibTeX capacity exceeded}{\quad number of \.{.bib} files@> Now we add the just-found argument to |bib_list| if it hasn't already been encountered as a \.{\\bibdata} argument and if, after appending the |s_bib_extension| string, the resulting file name can be opened. @= begin if (bib_ptr = max_bib_files) then overflow('number of database files ',max_bib_files); cur_bib_str := hash_text[ str_lookup(buffer,buf_ptr1,token_len,bib_file_ilk,do_insert)]; if (hash_found) then {already encountered this as a \.{\\bibdata} argument} open_bibdata_aux_err ('This database file appears more than once: '); start_name (cur_bib_str); add_extension (s_bib_extension); if (not a_open_in(cur_bib_file)) then begin add_area (s_bib_area); if (not a_open_in(cur_bib_file)) then open_bibdata_aux_err ('I couldn''t open database file '); end; trace trace_pr_pool_str (cur_bib_str); trace_pr_pool_str (s_bib_extension); trace_pr_ln (' is a bibdata file'); ecart@/ incr(bib_ptr); end @ Here we introduce some variables for processing a \.{\\bibstyle} command. @= @!bst_seen : boolean; {|true| if we've already seen a \.{\\bibstyle} command} @!bst_str : str_number; {the string number for the \.{.bst} file} @!bst_file : alpha_file; {the corresponding |file| variable} @ And we initialize. @= bst_str := 0; {mark |bst_str| as unused} bst_seen := false; {we haven't seen a \.{\\bibstyle} command yet} @ @:auxiliary-file commands}{\quad \.{\\bibstyle}@> A \.{\\bibstyle} command will have exactly one argument, and it will be between braces. There must be exactly one such command in the \.{.aux} file(s). All upper-case letters are converted to lower case. @= procedure aux_bib_style_command; label exit; begin if (bst_seen) then aux_err_illegal_another (n_aux_bibstyle); bst_seen := true; {now we've seen a \.{\\bibstyle} command} incr(buf_ptr2); {skip over the |left_brace|} if (not scan1_white(right_brace)) then aux_err_no_right_brace; if (lex_class[scan_char] = white_space) then aux_err_white_space_in_argument; if (last > buf_ptr2+1) then aux_err_stuff_after_right_brace; @; exit: end; @ @:this can't happen}{\quad Already encountered style file@> Now we open the file whose name is the just-found argument appended with the |s_bst_extension| string, if possible. @= begin bst_str := hash_text[ str_lookup(buffer,buf_ptr1,token_len,bst_file_ilk,do_insert)]; if (hash_found) then begin trace print_bst_name; ecart@/ confusion ('Already encountered style file'); end; start_name (bst_str); add_extension (s_bst_extension); if (not a_open_in(bst_file)) then begin add_area (s_bst_area); if (not a_open_in(bst_file)) then begin print ('I couldn''t open style file '); print_bst_name;@/ bst_str := 0; {mark as unused again} aux_err_return; end; end; print ('The style file: '); print_bst_name; end @ Print the name of the \.{.bst} file, followed by a |newline|. @= procedure print_bst_name; begin print_pool_str (bst_str); print_pool_str (s_bst_extension); print_newline; end; @ Here we introduce some variables for processing a \.{\\citation} command. Each element in |cite_list| (except for |cite_list[max_cites]|, which is always unused) is a pointer to the appropriate |str_pool| string. The cite-key list is kept in order of occurrence with duplicates removed. @d cur_cite_str == cite_list[cite_ptr] {shorthand for the current cite key} @= @!cite_list : packed array[cite_number] of str_number; {the cite-key list} @!cite_ptr : cite_number; {pointer for the current cite key} @!entry_cite_ptr : cite_number; {cite pointer for the current entry} @!num_cites : cite_number; {the total number of distinct cite keys} @!old_num_cites : cite_number; {set to a previous |num_cites| value} @!citation_seen : boolean; {|true| if we've seen a \.{\\citation} command} @!cite_loc : hash_loc; {the hash-table location of a cite key} @!lc_cite_loc : hash_loc; {and of its lower-case equivalent} @!lc_xcite_loc : hash_loc; {a second |lc_cite_loc| variable} @!cite_found : boolean; {|true| if we've already seen this cite key} @!all_entries : boolean; {|true| if we're to use the entire database} @!all_marker : cite_number; {we put the other entries in |cite_list| here} @ Where |cite_number| is the obvious. @= @!cite_number = 0..max_cites; {gives the |cite_list| range} @ @= cite_ptr := 0; {this makes |cite_list| empty} citation_seen := false; {we haven't seen a \.{\\citation} command yet} all_entries := false; {by default, use just the entries explicitly named} @ @^case mismatch@> @^entire database inclusion@> @^whole database inclusion@> @:LaTeX}{\LaTeX@> @:auxiliary-file commands}{\quad \.{\\citation}@> A \.{\\citation} command will have its arguments between braces and separated by commas. Upper/lower cases are considered to be different for \.{\\citation} arguments, which is the same as the rest of \LaTeX\ but different from the rest of \BibTeX. A cite key needn't exactly case-match its corresponding database key to work, although two cite keys that are case-mismatched will produce an error message. (A {\sl case mismatch\/} is a mismatch, but only because of a case difference.) A \.{\\citation} command having \.{*} as an argument indicates that the entire database will be included (almost as if a \.{\\nocite} command that listed every cite key in the database, in order, had been given at the corresponding spot in the \.{.tex} file). @d next_cite = 23 {read the next argument} @= procedure aux_citation_command; label next_cite,@!exit; begin citation_seen := true; {now we've seen a \.{\\citation} command} while (scan_char <> right_brace) do begin incr(buf_ptr2); {skip over the previous stop-character} if (not scan2_white(right_brace,comma)) then aux_err_no_right_brace; if (lex_class[scan_char] = white_space) then aux_err_white_space_in_argument; if ((last > buf_ptr2+1) and (scan_char = right_brace)) then aux_err_stuff_after_right_brace; @; next_cite: end; exit: end; @ @^kludge@> We must check if (the lower-case version of) this cite key has been previously encountered, and proceed accordingly. The alias kludge helps make the stack space not overflow on some machines. @d ex_buf1== ex_buf {an alias, used only in this module} @= begin trace trace_pr_token; trace_pr (' cite key encountered'); ecart@/ @; tmp_ptr := buf_ptr1; while (tmp_ptr < buf_ptr2) do begin ex_buf1[tmp_ptr] := buffer[tmp_ptr]; incr(tmp_ptr); end; lower_case (ex_buf1, buf_ptr1, token_len); {convert to `canonical' form} lc_cite_loc := str_lookup(ex_buf1,buf_ptr1,token_len,lc_cite_ilk,do_insert); if (hash_found) then {already encountered this as a \.{\\citation} argument} @ else @; {it's a new cite key---add it to |cite_list|} end @ Here we check for a \.{\\citation} command having \.{*} as an argument, indicating that the entire database will be included. @= begin if (token_len = 1) then if (buffer[buf_ptr1] = star) then begin trace trace_pr_ln ('---entire database to be included'); ecart@/ if (all_entries) then begin print_ln ('Multiple inclusions of entire database'); aux_err_return; end else begin all_entries := true; all_marker := cite_ptr; goto next_cite; end; end; end @ @^case mismatch errors@> We've previously encountered the lower-case version, so we check that the actual version exactly matches the actual version of the previously-encountered cite key(s). @= begin trace trace_pr_ln (' previously'); ecart@/ dummy_loc := str_lookup(buffer,buf_ptr1,token_len,cite_ilk,dont_insert); if (not hash_found) then {case mismatch error} begin print ('Case mismatch error between cite keys '); print_token; print (' and '); print_pool_str (cite_list[ilk_info[ilk_info[lc_cite_loc]]]); print_newline; aux_err_return; end; end @ @:this can't happen}{\quad Cite hash error@> Now we add the just-found argument to |cite_list| if there isn't anything funny happening. @= begin trace trace_pr_newline; ecart@/ cite_loc := str_lookup(buffer,buf_ptr1,token_len,cite_ilk,do_insert); if (hash_found) then hash_cite_confusion; check_cite_overflow (cite_ptr); cur_cite_str := hash_text[cite_loc]; ilk_info[cite_loc] := cite_ptr; ilk_info[lc_cite_loc] := cite_loc; incr(cite_ptr); end @ @:this can't happen}{\quad Cite hash error@> Here's a serious complaint (that is, a bug) concerning hash problems. This is the first of several similar bug-procedures that exist only because they save space. @= procedure hash_cite_confusion; begin confusion ('Cite hash error'); end; @ @^fetish@> @:BibTeX capacity exceeded}{\quad number of cite keys@> Complain if somebody's got a cite fetish. This procedure is called when were about to add another cite key to |cite_list|. It assumes that |cite_loc| gives the potential cite key's hash table location. @= procedure check_cite_overflow (@!last_cite : cite_number); begin if (last_cite = max_cites) then begin print_pool_str (hash_text[cite_loc]); print_ln (' is the key:'); overflow('number of cite keys ',max_cites); end; end; @ @:auxiliary-file commands}{\quad \.{\\\AT!input}@> An \.{\\@@input} command will have exactly one argument, it will be between braces, and it must have the |s_aux_extension|. All upper-case letters are converted to lower case. @= procedure aux_input_command; label exit; var aux_extension_ok : boolean; {to check for a correct file extension} begin incr(buf_ptr2); {skip over the |left_brace|} if (not scan1_white(right_brace)) then aux_err_no_right_brace; if (lex_class[scan_char] = white_space) then aux_err_white_space_in_argument; if (last > buf_ptr2+1) then aux_err_stuff_after_right_brace; @; exit: end; @ @:BibTeX capacity exceeded}{\quad number of \.{.aux} files@> We must check that this potential \.{.aux} file won't overflow the stack, that it has the correct extension, that we haven't encountered it before (to prevent, among other things, an infinite loop). @= begin incr(aux_ptr); if (aux_ptr = aux_stack_size) then begin print_token; print (': '); overflow('auxiliary file depth ',aux_stack_size); end; aux_extension_ok := true; if (token_len < length(s_aux_extension)) then@/ aux_extension_ok := false {else |str_eq_buf| might bomb the program} else if (not str_eq_buf(s_aux_extension, buffer, buf_ptr2-length(s_aux_extension), length(s_aux_extension))) then aux_extension_ok := false; if (not aux_extension_ok) then begin print_token; print (' has a wrong extension'); decr(aux_ptr); aux_err_return; end; cur_aux_str := hash_text[ str_lookup(buffer,buf_ptr1,token_len,aux_file_ilk,do_insert)]; if (hash_found) then begin print ('Already encountered file '); print_aux_name; decr(aux_ptr); aux_err_return; end; @; end @ We check that this \.{.aux} file can actually be opened, and then open it. @= begin start_name (cur_aux_str); {extension already there for \.{.aux} files} name_ptr := name_length+1; while (name_ptr <= file_name_size) do {pad with blanks} begin name_of_file[name_ptr] := ' '; incr(name_ptr); end; if (not a_open_in(cur_aux_file)) then begin print ('I couldn''t open auxiliary file '); print_aux_name; decr(aux_ptr); aux_err_return; end; print ('A level-',aux_ptr:0,' auxiliary file: '); print_aux_name; cur_aux_line := 0; end @ Here we close the current-level \.{.aux} file and go back up a level, if possible, by decrementing |aux_ptr|. @= procedure pop_the_aux_stack; begin a_close (cur_aux_file); if (aux_ptr=0) then goto aux_done else decr(aux_ptr); end; @ @^gymnastics@> That's it for processing \.{.aux} commands, except for finishing the procedural gymnastics. @= @ @ We must complain if anything's amiss. @d aux_end_err(#) == begin aux_end1_err_print; print (#); aux_end2_err_print; end @= procedure aux_end1_err_print; begin print ('I found no '); end; @# procedure aux_end2_err_print; begin print ('---while reading file '); print_aux_name; mark_error; end; @ Before proceeding, we see if we have any complaints. @= procedure last_check_for_aux_errors; begin num_cites := cite_ptr; {record the number of distinct cite keys} num_bib_files := bib_ptr; {and the number of \.{.bib} files} if (not citation_seen) then aux_end_err ('\citation commands') else if ((num_cites = 0) and (not all_entries)) then aux_end_err ('cite keys'); if (not bib_seen) then aux_end_err ('\bibdata command') else if (num_bib_files = 0) then aux_end_err ('database files'); if (not bst_seen) then aux_end_err ('\bibstyle command') else if (bst_str = 0) then aux_end_err ('style file'); end; @* Reading the style file. This part of the program reads the \.{.bst} file, which consists of a sequence of commands. Each \.{.bst} command consists of a name (for which case differences are ignored) followed by zero or more arguments, each enclosed in braces. @d bst_done=32 {go here when finished with the \.{.bst} file} @d no_bst_file=9932 {go here when skipping the \.{.bst} file} @= ,@!bst_done,@!no_bst_file @ The |bbl_line_num| gets initialized along with the |bst_line_num|, so it's declared here too. @= @!bbl_line_num : integer; {line number of the \.{.bbl} (output) file} @!bst_line_num : integer; {line number of the \.{.bst} file} @ This little procedure exists because it's used by at least two other procedures and thus saves some space. @= procedure bst_ln_num_print; begin print ('--line ',bst_line_num:0,' of file '); print_bst_name; end; @ When there's a serious error parsing the \.{.bst} file, we flush the rest of the current command; a blank line is assumed to mark the end of a command (but for the purposes of error recovery only). Thus, error recovery will be better if style designers leave blank lines between \.{.bst} commands. This macro must be called from within a procedure that has an |exit| label. @d bst_err_print_and_look_for_blank_line_return == begin bst_err_print_and_look_for_blank_line; return; end @d bst_err(#) == begin {serious error during \.{.bst} parsing} print (#); bst_err_print_and_look_for_blank_line_return; end @= procedure bst_err_print_and_look_for_blank_line; begin print ('-'); bst_ln_num_print; print_bad_input_line; {this call does the |mark_error|} while (last <> 0) do {look for a blank input line} if (not input_ln(bst_file)) then {or the end of the file} goto bst_done else incr(bst_line_num); buf_ptr2 := last; {to input the next line} end; @ When there's a harmless error parsing the \.{.bst} file (harmless syntactically, at least) we give just a |warning_message|. @d bst_warn(#) == begin {non-serious error during \.{.bst} parsing} print (#); bst_warn_print; end @= procedure bst_warn_print; begin bst_ln_num_print; mark_warning; end; @ Here's the outer loop for reading the \.{.bst} file---it keeps reading and processing \.{.bst} commands until none left. This is part of the main program; hence, because of the |bst_done| label, there's no conventional |begin|-|end| pair surrounding the entire module. @= if (bst_str = 0) then {there's no \.{.bst} file to read} goto no_bst_file; {this is a |goto| so that |bst_done| is not in a block} bst_line_num := 0; {initialize things} bbl_line_num := 1; {best spot to initialize the output line number} buf_ptr2 := last; {to get the first input line} loop begin if (not eat_bst_white_space) then {the end of the \.{.bst} file} goto bst_done; get_bst_command_and_process; end; bst_done: a_close (bst_file); no_bst_file: a_close (bbl_file); @ This \.{.bst}-specific scanning function skips over |white_space| characters (and comments) until hitting a nonwhite character or the end of the file, respectively returning |true| or |false|. It also updates |bst_line_num|, the line counter. @= function eat_bst_white_space : boolean; label exit; begin loop begin if (scan_white_space) then {hit a nonwhite character on this line} if (scan_char <> comment) then {it's not a comment character; return} begin eat_bst_white_space := true; return; end; if (not input_ln(bst_file)) then {end-of-file; return |false|} begin eat_bst_white_space := false; return; end; incr(bst_line_num); buf_ptr2 := 0; end; exit: end; @ It's often illegal to end a \.{.bst} command in certain places, and this is where we come to check. @d eat_bst_white_and_eof_check(#) == begin if (not eat_bst_white_space) then begin eat_bst_print; bst_err (#); end; end @= procedure eat_bst_print; begin print ('Illegal end of style file in command: '); end; @ We must attend to a few details before getting to work on this \.{.bst} command. @= procedure get_bst_command_and_process; label exit; begin if (not scan_alpha) then bst_err ('"',xchr[scan_char],'" can''t start a style-file command'); lower_case (buffer, buf_ptr1, token_len); {ignore case differences} command_num := ilk_info[ str_lookup(buffer,buf_ptr1,token_len,bst_command_ilk,dont_insert)]; if (not hash_found) then begin print_token; bst_err (' is an illegal style-file command'); end; @; exit: end; @ @^style-file commands@> @:this can't happen}{\quad Unknown style-file command@> Here we determine which \.{.bst} command we're about to process, and then go to it. @= case (command_num) of n_bst_entry : bst_entry_command; n_bst_execute : bst_execute_command; n_bst_function : bst_function_command; n_bst_integers : bst_integers_command; n_bst_iterate : bst_iterate_command; n_bst_macro : bst_macro_command; n_bst_read : bst_read_command; n_bst_reverse : bst_reverse_command; n_bst_sort : bst_sort_command; n_bst_strings : bst_strings_command; othercases confusion ('Unknown style-file command') endcases @ We need data structures for the function definitions, the entry variables, the global variables, and the actual entries corresponding to the cite-key list. First we define the classes of `function's used. Functions in all classes are of |bst_fn_ilk| except for |int_literal|s, which are of |integer_ilk|; and |str_literal|s, which are of |text_ilk|. @d built_in = 0 {the `primitive' functions} @d wiz_defined = 1 {defined in the \.{.bst} file} @d int_literal = 2 {integer `constants'} @d str_literal = 3 {string `constants'} @d field = 4 {things like `author' and `title'} @d int_entry_var = 5 {integer entry variable} @d str_entry_var = 6 {string entry variable} @d int_global_var = 7 {integer global variable} @d str_global_var = 8 {string global variable} @d last_fn_class = 8 {the same number as on the line above} @ @:this can't happen}{\quad Unknown function class@> Here's another bug report. @= procedure unknwn_function_class_confusion; begin confusion ('Unknown function class'); end; @ @:this can't happen}{\quad Unknown function class@> Occasionally we'll want to |print| the name of one of these function classes. @= procedure print_fn_class (@!fn_loc : hash_loc); begin case (fn_type[fn_loc]) of built_in : print ('built-in'); wiz_defined : print ('wizard-defined'); int_literal : print ('integer-literal'); str_literal : print ('string-literal'); field : print ('field'); int_entry_var : print ('integer-entry-variable'); str_entry_var : print ('string-entry-variable'); int_global_var : print ('integer-global-variable'); str_global_var : print ('string-global-variable'); othercases unknwn_function_class_confusion endcases; end; @ @:this can't happen}{\quad Unknown function class@> This version is for printing when in |trace| mode. @= trace procedure trace_pr_fn_class (@!fn_loc : hash_loc); begin case (fn_type[fn_loc]) of built_in : trace_pr ('built-in'); wiz_defined : trace_pr ('wizard-defined'); int_literal : trace_pr ('integer-literal'); str_literal : trace_pr ('string-literal'); field : trace_pr ('field'); int_entry_var : trace_pr ('integer-entry-variable'); str_entry_var : trace_pr ('string-entry-variable'); int_global_var : trace_pr ('integer-global-variable'); str_global_var : trace_pr ('string-global-variable'); othercases unknwn_function_class_confusion endcases; end; ecart @ Besides the function classes, we have types based on \BibTeX's capacity limitations and one based on what can go into the array |wiz_functions| explained below. @d quote_next_fn = hash_base - 1 {special marker used in defining functions} @d end_of_def = hash_max + 1 {another such special marker} @= @!fn_class = 0..last_fn_class; {the \.{.bst} function classes} @!wiz_fn_loc = 0..wiz_fn_space; {|wiz_defined|-function storage locations} @!int_ent_loc = 0..max_ent_ints; {|int_entry_var| storage locations} @!str_ent_loc = 0..max_ent_strs; {|str_entry_var| storage locations} @!str_glob_loc = 0..max_glb_str_minus_1; {|str_global_var| storage locations} @!field_loc = 0..max_fields; {individual field storage locations} @!hash_ptr2 = quote_next_fn..end_of_def; {a special marker or a |hash_loc|} @ @^save space@> @^space savings@> @^system dependencies@> We store information about the \.{.bst} functions in arrays the same size as the hash-table arrays and in locations corresponding to their hash-table locations. The two arrays |fn_info| (an alias of |ilk_info| described earlier) and |fn_type| accomplish this: |fn_type| specifies one of the above classes, and |fn_info| gives information dependent on the class. Six other arrays give the contents of functions: The array |wiz_functions| holds definitions for |wiz_defined| functions---each such function consists of a sequence of pointers to hash-table locations of other functions (with the two special-marker exceptions above); the array |entry_ints| contains the current values of |int_entry_var|s; the array |entry_strs| contains the current values of |str_entry_var|s; an element of the array |global_strs| contains the current value of a |str_global_var| if the corresponding |glb_str_ptr| entry is empty, otherwise the nonempty entry is a pointer to the string; and the array |field_info|, for each field of each entry, contains either a pointer to the string or the special value |missing|. The array |global_strs| isn't packed (that is, it isn't |array| \dots\ |of packed array| \dots$\,$) to increase speed on some systems; however, on systems that are byte-addressable and that have a good compiler, packing |global_strs| would save lots of space without much loss of speed. @d fn_info == ilk_info {an alias used with functions} @# @d missing = empty {a special pointer for missing fields} @= @!fn_loc : hash_loc; {the hash-table location of a function} @!wiz_loc : hash_loc; {the hash-table location of a wizard function} @!literal_loc : hash_loc; {the hash-table location of a literal function} @!macro_name_loc : hash_loc; {the hash-table location of a macro name} @!macro_def_loc : hash_loc; {the hash-table location of a macro definition} @!fn_type : packed array[hash_loc] of fn_class; @!wiz_def_ptr : wiz_fn_loc; {storage location for the next wizard function} @!wiz_fn_ptr : wiz_fn_loc; {general |wiz_functions| location} @!wiz_functions : packed array[wiz_fn_loc] of hash_ptr2; @!int_ent_ptr : int_ent_loc; {general |int_entry_var| location} @!entry_ints : array[int_ent_loc] of integer; @!num_ent_ints : int_ent_loc; {the number of distinct |int_entry_var| names} @!str_ent_ptr : str_ent_loc; {general |str_entry_var| location} @!entry_strs : array[str_ent_loc] of packed array[0..ent_str_size] of ASCII_code; @!num_ent_strs : str_ent_loc; {the number of distinct |str_entry_var| names} @!str_glb_ptr : 0..max_glob_strs; {general |str_global_var| location} @!glb_str_ptr : array[str_glob_loc] of str_number; @!global_strs : array[str_glob_loc] of array[0..glob_str_size] of ASCII_code; @!glb_str_end : array[str_glob_loc] of 0..glob_str_size; {end markers} @!num_glb_strs : 0..max_glob_strs; {number of distinct |str_global_var| names} @!field_ptr : field_loc; {general |field_info| location} @!field_parent_ptr,@!field_end_ptr : field_loc; {two more for doing cross-refs} @!cite_parent_ptr,@!cite_xptr : cite_number; {two others for doing cross-refs} @!field_info : packed array[field_loc] of str_number; @!num_fields : field_loc; {the number of distinct field names} @!num_pre_defined_fields : field_loc; {so far, just one: \.{crossref}} @!crossref_num : field_loc; {the number given to \.{crossref}} @!no_fields : boolean; {used for |tr_print|ing entry information} @ Now we initialize storage for the |wiz_defined| functions and we initialize variables so that the first |str_entry_var|, |int_entry_var|, |str_global_var|, and |field| name will be assigned the number~0. Note: The variables |num_ent_strs| and |num_fields| will also be set when pre-defining strings. @= wiz_def_ptr := 0; num_ent_ints := 0; num_ent_strs := 0; num_fields