Professional Documents
Culture Documents
1987 Content-Addressable Memories PDF
1987 Content-Addressable Memories PDF
Content-Addressable
Memories
Second Edition
Series Editors:
Library of Congress Cataloging in Publication Data. Kohonen, Teuvo, Content·addressable memories. (Springer
series in information sciences; 1) Bibliography: p. Includes index. 1. Associative storage. 2. Information storage
and retrieval systems. I. Title. II. Series. TK7895.M4K63 1987 004.5 87-4765
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned,
specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on
microfilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is only
permitted under the previsions of the German Copyright Law of September 9, 1965, in its version of June 24, 1985,
and a copyright fee must always be paid. Violations fall under the prosecution act of the German Copyright Law.
© Springer·Veriag Berlin Heidelberg 1980 and 1987
Softcover reprint of the hardcover 2nd edition 1987
The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a specific
statement, that such names are exempt from the relevant protective laws and regulations and therefore free for
general use.
Offset printing and bookbinding: Briihlsche" Universitiitsdruckerei, Giessen
2153/3150-543210
Preface to the Second Edition
Designers and users of computer systems have long been aware of the fact
that inclusion of some kind of content-addressable or "associative" functions
in the storage and retrieval mechanisms would allow a more effective and
straightforward organization of data than with the usual addressed memories,
with the result that the computing power would be significantly increased.
However, although the basic principles of content-addressing have been known
for over twenty years, the hardware content-addressable memories (CAMs) have
found their way only to special roles such as small buffer memories and con-
trol units. This situation now seems to be changing: Because of the develop-
ment of new technologies such as very-large-scale integration of semiconduc-
tor circuits, charge-coupled devices, magnetic-bubble memories, and certain
devices based on quantum-mechanical effects, an increasing amount of active
searching functions can be transferred to memory units. The prices of the more
complex memory components which earlier were too high to allow the application
of these principles to mass memories will be reduced to a fraction of the to-
tal system costs, and this will certainly have a significant impact on the
new computer architectures.
In order to advance the new memory principles and technologies, more in-
formation ought to be made accessible to a common user. To date it has been
extremely difficult to gain an overview of content-addressable memories; dur-
ing the course of their development many different principles have been tried,
and many electronic technologies on which these solutions have been based have
become obsolete. More than one thousand papers have been published on content
addressing, but this material has never been available in book form. Numerous
difficulties have also been caused by the fact that many developments have
been classified for long periods of time, and unfortunately there still exists
material which is unreachable for a common user. The main purpose of this book
has been to overcome these difficulties by presenting most of the relevant
results in a systematic form, including comments concerning their practical
applicability and future development.
VIII
1.1 Introduction....................................................... 1
1.1.1 Various Motives for the Development of Content-
Addressable Memories........................................ 2
1.1.2 Definitions and Explanations of Some Basic Concepts ......... 3
References 343
1.1 Introduction
The subject area of this book consists of various principles, methods, and
devices of computer technology which are often grouped under the heading
assoaiative memo~y. A word of caution at this pOint will be necessary: the
field of all phenomena related to associative memory is probably much wider
than that ever covered in computer and information sciences. There is no
other original model for associative memory than in human memory and think-
ing; unfortunately and surprisingly the experimental results do not yet
allow us to deduce what the detailed memory mechanism thereby applied is.
Only indirect evidence is available [1.1]. Similarly, while research in the
artificial intelligence techniques claims to deal with correct reasoning
and problem-solving, this does not necessarily imply that the real biolog-
ical information processes should resemble even the best artificial ones.
In order that a physical or abstract system could be named "associative
memory", one should stipulate at least that its function comply with the
phenomenological features of human memory expressed in the CZassiaaZ Laws
of Assoaiation; we shall revert to them a bit later on in Sect. 1.4.1.
It seems that there exist two common views of associative memory. One of
them, popular in computer engineering, refers to a principle of organization
and/or management of memories which is also named aontent-addressing, or
searching of data on the basis of their contents rather than by their loca-
tion. The coverage of this book in broad outline coincides with it. The
other view is more abstract: memory is thereby understood as a semantic re-
presentation of knowledge, usually in terms of ~eZationaZ struatures. We
shall explain this latter concept in Sect. 1.3.
The purpose of this book is in the first place practical. Although it
would be very interesting to expound the meaning of associative memory in
its most general sense, such discussions can be found elsewhere (cf [1.1,2]).
The primary scope of this representation is to serve as a text and reference
2
require special memories (cf Chap. 2). This, however, does not yet bring about
the other feature which would be very desirable in large problems, namely,
retrieving of a great number of variables from the memory simultaneously.
4) While associations were originally considered for the description of
interrelations or cross-references between pieces of information only, it
has later turned out that searching of data by its partial content can effec-
tively be utilized in the manipulation of arithmetic algorithms. Such content
addressing can be made in a highly parallel fashion, i.e., simultaneously over
a great number of data elements, usually at a rather low level, referring
to transformations that occur in the binary representations. The problem of
parallel computation has also another direction where content-addressability
is used to control higher-level algorithmic functions in parallel (cf Chap. 6).
5) Content-addressable memory functions have recently been found extremely
useful in the implementation of buffer memory organizations (cf Chap. 5)
which in large memory systems have provided very high average performance at
reasonable cost.
6) Small and fast content-addressable memories have been found useful for
the implementation of programmed sequential control operations in central
processing units as well as other devices.
7) One task of future computer technology is to make machines interpret
verbal statements given in the natural language. While some solutions for
this exist, nonetheless it may turn out that some newer forms of associative
or content-addressable memory are applicable for its handling in a simpler
and yet more efficient way.
in this book ought to be understood only as one possible mechanism for the
storage and recollection of associations.
PARHAMI [1.5] remarks that even parallelism in searching operations is
not essential, as long as the functional operation is concerned. Instead he
proposes three definitions of which we shall combine the latter two, and
substitute the word "content-addressable" for "associative":
Content-addressable memory: a storage device that stores data in a number
of cells. The cells can be accessed or loaded on the basis of their contents.
Content-addressable processor: a content-addressable memory in which more
sophisticated data transformations can be performed on the contents of a
number of cells selected according to the contents, or a computer or computer
system that uses such memory as an essential component for storage or pro-
cessing, respectively.
It may be necessary to clarify at this point that accessing data on the
basis of their content always means some comparison of an external search
argument with part or all of the information stored in all cells. Whether
this is done by software, mechanical scanning or parallel electronic circuits,
is il11l1aterial in principle; however, a "genuine" content-addressable memory
performs all comparisons in parallel. Another fact to emphasize is that com-
parison by equality match between the search argument and the stored item is
not the only mode used. If the stored data have numerical values, the pur-
pose of content-addressable search may be to locate all cells the contents
of which satisfy certain magnitude relations with respect to the search ar-
guments, for instance, being greater than or less than the given limit, or
between two specified limits. Content-addressable searching is sometimes
performed without reference to an external search argument, for instance,
when locating the maximum or minimum in a set of stored numbers. Finally,
searching on the basis of best match of the search argument with the various
stored data, in the sense of some metric, may be necessary. This is already
very near to the basic task of pattern recognition, and we shall revert to
it with the various similarity measures in Sect. 1.4.2.
Various alternative names for the CAM have been suggested, e.g., associa-
tive store, content-addressed memory [1.8], data-addressed memory [1.9],
catalog memory [1.10], multiple instantaneous response file [1.11], parallel
search file [1.12], and (parallel) search memory [1.13]. The diversity of
content-addressable computers is yet greater, and many names used for them
refer only to a specific construct (cf Chap. 6).
5
Directory
f g h k Recollection
The two main principles of content addressing are the one based on a data-
dependent memory mapping, implemented by programming techniques (software),
and the other which uses special hardware constructs for the storage and
retrieval of data items. It is remarkable that both of these principles were
invented almost simultaneously, around 1955; this shows that with the advent
of the first commercial computer systems there already existed a strong need
for content addressing. While over twenty years have now passed since the
introduction of these methods, no essentially new issues have been presented
in favor of one or the other. Therefore both principles are reviewed here
as they are ouprently standing, and any conclusions about their future status
are left to the reader.
Table 1.1. Randomly chosen names, and their numerical values computed from
the first two letters (see text)
there are 26 2 = 676 different pairs of letters, and a memory area with a
corresponding number of locations is reserved for the table. (With the com-
puters of the 1950s, this would have been a tremendous investment in memory
space.) Now assume that names with random beginnings are used; the probabil-
ity for different names to have different addresses is then not very small.
Table 1.1 is a set of randomly chosen first names of persons used as inden-
tifiers; their numerical values are given in the third column. These values
are calculated in the following way: denote A = 0, B = 1, ... , Z = 25. A
pair of letters is regarded as an integer in a basis of 26, so, e.g., IS =
8·26 + 18 = 226. The address in the table which is defined by the numerical
value of the name, by the application of some simple rule, is named calcu-
lated address. At sample 16 a name which has the same beginning as No. 14
was picked up; it is said that a conj1ict or collision occurred. Since both
names cannot be stored in the same location, a reserve location, easily
reachable from the calculated address, shall be found. Chapter 2 deals ex-
tensively with this problem. Let it suffice here to mention one possibility:
8
the next empty Location (in this case 183) following the calculated address
is used. If a search for an empty location is made cyclically over the ta-
ble, such a location can be found sooner or later as long as the table is
not completely full. It is striking that if the table is not more than half
full and the names are chosen randomly, there is a high probability for
finding an empty location within a reach of a few locations from the cal-
culated address.
In order to resoLve whether an entry is stored at the calculated address
or one of its reserve locations, the name or its unique identifier must be
stored together with the data. Assume that the name itself is used. The
correct location is found when the stored name agrees with that used as a
search argument. Table 1.2 exemplifies the contents of part of the memory,
corresponding to the example shown in Table 1.1.
Table 1.2. Partial contents of the memory area used to store the data cor-
responding to Table 1.1
~O GEORGE 0(8)
A search for, say, the data associated with HANS is performed easily.
The calculated address of HANS is 182. Since such an identifier is not found
there, a search from the next locations reveals that HANS was stored at
address 183. The associated data, 0(16), are then found.
Obviously the average number of trials to find a stored item from the
neighborhood of its calculated address depends on the loading of the table
and also on its size; moreover, the number of trials is smallest if the
items are scattered uniformly in the table. Instead of using sampled letters
9
Response
Fig. 1.2. Comparison circuit for one CAM word location. FF = bit-storage
flip-flop, E = logical equivalence gate, A = logical AND gate
Chapter 3 will deal in more detail with logical principles, and Chap. 4
with different hardware realizations developed for the CAMs.
10
1.3 Associations
Item 1 Location i
Ib) Item 2 Location i+1
Item 3 Location i+2
Pointer
Item 1 Location i
.-------------'
I
I
Fig. 1.3a-c. Representation
Ie) L..j Item 2 k Location j of associated items in comput-
I er memory: a) in the same lo-
r-----------'" cation, b) in consecutive lo-
I
I cations, c) through pointers
'-l Item 3 Location k (illustrated by dashed arrows)
11
The following notations for the relation between x and yare used:
x RY , x ->-R y .
This somewhat abstract definition may become more clear if simple seman tie
relations are considered. Assume that x and yare nouns and R is a verbal
construction. Assume two lists of specific cases of x and y :
Not all pairs (x,y) may have relevance; for instance, paper A may not deal
with digital electronics. Meaningful pairs are found upon documentation,
whereby the following types of observation must be made:
The construction 'deals with' here now defines the type of relation and it
can be put to correspond to R. It is, of course, possible to construct many
kinds of qualifier R.
The representation of a relation in memory can be an ordered triple of
the type (A,R,B), e.g., ('paper A', 'deals with', 'pattern recognition')
where the quotation marks are used to denote literal contents. Incidentally,
this has the format of an association between A, R, and B. We shall a bit
later revert to relations when discussing structures of knowledge represen-
table by them. Before that, let us consider how relations can be retrieved
from memory.
Representation and Searehing of Relations by Content Addressing. There exist
many methods for fast searching of listed items. Content addressing is the
fastest of them since an item is locatable directly, and sorting of the
stored items is not required as in most other methods. This principle is
particularly advantageous with large data bases. We shall first exemplify
searching of a relation by content addressing using hash coding.
The preliminary example of association by inference showed us that it
will be necessary to retrieve a direct association on the basis of any of
its component items. In the retrieval of information from relational struc-
tures discussed below in Sect. 1.3.2 it will further be found necessary to
13
search for a relation on the basis of any of the following combinations used
as search argument:
A,R,B, (A,R), (R,B), (A,B), (A,R,B)
Here an ordered pair or triple is simply understood as the concatenation of
its literal component items. Searching on the basis of (A,R,B) is needed
only to check whether a relation exists in memory or not.
The same relation may thus have seven different types of search argument.
If a relation is stored in memory by hash coding, in fact seven copies of it
could be stored, each one in a separate memory area or table reserved for
this type of search argument. Other methods for the handling of this task
will be discussed in Sect. 2.6.
Retrieval of relations is much more straightforward if the triples can be
stored in the word locations of a content-addressable memory. Using masking
of fields as discussed in Chap. 3, any of tr.e combinations shown above can
be defined as a search argument which has to match with the corresponding part
of the stored relation, whereby retrieving of the whole relation is made in a
single parallel readout operation.
A special hardware solution for the representation and retrieval of rela-
tions, not only by external search arguments but also using associations by
inference will be presented in Sect. 6.3.
Attributes and Descriptors. The position of an item in an ordered set de-
fines a specific role for this item. The position may, for instance, corres-
pond to a particular attribute, and the item stored at this position is then
the particular value given to this attribute. For instance, in a dictionary
example the first word in an ordered pair may be a type of English form,
e.g., 'horse', and the second that of a French form, namely, 'chevaux'. The
expressions 'English form' and 'French form' are here attributes, whereas
'horse' and 'chevaux' are their values, respectively. Perhaps even more
illustrative examples of attributes can be found in various personal records,
an example of which is in Table 1.3.
This example shows that some values may be numerical, while the others are
nonnumerical. As an ordered set this record would be written
(DOE, John, Male, 45, 5ft. 7in., Australian) .
The set of values corresponding to attributes should carefully be distin-
guished from a set of descriptors used to specify, e.g., a scientific work.
The descriptors have independent values and accordingly they form only an
unordered set. An "association" can be formed only between this set and the
specification of the document, e.g.,
({associative, content-addressable, memory, processor}, THIS BOOK).
. ~1"5
artIcle A topics
part _~t.:OP::.iC=S:..--::-:-_--- A I
of article B year
~rt/ ~1978
Fig. 1.4. Relational
~ year -1979 ~ rcbotics structure which corre-
book C topics
sponds to Table 1.4
Printer
I Output
Query ~
Result
Value
Fig. 1.5. Compound relation
(year, x, 1978),
(topics, x, AI).
17
In this case, searching is performed in two passes. One of them locates all
associations which respond to a search argument (year, 1978), and thus yields
a set of tentative values for x. The second pass locates all relations which
respond to the search argument (topics, AI), and a second set of values for
x is obtained. Finally it is necessary only to form the intersection of the
two sets. This is not exactly the way in which processors of associative lan-
guages handle the searching: e.g., for more information about LEAP, see
[1.16,17,20,21]. Another solution, explained in detail in Sect. 6.3, is bas-
ed on special parallel hardware operations, but it should be noticed that
the same operations are readily implementable by programming, although not
in pa ra 11 e1.
In order to set the constructs and methods discussed in this book in their
right perspective, it will be necessary to consider some further aspects of
associative memory, especially assoaiative reaaZZ, in order to understand
the limitations of present-day content-addressable memories.
First of all it shall be pOinted out that the computer-technological
"associations" have a completely rigid structure: either there exists a link
between associated items or not. This way of definition of association seems
to follow from two facts. One of them is a widespread tradition in informa-
tion sciences and logic to operate only on discrete-valued variables and
propositions, whereby it may also be believed that these are the elements
of higher-level information processes. (Some newer approaches, like the
fuzzy-set theory [1.22] in fact tend away from this tradition). Although it
is generally correctly stated that thinking processes operate on symbolic
representations, the meaning of "symbolic" is often confused with "provided
with a unique discrete code". The second reason which has determined the
format of associations in computer science is more practical. There has
existed no other powerful tool for the implementation of artificial informa-
tion processes than the digital computer, and the latter has strongly guided
the selection of abstract concepts which are used for the description of in-
formation processes. Although some attempts have been made to interpret
associations as aoZZeative or integraZ effeats, e.g., [1.23-36], their de-
monstration by computer simulation has turned out very cumbersome and not
rewarding. The demonstrations have mostly dealt with rather trivial examples.
In principle, however, this extension of the concept of associative memory
18
In a small book entitled On Memory and Reminiscence [1.37], the famous Greek
philosopher Aristotle (384-322 B.C.) stated a set of observations on human
memory which were later compiled as the Classical Laws of Association. The
conventional way for their expression is:
The Laws of Association
Mental items (ideas, perceptions, sensations or feelings) are connected
in memory under the following conditions:
1) If they occur simultaneously ("spatial contact").
2) If they occur in close succession ("temporal contact").
3) If they are similar.
4) If they are contrary.
Some objections to these laws might be presented by a contemporary scien-
tist. In view of our present acquaintance with various computing devices, it
seems necessary to distinguish the phases of operation signified as writing
(or storage) and reading (or recall). Obviously simultaneity or close suc-
cession of signals is necessary for their becoming mutually conditioned or
encoded in a physical system, whereas in a process of recall, the evoked
item (or part of it) might have, e.g., a high positive correlation (simi-
larity) or a negative one (contrast) to that item which is used as the input
key or search argument. Consequently, it seems that Laws 1 and 2 relate to
writing, and 3 and 4 to reading, respectively. Moreover, these laws seem to
neglect one further factor which is so natural that its presence has seldom
been considered. This is the background or context in which primary percep-
tions occur, and this factor is to a great extent responsible for the high
capacity and selectivity of human memory.
The following significant features in the operation of human associative
memory shall be noticed: 1) Information is in the first place searched from
19
the memory on the basis of some measure of simi~arity relating to the key
pattern. 2) The memory is able to store representations of structured se-
quences. 3) The recollections of information from memory are dYnamic pro-
cesses, similar to the behavior of many time-continuous physical systems.
In fact, Aristotle made several remarks on the recollections being a syn-
thesis of memorized information, not necessarily identical to the original
occurrence.
The Laws of Association now give rise to some important issues and they
will be discussed below in Sects. 1.4.2-5.
The useful capacity of memory for patterned information depends on its abili-
ty to recall the wanted items with sufficient se~ectivity. If there existed
a unique representation for every occurrence, then content-addressable search-
ing could simply be based on exact match between the search argument and the
stored representations as in content-addressable computer memories. However,
natural patterns and especially their physiological representations in neural
realms are always contaminated by several kinds of error and noise, and the
separation of such patterns naturally leads to the problematics thoroughly
discussed in the subject area of pattern recognition research; most of its
special problems, however, fall outside the scope of this book.
The genuine process of associative recall thus ought to take into account
some extended form of the concept of simi~arity. Several approaches, in an
approximately ascending order of generality, to the definition of this con-
cept are made below.
Hamming Distance. Perhaps the best known measure of similarity, or in fact
dissimilarity between digital representations is the Hamming distance. Ori-
ginally this measure was defined for binary codes [1.38], but it is readily
applicable to comparison of any ordered sets which consist of discrete-valu-
ed elements.
Consider two ordered sets x and y which consist of distinct, nonnumerical
symbols such as the logical 0 and 1, or letters from the English alphabet.
Their comparison for dissimilarity may be based on the number of different
symbo~s in them. This number is known as the Hamming distance PH which can
be defined for sequences of equal length only: e.g.,
20
x = (1,0,1,1,1,0)
y = (1,1,0,1,0,1)
and
u -_- (p,a,t,t,e,r,n)}
v - ( w,e,s,t,e,r,n ) PH (u , v) =3 •
For binary patterns x = (~1' ... , ~n) and y = (nl' ... , nn)' assuming
~i and ni as Boolean variables, the Hamming distance may be expressed for-
mally as a circuit operation
where the function biteount S determines the number of elements in the set
S which attain the value logical 1; the Boolean expression occurring as an
element in the above set is the EXCLUSIVE OR (EXOR) function of ~i and ni'
The restri cti on imposed on the l"engths of representati ons, or numbers
of elements in sets, can be avoided in many ways (cf, e.g., the definitions
of Levenshtein distances a bit later on). As an introductory example, con-
sider two unordered sets A and B which consist of distinct, identifiable
elements; if they had to represent binary codes, then for these elements one
could select, e.g., the indices of all bit positions with value 1. Denote
the number of elements in set S by n(S). The following distance measure
[cf Sect. 2.7 and (2.36)] has been found to yield a simple and effective
resolution between unordered sets:
C = Li=l ( 1.3)
In case one of the sequences may be shifted with respect to the other by
an arbitrary amount, the comparison can better be based on a translationally
invariant measure, the maximum correlation over a specified interval:
Cm = max
k
L
i=l
k = -n, -n+1, ••• , +n (1.4 )
In this case the sequences {~i} and {nil are usually defined outside the
range i = 1, ••• , n, too. Of course, shifted comparison can be applied with
any of the methods discussed below.
When similarity is measured in terms of C or Cm' two assumptions are usu-
ally involved: 1) The amount of information gained in an elementary comparison
is assumed directly proportional to respective signal intensities. 2) The
amounts of information gained from elementary comparisons are assumed additive.
It will be necessary to emphasize that correlation methods are most suit-
able for the detection of periodic signals which are contaminated by Gaussian
noise; since the distributions of natural patterns may often not be Gaussian,
other criteria of comparison, some of which are discussed below, must be con-
s i dered, too.
Direction Cosines. If the relevant information in patterns or signals is
contained only in the reZative magnitudes of their components, then similarity
can often be better measured in terms of direction cosines defined in the fol-
lowing way. If x E Rn and y E Rn are regarded as Euclidean vectors, then
is by definition the cosine of their mutual angle, with < x,y > the scalar
product of x and y, and II x II the Eucl i dean norm of x. Noti ce that if the
norms of vectors are standardized to unity, then (1.5) complies with (1.3),
or cos a = C.
Notice that the value cos a = 1 is defined to represent exact match;
vector y is then equal to x multiplied by a scalar, y = aX (a E R). On the
other hand, if cos a = 0 or < x,y > = 0, vectors x and yare said to be
orthogona Z.
The quality of results obtained in a comparison by direction cosines,
too, is dependent on the noise being Gaussian. This measure is frequently
used in the identification of acoustic spectra (e.g., in speech recognition).
22
(1.6 )
+
Although seemingly natural for the detection of differences, PE(x,y) in re-
ality often yields worse results in comparison than the previous method, on
account of its greater sensitivity to the lengths of the vectors to be compar-
ed; notice that IIx - y 112 = II x 112 + II y 112 - 2 < x,y >. On the other hand,
if the lengths of the vectors are normalized, the results obtained are iden-
tical with those obtained by the previous methods. Often PE is applicable to
comparisons made in parameter spaaes.
Measures of SimiZarity in the Minkowski Metria. Obviously (1.6) is a special
case of distance which defines the ~nkowski metria:
(1. 7)
The origin of this measure is in the comparison of sets. Assume that A and
B are two unordered sets of distinct (nonnumerical) elements, e.g., iden-
tifiers or descriptors in documents, or distinct features in patterns. The
similarity of A and B may be defined as the ratio of the number of their
common elements to the number of all different elements; if n(X) is the
number of elements in set X, then the similarity is
n(A n B) n(A n B)
(1. 9)
n(A U B) n(A) + n(B) - n(A n B)
23
Notice that if x and y above were binary vectors, with components E{O,l}
the value of which corresponds to the exclusion or inclusion of a particular
element, respectively, then < x,y >, I x II, and II y /I would be directly
comparable to n(A n B), n(A), and n(B), correspondingly. Obviously (1.8)
is a generalization of (1.9) for real-valued vectors.
The Tanimoto measure has been used with success in the evaluation of rele-
vance between documents [1.41,42]; the descriptors can thereby be provided
with individual weights. If a ik is the weight assigned to the kth descriptor
of the ith document, then the similarity of two documents denoted by xi and
Xj is obtained by defining
Weighted Measures for Similarity. The components of x and y above were assum-
ed independent. In practical applications they may be generated in a stochastic
process which defines a statistical dependence between them; it can be shown
that for vectors with normally distributed noise the optimal separation is
obtained if instead of the scalar product, the inner product is defined as
(a = b) = (a A D) v (a A b) (1.13 )
( 1. 15)
L
n
[e(f: i , ni)lP , (1.16 )
i=1
+
with p some real value is one possibility. Notice that with p = 1 the linear
sum is obtained, and with p -> -ro, SM(x,y) will approach min [e(t;., n.)].
. i l l
This method has two particular advantages when compared with, say, the
correlation method: 1) Matching or mismatching of low signal values is taken
into account. 2) The operations max and min are computationally, by digital
or analog means, much simpler than the formati on of products needed in cor-
relation methods. For this reason, too, p = 1 in (1.16) might be preferred.
It has turned out in many applications that there is no big difference
in the comparison results based on different similarity measures, and so it
is the computational simplicity which ought to be taken into account in the
first place.
Variational Similarity. The following principle can be combined with many
comparison criteria. Its leading idea is that patterned representations are
allowed to be marred by deformations or local scale transformations, and the
comparison for similarity is then performed by considering only small pieces
of patterns at a time. The pieces are shifted relative to each other to find
their maximum degree of matching. As the partial patterns must be connected
anyway, the matching must be done sequentially, whereby this becomes a kind
of variational problem.
The matching procedure, here termed variational similarity, is illustrated
by a symbol string matching example. The three possible types of error that
can occur in strings are: 1) Replacement or substitution error (change of
a symbol into another one). 2) Insertion error (occurrence of an extra sym-
bol). 3) Deletion error (dropping of a symbol). Errors of the latter two
types stretch or constrict the string, respectively, and their effect is
analogous with scale transformations.
Assume that one of the strings is a reference, and for simplicity, in the
other one to be compared, two or more errors of the same type are not allowed
26
in adjacent symbols. (Several errors may, however, occur in the string dis-
tant from each other.) Consider two strings written at the sides of a lattice
as shown in Fig. 1.6. A line shall connect lattice points which are selected
by the following rule:
A Dynamic Matching Procedure: Assume that the line has already been defined
to go through the lattice pOint (i.j); the next point shall be selected from
(i+1,j), (i+1,j+1), and (i+1,j+2). Compare the symbol pairs corresponding to
these three points. If there is one match only, take the corresponding point
for the next point on the line. If there is no match, select (i+1,j+1) for
the next point on the line. If there are matches at (i+1,j+1) and some other
point, select (i+1,j+1). If there are matches only at (i+1,j) and (i+1,j+2),
select for the next point (i+1,j+2).
A matching score is now determined by counting the number of matching pairs
of symbols along the above line.
S lb.
"
~ R
R
IJ1 A
E IJ1 R I
T l.e- ~ T i>"
E ./ A 1/
P 1-- P
PATTERN
i--
--
PET E R S Fig. 1.6. Two examples of
variational matching
(1.17)
Since the different types of error may occur with different frequencies, an
improved measure is the weighted Levenshtein distance (WLD) defined as
When dealing with samples of natural images or signals, one big problem is
the filling up of any available memory capacity sooner or later. By advanced
instrumentation it is easy to gather a great deal more information than can
ever be represented in the memory systems, at least in their random-access
parts. This is the problem of infinite memory, as it is often referred to.
This problem has been known in systems theory, especially in the theory of
adaptive (optimal) filters. The central principle in the latter is to repre-
sent a dynamical system or filtering operation by a finite set of parameters.
These parameters are recursively updated by all received signals whereby they
can be regarded as a kind of memory for all received information.
A characteristic of all constructs discussed in this book is that the
memory shall be able to recall the stored representations as faithfully as
possible, with a selectivity depending on the measure of similarity thereby
applied. However, mere memory-dependent responses or classifications may not
be regarded sufficient to represent the operation of associative memory. It
seems that the available memory capacity could be utilized more efficiently
if it were possible to represent only significant details accurately, and
for those parts which are more common, to use average or stereotypic repre-
sentations. It seems that the human memory to a great extent operates in
this way; although a single occurrence can cause clear memory traces, nonethe-
less recollections are often only stereotypic, affected by earlier experiences.
This must be due to the fact that biological systems almost without exception
tend to optimize their resources, and this principle must also be reflected
in the utilization of memory.
One elementary way for the description of "infinite" memory is to assume
that there are representations in the memory for only a finite set of items
{x(i)}, but corresponding to every item there exists a state variable y(i)
which is supposed to average over a great number of realizations of x(i).
Consider, for simplicity, the occurrences of x(i) as discrete items x~i).
The state variable is assumed to change as
(i ) f(y( i ) x( i ) )
Yk+1 ( 1.19)
k 'k '
where f can attain many possible forms; the simplest of them is a weighted
sum of y~i) and.x~i) whereby a moving arithmetic average is formed over the
sequence of {X~l)}. In a more complex case the recursion may also involve
scale transformations. When something is recalled from memory, then it may
29
This is one of the basic problems of linear algebra, and it has a simple
answer:
Lemma [1.1]: If all the xk ' k E S are linearly independent (no one can be
expressed as a linear combination of the others), then a unique solution
of (1.20) exists and it is
31
where X = [xl"" ,x N] and Y = [Yl"" 'YN] are matrices with the xk and Yk
as their columns, respectively, and the superscript T denotes the transpose
of a matrix. If the xk are linearly dependent, then there exists a unique
approximative solution in the sense of least squares
1\
M = YX+ (1. 22)
Y Mx ( 1.23)
are infinitely many forms of nonlinear transformations. The patterns can also
be stored in spatially separate locations and recalled by a matching process;
this is also a nonlinear mapping, and the performance in recall is finally
dependent only on the similarity criterion applied (cf Sect. 1.4.3).
It should also be pointed out that there exist nonZinear estimators cor-
responding to (1.22) which are specified by pairs of data (xk'Yk) and which
thus implement a kind of associative recall, too. These estimators can be
constructed, e.g., by stoehastie approximation techniques (cf, e.g., [1.56]).
Autoassoeiative ReeaZZ by Projeetion Operators. Very interesting associative
recollections are obtained if the optimal linear associative mappings are
considered in the case that Yk = xk ERn. In this case (1.22) becomes
M= XX+ . (1.24)
(1.25)
f: x S ~ S, g: I x S ~ 0 , (1.26 )
where x denotes the Cartesian product. These expressions may also be written
more explicitly. If si E S is one of the internal states, if subscript i is
used to denote a discrete time index, if the present input is ii E I, and the
34
(1.27)
Combinational Combinational
circuit s circuit
A{====I~=:::;:+::=}C
B{~ CAM
o{
Fig. 1.10. Associative memory for
structured sequences
{( A( t ), B( t ), D( t) )}
recollection of the rest of the sequence, no further keys A(k), k = 2,3, ...
shall be necessary. (The reading could also be started in the middle of a
sequence.) The CAM is assumed to produce a recollection A(l) at its output
port; this pattern will be mediated to the D input by a unit time delay as
D(2) = A(l). Reading will now be continued automatically, but the A(k) input,
k = 2,3, ... thereafter attains an "empty" or ~ value. The next input is
(~,B,A(l)) which will match in its specified part with the second term in
the sequence (1.28). Consequently, it will produce the output A(2), and a
continued process will retrieve the rest of the sequence A(3), A(4), ... ,
A(N) .
It may be clear that the CAt, is able to store a great number of indepen-
dent sequences, each one being retrievable by its own key input, consisting
of an (A,B) pattern.
We have intentionally avoided the mention of one further problem. It may
occur that some input pattern, say, (~,B,A(k)) may match several triples
stored in the memory. This is the multiple-match situation, and in pattern
recognition tasks it would be named reject. In usual information retrieval
this is a normal situation since all matching items will have to be found.
If content-addressable memories are used to generate structured sequences,
then the output C, however, must always be ~ique. The following possibil-
ities for the handling of this problem exist:
1) It may be forbidden to store triples which would cause rejects.
2) One may apply similarity measures with a higher resolution in order that
the probability for rejects become negligible.
3) One may arbitrarily choose one of the responding items, e.g., in the
order of some priority applied in readout (cf Sect. 3.3.2), or at random.
4) The output may be some synthesis of the multiple responding items.
Each of the four procedures mentioned above has a different philosophy.
The first of them may be applied, e.g., when designing automatic, programm-
able control circuits based on the CAM. The second method is typical for
sequential pattern recognition in which special associative memories are
used. The last two methods have more theoretical interest and they may re-
late to simulation of heuristic behavior in automatic problem solving tasks.
It should be noticed that the system model of Fig. 1.10 can have many
different embodiments: the CAM may be a hardware circuit, a hash-coding
scheme, or a distributed memory, and the matching criterion applied in recall
37
Memory
Items
~ ~"~''] "..,"
1-:-:-:-:-+--:-:-::-:--1::-
Mapping
the same item span a class which then should be recognizable in an invariant
way. This kind of demand may at first seem to restrict seriously the appli-
cability of hash coding. Fortunately, this need not be the case; although
it will be necessary, for the achievement of error-tolerance, to add some
redundancy to the memorized information as well as to the computing opera-
tions, the growth of costs remains modest, and significant benefits can
still be gained by the use of hash coding. The demonstration given in Sect.
2.7 is intended to point out that hash coding may constitute a straight-
forward solution to searching, even in the case of incomplete key information.
There exist some searching tasks which, however, do not easily lend them-
selves to hash coding. The most important of these cases are magnitude search,
or location of all items an attribute of which is greater or less than a
specified limit, and the search on Boolean variables, especially if the key-
words have to give a certain value to a complicated combinational function.
As for magnitude search, if the ranges of attribute values can be divided
into a small number of intervals, it is still possible to provide each of them
with a distinct keyword, whereby the searching would be based on identifica-
tion of the intervals by hash coding. Most often, however, searching on the
basis of magnitudes as well as on Boolean arguments is implemented as a batch
run, by scanning all entries sequentially and substituting their keyword
values to a function which indicates the matching condition, or by more tra-
ditional methods discussed below. Thus, before proceeding to the details of
hash coding, it may be necessary to mention that the methods for the handling
of large files seem to have developed from the beginning in two distinct di-
rections. The hash coding methods constitute one of them, but the other main
line in the data management business is the one in which the searching prob-
lems are solved by ordering or sorting the entries and structuring the files
in accordance with their argument values. One advantage thereby achieved is
the possibility of performing magnitude searches directly. The highlights of
this latter approach are search trees and multidimensional indexing. Search
trees as well as ordering and indexing of entries, in fact, facilitate the
manipulation of data by their contents, but there are some good reasons to
leave these topiCS outside the scope of this book. One of them is that the
principles of hash coding, although actually belonging to the area of com-
puter programming, have been included in this kind of book since there has
for long existed a debate about the superiority of hardware versus software
in content addressing, and the latter is usually identified with hash coding.
This book is now intended to provide some documentary material to the reader
for settling this question. The second reason for not expounding the other
43
searching and sorting methods is that there already exist many good text-
books and survey articles on those topics; it seems unnecessary to duplicate
them here. Readers who are interested in data management may find an over-
view to the material they need, e.g., in the books of KNUTH [2.1], MARTIN
[2.2], FLORES [2.3], DESMONDE [2.4], and LEFKOVITZ [2.5]. Reviews on the
same topics have been published in the articles [2.6-9]. Special questions
partly relevant to content addressing have been dealt with, e.g., in the
papers [2.10-19]. It ought to be emphasized, too, that the searching of
entries by their data contents is one central task in certain business-
oriented high-level computer languages such as COBOL; there was a working
group in 1965, named CODASYL Data Base Task Group, which has published two
reports, [2.20,21] on the development of a generalized data management sys-
tem.
To recapitulate, if the problem is to find entries from a data base by
given identifiers or muZtipZe descriptors, as the case is with documents,
publications, etc., then hash coding usually offers effective means for per-
forming the task. If, on the other hand, the searching conditions are spec-
ified by giving the values or ranges of certain attributes, especially using
magnitude relations and Boolean functions, then the traditional sorting and
tree searching methods might be preferred. Hash coding has been selected
for presentation in this book since it is what many high-level programming
languages and linguistic data structures make extensive use of, and a view
is generally held that hash coding constitutes one of the most direct imple-
mentations of associative and content-addressable processing functions in
computer science.
The discussion of hash coding to follow is carried out by first intro-
ducing the fundamental ideas and then proceeding to quantitative comparisons.
References to survey articles and other collateral reading are found in
Sect. 2.9.
N
v '\' d wi (2.1)
L.. i
i =0
For instance, for the English alphabet, one may choose A = 0, B = 1, ... ,
Z = 25. The word 'ABE' thus has the numerical value 0 • 26 2 + 1 • 26 + 4 = 30.
It has turned out that this method works rather well in the numerical con-
version of English words and names.
Relating to another application of hash coding, the following example
shows how index variables can be converted into a numerical form suitable
for hash coding. This application describes a subtle method for the storage
and reading of sparse matrices [2.22]. A sparse matrix is one which has only
a small fraction of its elements nonzero, and they are distributed in some
arbitrary way over the matrix array. In order not to waste memory for the
representation of zero values, the nonzero elements can be stored in a hash
table whereby the pair of indices (i,j) that indicates the position of this
kind of element in the matrix array is now regarded as its "keyword". Assume
that i E {1,2, ... ,p} and j E {1,2, ... ,q}; it is then possible to represent
the numerical value of the "keyword" (i ,j) in either of the following ways:
recursively, using the last number obtained as the argument for the next
step. In order to decrease correlations in the sequence, it has been found
advantageous to modify the algorithm, e.g., into the following form:
or, alternatively,
all the states excluding zero, in other words, the length of the sequence
is 2n - 1.
An example of linear sequential circuit with the maximum-length sequence
is shown in Fig. 2.2. For details with other lengths of the register, see,
for instance, [2.24].
Cloc~
even and odd when v is odd; an odd-even unbalance in the keyword values
would then directly be reflected in an unbalanced occupation between even
and odd hash addresses. Neither should H be a power of the radix used in
the numerical conversion (for instance, in the case of the English alphabet
H should not be of the form 26 P) since v mod H would then always be iden-
tical if the p last letters of the keywords were the same. KNUTH [2.1] has
pointed out that if w is the radix used in numerical conversion, then it is
advisable to avoid table sizes the multiples of which are near any of the
values wk, where k is a small integer. Moreover, it is good to select a
prime number for H.
Special consideration is necessary if the hash table size is a power of
the radix used in the computer arithmetic (e.g., r = 2 or 10). In this case,
namely,
(2.5)
where the dp_1 through dO are the p least significant digits of v, which
together with B then define the hash address. In other words, no separate
computation for division then need be performed. As the computation of the
hash function is thereby made in a very convenient way, it must be remembered
only that the numerical conversion must not be carried out in a base, the
radix w of which is a power of the radix r of arithmetic.
The Multiplication Method. This method is to be preferred with computers
having a slower algorithm for division than for multiplication. While there
are no great differences in speed between these algorithms in contemporary
computers, there is yet another aspect of this approach; namely, by this
method the normalized hash address in the interval [0,1) is computed first.
After that the result is applicable to a hash table of an arbitrary size,
without any consequences possibly resulting from an improper table size, as
in the division method.
Assume that v is the nonnegative integer obtained in the numerical conver-
sion of the key. It shall fall into the range of representable values. Further
let c be a constant chosen from the range [0,1). The no~alized hash address
~(v) in the range [0,1) is first defined as
that is, as the fraction part of the product cv, whereafter the final hash
address becomes h(v) = H ~(v), with H the hash table size. In integer arith-
51
metic, the practical computation of ~(v) proceeds in the following ways. Let
b be the word length in bits, whereby d = 2b is the upper bound of the re-
presentable integers. Consider c as the integer cd, and for ~(x), take the
last b digits of the double-length product (cd}v; the radix point must then
be imagined at the left of these digits. Notice further that if the table
size H is a power of 2, e.g., H = 2P, then the leading p digits of the above
b digits constitute the hash address. (With the multiplication method, there
is no harm from the table size being a power of 2.)
The quality of the multiplication method obviously depends on the selection
of the constant c. It has been pointed out [2.1] that if cp is the "golden
ratio", then c = cp - 1 = 0.618034 is a rather good choice.
Hash-bit Extraction. Digit Analysis. One of the oldest and by far the simplest
of the hashing functions is named hash-bit extraation. When the keyword is
regarded as a binary string, then a suitable number of its bits, taken from
some predetermined positions on the string, are simply concatenated to form
a binary number corresponding to the hash address. The number of such hash
bits should be large enough to address the complete hash table. Usually a
special bit analysis is accompanied with the selection of hash bits: when a
statistical sample of keywords is available, it is advisable to pick up such
bits for hash bits in which approximately as many O's and l's occur.
Instead of bits, it is possible to think of the binary string as a binary-
coded representation of digits by dividing the string into suitable segments.
The statistical distribution of the various digits in each segment may be
studied, and digits corresponding to the best distributions are chosen for
the hash address. This method is named digit analysis.
The ~d-Square Method. One method which has very good randomizing properties
and in some comparisons (cf Sect. 2.5.2), with a particular hash table or-
ganization, may yield better results than the others, is named the mid-square
method. In this algorithm, the keyword value v is squared, after which the
hash bits are extracted from its middle part. This algorithm is computational-
ly very simple. Notice that if the keywords are of different lengths, the
hash bits are not taken from fixed positions, which further increases the
degree of randomization.
A precaution with the mid-square is to check that v does not contain a
lot of zeroes, in which case the middle part of the square may not become
very well randomized.
In the multiplication method, the address bits could be chosen in a simi-
lar way from the middle of the product; no experimental evidence is available,
52
where X and V denote bit strings, and ~ is the EXCLUSIVE OR operation over
corresponding bit positions.
Radix Conversion. This method is based on the idea that if the same number
is expressed in two digital representations with radices which are relatively
prime to each other, the respective digits generally have very little cor-
relation. Notice that this effect was already utilized in the basic numerical
conversion whereby the number of characters in the alphabet was selected to
be a prime number, and this number was chosen for the radix (2.1). If radix
conversion is taken into consideration in later steps, the keyword is then
53
(2.9)
Nd
P = [1 - (1 - d/H) ] (2.10)
For instance, for H = 10 4 bits, d 10 bits, and N 1000, we have P F:J 0.01.
In short, the various methods for the handling of collisions can be divided
into two mai n categori es: 1) ~·1ethods called open addressing, open hash, or
open overflow, which utilize empty locations of the hash table as reserve
locations. 2) Methods using a separate overflow area for the colliding items.
The techniques applied in open addressing contain many detailed and even
complex ideas for the improvement of efficiency in searching. In contrast to
this, the methods using an overflow area are relatively simple and straight-
forward, and if the efficiency were measured in terms of searches to retrieve
an entry, the performance of the latter method might seem superior (cf Sect.
2.5.1). Therefore there must be good reasons for the particularly high inter-
est in the open-addressing methods. One of them is that large hash tables
are usually held in secondary storages such as disk memories, and the con-
tents are only partly buffered in the fast primary memory in which, however,
the searching computations are in effect performed. It will be easy to under-
stand that for efficient buffering, it is advantageous to have all information
referred to in a program in a small local memory area, and this is made pos-
59
Primary C~u8tering. Quadratia and Random Probing. The most important motive
Assume that for some reason, for two distinct keywords Kl and K2 , there now
happens to be
fi(K 1 ) = f j (K 2 )
(i F j mod Hand Kl F K2 ). (2.12)
(2.13)
Since the above condition has to hold when f i (K 1) and f j (K 2 ) are arbitrary
members in two independent probing sequences, obviously primary clustering
is associated with linear probing only.
Quadratia probing, as one of the many possible nonlinear methods, will be
able to eliminate primary clustering. It has a nice feature of being easily
computable. This probing procedure was originally introduced by MAURER [2.33]
in the following form:
gi =a • ,. + b • ,.2 (2.14)
It has been concluded from practical experiments that if the hash table size
is a power of two, the probing sequences tend to become cyclic, and only a
part of the table is then covered up. For this reason, it was suggested that
the hash table size ought to be a prime, in which case the gi will be guar-
anteed to span exactly half the hash table before the sequence becomes cyclic.
58
Basic Probing Methods. The simplest type of hash table makes use of only one
memory area which is shared by the calculated addresses and their reserve
locations. This is called open addressing. The addresses of the reserve lo-
cations are derived from the calculated address, by incrementing the latter
according to some rule. A procedure by which the reserve locations are deter-
mined in relation to the calculated address was named probing. In linear
probing, the calculated address is incremented or decremented by unity until
an empty location is found. The increments can also be computed by various
algorithms. The best known procedures of the latter type are quadratic prob-
ing (often inaccurately named quadratic hash) in which the sequence of re-
serve locations is defined by increments to the calculated address which are
computed from a quadratic function, and random probing in which the corre-
sponding increments are pseudorandom numbers. It should be noted that the
addresses always have to be incremented cyclically: if H is the number of
locations in the hash table, and the table starts at B, and if the incremented
address is fi' then as the true address in the probing sequence, fi mod H + B
57
sible by open addressing. Another aspect is that there are no unique theo-
retical criteria by which the optimal performance in multilevel memories
could be defined, first of all because a compromise between speed and demand
of memory space always has to be made. In view of the large differences in
access times between primary, secondary, tertiary, etc. storages, and taking
into account the possibility for transferring data between the memory levels
by blocks in a single access, the performance, thus, cannot simply be mea-
sured in terms of memory accesses per retrieval; the overall speed should
actually be evaluated by benchmarking runs and performing a practical cost
analysis.
The procedure by which the reserve locations in the open-addressing method
are found is named ppobing. In principle, probing is a trial-and-error method
of finding an empty location. Whatever method is used in probing, the sequence
followed in it must be deterministic because the same sequence is followed
in the storage of entries as well as during retrieval.
In recent years, effective methods for probing and associated organization-
al procedures have been developed. It has thereby become possible to reduce
the number of accesses per retrieval to a fraction of that achievable by the
simplest methods, especially at a high degree of loading of the hash table.
These methods are here reviewed in a systematic order, starting with the basic
ones. It is believed that the choice of the most suitable variant will be
clear from the context.
Identifieps and SpeciaZ Mapkeps in the Hash TabZe. The accesses to the cal-
culated address and its probed reserve locations are made in the same sequences
during storage and retrieval, except for minor deviations from this when
entries are deleted from the hash table, as described below. The probing se-
quences are thus unique, but there usually exists no provision for the in-
dication of the position of an entry in them. In order to resolve which one
of the reserve locations belongs to a particular entry, a copy of the corre-
sponding keyword may be stored at the calculated or probed address, whichever
is proper. During search, probing is continued until a matching keyword is
found. Instead of the complete keyword, sometimes a simpler identifiep of it
can be used for the resolution between the colliding entries. For instance,
if the division method is applied in the hashing algorithm, and the separate
chaining method discussed in· Sect. 2.3.4 is used for the handling of colli-
sions, the quotient of the division (which, in most cases occurring in prac-
tice, is representable by a few bits) will do as identifier. (With all the
other collision handling methods, however, complete keywords have to be used
for identification.)
61
A = (h + i 2 ) mod H ,
and B = [H + 2h - (h + i 2 ) mod H] mod H
(i = 1,2, ... , (H - 1)/2) (2.15)
for which h = h(K) is the calculated address. It can be shown that addresses
obtained from A cover half the table and those generated by B the rest.
The above method has been worked out in an easily computable form by
DAY [2.36]. The computational algorithm is given below without proof:
1) Set i to -H.
2) Calculate the hash address h = h(K).
3) If location h is empty or contains K, the search is concluded.
Otherwise set i = i + 2.
4) Set h = (h + Iii) mod H.
5) If i < H, return to 3; otherwise the table is full.
HOPGOOD and DAVENPORT [2.37] have found that if the coefficients in the
quadratic function (2.11) are selected such that b = 1/2, then, with the
hash table size a power of two, the length of the probing sequence is
H - R + 1 in which R = a + b. With R = 1, the entire table will be searched.
The probing algorithm in this case takes on the following form:
With this method, the end of the probing sequence has been reached when the
ith and (i+l)th probed addresses are the same; this condition can be used
to indicate that the entire table has been searched.
ACKERMAN [2.38] has generalized the above maximum length results for the
case in which the table size is a power of a prime. BATAGELJ [2.39] has
pointed out that the maximum probing period can be reached if the factori-
zation of the table size contains at least one prime power, or is twice the
product of distinct odd primes. Some additional theoretical results on the
period of search with quadratic probing and related algorithms have been
presented by ECKER [2.40].
With random probing, two randomizing algorithms are needed: one for the
computation of the hashing function h(v) where v is the numerical value of
the keyword, and one for the definition of a sequence of pseudorandom numbers
{d i }. The address of the first probed location is fl = (h + d1) mod H, and
further reserve locations have the addresses fi = (f i _1 + di ) mod H, for
i = 2,3, .... Because of the mod H operation, it might seem that no re-
strictions on the range of the di are necessary. Nonetheless, some simple
restrictions imposed on the di will be shown useful. In order to avoid un-
necessary probing operations, no part of the table should be examined twice
until all the H locations have been probed. This is so if di is relatively
prime to H, i.e., di and H do not have common divisors other than unity, and
if di is less than H. For instance, if H is a power of 2, then di is allowed
to be any odd positive number less than H, and if H is a prime number, then
di can be any positive number less than H.
SeaondQry Clustering. Double Hashing. One of the problems associated with
the use of open addressing is named seaondQry alustering: whenever two or
more collisions occur with the same address, it is then always the same se-
quence of locations which is probed. Secondary clustering will somewhat
deteriorate the performance of quadratic as well as random probing with long
chains of reserve locations. If we do not consider multiple-keyword search-
ing applications in this connection, we can assume that the colliding entries
have different keywords. It is then possible to make the probing increment
depend on the keyword by the application of another hashing algorithm. Thus,
in this kind of-double hashing, two hashing functions are defined:
h(v) = the hash address corresponding to the numerical value v of the
keyword
i(v) = the "hash increment" corresponding to v.
63
BELL [2.41] has chosen the function b(K) in the following way in order to
make it rapid to compute and yet able to make secondary clustering unlikely.
If the initial hash address is computed by the division method, h(K) =
v(K) mod H + B, there will be available a key-dependent number, namely, the
quotient (denoted Q in the following) at no extra cost of computation. The
quotient may now be chosen for the parameter b(K), whereby the method is
named the quadratia quotient method. In order to cover the whole table, the
algorithm might be modified in analogy with (2.15); notice that a table, the
size of which is a power of two, can no longer be searched fully since one
is no longer free to choose the parameter b equal to 1/2. BELL has later
remarked, in response to a comment of LAMPORT [2.42], that the probing func-
tion
(2.18)
cation, and probings to reach the new entry in the place where the old one
was earlier, respectively, are thereby reduced.
The computational load in BRENT's method during storage, because of aux-
iliary probing operations performed during reorganization, is somewhat heavier
than with the simpler methods. On the other hand, the computations in re-
trieval are the same as with other methods. As will be pointed out in Sect.
2.5.2, with large hash tables stored in secondary storage it is possible to
compute rather complicated hashing functions in a time comparable with a
single memory access, and if the number of accesses is reduced by the new
method, then the overall searching time is correspondingly reduced, too.
Consider first that the hash table is not full and the probing sequence
relating to the new key K will terminate in s steps when an empty location
is found. Assume that the probed addresses are defined by an algorithm of
the form
where r = r(K) and q = q(K) are key-dependent parameters, and H is the hash
table size. If T(f i ) is the keyword actually found at address fi (notice
that in a probing sequence associated with a particular key, the latter is
not stored in the probed locations except in the last one) then another
probing sequence, formally similar to (2.20),
f ij = (r i + qi • j) mod H, j = I, 2, 3, ...
with
qi = q[T(f i )] , (2.21)
If this procedure is used with the linear quotient method, then one should
select
The efficiency of this method, when compared with the other methods, has
been reported in Sect. 2.5.1.
In most data management applications, the time spent for retrieving operations
is very precious, whereas more time is usually available for the entrance of
items into the table (initially, or in updating operations). Since the probing
calculations have already been made d'uring the storage phase, it seems unnec-
essary to duplicate them during retrieval, if the results can somehow be
stored. The calculated address sequence of the reserve locations can indeed
be recorded and utilized in search in a convenient way. Using an additional
field in every memory location, the address sequences can be defined by
ahaining or the Zinked Zist data structure. Starting with the calculated hash
address, every memory location is made to contain the address of the next
reserve location (which is computed when the items are entered). During re-
trieval, the address of the next location is then immediately available with-
out arithmetic operations. These addresses are also named pointers. This type
of data structure is also called direat ahaining, in order to distinguish it
67
Probed
.dd.....
L-:...-_-_-_-_-~
M.mory location:
Cont.nt Pointer
Fig. 2.3. Direct
chaining
when data are deleted from an unchained hash table. It is necessary only to
notice that in a hash table, it is completely equivalent which locations are
marked empty. The contents (identifier + entry) of that memory location which
is next in the chain are copied in place of the deleted item, and the location
which was next in the chain is available for new items and is marked empty.
The contents of the rest of the chain need not be altered. (Notice that it
is not practicable to change the pOinter of the location which was prior to
the deleted item since in a usual list structure there is no provision for
tracing the chain backwards.)
The end of the chain may be indicated by giving a particular value that
is easy to check to the last pointer: a constant (e.g., zero), the calculated
address, or the address of the location itself will do.
There exists a possibility for compressing the link fields needed to store
the pOinters [2.49]. When the probing sequence is defined by a recursive
algorithm which generates the hash increments, then the next reserve location
can be defined by giving the probe number, i.e., the number of steps in re-
cursion by which the new location can be found. The probe number, instead of
a pOinter, is then stored in the link field. This method may be named pseudo-
chaining. It should be noted, however, that if the maximum length of a probing
sequence may equal the hash table size, in the worst case, the probe number
contains as many bits as the hash address; if the link field is then dimen-
sioned according to the maximum length in probing, no space savings are
achieved.
this method is also called sep~te ahaining, ahaining with sep~te lists,
or nonaoalesaed ahaining (Fig. 2.4).
Storage
operation no.
I.III.VI
II
IV.V
Fig. 2.4. Separate chaining through an overflow area. The results of handling
two collisions at the first location and one collision at the third location
are shown, whereby the Roman numerals tell in what order the entries have
been stored
locations. Notice that the remainder already defines the calculated address,
and the keyword is uniquely determined by the quotient and the remainder.
The quotient is in many practical cases only a few bits long, and the space
thereby saved compensates to a great extent the extra space needed for the
pointers in the prime hash table.
Finally it may be mentioned that with multilevel storages, most of the
benefits of an overflow area can be achieved, and yet one is able to keep
the reserve address locally near the calculated address if the overflow area
is divided into several parts, each one belonging to a particular range of
calculated addresses. Only that part which corresponds to the calculated
address needs to be buffered in the primary storage.
2.3.5 Rehashing
With the use of a separate overflow area, no serious problem arises from the
prime hash table having a load factor which exceeds unity, possibly even being
a multiple of unity. Nonetheless there seems to be a practical limit for a
load factor which is many times considered to be around 0.8 to 0.9. With
open addressing, the filling up of the hash table is a more serious incident,
and there must exist an automatic procedure to handle this case. One possi-
bility is to assign additional addresses to the hash table with a continued
range of the original hashing function; since the old entries must stay
where they are, the distribution of occupied locations is not expected to
become uniform over the combined table. Another, preferable approach devised
by HOPGOOD [2.51,52] is to assign a completely new table of a greater size
and to rehash or reallocate all the entries into the new table. Transition
to the new table can be made when the old one becomes, say, 80 percent full.
The entries can be moved from the original table to the new table serially;
notice that in this case it is necessary to buffer complete keywords instead
of their identifiers in the hash table locations for the calculation of the
new hashing function.
Rehashing techniques have been discussed in detail by BAYS [2.53,54], as
well as SEVERANCE and DUHNE [2.55].
Conflict Flag. A simple and effective way to cut down unnecessary searches
has been devised by FURUKAWA [2.56]. This method makes use of the fact that
in a substantial portion of attempts, a search will show unsuccessful, i.e.,
71
the entry looked for does not exist in the hash table. It is now possible to
reserve a special bit named aonjtiat fZag at every location to obtain the
above answer more quickly. If during the search the search argument or its
identifier disagrees with the keyword or its identifier at the calculated
address, this may be due to either of the following possibilities. Either
an entry with this keyword is in the table but it has been set aside to a
reserve location due to a collision, or it does not exist. The conflict flag
can now be used to resolve between these cases. If the flag was initially
o and was set to 1 when the first collision with this location occurred,
then, provided that the value 0 for it was found during searching at the
calculated address and the keywords disagreed, one can be sure that the
keyword no longer can be found in either of the reserve locations, and the
search is immediately found unsuccessful.
The usage flag, in connection with the conflict flag, can also be used
to mark deleted items. Assume that the conflict flag in any location is set
to 1 if it is followed by a reserve location. The value 0 given to the usage
flag upon deletion of an item indicates that the location is empty (available),
but if the conflict flag has the value 1, this is an indication that the chain
of reserve locations continues.
Ordered Hash TabZes. The concept of an ordered hash tabZe was introduced
by A~mLE and KNUTH [2.57], in an attempt to speed up retrievals in case the
search is unsuccessful. This method is based on the observation that if the
entries in a list are ordered, for instance, alphabetically or by their
magnitudes, the missing of an item can be detected by comparing the relative
order of the search argument and the list items. As soon as the ordering re-
lation is reversed, one can be sure that the item can no longer exist in the
rest of the chain. Fortunately an algorithm for the ordering of items during
insertion is very simple, and it is described by the following computational
steps.
Assume that the probed locations in the chain are indexed by p, 1 ~ P ~ L,
and L+1 is the index of the next reserve location to be assigned. Denote by
Kp the keywords eXisting in the chain, let KL+1 = 0, and let K be the search
argument. The ordered insertion of items, which is exemplified below in the
case of magnitude ordering, is defined by the following algorithm:
1) Set p = 1.
2) If Kp = 0, set Kp = K and stop.
3) If Kp < K, interchange the values of Kp and K.
4) Set p = p + 1 and return to 2.
72
Hash Index Table. The data items which are stored together with the keywords
or their identifiers in the hash table may sometimes demand a lot of space.
Their lengths may also vary, but a normal hash table has to be dimensioned
according to the longest one. This then normally results in an inefficient
use of the memory space. Moreover, it is cumbersome to manipulate with long
entries when making the hash table searches. If one can afford an extra memory
access (which with secondary storages is necessary anyway), the operation be-
comes much more flexible when the entries are stored in a separate memory
area, and only pointers to this area, showing the beginnings of the records,
are stored in the prime hash table together with the identifiers. It should
73
0/1 flag, the true address is found from an auxiliary table or register
array, and the contents of the address field only define the index or ad-
dress of the corresponding location. Indirect addressing also facilitates
reallocation of programs into another memory area without a need to move the
data. It is further possible to continue indirect addressing on several
levels, every time looking for a 0/1 flag in the location found in order to
deduce whether literal contents or a further address is meant.
0/1 0/1
lal lop, 101 operand Ibl 1OP2 10 1 ad(operand)
0/1
lei 1 OPa 11 1ad(ad(operand))
32 bits of information, less than 17 bits per entry have then been used on
the average.
In general, advanced data structures based on hash linking might be use-
ful in certain high-level languages developed for artificial intelligence
research, list processing, etc.
FOPmats and Contents of Hash Table Locations. The size of the storage loca-
tion in the hash table is determined primarily by the amount of information
to be stored, thereby taking into account possibilities for compression of
the keyword identifier (cf the division method of hashing) and efficient
utilization of the memory space by direct/indirect addressing as described
above. However, the addressing principle used in most computers dictates
that the hash table location must be a multiple of a word or byte. Sometimes
even one excess bit to be stored may mean an extra word and extra readout
operation to be performed when searching a hash table location, and, there-
fore, the elements of the location must be chosen carefully. This problem is
alleviated if there is space left in the last word belonging to the location.
Obviously the indispensable constituents of a hash table location are the
keyword or its identifier, and the associated data or their pointer. If open
addressing is used, a further item to be stored is a flag which indicates the
end of a probing sequence. On the other hand, this flag can be dispensed with
if a dummy location, with a special code as its contents, is appended to the
sequence. One might also choose a much longer way in retrieval of probing
until an empty location is found, or until the whole table is probed, but
this is not very practical. When chaining is used, a special value of the
link word usually indicates the termination of a chain.
Optional contents of the hash table location are the following one-bit
markers: the usage flag, the conflict flag (cf Sect. 2.5.3), the flag which
marks a deleted item, and the link flag discussed above. The usage flag and
the link flag can be replaced by a special value given to the contents of
the location, e.g., all bits equal to 0 may mean an empty location, and all
bits equal to 1 in a particular field may mean that the rest of the location
is a hash link. Consequently, entries with these data values then cannot be
stored. The usage flag facilitates a faster operation; the flag indicating
a deleted item may be replaced by the conflict flag (Sect. 2.5.3), and it is
unneccessary if chaining is used.
Examples of hash table locations are given in Fig. 2.6.
76
10 TI data 10
10 UIT 1 0 1 data
10
ro r i
Fig. 2.6. Various types of hash table location. Legend: 10 = keyword identi-
fier; Pi = pointer to data area (index); Po = pointer to overflow area (or
next entry in direct chaining); T = terminal flag; U = usage flag; 0 = flag
denoting a deleted entry; C = conflict flag; L = link flag
Fig. 2.7. Indexing of words in the hash table (see text). In this example
it is assumed that k = 4
77
address is used in this case as the index to an element of the entry in each
section which results in less computation during searching; especially with
symbol tables, this is a rather important benefit.
Buffering of Hash TabZes. Bucket Organization. When an auxiliary storage
device is used for a large hash table, only part of the table which contains
the calculated address may be buffered in the fast primary memory. If linear
probing is used for the handling of collisions, it is then likely for one
to find a substantial number of the reserve locations, possibly all, in the
buffered area. Some systems programs allow flexible swapping of data be-
tween primary and secondary storages in quantities named pages; with disk
memories, a typical page size is 256 words which corresponds to an address-
able sector on a track. In order to increase the probability for finding
all reserve locations on the same page, the increments in probing algorithms
can be taken cyclically, in the modulus of the page size. Only in the case
that the page would be filled up completely, the chain of reserve locations
will be continued on the next page. Although the distribution of occupied
addresses in the hash table thereby becomes less uniform, the overall speed
of retrieval is probably increased as the number of slow accesses to the
auxiliary storage is thereby reduced.
Another useful principle which is commonplace, especially in connection
with auxiliary storages, is named bucket organization. In it, several con-
secutive locations are combined under a common hash address. Such a combi-
nation is named bucket. and it has a fixed number of "slots" for colliding
entries. There is thus no need to look for reserve locations at other ad-
dresses as long as all slots of the bucket are not full. In the latter case,
another bucket is appended to the chain of reserve locations, and only one
pOinter per bucket is needed. (Fig. 2.8).
J
end
«empty)
Fig. 2.8. Two buckets with four "slots" each, chained, the latter partly
empty
78
In linear probing, overflows can also be stored in the next bucket which
has empty space for them. There obviously exists an optimum number of slots
per bucket in a particular application. This depends at least on two opposing
factors. With a greater number of slots in a bucket, the common pointer re-
presents a smaller overhead. On the other hand, since the number of hash
addresses then becomes significantly smaller than the number of available
memory locations, it is expected that the distribution of entries in the
hash table becomes uneven, with a result of poorer efficiency in its uti-
lization and an increased number of collisions. The theoretical optimum
seems to be very shallow; we shall revert to the question of bucket size
at the end of Sect. 2.5.
Quite independently of theoretical optimality considerations, one may
easily realize that the reading mechanism of a rotating memory device such
as a di sk favors 1arge buckets; .if the bucket is made equi va 1ent to a track,
its contents are transferable to the primary storage in one operation.
HIGGINS and St~ITH [2.59] refer to a possibil ity of making the bucket
size grow in a geometric series in a chain; the leading idea is that the
frequency of overflows from the first bucket, especially with multiple key-
words, depends on the frequency of usage of that keyword, and the disk space
might thereby be utilized more effectively. This idea which may be justified,
e.g., in library indexing is in contradiction to the ideas of KNUTH [2.1]
and SEVERANCE and DUHNE [2.55] who suggest that the overflow buckets (with
sing1e keywords) should be smaller than the prime buckets.
It seems that the principle of buffering of data was understood already
in the early days of computer technology. Even the bucket organization was
described in the first papers published on hash coding, e.g., those by
PETERSON [2.60] and BUCHHOLZ [2.61]. Incidentally, the first applications
of hash coding were meant for large files!
On the Use of Buakets for Variable-Length Entries. Since the bucket organi-
zation is normally connected with the use of secondary storages, even long
entries can be stored in the locations. The searching time depends primarily
on the time needed to transfer a bucket to the primary storage, whereas a
linear examination of the "slots" can be made by fast machine instructions
in the mainframe computer. One possibility with variable-length entries is
now to pack them linearly (contiguously) within the bucket, without consid-
eration of any slot boundaries. It is necessary only to demarcate the dif-
ferent entries using, e.g., special bit groups. For a comparison of various
methods for variable-length entries, see, e.g., the article of McKINNEY [2.62].
79
Of all hash tables implemented so far, maybe the most complex ones are those
designed for various artificial intelligence languages, for instance, SAIL
(Stanford Artificial Intelligence Language). A subset of it which was ini-
tially developed as an independent version for the storage and manipulation
of associative (relational) data structures is named LEAP, and it was imple-
mented around 1967 by FELDMAN and ROVNER [2.63-66]. The language processor
of LEAP contains many features which are not pertinent to this discussion;
however, the special hash table organization of it is reviewed in this sub-
section. The most important new feature which distinguishes the hash table
of LEAP from other forms discussed earlier is that there are several types
of memory location in the same table. Their divided roles allow an optimal
usage of the data space. This solution is partly due to the fact that entries
are encoded by multiple keywords and made retrievable by any combination of
them. Although multikey searching procedures will otherwise be presented in
Sect. 2.6, this particular case may serve as an introductory example of them.
It must be emphasized, however, that the most important reason for the defi-
nition of so many types of location has been the necessity of facilitating
an automatic generation and manipulation of data structures by the language
processor.
Since the data structure of LEAP was designed for a paged, time-shared
computer TX2 with a large secondary storage, the memory space could be allo-
cated redundantly, which resulted in a high speed in retrieval. The version
of LEAP implemented in SAIL is actually a restricted version, and it is the
original form which is discussed here.
In order to facilitate the understanding of data structures in presentation,
it will be necessary to describe first the modes of searching expected to
occur. The entries in associative languages are constructs named r~ssociations"
which are represented by ordered triples (A,O,V). The elements A, 0, and V
have the following roles: 0 is the name of an object, A is an attribute of it,
and V is some value (e.g., name) associated with the pair (A,O). For example,
a verbal statement "the color of an apple is red" has the following equivalent
representation: (color, apple, red). The value can also be another entry,
namely, a triple linked to (A,O). We shall abandon the consideration of data
structures thereby formed in this context; let their discussion be postponed
to Chap. 6. The aspect in which we shall concentrate here is that a triple is
always stored as such in the memory space, but it must be retrievable on the
basis of one, two, or all three of its elements A, 0, and V. The case in which
8D
The two higher order digits of v(A) were identified with the page number. Let
the number formed of the remaining digits be denoted v'(A). The re~ative
address of the entry on the page was obtained by a hashing function
where 0 denotes the EXCLUSIVE DR operation over respective bits in the binary
81
strings of v'(A) and v(O), and h corresponds to the binary number thereby
formed, taken modulo H.
A property of the EXCLUSIVE OR function is that (x • z = y • z) implies
(x = y). Accordingly, in order to resolve between the colliding items in the
hash table, it was considered enough to use v(O) as the identifier of the
entry. Notice that the quotient in the mod H division of Eq. (2.24) might
have required yet less space as an identifier (cf Sect. 2.3.1).
subahain ~ganization of Resepve Locations. The special subchain organization
of the reserve locations applied in LEAP was intended for speedup of retrie-
val. Oue to multiple keys and unlimited occurrence of the elements A, 0, and
V in the triples, a normal case would be that there are two types of collisions
in the hash table; they may result from the mapping of distinct compound
keywords on the same address, or occurrence of identical compound keywords.
The reserve locations were, therefore, structured in two dimensions, shown
horizontally and vertically, respectively, in Fig. 2.9.
Fig. 2.9. An example of two-way chaining in LEAP, with six colliding entries
(see text). Fields on the top row contain different keyword identifiers 1°1 ,
1°2 , and 103' Field F contains various flags (location type identifiers).
A: see Fig. 2.10. Notice that no keyword identifiers are needed in members of
the subchains except the head; the same goes for the A fields (see Fig. 2.10).
Notice, too, that the circular list structure facilitates a continuous trans-
fer from one subchain to another. VI through V5: values of "association"
cess to all entries with a particular compound keyword. The ends of the
chains are always indicated by pointers back to the calculated address.
The extra blank fields as well as the A fields in the entries shown in
Fig. 2.9 shall be explained below. Observe the number of necessary pOinters,
which varies with the role of the entry; it, therefore, seems possible to
optimize the utilization of memory space by using variable-size locations.
The operations associated with the language processors, in fact, require
that yet more types of locations be defined. For an immediate check of whether
a chain of colliding entries was appended to the calculated address, a special
conflict flag can be used in the field on the upper row shown blank. In this
field it is also possible to show whether a subchain has been appended to an
entry. Further specified roles of entries, indicatable by flags stored in
this field are: last member in a horizontal chain, member in the vertical chain,
member which contains its internal name (for language processor operation),
and type of memory location used as independent register.
Inverted List. In order to make the stored entries accessible on the basis of
single elements A, 0, and V, respectively, the inverted list (hash index
table) structure is used. A convention is made to store the list accessible
by A (called briefly the A list) on the same page as the hash table access-
ible by (A,O); accordingly, this page is called A-type page. The 0 list and
V list are on O-type and V-type pages, respectively. The keyword A is hash
addressed, and in the corresponding location there are stored the identifier
of A, and a pOinter to one of the entries (the one with this particular A
I I I I
(AI.O I ' (AI·02'
I I I I
(A I .0 2 ,
(AI.O I '
Fig. 2.10. The inverted list structure which facilitates searching on the
basis of Al (see text). Notice that all entries shown in this picture are
accessible by Al through the subchains. Notice that if there exist other
chains of reserVe locations similar to that shown in this picture which
have the same Al value, they must be appended to this A chain. The A list is
always confined on the A-type page
83
value which was stored first in the table). Further entries which contain
the same A value are linked to the same list, by using the A pointer fields
shown in Fig. 2.9. Let Al be a distinct A value. The chained structure used
for the inverted list is shown in Fig. 2.10. The entries shall be identical
with those occurring in Fig. 2.9. For clarity, the various types of compound
keywords (A,O) thought to occur in this example have been written below the
storage location in question. As usual, the end of the A chain is indicated
by a pointer to the calculated address (A).
Up to this point in this chapter, the emphasis has been on the introduction
of various designs for hash tables. For a practical application, one has to
choose from many options. Some of these choices are clear, for instance,
one may easily deduce whether open addressing or overflow area has to be
used, since this depends on the nature of the hash table as well as the type
and size of memory device available. For instance, let us recall that sym-
bol tables are usually kept in the primary storage, and chaining through the
overflow area is thereby considered most effective. On the other hand, with
multilevel memories, it was considered significantly beneficial to store all
related data close to each other, whereby open addressing, or partitioned
overflow area, is advantageous.
The size of a memory location is also rather obvious from the amount of
information to be stored with an entry, and it is normally a multiple of
the length of an addressable word. But one of the most important architectural
features which is not quite easy to decide is the bucket size, i.e., the num-
ber of slots in the bucket; although an addressing mechanism of the secondary
storage may make it convenient to use, say, one track for a bucket, nonethe-
less for the best efficiency it is desirable to learn how the number of mem-
ory accesses theoretically depends on the number of stored items with dif-
ferent bucket sizes.
Another option for which plenty of alternatives are available, but only a
few of them are to be recommended, is the hashing function. The quality of
different functions is relative with respect to the keywords, and cannot be
known until benchmarking runs have been carried out in real situations. It
seems that the problem of the best hashing function now has a rather gener-
ally valid answer, and in spite of the fact that so many alternatives have
been introduced, the recommended choice in most cases is the division method,
84
Average Number of Accesses with Random Probing. With the aid of the follow-
ing simplified example, it becomes possible to gain an insight into the most
important property of hash coding, namely, the average number of accesses as
a function of loading of the hash table. This example describes an open ad-
dressing situation when random probing with independent double hashing is
used, and ideally randomized hash addresses are assumed. In other words,
degradation of performance due to primary and secondary clustering is com-
pletely ignored. For further simpliCity, only the case of successfuL search
is considered in this example, i.e., it is assumed that the entry to be lo-
cated really exists in the table so that the search ends when it has been
found. t~oreover, all keywords are assumed different (multi p1e-response cases
are not considered). The average number of accesses to the hash table during
readout can then be derived theoretically using elementary statistical consid-
erations. It may be clear from the above reservations that an optimistic
estimate to the average number of accesses is thereby obtained.
85
Assume that N entries are stored at random and with a uniform distribu-
tion in a hash table containing H locations. The ~oad faator of this table
is defined as a = NIH. The expected or average number of trials to find an
empty location is obtained in the following way. The probability for finding
the empty location on the first probing is 1 - a; if probing is assumed com-
pletely independent of hash addressing, the probability for finding an empty
location on the second but not on the first trial is [1 - (1 - a)l (1 - a) =
a(l - a); the corresponding probability on the third, but not on the first
nor the second, trial is [1 - (1 - a + a(l - a))](l - a) = a~(l - a), etc.
The average number of trials Tav is obtained when the ordinal number of the
trial is weighted by the corresponding probability, i.e.,
i 1
Tav = (1 - a) ~ a =~ (2.25)
i=l
It is now assumed that the same address sequences are followed during the
storage and retrieval of items (which holds true if no items have been de-
leted during use). As a rule, the length of an address sequence grows with
the filling up of the table. It is to be noted, however, that when an entry
is retrieved, the corresponding sequence may be new or old. It is noteworthy,
too, that the table was initially empty, and when it was filled up so that
the final load factor is a, the average number of probings made during this
process is smaller than the expected number of probes at the last step. If
the number of probes is averaged over the history of the table, the value so
obtained then must be the same as the expected number of accesses in the
readout process. With random probing, the distribution of entries in the
table can be assumed to stay random and uniform at all times, and if the
table has a large number of locations, the expected number of searches made
during readout may be approximated by the integral
1 a dx
E = - I-- = - {l/a) In(l - a) (2.26)
r a 0 1 - X
For instance, when the hash table is half full, Er (O.5) = 1.86. We shall
return to the above expression Er = Er(a) with the analysis of other probing
methods.
86
The idea of random probing, and the above result were publicly mentioned
for the first time by r,1cILROY [2.67], based on an idea of VYSSOTSKY. Meth-
ods for derivation of (2.26) that have been presented in the literature are
somewhat different.
Average Number of Searches in Linear ~obing. Although linear probing is
the simplest method for finding the reserve locations,its statistical analysis
turns out to be tedious, in view of the fact that the probing sequences may
coalesce. ERSHOV, who was one of the first to introduce hash tables and
linear probing [2.68], conjectured that if the table is less than 2/3 full,
then the average number of searches would be less than two. In the classical
work of PETERSON [2.60], the average number of searches was determined by
computer simulations, not only for simple hash tables but for bucketed tables,
too (cf the discussion later on). These figures are consistently higher by
a few percent than those obtained later on. One of the first reasonably
accurate theoretical analyses on a probing scheme that is very much similar
to the linear probing method described above was made by SCHAY and SPRUTH
[2.69] who used a Markov chain model to compute the average length of probing
sequences. This analysis, a bit lengthy to be described here, applies to the
normal linear probing scheme, too. Thus, if certain statistical assumptions
and approximations (such as stationarity of the stochastic processes in the
Markov chains, and replacement of the binomial distribution by the Poisson
distribution) are made, one arrives at the following approximative expression
for the average number of searches:
2 - CI CI
ER, (CI) = 2 _ 2C1 = 1 + -=-2""-'l'---Cl-'-)
( (2.27)
The validity of (2.27) has later been strictly discussed by KRAL [2.70] as
well as KNUTH [2.1] (cf also Table 2.1). It is noteworthy that the above
expression describes the number of accesses in the case that the search was
successfuZ; we shall revert to the case that the entry does not exist in
the table later on in Sect. 2.5.3. If the hash table is half full, (2.27)
yields the value ER.(0.5) = 1.5; moreover, ER,(2/3) = 2.
Comparison of Searches in RandOm. Linear and Quadratic ~obing. One of numerous
It can be clearly seen that quadratic probing which eliminates this effect
is theoretically almost as good as random probing. Another interesting point
to notice is the validity of Et(a) when compared with numerical results.
In the experiments reported in Table 2.2, the keywords were drawn from
algorithms published in "Communications of the Association for Computing
Machinery", using (presumably) the division method over the table size H =
2048. Only the first three characters of the identifiers were selected for
the keywords. It may be mentioned that the linear probing algorithm was
tested with several aonstant integers as the probing inarement (denoted by
a), and the value a = 173 reported in Table 2.2 is an optimum obtained in ten
runs.
Comparison of Quad:r>atia Probing and the Quadzoatia and Linear Quotient Method.
Those numbers which are available to us about the quadratic and linear quotient
methods are due to BELL [2.41] as well as BELL and KAMAN [2.43]. Unfortunately,
88
comparable figures have been reported only for the average number of trials
to insert a new entry; in the random probing with independent double hashing,
this would be given by the theoretical expression Tav = (1 - CI)-l [cf (2.25)].
The corresponding values for the other methods are given in Table 2.3. The
hash addresses were random.
15,320 probings, respectively. The time per probing was 10 percent less with
the linear quotient method, however.
Comparable simulation results are available about the average number of
searches with quadratic probing and the quadratic quotient method (for random
keys) [2.41], and they are given in Table 2.4. In view of the previous numbers
(Table 2.3) one can deduce that the last column is approximately valid for the
linear quotient method, too.
Table 2.4. Average number of searches in quadratic probing and the quadratic
quotient method
0.5 1.386 1. 44 1. 38
0.6 1.527 1.61 1. 52
0.7 1. 720 1.84 1.72
0.8 2.012 2.18 2.01
0.9 2.558 2.79 2.55
Average Number of Searches from a Rearranged Hash Table Using the Linear
Quotient Method. The method of BRENT (cf Sect. 2.3.2) for the reduction of
searches in open addressing by rearrangement of the reserve locations has
the best performance so far reported for any open-addressing methods. BRENT
has carried out a theoretical analysis as the result of which the following
serial expression for the average number of searches EB(a) is obtained:
Table 2.5. Average number of searches with Brent's improvement of the linear
quotient method
Searahes with CoaZesaed Chaining. With open addressing, the number of prob-
ings increases progressively as the load factor approaches unity. However,
if the reserve locations are defined using direct or coalesced chaining,
the tedious probing calculations are made only during insertion of entries
into the table, while the unsuccessful probings are not involved in the
searches during retrieval. The number of searches from a chained hash table
is therefore greatly reduced even if the table is completely full. KNUTH
[2.1] has worked out the following approximation for the average number of
(successful) searches Ec(~) from a hash table with coalesced chaining
(assuming uniform hashing):
~ 1 2
Ec( ~) = 1 + "4 + 8~ (e ~ - 1 - 2~) . (2.29)
Some figures computed from (2.29) are given in Table 2.6, together with ref-
erence values for uniform random probing.
Searahes with Separate Chaining (Through the Over~ow Area). The load factor
in open addressing cannot exceed unity. With the use of a separate overflow
area, only the calculated addresses exist in the hash table; accordingly, if
no restrictions are imposed on the size of the overflow area, an arbitrary
number of entries can be stored in the latter. If the load factor ~ is again
defined as the ratio of the number of stored items to the number of addresses
in the hash table, it may exceed the value unity by any amount. The average
number of searches vs ~ may then be evaluated. Actually a comparison with
the other methods based on this figure is not quite justified since the
demand of memory in the overflow area is not taken into account, and a too
optimistic view of separate chaining may thereby be obtained. However, the
hash table and the overflow area usually have different organization, and it
becomes difficult to define the load factor in any other simple way.
91
The analysis of searches with separate chaining is very simple and can be
delineated in the following way. If N keywords are randomly and uniformly
hash addressed over a table of size H, the average number of collisions at a
particular address is NIH = a. As all collisions will be handled by chaining
the overflowing entries separately for every calculated address, the average
number of reserve locations will also equal a. If the sought entry really
exists in the table, the expected length of search is
(2.30)
(One search is always needed for looking at the calculated address.) The
same result was obtained by JOHNSON [2.71] on more complicated grounds.
Indirect chaining has also been analyzed by HEISING [2.72].
A comparison of searches with random probing and with coalesced and
separate chaining, respectively, is given in Table 2.6.
Table 2.6. Average number of searches with random probing and with coalesced
and separate chaining (successful search)
Table 2.7. Average number of bucket accesses vs bucket size with linear
probing and assuming successful searches
For all values of load factor, the number of searches decreases monotonically
with increasing bucket size. If the whole hash table formed one bucket, the
number of retrievals would equal unity. It would be absurd to state that this
is the optimum, and an obvious flaw in this analysis is that neither the
effect of bucket size on the transfer time of records to the primary memory,
nor on the time needed for comparisons connected with the bucket access, is
taken into account. Accurate evaluation of the optimal bucket size is very
difficult since the computing costs may be defined in many ways. SEVERANCE
and DUHNE [2.55] have made one analysis in which they arrived at a theoretical
optimum for the bucket size which is between 4 and 18. This question needs
further investigation, however. For instance, independently of optimality
consideration, the reading mechanism of the secondary storage usually favors
a bucket size equivalent to that of one track or an integral number of pages.
Another figure which may be helpful in the design of bucketed organizations
is the percentage of entries overflowing from the buckets during insertion.
The results given in Table 2.8 contain theoretical results with random keys,
calculated by SEVERANCE and DUHNE, as well as practical results that are
averages from several different practical files (cf [2.29]). The agreement
93
this property; further advantages are achieved if the probed buckets are
chained (directly): the unnecessary probings during retrieval are thereby
eliminated, which brings about a significant advantage in view of the rel-
atively slow operation of the secondary memory devices. Reorganization of
hash tables after deletions is also made more flexible by chaining: no en-
tries need to be moved when others are deleted, and this is an advantage when
the buckets contain a lot of data.
The extra pOinters needed in a chained structure represent an overhead in
memory resources which, however, is negligible if the bucket size is much
greater than unity.
HandZing of OverfZows trom Buckets with Separate Chaining. It has been pOint-
ed out by several authors that if one wants to apply chaining with separate
lists, it is advantageous to have the reserve buckets on the same memory
area, e.g., on the same track as the calculated address. However, in the
bucket-per-track case there would be no sense in reserving an overflow area
on the same track as the bucket since it would then be more natural just to
make the bucket correspondingly bigger. If, on the other hand, there are
several buckets on a track, for instance, one corresponding to every address-
able sector, the transfer time of a bucket may be significantly shorter than
that of a whole track (due to the serial nature of transfer, and some possible
restrictions set by the operating system). All buckets can then be made to
share a common overflow area at the end of the track, and for a sufficient
storage capacity, this could be chained by the entries as explained below.
KNUTH [2.1] has made a further notice, namely, that there is no reason to
make the bucket size in the overflow area equal to that in the hash table.
In view of the fact that overflows from buckets are relatively rare events,
the reserve buckets would be very sparsely occupied and space would be wasted.
Instead KNUTH suggests that the overflowing entries as such could form lists,
i.e., the bucket size in the overflow area could be unity.
The average number of searches with bucket organization and separate
chaining of the overflowing entries, given in Table 2.9, is based on the
above suggestion of KNUTH. The figures are from the work of SEVERANCE and
DUHNE. These values do not, however, describe the average time of search
since, for instance, if the complete overflow area were retrieved in the
same memory access as the bucket itself, the number of retrievals would be
trivially equal to one.
The increase of accesses with bucket size for a > 1 is due to chaining
of individual entries in the overflow area: every entry thereby needs one
95
access, and this effect does not manifest itself until at high load factor
and large bucket size.
Table 2.9. Average number of searches with bucket organization and separate
chaining of overflowing entries (successful search)
300
l/'
200
v
100 / y
50
V V V
/
/
,,/"
40
I / V y
;
u
u
CD
'0 20
30
/ v
/ v v
~
j V / ~/
.!E
c"
i V1\rl
v
./
Y II
10
l
5
~
I} V
/
4
/ .. v
/ v v
3
2
~
I- ----
~
I--
DA
0
-{T: }
.5 .6 .7 .8 .9 1.0
Load factor
2.5.3 Special Considerations for the Case in Which the Search Is Unsuccessful
age length of successful search is L/2. In hash tables, the situation is not
that simple and needs more careful analysis. For instance, with linear prob-
ing, the average spacing of reserve locations in a chain grows with the
filling up of the table. If an entry exists in the chain, it will be found
when, on the average, half of the chain is retrieved. On the other hand, the
number of probings on the first half of the chain is smaller since the aver-
age spacing is smaller, and retrieval of the rest of the chain (for unsuccess-
ful search) takes progressively more probings. The analysis of an unsuccessful
search should actually be carried out for every collision handling method
separately. This would extend the material presented in this chapter consider-
ably, and we shall refer to a couple of the simplest cases only. However, if
unsuccessful searches would play an important role in practical applications,
the conflict flag method or the ordered hash table described earlier can be
used.
In the simplest case, the average number of unsuccessful searches is ac-
tually the same as the number of trials for inserting a new item. The average
number of unsuccessful searches in the cases of linear and random probing
are approximately given by the following theoretical expressions [2.1]
Linear probing: 1 -2
I [1 + (1 - a) ] (2.31)
-1
Random probing: (1 - a) (2.32)
Numerical values computed from these formulas, together with figures refer-
ring to successful searches, are given in Table 2.10.
Load factor a
0.5 0.6 0.7 0.8 0.9
possible choice for such mass memories would be magnetic disk; special de-
vices such as optic ones may be used in extremely large archives, too.
In view of the above considerations, it seems necessary, within the scope
of this book, to have a special discussion of methods connected with the
searching of entries on the basis of multiple keywords, or the muZti-key
searching. The problematics of this field are discussed in this section in
a rather general form, without reference to particular applications. Some
standard principles of mUlti-key searching are presented first. After that,
special solutions implementable only by hash coding are introduced.
It ought to be emphasized that definition of a searching task is often
made by giving a set of names and their attributes. In this case the semantic
differentiation of keywords enters into the addressing problem. Searching
from semantic structures will be discussed in Chap. 6.
Any ordered set of data items may be named a Zist. A Zist structure results
when an element of a list is another list, and if such a nested organization
is continued, the list structure may grow very complex.
A list or list structure itself is an entry in memory which must be access-
ible in a convenient way. There is a particular point in the list, usually
its first item, which is called the head or header, and which is identifiable
by the name of the list. The names can be kept sorted in which case usual
searching methods (described in numerous textbooks) can be applied. A con-
venient method for the location of list heads is the application of hash
coding on the names, whereby the latter need not be sorted. The other items
may be linked to the list in a number of ways; the simplest way is to use
consecutive locations.
Examples of lists the items of which are not stored consecutively but
scattered over the memory area are the chains of reserve locations in hash
coding; the calculated address then comprises the list head. The sequence of
addresses can be computed by a probing algorithm at every retrieval, or di-
rectly defined with the aid of pointers. A Zinked Zist is any collection of
items which is chained using pointers. A one-way (ZinkedJ Zist allows tracing
of its items in one direction only. In a two-way Zist, also named symmetric
or doubZy-Zinked Zist, every item contains two pointers, one pointing for-
ward and the other backward, with the result that the list can easily be
traced in both directions. A circuZar or threaded Zist, also named a ring,
is a one-way list the last pointer of which points back to the list head.
Various types of linked list are shown in Fig. 2.12.
101
Helld
(8) I• I• ,I lend I
Ib) I• ,I I• 1
Ie)
I' ! I•
Jt
I I•
I 1
Fig. 2.12. Various types of list; (a) one-way list, (b) circular list,
(c) two-way (double-linked) circular list
For implementing a Zist structure, some items must have at least two
pointers thus defining a node. A tree is a list structure with branchings
starting from a node named root, and it has no recurrent paths to the same
node. A binaPY tree, an example of which is given in Fig. 2.13, always has
two branches at every node, except for the last items. A graph is a general
list structure which may have recurrent paths.
There are various methods for the retrieval of information from graphs,
as well as means to reduce graphs in simpler forms. We shall abandon such
discussions here since the list structures are considered only in a few in-
stances in this book. The principal applications of list structures will be
discussed in Sect. 2.8 and Chap. 6. For references of lists and related data
structures, see, e.g. [2.81-86].
102
MuZtiZists. Structures which are formed when the same item may occur in many
lists, i.e., when several independent chains pass through the same set of
items, are named muZtiZists. They differ from meshy graphs by the fact that
the common items do not behave like nodes, but every partial list, starting
from its head, can be followed in a unique way up to its end. For this pur-
pose, the same principle as that used in the hash table may be applied: the
identifiers of the individual list heads must be stored together with the
respective pointers, for the resolution between different chains (Fig. 2.14).
Rags Data
101 Pointer 1
102 Pointer 2
1D3 Pointer 3
Key/first entry/length J
Cell 0
Cell 1
Cell 2
Cell 3
Fig. 2.15. Usual multilist
103
is made to the last partial list which is normally much shorter than the
complete chain of a usual multilist (Fig. 2.16).
Call 0 Call 0
Cell 1 Call 1
-~:~
,
Call 2
-- Call 2
- - - - - -
A21-p'
r/
...,"- - r- -
1~22
- - ... -....--
\~..,
A2~\A20
A27 _______ ________ 1_
~
---------- -
Call a
// 1
Call 3 0' A37 A35
In the so called aeZZuZar muZtiZist, the lengths of the lists are not
limited; instead, a chain is not allowed to exceed the boundaries of a physi-
cal memory block, for instance, a track (cell, bucket) (Fig. 2.17).
The advantages of this organization are that all data in a partial list
are always transferred to the primary storage whenever one memory block is
buffered in it, and furthermore, it is usually deducible from the directory
that only a few blocks need to be examined. For instance, if in the example
of Fig. 2.17 a combination search on the condition 'X and Y' were to be per-
formed, examination of cell 3 is unnecessary since Y does not have entries
in it. Moreover it is possible to carry out the searches in different cells
along the shortest chains, for instance, on cells 0 and 1 along the Y chain,
and on cell 2 along the X chain.
The aeZZuZar seriaZ organization (Fig. 2.18) has some features from the
cellular multilist and the bucketed organization. In the directory, only
indices of those memory cells are indicated in which entries with a particular
keyword occur. A complete linear search, however, has to be performed through
the corresponding cells. The updating of this organization is simple, and if
one cell holds only a few records, the operation is relatively fast.
105
CeliO o
o
o
----------------------
o
Cell 1 o 0
o o 0
---------------------0
o o
Cell 2 o o
o
o
Cell 3 o Fig. 2.18. Cellular serial organi-
zation
w x y Z
A06 A19 NJ3 A15 A09A21 A15 A25
A07 A23 A07 A19 A14 A28 A17 A27
A09 A35 A14 A20 A16 A22 A37
A12
OA3
° ° A7
A6 A9
°
A14
o 0 A15 A19
AlSO 0A17 0
---------------------0
A21 ° 0A22 ° A20
A280 ° A25 A23
°A27
° A37 °
A35
Fig. 2.19. Inverted list
106
Document
2
There are some opposing opinions about one detail of the probing algorithm,
namely, whether the reserve addresses should be on the same page as the cal-
culated address and thus be computed modulo 256, or whether they should con-
tinue on the next page and be defined modulo 220. The choice is largely dic-
tated by how many times the same keyword may occur in different entries on
the average. Especially if the load factor is relatively small, modulo 256
probing seems profitable since this then reduces the number of disk accesses.
The sorting tables Band C are used in the combination search in the fol-
lowing way. They are hash-addressed with respect to the pointers found in
the hash index table. The search argument presented first picks up a set of
pointers in table A. These pointers (their copies) are then stored in table
B at addresses which are hashing functions of the pointers themselves. No
other associated information is stored in these locations. The second keyword
picks up another set of pointers independently of those found in the first
step. However, when the second pointers are hash addressed, it directly be-
comes possible to check whether they have already been recorded in table B.
If this is the case, one can deduce that both search arguments must occur in
these entries. In the opposite case the pointers found first may be forgotten.
If no further search arguments have been given, the pointers for which a co-
incidence was verified are entered into a list according to which the final
look-up in the document area is made. If, on the other hand, a third search
argument is given, those pOinters for which a coincidence was due are first
copied into table C, again using hash addressing. When the pointers picked
up by the third search argument are hash addressed for table C, it then be-
comes possible to check whether they coincide with the pointers found in the
first two passes. If the result is positive, one can reason that all the
three search arguments presented so far are contained in entries correspond-
ing to these pointers. If the number of search arguments was three, the point-
ers can now be listed. With a still greater number of search arguments, one
first copies the last pointers in table B, after its previous contents have
been cleared up. In this way tables Band C are used in turn until all search
arguments are exhausted. The "sifting" procedure described above continuously
reduces the number of possible candidates until only those having keywords
in agreement with all the specified search arguments are left.
By virtue of hash coding applied at all steps of retrieval, the complete
search can be performed extremely quiCkly. In a real application described
in this example, the sifting time was much less than that needed to type in
the keywords.
109
The multi-key searching methods discussed above were based on the determi-
nation of intersections of lists associated with individual keywords. This
subsection approaches the problem in a different way, by regarding all the
keywords associated with an entry as a single compound keyword. As long as
all keywords are involved in a search, the implementation is trivial, similar
to that of a single-keyword case. A problem arises when arbitrary keywords
are left unspecified.
Hashing of AZZ Keyword Combinations. A solution for combination search which
is fastest of all, but also demands the maximum amount of memory space, is
to hash the entries for all different combinations of their keywords. Thus,
for n keywords in an entry, there are Lk~l (~) combinations for which copies
of the entry have to be stored in the hash table.
A standard procedure for the definition of a hashing function for compound
keywords is the following. Assume that B(K I ), B(K 2 ), ... , B(K k) are the
binary bit strings corresponding to the keywords KI , K2 , ... , Kk, respectively,
and they all have been defined by some hashing algorithm, having the same
length as the hash address. For the bit string B corresponding to the hashing
function, then, the following expression is used:
(2.33)
Let us once again recall that the usual objective in content addressing has
been to locate all items which match the given search arguments in their
specified parts. This is obviously not the way in which biological associa-
tive memory operates; for instance, for a human being it is possible to re-
call memorized occurrences on the basis of rather vague key information.
On the other hand, a noteworthy feature in the operation of human associa-
tive memory, when compared with the content-addressable memories, is that
it does not list all matching recollections but usually concentrates on that
one which has the highest degree of matching. Associative recall from a bio-
logical memory is probably more akin to the operation of pattern reaognition
devices. In the latter, the representation of an object is given in the form
of a set of simultaneous or subsequent signals. After an analysis of some
characteristics (often named features) of this set of values, a result is
obtained which usually defines the identity of the object, or alternatively,
triggers a response. The classical approach to artificial pattern recogni-
tion is based on operational units named Peraeptrons, the mathematical equi-
valent of which is a disariminant funation. One discriminant function is de-
fined for every category of the patterns to be classified, and it attains a
scalar value when the representative signal values are substituted into it.
The classification decision for a particular pattern is based on comparison
of the relative magnitudes of the discriminant functions.
In this context we shall not be involved further in the problematics of
pattern recognition or biological associative memory; it may suffice to refer
to a recent discourse by the author [2.94]. Instead, since the implementation
of large parallel networks of Perceptrons has technically been shown very
tedious, it may be more interesting in this context to look for the most
effective software implementations of this task. If the features may take on
only disarete values, hash coding provides a very efficient solution, present-
ed in this section. The procedure described here was recently developed by
the author in cooperation with REUHKALA [2.95,96] and it directly implements
a mode of retrieval called proximity searah, or identification of one or a
few patterns which have the highest degree of matching with the search ar-
gument given. The first of the central ideas applied in this method is to
represent a keyword or a relevant portion of an item by muZtipZe attributes
or features derived from it. These features are then regarded as multiple
keywords for the item by which the item is addressable in a hash table. The
second of the central ideas in this method is a procedure for the collection
112
of the same type, would occur close to each other. On the other hand, we
would like to allow double errors in arbitrary combinations.
There is yet another aspect which ought to be mentioned in this connection.
We shall set no restrictions on the words to be handled, and so the strings
of letters can be considered as being stochastically (not necessarily uni-
formly) distributed. Errors are generated in stochastic processes, too, but
it is more justified to assume that their type and occurrence in the strings
has a uniform distribution. Now, contrary to the objectives in error-tolerant
coding, there exists no possibility of designing an identification procedure
for arbitrary erroneous words which would guarantee their identification to
100 percent accuracy. One should realize that errors are inherently undetect-
able if they change a legitimate word into another legitimate word, and so
the identifiability depends on the contents of the keywords, although to a
smaller extent.
Assume now for simplicity that only trigrams are used for features. Their
address in a hash index table may be computed in the following way. Assume
that the alphabet consists of 26 letters. If the letters in a trigram are
denoted by xi_I' xi' and xi+l' respectively, the trigram shall first attain
the numerical value
(2.34)
If H is the size of the hash index table and the table begins at address B,
the division method gives for the hashing function
(2.35)
If the hash table size, especially when using disk memories as auxiliary
storages, is selected to be a prime, the radix 26 used in numerical conversion
seems profitable, whereby the division method is expected to guarantee a rath-
er uniform distribution of hash addresses.
Organization of the Hash Table. With multiple keywords, the chains of reserve
addresses tend to become long; multilevel storages ought to be preferred,
therefore. It is advisable to make the hash table rather large in order to
keep its load factor low. Open addressing with linear probing (modulo page
size) is thereby applicable. The bucket organization is also natural with
secondary storages. It is noteworthy that an additional advantage is provided
by the adoption of buckets: in this example the number of hash addresses was
limited by the number of different triples, being only 17,576. When using
115
bucket organization. the hash table size equa1s this number multiplied by
the number of slots per bucket.
The linked list structure is also natural in connection with bucket or-
ganization. whereby the time of retrieval. which in this method tends to be
significantly longer than in simple hash coding. can be kept reasonable.
If the division method is chosen for the hashing algorithm. the extra
advantage thereby gained is a minimal need of space for the keyword identi-
fier; the significance of this benefit will be appreciated with large tables.
The overall organization is delineated in Fig. 2.21. Every location in
the hash index table contains an identifier for a trigram. and a pointer to
a document area where the original entries can be found. The bucket organi-
Word 1
A, S,C Word 1
Word 2
D, E , F Word 2
Word 3
A, G • H
Word 3
Word Y
Trigram POinter
Identifier
Fig. 2.21. Illustration of the redundant hash addressing method. The letters
A. B•..•• X stand for trigram features. Notice that there is a collision of
feature A. and the next empty address in the hash index table is used as re-
serve location
116
zation is not shown explicitly; as a reserve location, the next empty slot
to the calculated address is used in the illustration. (In the example of
Fig. 2.21, there is a collision of feature 'A'; the colliding item is placed
in the next location.)
A search argument will pick up as many pointers from the hash index table
as there are features in it. As long as at least one of the original features
from the correct keyword exists in the search argument, there will be at least
one pointer to the correct entry. By the nature of this method, however, an
erroneous search argument generates wrong pointers, too. In any case, how-
ever, the maximum number of different pOinters thus produced is the same as
the number of features, or, which is equivalent, the number of letters in the
search argument. All of the candidates indicated by these pointers can now be
studied more carefully. As mentioned earlier, the position of a trigram in
a candidate entry will be compared with that of the search argument, and only
if the relative displacement is not more than, say, two positions, a match is
said to be true.
Alternative Modes of Search. In the continuation, the redundant hash address-
ing system may be developed in either of the following directions.
For one thing, a combination search with several, possibly erroneous,
keywords can be performed. The procedure is essentially the same as that
discussed in Sect. 2.6.2. If there are several search arguments associated
with the same entry, during storage their features are entered into the hash
index table independently, as described above. The retrieving operation con-
sists of several passes, one for each search argument, and every pass produces
a number of pointers in the way described for the principle of redundant hash
addressing. By the application of another hashing algorithm to all of these
pointers, they are entered into the sorting tables Band C (cf Fig. 2.20),
or compared with pOinters entered at earlier passes. When all the search
arguments have been exhausted, the number of remaining candidate pointers has
been reduced and among the pointers finally left there are probably no further
ones than those which identify the sought entries.
Another possible continuation for redundant hash addressing is a statisti-
cal analysis of the features, and the identification of a single entry which
is most "similar" to the search argument. This mode of operation is termed
proximity search. One of its applications exists in the correction of misspelt
words: the search argument has to be compared with reference words stored in
a dictionary (equivalent to the document area described above) and that one
which agrees best with the search argument is picked up. The latter mode of
identification will be expounded below in more detail.
117
in which nx and ny are the numbers of features extracted from X and Y, re-
spectively, and ne is the number of matching features. In the redundant hash
coding method, ne is the number of accepted pointers converging to an entry
in the document area.
When counting the pointers, it will be expedient to construct a match
table at the time of each search. The pointers picked up from the hash index
table by the features are entered into this match table if the corresponding
features in the search argument and the stored keyword match. If the pointer
already existed in the match table, only a count index of this pointer is
incremented. If the number of search arguments is small, as is the case in
general, the inquiry for pOinters stored in this table can be made by linear
search. For ultimum speed, this table may also be made hash addressable.
The efficiency of this method has been tested with two fairly large dic-
tionaries of keywords. One of them consisted of 1021 most common English
words [2.97]; the second dictionary had only surnames in it, picked up from
World ~'s ~ in Science [2.98]. In the latter case, some transliteration
was performed to fit the English alphabet. All possible types of single and
double errors were investigated; the single errors were made to occur in each
character position of each word, but the double errors were generated at
118
random with a probability which is uniform for every possible erroneous let-
ter and for every position. The statistics used in these experiments was
rather large: approximately 50,000 realizations for erroneous search argu-
ments were applied for each of the dictionaries. The results, expressed in
percentages, are shown in Tables 2.11 and 2.12.
Word Deletions 1 a a 1 1 a 2 a a
length Rep 1acements a 1 a 1 a 1 a 2 a
Insertions a a 1 a 1 1 a a 2
3
Depth 1 96 - 39
Depth 2 97 - 42
4 Depth 1 89 93 98 - 83 47 - 75
Depth 2 93 95 98 - 84 49 - 80
5 Depth 1 88 98 99 23 64 74 19 46 93
Depth 2 96 98 99 37 71 80 26 51 97
6
Depth 1 95 99 99 46 90 93 25 75 97
Depth 2 96 99 99 65 95 97 53 79 97
7 Depth 1 96 99 99 57 94 98 26 91 99
Depth 2 96 99 99 77 95 98 55 97 99
8 Depth 1 98 100 99 76 98 99 36 97 99
Depth 2 98 100 99 92 98 99 64 98 99
9
Depth 1 98 100 99 92 98 100 67 99 100
Depth 2 98 100 99 95 98 100 87 99 100
10
Depth 1 98 99 100 93 98 100 81 98 100
Depth 2 98 99 100 93 98 100 84 98 100
119
Word Deletions 1 a a 1 1 a 2 a a
length Replacements a 1 a 1 a 1 a 2 a
Insertions a a 1 a 1 1 a a 2
3
Depth 1 95 - 36
Depth 2 95 - 36
4 Depth 1 98 96 99 - 83 50 - 80
Depth 2 99 98 99 - 84 50 - 83
5
Depth 1 95 99 99 1 38 70 75 33 51 95
Depth 2 97 99 99 1 45 72 80 35 54 98
6
Depth 1 99 99~ 100 63 95 92 52 78 98
Depth 2 99 99 100 73 98 98 67 81 99
7
Depth 1 99 1 99~ 100 69 98 98 50 93 99 1
Depth 2 99 1 99 100 85 99 99 76 98 99 1
A B C o E blank
1 5 2 Portal register
2 3
3 8 4
4 1 data(CADI
5 6
6 7 1 data (ABI
7 1 data (ABEl
8 9 1 data (CABI
9 1 data (CABAl
10
The TRIE system consists of a number of registers, each one capable of hold-
ing 27 pointers. The value of every pointer is 0 before the pointer is set
during storage operations. The number of registers depends on the number of
entries to be stored. The first of the registers, indexed by 1, is the portaL
register which corresponds to the first letter of every keyword. Assume
that the word 'CAD' has to be stored. The pointer in the portal register
corresponding to 'C' is set to point at register 2, which happens to be the
first empty one. Into its 'A' position, a pointer to register 3 is set. As-
sume now that the end of a keyword is always indicated by typing a blank. If
the next character position is always studied when scanning the letters, and
if a blank is found, the letter in question is known to be the last one.
Therefore we write a pointer 4 (next empty location) into the '0' position
of register 3, but in register 4, the pointer in the 'blank' field contains
the terminal marker, in FREDKIN's scheme pointer 1. The seventh field in
every register may be used to hold a pointer to an associated entry in a
separate document area. This completes the writing of the first entry.
The storage of further items in the TRIE now divides into two different
procedures depending on whether the new keyword has a beginning which has
occurred earlier or not. Consider first the latter case, and let the new
keyword to be stored be 'ABE'. The fact that 'A' has not occurred before can
be deduced from the 'A' position in the portal register, which up to this
point was O. This pointer is now set to point at the next empty register
which is 5. The pointer in the 'B' position of register 5 is set to 6, and
the 'E' position in register 6 is set to 7. Again, a pointer to the portal
register and to the document area, corresponding to the entry associated
with the keyword 'ABE', are set in register 7. All operations needed to store
the second item have now been carried out.
Let the next case to be considered have the first two letters in common
with an earlier keyword, e.g., let the new keyword be 'CAB'. An inspection
of the portal register shows that 'C' has occurred before, and following the
pointer found at 'C', register 2 is examined next. As 'A' is also found, to-
gether with a pointer to register 3, then register 3 is subjected to exami-
nation. The letter 'B', however, was not found and therefore, the 'B' loca-
tion can be used to store a pOinter to a new empty register which is 8. The
pointer in the 'blank' location of register 8 is set to 1, and a pOinter to
the entry associated with 'CAB' is simultaneously set.
The measures for the storage of an entry with a keyword which totally
contains an old keyword and has extra letters in addition, as well as the
case when the new keyword is identical with the front part of an old keyword,
122
Historical. It seems that the possibilities for hash coding were realized
soon after the introduction of the first commercial digital computers. In
1953, the basic idea appeared in IBM internal reports: Luhn was the first to
suggest a hashing method which resolved the collisions using buckets and
chaining. An algorithmic method for the definition of reserve locations is
due to Lin. The idea of linear probing was introduced by Amdahl in 1954
during a project in which an assembler was developed for the IBM 701 computer.
Other members of the team were Boehme, Rochester, and Samuel. Independent
work was done in Russia where ERSHOV [2.68] published the open addressing,
linear probing method in 1957.
A few extensive review articles can be found from the early years. DUMEY
[2.116] in 1956, while discussing indexing methods, also introduced the di-
vision method of hashing. An extensive article, describing the application
of hash coding in large files occurring in practice, was published by PETERSON
in 1957 [2.60]. This work contained many extensive analyses of the effect of
bucket size on the number of searches. Hashing functions were studied by
BUCHHOLZ in 1963 [2.61].
Books. The principles of hash coding can be found in some books written either
on systems programming or on data management. The monograph on compiling
techniques by HOPGOOD [2.52] contains a short section on the principles of
hash coding. There is a somewhat longer review which also contains many
statistical analyses in the book of KNUTH [2.1]. In the book of ~~RTIN [2.2],
topics of hash coding have also been discussed.
Review Articles. In addition to those writings mentioned above, there are a
few review articles that can be mentioned as a reference. One of them was
written in 1968 by MORRIS [2.22], and besides an introduction to the basic
principles, it also contained answers to many detailed questions concerning
the relative merits of different solutions.
More recent review articles have been written by SEVERANCE [2.114], SEVE-
RANCE and DUHNE [2.55], MAURER and LEWIS [2.117], as well as SORENSON et al.
[2.118]. A thorough comparison of the hashing functions can be found in the
articles of LUM et al. [2.29] and KNOTT [2.119].
Developments in Hash-Coding Methods. A few recent studies of the hashing
methods may be mentioned. The influence of the statistical properties of
keywords on the hashing method has been considered by BOOKSTEIN [2.120],
DOSTER [2.121], and SAMSON and DAVIS [2.122]. An analysis routine for search
124
methods has been designed by SEVERANCE and CARLIS [2.123]; optimal or im-
proved table arrangements have been discussed by YUBA and HOSHI [2.124],
RIVEST [2.125], and LITWIN [2.126,127]; the qualities of some hashing al-
gorithms have been considered by AJTAI et al. [2.128], GUIBAS [2.129], GUIBAS
and SZEMEREDI [2.130], LYON [2.131], KRAMLI and PERGEL [2.132], FORTUNE and
HOPCROFT [2.133], as well as THARP [2.134]. A special dynamic hashing method
has been developed by LARSON [2.135]. So called extendible hashing is suggest-
ed by FAGIN et al. [2.136] . The partial-match problem has been handled using
projection functions by BURKHARD [2.137]. In multikey searching, the statisti-
cal principal component analysis has been applied by LEE et al. [2.138].
Various applications are described by HILL [2.139], GRISS [2.140], Ir4AI et al.
[2.141], WIPKE et al. [2.142], HODES and FELDMAN [2.143], LEWIS [2.144], and
NACHMENS and BERILD [2.145]. Differences between files have been discussed by
HECKEL [2.146], and concurrency in access by GRONER and GOEL [2.147].
Special Hardware for Hash Coding. The worst handicap of usual hash coding
methods, namely, the sequential access to memory can to a great extent be
relieved by special hardware. It is possible to control a multi-bank memory
by a set of hash address generators and some auxiliary logic circuits thereby
facilitating parallel access to many locations. Such systems have been pro-
posed by GOTO et al. [2.148] as well as IDA and GOTO [2.149]. Fast hashing
operations for this hardware are described by FABRY [2.150].
Chapter 3 Logic Principles of Content-Addressable Memories
operands must be read and written serially, i.e., one at a time, which con-
sumes a great deal of time; many large problems could better be handled in
parallel, i.e., simultaneously over a set of variables. 2) With increasing
memory capacity, the length of address code increases, with a result that
more space must be reserved in each memory location which contains an ad-
dress reference. For this reason, instead of addressing the whole memory
system directly, it is more customary to divide the memory space of large
computers into several smaller banks which are accessed individually, there-
by using indexing of the banks in machine instructions, as well as relative
and indirect addressing for the reduction of the address range handled with-
in a program. Then, however, for the determination of the absolute addresses,
several auxiliary arithmetic operations must be performed with each memory
reference.
In order to operate on many memory locations simultaneously, and to simpli-
fy the searching of the operands, it would be very desirable to base the com-
putations on content-ad~ssabZe memories, and during more than 20 years,
many attempts have been made for their development. Content-addressable me-
mories would be especially advantageous from the point of view of high-level
programming languages which refer to their operands and procedures by sym-
bolic names and other identifiers. The computer architectures would probably
be quite different from those currently applied if highly parallel computa-
tions, distributed allover the data files were simply and directly implement-
able by hardware. Unfortunately, large content-addressable memories have not
been realized; when it comes to direct access to data on the basis of their
symbolic representation, i.e., by names or alphanumeric identifiers, it seems
that software methods as those discussed in Chap.2 have mostly been resorted
to. No parallel processing, however, is involved in software content addressing.
Nonetheless, hardware content-addressable memories have, in fact, been
used as special parts in computing systems, in certain organizational solu-
tions whereby the CAMs can effectively perform fast buffering, checking, and
bookkeeping operations needed in the moving and rearrangement of data. In this
way, although the memory system as a whole is not content addressable, the
CAM devices can effectively contribute to the speed of usual operations by
making the operands more directly available to the processing circuits. Two
such applications, the virtuaZ memory and the dynamic memory aZZocation will
be discussed later in Sects. 5.1,2, respectively.
One has to realize further that a substantial part of automatic computa-
tions, and the majority of all installed computing devices are concerned with
pure searching and sorting tasks which in fact comprise the main bulk in ad-
127
x @ Y = (x A Y) V (x A y) (3.2)
( 3.3)
Note that when the j:th bit is masked. then mji is identically 1 because
ci 1. The masked mismatch in the i:th bit position is indicated by the
129
Boolean function
(3.4)
(3.5)
n
V
i=O
mji (3.6)
(3.7)
comparisons are carried out serially (cf Sect. 3.4), then the sets Gj and
Sj are represented by two machine variables gji and tji' respectively, which
are updated during the process of recursion. Consider the following recursive,
simultaneous expressions:
(3.9)
are defined:
(3.10)
with gjO = ~jO = O. As before, when all digits have been exhausted, the last
values of gji and ~ji indicate the magnitude relation. This result may be
justified in the following way. If the bits a i and Sji agree, then gj,i+1 =
gji and ~j,i+1 = ~ji· However, if they disagree, gj,i+1 becomes 1 if Sji = 1
and a.1 = 0, but zero if S·l·J = 0 and a.1 = 1. Similarly ~ J,l
.. +1 becomes 1 if
s··
Jl = 0 and a.
J = 1, but zero if s ..
Jl = 1 and a.
1
= O. After recursion, S.J can
be found greater than A if the last values are gj,n+l = 1, ~j,n+l = 0 and less
than A if g.J,n +1 = 0, ~.J, n+l = 1. Equality of S.J and A is due if g.J,n +1
~j ,n+l = o.
A paradigm of logic circuits, the so-called iterative airauit may be used
to implement (3.9) by hardware; its block scheme is shown in Fig. 3.1. The
bit values to be compared together with their inversions are available as
the contents of two registers, and the rectangular blocks situated between
the respective bit positions contain identical logic circuits, defined by
(3.9). Results from the recursive operation are then represented by signals
propagated to the right, and the outputs of the last stage indicate the final
result of comparison.
9jO
1j.n+1 =0
Fig. 3.1. Iterative logic circuit for the parallel comparison of magnitudes
132
shift register
shift register
The FS circuit is a combination logic circuit with the following truth tables
(Table 3.1).
ai s ., 0 1 a. Sji 0 1
Jl 1
0 0 0 1 0 0 0 1
0 1 1 0 0 1 1 1
1 0 1 0 1 0 0 0
1 1 0 1 0 1
133
The comparison of the search argument with all stored words is in principle
based on either the equivalence function (3.1), or the mismatch (3.2). In
practice, it is most economical to provide every bit of the search argument
with its logical negation, and these values are then broadcast in parallel
along pairs of lines to respective bit positions of all storage locations.
The double lines can also be controlled in a way which directly implements
the masking function in (3.3) or (3.4), as described below. The results of
bit comparisons in a word have to be collected by an AND or NOR circuit with
very high fan-in (number of inputs). Implementation of this function by a
logic gate is not a practicable solution. One way to circumvent this problem
would be to compute the NOR function recursively as shown in Fig. 3.3a, where-
by an arbitrary number of bit cells can be chained. A drawback to this solu-
tion is a delay in the cascaded stages which drastically increases the access
134
Bit direction -
11 11 1
Aj~;~~'_----------------~~~------------~~~~
iiii jo---ll-+-+-+------------------+-+-1f--------,
Word
direction
TT Bit storage
TT T
(a)
T '1'
Bit direction -
Aj~~~._----------------+4~--------~~r--_+T----
~
f" 1" i Mj
Word
direction
(b)
T1 Bit storage
T
Bj
Fig. 3.3. (a) Bit cell of an all-parallel CAM, with iterative circuit imple-
mentation of word comparison (mismatch, M.. ) and addressed readout (B .. ).
lJ lJ
(b) Bit cell of an all-parallel CAM, with Wired-AND implementation of word
comparison (match, Mi ) and addressed readout (ITj )
135
time. A much better alternative, whereby high speed and also rather good
electrical characteristics can be achieved, is the so-called Wired-AND func-
tion wich means that the output circuit of the bit logic is equivalent to a
switch connected between the word output line and the ground potential. If,
and only if, all the parallel switches are open, the word line potential is
able to rise high, whereas a single closed switch is able to clamp the word
line to the ground potential.
Figure 3.3b shows the complete circuit of a standard commercial CAM bit
cell. Besides the comparison logic, every bit cell contains provisions for
addressed reading and writing of data.
All of the logic gates shown in Fig. 3.3b are standard NAND gates except
those which are connected to the Mi and Bj lines which in addition have to
perform Wired-AND logic. For this purpose they must have a suitable output
circuit, e.g., the open-collector circuit if bipolar transistors are used.
Writing. Gates G3 and G4 form the bit-storage flip-flop which is set and
reset by the signals Wj(l) and Wj(O), through gates G1 and G2 , respectively,
under the condition that the address line signal Ai is high. The output of
G3 represents the stored bit value, and it will be set to 1 if Wj(O) = 0,
Wj(l) = 1, and Ai = 1. The value 0 is set at the signal values Wj(O) = 1,
Wj(l) = 0, and Ai = 1. If in a writing operation a bit value of the addressed
word shall be left unaltered, the writing can be masked out by holding Wj (l)
Wj(O) = O.
Reading. For addressed reading of the bit storage it is only necessary to
hold the Ai line high, whereby G5 directly mediates the inverted bit value
to the ~. line, which is common to this bit in all words. Content-addressable
J
reading, taking into account masking, is done using the lines Cj(O) and Cj (l)
which pass all words at the same bit position. Then it is set Cj(O) = 0,
Cj (l) = 1. If now the stored bit was 1, the output of neither G6 nor G7 is
low (both output switches are open) with the result that these gates, when
connected to the Mi line, allow it to rise high. A similar condition is ex-
pected if the search argument has the value 0, whereby Cj(O) = 1, Cj (l) = 0,
and if the stored bit is O. On the other hand, if the search argument bit
disagrees with the stored bit, then the reader may convince himself that
either G6 or G7 will switch down. This is sufficient to keep the potential
of the Mi line low, and it is seen that a single mismatch in one of the bit
positions is able to give the value 0 to the Mi signal. When this bit posi-
tion is masked in the comparison operation, it will be sufficient to hold
136
Consider a rectangular m-word, n-bit CAM array built of bit cells of the
type shown in Fig. 3.3b. The basic searching task is to locate all stored
words in the array which match with the search argument in its unmasked
portions, whereby the C.(O) and C.(I) lines are controlled in the way which
J J
defines the masked search argument. A normal situation is that several Mi
lines yield a response. The contents of these words must then be read out in
full, one at a time. It is for this purpose that the CAM has a provision for
addressed reading. For the sequential readout of all responding words, how-
ever, some problems discussed below must be solved. First of all there must
exist some sort of queue serving organization which is able to pick up only
one response at a time, usually in the top-down order, and to subsequently
reactivate the corresponding word location, this time using the address line.
The selected word then appears at the ~j lines, albeit with every bit comple-
mented. The addressed readout operation will be repeated for all responding
words.
There exist two principally different strategies for queue serving. For
both of them it is necessary to provide every Mi line of the CAM array with
a buffer flip-flop. Another necessary circuit is some kind of priority resolver
which shall single out one of the multiple responses. The flip-flops together
form the response store, and the latter combined with the priority resolver
is named mUltiple-response resolver.
In the first queue serving strategy, the parallel equality comparison in
the CAM array is performed only once. The priority resolver displays the
uppermost response at the respective output line, whereby the corresponding
word can be read out. After that, the uppermost active buffer flip-flop is
reset whereby the response which is next in order appears uppermost in the
response store and can be served, and so on. In the second strategy, all
flip-flops except the uppermost "responding" one are reset to leave only one
response, whereby information about the lower responses is lost. The corre-
sponding word can then be read out immediately. This location in the memory
must now be marked in some way to show that it has been processed, for in-
stance, using a "validity bit" in the word reserved for this purpose. io
continue, new content-addressable reading operations must be performed to
find the rest of the matching words which are yet unprocessed.
137
1---_--_----- 00
M j ; match bit
Ij ; inhibit signal
OJ; output signal
0, -{) ; NOToperation
(a)
Select
first Set
I--__+-- 01
(b)
Ni,
r
Fig. 3.4a,b. Multiple-response resolver: a) with nondestructive "select first/
next" control with JK flip-flops, b) with destructive "select first" control
138
In view of the fact that plenty of buffer flip-flops are usually needed,
the solution shown in Fig. 3.4b, which implements the second strategy, might
look more attractive because simple bistable circuits can be used whereby
the total costs are reduced. Then one can permit the repeated content-ad-
dressable readout operations necessary in the second strategy, especially
since they are executable almost as fast as the resetting operations in the
response store. Notice that the iterative OR network at the output produces
only an output which is called inhibit vector: all of its output signals
below the uppermost response are 1 and they are used for resetting the flip-
flops.
It should be realized that the sequencing control is different in these
two cases. In Fig. 3.4a it is very simple: the first response is automatical-
ly displayed at the outputs, and the "Select next" control is used only to
reset the uppermost "responding" flip-flop. The "Select first" control in
Fig. 3.4b, on the other hand, is applied after content-addressable reading,
to single out the uppermost response and to reset all the lower flip-flops.
The subsequent marking of the validity bit, resetting of the response store,
and repetition of content-addressable reading operations need some central
control not shown in the diagram.
In both of the above methods, the last stage in the cascade produces an
extra signal which indicates that there is at least one response. This in-
formation is necessary in some extremal-searching algorithms (cf Sect. 3.4.5).
A drawback of the cascaded networks is that with a large number of storage
locations it takes an appreciable time for the iterative logic to settle down
to its final state, because of the propagation of signals through all the
gates. For example, if the delay per stage were 10 ns, it would initially
take 10 ~s to search through a 1000-word store although this time decreases
with lower bits. For this reason, a circuit, the upper part of which is shown
in Fig. 3.5, was suggested by FOSTER [3.2] for the generation of the inhibit
vector to speed up priority resolution. This circuit must be used in connection
with the principle of Fig. 3.4b. It uses the idea of a binary tree, with the
root on the right; for more details, the reader should consult the original
article. It has been shown by FOSTER that if there are 2n words in the memory,
the worst-case delay is only 2n-1 gate delays. On the other hand, the number
of gates in the circuit is only 3 • 2n- 1 - 1, or approximatel¥ 1 1/2 gates
per word. In a one-million word memory with 10 ns gate delay, the worst-case
total delay would be only 400 ns.
139
M'
3
M.j------/
Modular Tree Organization for the Priority Resolver. The structure of the
circuit of Fig. 3.5 is irregular and thus not very suitable for fabrication
by the LSI technology. Another method which conveys a similar idea but is
implementable by standard modules is due to ANDERSON [3.3].
Consider the block depicted in Fig. 3.6a. Without the IA ("inhibit all")
signal it could, in fact, be a simple generator of the inhibit vector, and
being a one-level circuit, it is yet faster than the construct shown in Fig.
3.5. Its drawback is that the number of inputs to the OR circuits increases
linearly with the number of words, which makes this solution impractical for
large memories. For CAMs with a small number of locations this might be ap-
plicable as such, however (cf, e.g., the buffer memory organization dis-
cussed in Sect. 5.1).
Mo _.l..-.-_ _ _--,
11
M1
Some
response
(al (bl
~-----------------~
I
I
I
I
I
~ _________________ J Address
LSB MSB
Clock
Address register
Set
Reset
From
response
storage
- - - - 0 NOT operation
The block scheme of a complete CAM system is shown in Fig. 3.9. Its central
parts are the CAM array, the search argument register, the mask register,
the response store, which consists of a flip-flop for every Mi line, the
multiple match resolver, and some auxiliary circuits needed for interconnec-
tion and control.
The mask bits explained in Sect. 3.2 are stored in the mask register which
together with the search argument forms the Cj (l) and Cj(O) signals. During
writing this unit forms the W.(I) and W.(O) signals, respectively, whereby
J J
the (masked) search argument will be written into a word location defined by
the address code given at the input of the address decoder.
The output lines of the multiple-match resolver could in principle be
connected directly to the address lines of the associative memory, whereby
143
Read/Write control
Searcharg. reg.
Mask register
External
address ~
t;~
*f
:¥
I~
~
Word output
the complete contents of all matching words can be read one item at a time.
In practice, CAM arrays are normally made by integrated circuit technology,
and for the addressed reading and writing there is always a built-in address
decoder. For this reason the output from the multiple-match resolver must be
enaoded, whereby on ly a sma 11 number of fe.edback 1i nes is needed, as shown
in Fig. 3.9.
It is also possible to have both content-addressable and usual memory
arrays in the same memory system. This type of memory might then be used as
a simple CataZog Memory the main features of which were shown in Fig. 1.1.
The left-hand part is a normal all-parallel CAM intended to hold keywords,
and the right-hand subsystem is an addressable (linear-select) memory for
the associated items. If the catalog memory is used to represent symbol tab-
les, then it may be assumed that all keywords in its content-addressable
part are different. During search, all entered keywords are compared in par-
allel with the search argument. If all bits of the stored keyword match with
those of the search argument, then the corresponding output line is activated.
This line again acts as the address line in the linear-select memory. Notice
that entries can be stored at arbitrary locations, for instance, in the order
of their declaration.
various Uses of Masking. The provlslons for masking the search argument can
be applied in several operations. The most important of these may be in the
144
searching on selected attributes which are parts of the search argument. The
words in the content-addressable memory may be composed of several fields
describing different attributes, and a subset of these can be addressed by
unmasking the search argument correspondingly.
Another important application of masking is the writing of new data into
memory Zocations which happen to be vacant. Because a list of the occupied
positions usually cannot be maintained, the empty places must be found auto-
matically. For this purpose there is an additional vaeaney indieatop bit in
each word which is comparable to a data bit. In an empty location this bit is
initially 0, and it is marked 1 when an item is entered. After deletion of
the entry this bit is again reset to o. The vacant places can now be found
by masking all other bits except the vacancy indicator bit, and performing
an equality search. The vacant location into which a word is written is then
determined using, e.g., the multiple-match resolver.
The third possible use of masking is in the magnitude seapch on numerical
entries explained in the next subsection.
The all-parallel CAMs usually do not have logic circuitry which would direct-
ly implement the magnitude comparison, e.g., by the algorithm (3.9) or (3.10).
As a search on the basis of magnitude comparisons, or magnitude seapch as it
is usually called, may become necessary in some systems which have an all-
parallel CAM, it may be useful to show how the organization can be modified
simply for this mode of searching.
The pesponse stope which is included in the organization shown in Fig.
3.9 can now be utilized effectively. Let us recall that its contents are
normally reset to 0 before usual searches. Since the following searching
algorithm is made in several passes, it is necessary only to leave the values
obtained in each pass in the response store where the final search results
are accumulated. Initially all bits of the response store are set to 0, and
every search operation changes the value to 1 if a match is found. If the
bit value is 1 after the previous pass, a value 0 (mismatch) obtained on the
match line in a new searching operation cannot reset the value to 0, however.
It is further assumed that the CAM is associated with, say, a computing
device such that the search argument bits can be scanned and changed auto-
matically, and the mask can be set depending on the bit values.
The follo~ling algorithm is based on an idea of CHU [3.15].
145
AlgoPithm Which Finds all Words Greater than the Search Argument:
1) Set all bits in the response store to O.
2) Scan the search argument from left to right and choose the first 0 for a
target bit.
3) Change the target bit to 1 and mask away all bits to the right of it,
thereby forming a new search argument.
4) Perform a search on the equality match condition relating to the new
search argument.
5) Set those bits in the response store to 1 which correspond to matching
words.
Comment: It is easy to deduce that those words which correspond to l's
in the response store are certainly greater than the original search ar-
gument, whereas nothing can be said of the other words until this algo-
rithm stops.
6) Change the target bit to 0, retain all bit values to the left of it in
the search argument, and scan the search argument further to the right.
If no O's are found, or the bits of the search argument have been exhaust-
ed, stop. Otherwise choose the next 0 for the new target bit and repeat
from step 3.
This algorithm leaves l's in the response store at all words which are greater
than the search argument.
Example:
Below are shown a search argument, the contents of a CAM, and the contents
of the response store after the two searching passes corresponding to the
two target bits.
T1 T2 Response store
Search argument 1 0 1 1 0 Pass 1 Pass 2
Word 1 1 0 1 1 1 0 1
Word 2 0 1 0 1 0 0 0
Word 3 1 1 0 1 0 1 1
Word 4 1 0 1 0 0 0 0
Word 5 1 1 0 0 0 1 1
Word 6 1 0 1 1 0 0 0
In order to find all words which are less than the search argument, the
above algorithm is modified by changing the bit values 0, 1, and 0 mentioned
at steps 2, 3, and 6 into 1, 0, and 1, respectively.
Because magnitude search and searching on more complex matching condi-
tions are normally made using special hardware provisions not provided in
the above design, the discussion of other searching modes is postponed to
Sect. 3.4 where it is carried out naturally in the context of content-ad-
dressable processing with word-parallel, bit-serial CAMs.
Bit-slice
address
Word
input
Decoder
Word
output
Fig. 3.10. a) Linear-select
(al (bl
RAM module.b) Application of the
RAM module to bit-slice addressing
results from the various bit slices. If a comparison for equaZity were the
only operation to be performed, then it would be necessary only to form
somehow the logical products of bit comparisons as defined by (3.3); results
from the successive bits can be collected recursively as indicated by (3.7).
The comparison of words with the search argument is most often done in
word-parallel, bit-serial CAMs with the aid of a resuZts storage which in this
case is very simple. (More complex recursive operations are described in
Sect. 3.4.5.) The results storage consists of a set of flip-flops, one for
every word location, which is set to 1 from a common line before searching,
and which is conditionally reset by an EXOR gate taking one input from the
memory output, and the other from the search argument register as shown
in Fig. 3.11.
J
Decoder
~-
I
-------- -,
·
I ;,-
··
"
~f-
I
r---I I r---
Memory ~EXORt-
array
·· f-
·
I---
I---
I---
'--
Buffer
flip-flops,
__________ J
Fig. 3.11. Control organization of the
Results storage resul ts storage
The idea is to reset the flip-flops in the results store whenever a bit value
read from the memory array disagrees with that of the search argument. The
schematic principle shown in Fig. 3.11 does not take into account the asyn-
chronism of the bit values to be compared; for a practical solution, some
extra circuits for the resetting operation are necessary. When all bit slices
have been read, there are ones left only in such flip-flops which correspond
to completely matching words.
For a complete search in the array, only n reading cycles have to be per-
formed. This is usually considerably fewer than the m cycles needed in linear
search.
It is to be noted that the word bits can be read for comparison in any
order, by the application of a corresponding address sequence at the decoder
149
Read Write
bit slice bit slice
Read word
Write word
, ~Bitslice
;;------+---+-----------t---j~ bit out
Word
bit out
Fig. 3.12. Modified RAM bit cell for reading and writing by words as well
as by bit slices. One of the write control lines is the address selector,
another the data bit to be written, respectively [3.15]
150
When compared with the principle of Sect. 3.4.1, the CAM arrays described
here need somewhat more hardware; they contain m decoders, and each of them
must receive a different address code computed by some functional circuits.
The searching principle applied in the two designs described below is in
general termed skew addressing, or originally, skewed stopage techniques
[3.16,17]. The corresponding hardware may be named skew netwopk. The meaning
151
5 0 •2 ".1 5 2 •0
1 1
,
1
Adder Skew Network. Every address decoder in this design (Fig. 3.15) is pro-
vided with an arithmetieaZ addition eireuit (full adder), capable of forming
the sum of the externally given address code and a constant (wired-in) number
which corresponds to the number of the row. This sum is taken modulo m where
m is the number of bits. By a proper control, the adders can be disabled,
i.e., a replica of the bit-slice address is then obtained at their outputs.
Data read out and written into the memory is buffered by a memory register,
capable of performing end-around shifting operations by a number of steps
corresponding to the bit slice address. Shifting can be in the upward or
downward direction depending on a particular access operation.
Address
Decoder
~
'lii
Decoder '0.
~--------------~~-, =
~
:.EI
Data I/O
CII'
Consider first writing of a bit slice into the memory. The two's com-
plement of the bit-slice address is switched to the adder inputs, the addition
circuits are enabled, and the data to be written are transferred into the
memory register where they are shifted downwards end-around by an amount corre-
sponding to the bit-slice address. By a writing command, this slice is trans-
ferred into proper bit storages of the memory array; for instance, if the bit
position was 1, the map of the slice would be that indicated by the heavy
borderlines in Fig. 3.14.
Reading of a bit slice is done by giving the two's complement of the bit-
slice address, enabling the adders, and performing a reading operation whereby
the slice is transferred into the memory register. The data obtained, however,
are still in a permuted order, and a number of upwards shifting operations
(end-around) must be performed to align the contents of the slice.
writing of a word proceeds by first giving the word address to the adder
inputs, disabling the adders, and transferring the data into the memory re-
gister where they are shifted downwards by the amount given in the word ad-
dress. A writing command then transfers the data into proper bit storages in
the memory array. It should be noted that word i will thereby be written into
column i of the memory array, with its contents rotated by an amount corre-
sponding to the word address.
Reading of a word is an operation inverse to the previous one. The word
address is presented at adder inputs, the adders are disabled, and a reading
command is given. Contents of the memory array corresponding to one original
word appear in the memory register in which it is shifted upwards by an amount
corresponding to the word address.
EXOR Skew Network. Another skew addressing principle due to BATCHER [3.17]
in which the adders are replaced by simpler EXOR gates is presented in Fig.
3.16. The main idea is that when the address mode register contains mere
zeroes, the writing and reading occurs by bit slices, and when the register
is full of ones, writing and reading by words is done. The memory mapping
is less lucid than in the adder skew network, but can be followed easily
with examples. It can be shown that for any contents of the address mode
register the memory mapping is one-to-one and thus reversible. In the EXOR
skew network, it is further possible to define other slices than by bits or
words. This becomes possible by having special codes in the address mode
register. For instance, a k-bit field in every kth word (with k even) may be
addressed.
A special permutation operation at the I/O port of this network is needed:
if w is the index of word to be written and b its bit position, then the per-
muted bit position (row in the memory) is w mb.
153
01 7 01 7
..::I Decoder I
I ~
Decoder I
I c:::=
Decoder I
r t:::::::::
EXOR
bank
I
~
I Ackiress
mode reg.
I~ Fi g. 3.16. EXOR sk ew network
Search argument
Mask register
Word 0
Word 1
Shift_......----<~_ _W=or:.:d~n~_J
Fig. 3.17. Shift register imple-
.
Shift register
mentation of the word-parallel,
bit-serial CAM
Besides the above methods which are based on standard random-access memories,
there is another principle according to which a word-parallel, bit-serial
content-addressable memory can be built of standard components. In this so-
lution, delineated in Fig. 3.17, all memory locations are either shift re-
154
gisters or cheap dYnamia memories such as the CCD discussed in Sect. 4.3.4.
The search argument and the mask word are stored in separate registers simi-
lar to the storage locations. The contents of every location, including those
of the search argument and mask word registers, are made to recirculate in
synchronism through the results store, in which the masked search argument
is compared in parallel with all storage locations, one bit position at a
time. In this solution, too, a results store is needed the contents of which
are evaluated recursively during the recirculation.
In this latter type of memory, the words may be stored with either the
most significant or the least significant bits to the right, depending on
the comparison algorithm applied (cf Sect. 3.2). However, there is no possi-
bility of skipping any bit positions as might be done in the first version by
special bit slice addressing.
DQ~
C -
fi~
It should be noted that all words which exactly match the search argument
are indicated by the values gjO = ~jO = 0 after searching.
It is a straightforward task to design a switching circuit which connects
relevant signals from the results store into a multiple-match resolver to
carry out the readout operations as shown in Fig. 3.8.
The ResuZts Storage of STARAN. One of the largest content-addressable parallel
computers implemented is the STARAN of Goodyear Aerospace Corporation. The
central memory in it is based on the EXOR skew network principle (cf Sect.
3.4.2). The processor system will further be described in Sect. 6.5.1; its
basic CAM is expandable in blocks of 256-word by 256-bit arrays up to a capa-
city of 8 K words. Let the memory here appear as a normal word-parallel, bit-
serial CAM.
In the results storage of STARAN there are two sequentially operating inter-
mediate results stores, with bits named Xi and Vi' respectively. There is a
common register F with flip-flops Fi which may appear as a search argument
register (cf Sect.6.5.1). Further there exists a set of flip-flops Mi which
can be used, e.g., for masking out words during writing; in this way it is
possible to select a subset of locations which become subject to parallel
computation.
The two intermediate results stores Xi and Yi have the following specified
roles: Xi can be switched to many modes of sequential operation and it is the
primary results store. Yi is an auxiliary intermediate results store which
has a simpler sequential control. It should be pOinted out that although
STARAN operates on bit slices, these can be picked up in any order from the
CAM array by stored-program control (machine instructions). Accordingly, the
recursive operations described below are not synchronous on pairs of bit val-
ues as in the magnitude search example given above; it may be more proper to
imagine that the results stores acquire new values in synchronism with machine
157
(3.11)
where oP1 and oP2 are any Boolean operations referring to two variables. It
may be generally known that there exist 16 different truth tables for two
Boolean variables, and any of these can be defined in the machine instruc-
tions by which Xi and Vi are controlled.
An Example of the Use of Results Storage for the Implementation of Addition
over Specified Fields. Assume a content-addressable array with four fields,
corresponding to variables Vi' Ai' Bi' and Si in every word, with i the word
index. The purpose in this example is to form the sums Ai + Bi and to store
the results in the Si fields. This addition shall be conditional on sums
being formed only in words which have a Vi value matching with the search
argument in the corresponding field.
In this particular example [3.18], the function Fi oP1 Xi is defined to
be identical to Fi' and for Fi oP2 Vi' Fi ~ Vi (EXCLUSIVE OR) is taken.
Denote the operands A.1 = (a.1,n- l' a.1,n- 2' ... , a·1O)' B.1 = (b.1,n- l' b.1,n- 2'
... , biO), and Si = (si ,n-1' si ,n-2' ... , siO)' The addition of Ai and Bi
commences with the least significant bits. During a bit cycle, Vi shall form
the sum bit and Xi the carry bit, respectively.
The first step is to load the Mi flip-flops, by performing a content-
addressable search over the V.1 fields (with A., 1 B.,1 and S.1 fields masked off).
The Mi are thereby set according to the responses. For the initial values,
Xi = Vi = 0 are set. The first bit slice, the set of all aiO is read into
the Fi flip-flops by having 1 in the search argument at this position and
masking off all other bits. According to (3.11), Xi then attains the value
o and Vi + aiO' Next the bit slice bi~ is read into the Fi flip-flops where-
by Xi + aiO A biO' Vi + aiO ~ biO' Because there was no previous carry, a new
carry (Xi = 1) is correctly obtained if, and only if, aiO = bi~ = 1, and Vi
is formed as the mod 2 addition of aiO and bi~ which is correct, too. This
value is written into the bit slices siO on the condition that Mi is 1.
Finally, Vi + Xi is set.
The next bit positions, j = 1 through n-1, are operated in the same way.
In the first step the bit slice a .. is read into F., and in the second step
1J 1
158
V.+V.i1a .. (3.12)
1 1 1J
(3.13)
The writing of Vi back into the bit slice Sij proceeds next conditionally on
the value Mi = 1. As the third mapping,
V.
1
+ X.
1
(3.14)
V1· + a .. iI b .. iI c. 1 (3.15)
lJ lJ J-
These equations are known as the equations of a full adder, i.e., Xi repre-
sents the new carry and Vi the sum digit, respectively. Since the recursion
started correctly at the least significant bit, a formal proof of the algo-
rithm by complete induction has been given Q.E.D.
on the basis of more complex searching conditions, too, not only on the basis
of equality or magnitude. For instance, logic functions defined over spec-
ified bits can be used to indicate matching. Moreover it will often be nec-
essary to locate entries the bit pattern of which is closest to that of the
search argument, or which have a value next greater or smaller to that of the
entries found in the first search. The following review covers some such
operations that occur in parallel processing. Although these procedures are
discussed in the context of word-parallel, bit-serial memories, nonetheless
many of these methods are in principle applicable to all-parallel memories
as well. However, for the verification of complex matching conditions such
as those based on Boolean variables, it may sometimes be necessary to add
new functions into the comparison logic, and the extra expenditures are then
tolerable only if the logic circuit is common to the memory location, as is
the case with this type of memory.
Between-Limits Search. In this, as well as in most of the other modes of
search discussed, the results storage must contain a further auxiliary storage
flip-flop in addition to the g and ~ flip-flops already involved. The pur-
pose of this third flip-flop, briefly called results store below, is to form
the intersection of those sets of responses which are found in partial searches
on one matching criterium only. The idea is to initially load the results
store with ones, of which every partial search resets a subset to zero upon
mismatch. In this way ones are left only in such results stores for which all
matching conditions are satisfied.
The between-limits search, or retrieval of all words which have their
numerical value between two specified limits, is simply done by first search-
ing for all numbers greater than the lower limit, and then performing the
next search for numbers less than the upper limit. If the bit-cancellation
method described above is used, only the between-limits matches are left in
the results store.
Search for Maximum (Minimum). On account of the special nature of binary
numbers, a search for maximum among the stored values can be made in several
passes in the following way. No external search argument is used with this
method. Starting with the most significant bits (MSBs), values 'I' are set
in the search argument register into its highest, next to highest, etc. bit
position in succession. By masked equality match, all words are first search-
ed which have the value 'I' in the highest bit position, and proceeding then
to lower positions. If, when proceeding from the MSB towards lower bits, a
search does not pick up any candidates (all words have 'a' in this bit posi-
160
tion). then this occurrence must be detected in the output circuits. and no
cancellations must be performed in the results store during this pass. The
words found in the first passes are possible candidates for further searches
because all the other numbers must necessarily be smaller. This process is
repeated until all bits of the search argument have been exhausted. It is
possible that there exist several identical greatest numbers which all are
found in this way.
The search for minimum is obviously the dual of the search for maximum.
Starting with the MSBs. the highest zeros are checked in similar passes as
before.
Search for the Nearest-Below (Nearest-Above). By combining the successive
searches for numbers less than a given value. and searching for the maximum
in this subset. the nearest-below to the given value is found. Similarly. a
combination of the greater-than and minimum searches yields the number nearest
above the given value.
Search for the Absolute-Nearest. After the nearest-below and the nearest-above
to a given value are found. the number that is absolutely nearest to the given
value is determined by comparing the differences from the given value.
Ordered Retrieval (Ordering).This is nothing else than a successive search
for maxima or minima in the remaining subset from which the maximum or mini-
mum. respectively. has been deleted at the preceding step. It can be imple-
mented as a combination of previous algorithms. However. ordered retrieval
of items from a file. in numerical or lexicographical order. has sometimes
been regarded so important a task that special hardware-supported algorithms
for it have been designed; one has to mention the works of LEWIN [3.19l.
SEEBER and LINDQVIST [3.20l. WEINSTEIN [3.21l. JOHNSON and McANDREW [3.22l.
MILLER [3.23l. CHLOUBA [3.24l. SEEBER [3.25l. WOLINSKY [3.26l. and
RAMAMOORTHY et al. [3.27l. Hardware sorting networks have been invented by
BATCHER [3.28l.
Proximity Search. There are several occasions on which the maximum number of
bits matching with those of the search argument is a useful criterion for
location of the entry. It is now necessary to augment the memory system.
especially the results store. by bit counters which for each word count the
number of bits agreeing with the bits of the search argument. If these bit
counters are also made content-addressable. a maximum search over them yields
words that have their value in proximity to that of the search argument.
161
~oscilator
Local I
CP CP
START
.....--..., I .....--'--...,
Delay I
linas
Associatad
data out
STOP
START
- single path
~ parallel paths
- - - control signal
Interrogator word in for data paths
CP clock pulsa
The standard coding of information on magnetic tape and other media used
for archival storage is based on bytes, eight bits wide. One byte could thus
in principle encode up to 256 different symbols or characters which may re-
present letters, numbers, as well as control and signaling marks that occur
in a data stream. Most common of all coding systems is the ASCII (American
Standard Code for Information Interchange) which is used in practically every
teletypewriter and other terminal. A few characters of the ASCII code are
represented in Table 3.2.
The above coding is often used within central processing units, too, especial-
ly if the computer system is oriented towards administrative EDP in which the
main bulk of data may consist of numbers as well as text. Many large-scale
computers such as the IBM Systems 360 and 370 have a standard option in their
instruction repertoire of addressing their memory systems by byte locations.
The transfer of information between the CPU and peripherals, as well as the
remote transmission of data is usually made by bytes, very often through
eight parallel lines or channels on which the bits of a character appear
simultaneously.
One further remark ought to be made at this point. While the access to the
storage location of main memories, disks, and drums can be based on address,
usually no such possibility exists for magnetic tape. This is due to the tape
transport mechanism which is based on pulleys and does not allow accurate
stopping of the tape. For this reason, the files of contiguous data must be
165
Record:
--r--,======~=========~5
IA I I I q>
~Rec~b:,~:cord
~ p
type:
E=empty
v = non-empty
'----v----" v '--.,--J
Record Information Search
length fields results
One information
field:
'-.r-' ~
Field Field
name information
Reserved Characters. Every position in the string shown in Fig. 3.20 corre-
sponds to an eight-bit character code and one marker bit. Most of the char-
acter codes represent data being, for instance, arbitrary symbols of the
ASCII code. For this searching algorithm it is necessary to introduce some
auxiliary codes for special delimiters and values; there are many unused bit
combinations in the ASCII code for this purpose. Symbols of special character
codes are defined in Table 3.3.
One may make a convention that the first field in the record always in-
dicates its total length, whereas the beginning of this field is completely
identifiable by A. The other fields are then called information fieZds.
168
Symbol Meaning
Processing Operations. In the real device the records pass the reading heads;
since the beginning of a record along the track is not restricted in any way,
the various processing circuits, one per reading head, are normally in dif-
ferent phases. During one revolution, however, any external control such as
buffering of one character of the search argument is always identical for
all processing circuits, and it is thus assumed that a similar processing
step will be carried out for all records during one revolution of the disk.
To describe the various processing operations, it will be equivalent to
assume that some processing element makes several recurrent passes over the
record from left to right, every time looking only for one particular char-
acter and its associated marking, and being able to change the found symbol
or its marker or any symbol or marker to the right of it. The operation per-
formed in one pass is defined by a machine instruction a set of which is de-
fined, e.g., in Table 3.4.
Table 3.4 lists the instructions which describe the mode of operation of
the processing element (processing circuit) during one pass (revolution). A
program can be formed of these instructions. When studying programs written
out of these instructions, it may be helpful to imagine that every instruc-
tion takes at least one pass to be executed. So, for instance, the search
instructions set markers which are then later studied and modified by the
propagate, expand, and contract introduction at the next passes. In reality,
when searching for a matching string, only one character at a time can be
held in the buffers of the processing circuits. The searching begins by
169
Instruction Definition
search fo:!' Find all occurrences of the string sls2 ... sn and
sl s 2 ... sn set the marker in the character which immediatel-y
fol-l-ows s (This instruction applies to a single
character~ too.)
sea:r>ah fo:!' ma:r>ked Same as before except for a string to match its first
sl s 2 ... sn character must be marked; this marker is then reset
sea:r>ah fo:!' 1/IS Find all occurrences of a character which has the
relation 1/1 with s and set the marker of the following
character. (E.g., if 1/1 is >, then for a character to
qualify, its numerical value must be greater than
that of s. Possible relations are <, ~, >, ~, f
search fo:!' ma:r>ked 1/IS Analogous to "search for marked sls2 .. , sn"
sea:r>ah and set s Find all occurrences of character s and set their
markers
p:!'opagate to s Whenever a marker is found, reset it and set the
marker in the first character s following it
p:!'opagate Whenever a marker is found, reset it and set the
marker of the i:th symbol to its right
e:r:pand to s Whenever a marker is found, set the markers of all
characters following it up to and including the first s
e:r:pand i Whenever a marker is found, set the markers of the
first i characters following it
e:r:pand i O:!' to S Whenever a marker is found, set the markers of the
first i characters following it but only up to and
including s if s occurs earlier
aont:!'aat In a row of conti guous markers '. reset the fi rst i
ones
add s Add the numerical value of s to the numerical value
of all marked characters and replace these by the
character representing the sum
:r>ep l-aae by s Replace all marked characters by s
170
searching for the first characters of the search argument. When a matching
character is found, the marker in the next position is set. At the next pass,
if a marker is found and the corresponding character matches, a new marker
is set, etc. In this way the matching proceeds recursively, one character at
a time, which takes as many passes as there are characters in the search ar-
gument.
Corronent: The following list of instructions is almost directly from [3.50]
but in view of the fact that the programming examples given in the original
work seem to need some revision, one might alternatively study a simpler
instruction set which uses more special characters (see an example below).
To these instructions which refer to the searching phase only, there must
be added the operations associated with reading and writing of records. The
purpose of searching operations is to leave markers only in such records
which satisfy all searching criteria and which, accordingly, can be located
by these markers in the actual reading process. Writing must be preceded by
a program which finds an empty and large enough slot for a record and marks
this location available. For reading and writing, a host computer may be
needed.
There are plenty of standard methods to implement the instructions given
in Table 3.4 and the associated stored program control; the logic described
in [3.50] gives some outlines for them. No attempt is made here to go into
details. Similar logic principles will be discussed in Chapter 6. The block
diagram given in Fig. 3.21 shows the most important parts of the processing
circuits and the common control.
Exampl-e:
This program marks all records which contain the word 'magnet' followed
by at least 2 nonblank characters. It is assumed that all markers have
been reset before searching, and if the word is found, a marker is set
in the 'p' character.
Program Corronent
searah for magnet if found, the next character is marked
expand to B a row of markers is first set; (B=blank);
aontraat 2 if 'magnet' is followed by less than 2
nonblank letters, all markers are deleted
propagate to p
Corronent
The programs could be made simpler if two types of "don't care" symbols
were defi ned: o any character including blank
K = any character excluding blank .
171
Then, for instance, to find all words which have 'magnet' as the root
followed by 2 to 6 nonblank characters, the following program might be
used:
search for magnetKKoOOOS
propagate to p
Reading of matching records from a rotating device looks difficult be-
cause the markers which indicate matching are at the trailing end of the
record. There are now two possible ways for reading the marked records. In
one of them, the descriptors and attributes are located in front of the above-
mentioned markers but the information to be read out is written behind these
marker positions. If necessary, duplicates of the descriptors could be written
in this area. Another way is first to read the length of every record, found
behind the A character. If at the end of the record a marker is found, the
program waits until the beginning of this record at the next revolution. The
beginning address is obtained from the address of the marker by subtracting
the length of the record.
Hardware Organization. The overall organization of the byte-serial CAM is
shown in Fig. 3.21. There are external control circuits common to all pro-
cessing circuits and they are connected to a stored-program control device,
for example, a host computer. ~ch of the processing circuits, one per track,
contains a buffer for the character to be compared, as well as a few flip-
flops and some logic functions for the sequential operations which are con-
nected with the reading and writing of markers. It may be recalled that the
data representing the search argument, and data read from and written into
the memory are transferred to and from the processing circuits one character
at a time. Since input to and output from the rotating device are always
through the processing circuits, these must be selectable by address decoders.
F==~ Buffers
and the
results
j¢==:::j storage
Y - X = Y + X+ 1 . (3.16 )
In the usual CAMs, those parts of the search argument which have to match
with the stored words are defined by setting up a mask. The masking pattern
is the same for all stored words, and it thus excludes the same bit positions
in all locations from comparison. On the other hand, in the byte-serial con-
tent-addressable search it was possible to define the "don't care" characters
individually for all strings because the coding capacity of the characters
allowed the use of special symbols. This naturally leads to a question whether
the binary '0' and '1', as well as the "don't care" value YJ could be set in
the usual CAMs, too, i.e., at bit positions which can individually be defined
for every stored word. If this were possible, then those parts of information
which are regarded to be uncertain could be masked and skipped upon retrieval.
Another possible area of application for CAMs which can be masked within
the memory is in the implementation of logic functions for computation and
173
values can be assigned in four different ways to three symbols, e.g., 0 ~ (1,0),
1 ~ (0,1), and ~ ~ (0,0). In a hardware design, two bistable circuits together
with some comparison logic are needed for every bit cell.
-------+------~-------+------1-M
G7 of Fig. 3.3. no mismatching signals can be caused on the Mat any value
combination of C1 and C2 whatsoever.
Only a few electronic implementations of the above principle have been
suggested. although all basic circuit constructs of the CAMs are in principle
amenable to the FM.
F= V p
PI
(AI 1\
P2
A2 1\ ••• 1\
Pn
An ) (3.17)
where the superscript Pi' i = 1.2 •... n is assumed to attain one of the
values O. ~. and I, P is a set of combinations of the superscripts. and
p.
A.'
1
is an operational notation with the meaning
A~ = ~i' A~ = 1. and A~ = Ai
The expressions ~i and Ai are named titePats corresponding to an independent
variable Ai'
The notations expressed in (3.17) have now a very close relationship to
the so-called aompressed truth tabte which is obtained. e.g .• by the well-
known Quine-MaCtuskey method (cf. e.g., [3.62]). The following example, Table
3.5, shows a usual and a compressed truth table. respectively.
Usual Compressed
A B C F A B C F
0 0 0 0 0 1 ~ 1
0 0 1 0 1 ~ 0 1
0 1 0 1
0 1 1 1
1 0 0 1
1 0 1 0
1 1 0 1
1 1 1 0
175
C = S3AS2AS1ASO
R3 (S3AS2AS1ASO) v (S/S·2) v (S3 A"5"I) v (S3ASO)
R2 ("5"2 AS I AS O) v (S2 A"5"I) v (S2 ASO)
Rl (SI AS O) v (SI ASO)
RO ="5"0 . (3.18)
The combined truth table is shown in Table 3.6. The ~'s of the Input Table,
and the O's of the Output Table are indicated by blanks, and this convention
shall be followed throughout the rest of this section.
In the hardware implementation of the combined truth table, the Input
Table has a similar FM counterpart as that described above, with ~ATCH signals
obtained as outputs at every word line (row). Several logic sums are formed
by OR circuits, one for every Boolean function. The inputs to these OR cir-
176
Table 3.6
Row 53 52 51 50 C R3 R2 R1 RO
1 0 1
2 0 1 1
3 1 0 1
4 0 1 1 1
5 1 0 1
6 1 0 1
7 0 1 1 1 1
8 1 0 1
9 1 0 1
10 1 0 1
11 1 1 1 1 1
cuits are taken from those word lines which correspond to the bit value 1 in
the columns of the Output Table.
It may be recalled that the main objective in the introduction of func-
tional memories was the implementation of logic operations by programming.
Accordingly, any specified hard-wired operations such as that described
above in which word lines were connected to OR circuits according to the
functions to be implemented, should not be allowable. The Output Table can
now be made ppogpammable by providing every location in it, i.e., every
crossing of the word lines and columns by a usual flip-flop which can be
read by the word line signal, and connecting the output circuits of all flip-
flops of one column by a Wired-OR function. The value 1 is written into all
flip-flops in which the Output Table has them, and so the vertical output
line at every column will receive a resultant signal which corresponds to
the hard-wired operation diescribed above. We shall revert to a similar pro-
grammed output operation with Functional Memory 2 below.
Seapch-Next-Read. Table 3.6 can further be compressed by introducing the
5earch-Next-Read function, as named by FLINDERS et al. The above example,
due to its speCial properties, may yield a rather optimistic view of the
applicability of this method, but similar cases may occur rather often. The
central idea is that some input and output rows, not necessarily adjacent
ones, may resemble each other. For example, in this example, input row 2 has
a 1 in the same position as output row 1 has it; if a read operation were
177
possible in the Input Table, output row 1 could be represented by input row 2.
If we would now make the convention that after searching for input rows, the
next input rows are automatically read, output row 2 could be deleted. Simi-
larly, the output rows 2, 3, 4, 5, 7, 8, and 9 could be deleted, because
they have l's in the same position as the input rows 3, 4, 5, 6, 7,9, and
II, respectively, have them. (Notice that rows can be easily reordered in
order to represent as many output rows by next input rows as possible.) A
separate problem arises with output rows 6, 10, and II, which cannot be re-
presented by the next input rows. The solution is that these left-over out-
put rows are added behind the corresponding input rows in the table, but now
every row is provided with an additional tag bit; for words not allowed to
occur as search arguments, as for the three ones mentioned last, this tag
bit is I, and it is 0 for the rest of the words. The source word, the search
argument, now has an extra bit in this position and its value is 0 during
searching; during reading it is 1. The words provided with a tag I, therefore,
cannot match with any search argument during searching, and do not interfere.
During reading, the next to the searched word is always read. Table 3.7 shows
the more compressed table.
Table 3.7
S3 S2 Sl So
Row Tag C R3 R2 R1 RO
1 0 0
2 0 0 1
3 0 1 0
4 0 0 1 1
5 0 1 0
6, input 0 1 0
6, output 1 1
7 0 0 1 1 1
8 0 1 0
9 0 1 0
10, input 0 1 0
10, output 1 1
II, input 0 1 1 1 1
II, output 1 1
178
The previously discussed functional memory was intended for a direct imple-
mentation of two-level logic, i.e., for disjunctive Boolean forms. With a
minor amount of additional circuitry, certain multilevel logic expressions
can advantageously be represented by a functional memory which thus can fur-
ther be compressed. The first-level operation in these expressions is always
AND, and the second-level operation is OR. The logic operation on the third
level is EXOR (EXclusive OR), and the fourth level implements the ANDNOT
function. These particular operations on different levels were selected for
this implementation because it will be easy to transform truth tables into
such expressions, as will be demonstrated below. In particular, the EXOR and
ANDNOT forms can be found easily, e.g., by direct inspection of Karnaugh
maps as illustrated by a couple of examples.
Let us first consider the Karnaugh map shown in Fig. 3.23a. The inter-
sectional area of the two terms shown by dotted lines cannot be represented
by their Boolean sum, but EXOR is a function which is directly suited for
this purpose: the EXOR of these terms has zeros in the intersection.
F:
C
1
1 1 B
•
A 1 1
.1 1
Fig. 3.23a,b. Finding terms for Func-
o o tional Memory 2: a) for the EXOR func-
F= C E9(BIIO) G=CII(BIlOI tion b) for the ANDNOT function
(a) (b) (see text)
179
Another example, the Karnaugh map of Fig. 3.23b represents a case in wich
a small fraction of l's are missing from an otherwise "complete" Boolean pro-
duct term. By multiplying the "complete" term by the negation of another
suitable term, i.e., by forming the ANDNOT of these terms, the l's are covered.
A heuristic procedure for finding a simple three-level expression (not
necessary the absolutely simplest one) is thus to try normal methods for
covering the l's in a Karnaugh map by the simplest terms. If, then, one other-
wise had an almost "ideal" solution except that a few l's were missing from
some terms, or that some O's would exist in the intersections of two "complete"
terms, the solution can be "patched" by introducing the third level as indi-
cated in these examples.
Whatever procedure for finding a multilevel Boolean expression of the
above type is utilized, the result is then readily implementable by Functional
Memory 2. It consists of two partial tables, the Input Table and the Output
Table. The former is similar to the Input Table of Functional Memory 1, and
the searching operation is similar, too. The fundamental difference lies in
the Output Table which has a special column, consisting of usual flip-flops
and an EXOR circuit for every Boolean function which has to be represented
by this memory.
-:-Tl-
_~~~_
fllp-flopewlth
nNIdout logic
Consider Fig. 3.24 which shows one column of the Output Table. A pair of
iZip-fZops is shown schematically at every word line. The operation of these
flip-flops is such that whenever the word line is activated (equivalent to
reading to the flip-flops), the states of the flip-flops are made to appear
on the Bit 0 line and Bit 1 line, respectively. The output circuits of all
flip-flops connected to the Bit 0 line and Bit 1 line, respectively, are such
180
Flip-flop states 10 01 00 11
Abbreviation o 1 X Y
The Bit 0 and Bit 1 lines are connected to an EXOR circuit. As a whole,
the logic operations performed by this column, as explained below in more
detail, are signified by the name Read Right-EXOR-Left.
Implementation of four-level logic expressions by means of Functional
Memory 2 needs a careful study and shall be discussed in the following with
the aid of examples relating to various partial operations on the different
levels.
1) Implementation of AND operations (first level) occurs in the Input Table.
The results are the Boolean product terms which correspond to word line
outputs.
2) Implementation of the logic sum of these Boolean products (second level)
as well as the EXOR of two such partial expressions (third level) is done
using the Read Right-EXOR-Left operation. Assume that a three-level Boolean
expression found, e.g., from the Karnaugh map is of the form
(3.19)
where the Ai and Bj' i = 1... Mand j = 1... N are Boolean product terms
already computed by the Input Table, and their signals occur on the word
lines. Information is now written into the output column in the following
way: at all word lines which correspond to terms of the type Ai' the value
combination 10 is written into the corresponding pair of flip-flops, and
at word lines corresponding to the Bj terms, the value combination to be
written into the flip-flops shall be 01. Thus, if there were only value
combinations 10 and 01 in the Output Table, the expression F of (3.19)
would be implemented by the output columns, as can readily be seen.
181
( 3.22)
An Example, Derived from the Karnaugh Map. The map shown in Fig. 3.25 is
represented by the Boolean function
(3.23)
but it has a striking correlation to a very simple form F' = rvD. Only three
squares ought to be "carved out" from F'.
182
F: C
..A.
1 1 1
~
1 1 1
8
A
~ 1 1 1
D Fig. 3.25. Karnaugh map for (3.23)
The dotted line represents a term that now can be used to inhibit F' for the
three mentioned squares to yield F, and the corresponding Boolean function
would then be
F = (CvD) A ~ (3.24 )
The functional memory for this functional form is represented in Table 3.9.
Table 3.9
ABC 0 F
o 1
1 1
1 1 Y
When the word line corresponding to the term AAB is activated, the Y state in
the output bit cell yields a 1 at both output lines of this column, and so
the EXOR of these signals yields the result O.
Increment Tabte Sotved by the Read Right-EXOR-Le[t. Without derivation, the
increment table (Table 3.6) is now shown in a compressed form in Table 3.10
using Functional Memory 2 and this solution can easily be verified. The "don't
care" states are shown by blanks in the input table, and the state notations
defined in Table 3.8 have been used in the Output Table.
It may not always be realized that simple logic circuits have a close rela-
tionship to content-addressable memories. The wiring of a logic circuit is
183
Table 3.10
S3 S2 Sl So C R3 R2 R1 RO
0 0 0 0 0 1
0 0 0 0 1 X
0 0 0 1 X X
0 0 1 X X X
1 X X X X
T Ao Ao A, Al An An
~
Word line
responses
I I I I
I I I I
I I I I
Fig. 3.26. Switching
L---wv I I I I I I matrlx as a read-only FM
All the digital searching algorithms, including those discussed in the pre-
vious sections, can be defined using a formal notation. This formalism might
be used in the design of microprogrammed associative memories. It is based on
a subset of the APL language of IVERSON [3.73] which has been used in the
description of CPU functions in large systems, and the present form has been
worked out by FALKOFF [3.74]. The main advantage of the language of IVERSON
in this connection is that micro-operations such as masking, substitution,
as well as compression and expansion of the dimension of logical vectors
(i.e., vectors with logical variables as components), can be expressed in
a very compact way.
Statements. The elementary operations are expressed by statements numbered
by rows. Although several operations sometimes might be performed in parallel,
they are here shown on separate lines for better clarity. The statements are
executed in numerical succession except for when the order of a sequence is
changed, whereupon a jump on a branching condition is made.
Notation.The content of a bit cell is a binary scalar. Contents of operational
registers are binary vectors, and the memory matrix corresponds to the con-
tents of an array of bit cells. FUnctional variables can be implemented by
hardware, the circuit logic, and they are integral parts of the memory. The
constants are logic values defined either by hardware connections, or by
binary values buffered in registers of the control logic. Notice that the
indices in the following ascend to the right and downwards, always starting
with the value 1.
Machine Variables
tv1 memory matrix, an r by n array
x argument vector, 1 by n
m mask vector, 1 by n
s result vector, r by 1
:t)
Tv auxiliary result vectors, r by 1
FUnctional Variables
X matrix of r rows, each identical to x
P matrix of operators, same dimension as M
185
Constants
g row or column vector of binary ones
gj row or column unit vector: the j:th component is 1, all others are 0
aj row or column prefi x vector: the i:th components, i ~j are 1, all
others are O.
Statements
Let a, b, and c denote arbitrary vectoral arrays, and let arbitrary logical
vectors be denoted by u and v. Let u be the Boolean complement of u. Matrices
formed of vectors are denoted by A, B, C, and U, respectively. Then it is
defined:
Operati on Definition
C -<- axb C is formed as the outer product of a and b, i.e., Cij = aib j
where the subscripts denote array indices.
c -<- v/U c is a column vector that is formed as the OR-reduction of
each row of U, i.e., c,. = 1 if at least one UlJ
.. = 1,1 < j < n.
--
c -<- A/U c is a column vector that is formed as the AND-reduction of
each row of U, i. e., c·1 = 1 if all UlJ
.. = 1, 1 < j < n.
--
c -<- u/A compression of A by u: C is obtained from A by deleting all
columns of the array for which u has the value O.
C -<- u\A expansion of A by u: C has the same number of columns as u
has components, with the columns consisting of zeroes where u
has zeroes, and the other columns picked up from A in the same
order. Notice that u/C = A and ti/C = 0, a zero matrix.
C -<- \A,u,B\ "roUl mesh" of A and B: the columns of C are obtained from the
columns of A and B retaining their order but always taking a
column from A when u = 0, and from B when u = 1. Notice that
u/C = B, ti/C = A.
c -<- +/u c is the arithmetical sum of the component values of u.
C -<- (ufv) if c, u, and v have the same dimension, c has ones where the
components of u and v di sagree and zeroes where u and v agree.
x -<- x* x attains a value x* that is given externally.
a : b branch on comparison of a and b: transfer to another line c
may be indicated, e.g., as in the following: II if =, go to c"
which means that if a = b, a jump to the line c is to be made,
otherwise the next line is executed.
186
ExampZes of the Use of This Notation. In the following, the rows are denoted
by superscripts and columns by subscripts. So Mk is the k:th row of the mem-
ory matrix. A comparison relation of Mk with the unmasked argument vector x
is denoted by Mk = x or Nk f x, and the result is a row vector in both cases;
in the former case it has ones in the places where the bits of Mk and x agree
and 0 otherwise. In the latter relation the bits are 1 if the respective com-
ponents disagree and 0 otherwise. The comparison for equaZity of all rows in
the store with the argument vector, and a subsequent substitution of the
matching results into a flip-flop s(k) of the results store, is denoted by
which means that a comparison of all the respective bits of Mk and x is made,
and in combining the results, only those bits corresponding to components with
the value 1 in the mask vector m are taken into account. (In this formalism,
o means disabling a component.) In practice, however, the comparison circuits
at the masked bits are disabled, i.e., these comparison operations are can-
celled. For this purpose the operator vectors 1 and = are introduced. Because
1 and = are operators, relations are obtained from them by applying these
expressions to a second operand vector. So, for an arbitrary vector a, the
relation 'alE' is defined to have the value 1 for all components (identity
comparison), and 'a=E' means the usual comparison. By row meshing, the fol-
lowing variabZe operator vector is defined:
(3.26)
187
in which an identity relation holds for the components of m that are 0, and
the comparison relation for components that are 1. The notation
(3.27)
is the masked comparison operation that has occurred, e.g., in the wired-
logic example.
Parallel-by-Bit Equality Search. Let us assume that the following operations
can be implemented by the circuit logic so that they need not be computed in
separate steps (this is where the following algorithm differs from that pre-
sented by FALKOFF):
x = E: x x, P = (f E: x m) ( 3.28)
The matrix P is an operator matrix of the same dimension as M. Then the al-
gorithm for parallel-by-bit equality search, including masking, is expressed as
1) x -<- x*
2) m -<- m*
3) s -<- v/(MPX}
At the third step, use is made of the results store in which a one is first
set in all positions.
Serial-by-Bit Equality Search. A cycle counter is represented by a variable
index j which represents the contents of a counting register or storage lo-
cation. Again it is assumed that there exists circuit logic to implement the
functional variable X described above. The j:th columns of Mand X are de-
noted by Mj and Xj , respectively. This time the algorithm is written as
1) x -<- x*
2) m -<- m*
3) j -<- 0
4 s -<- E:
5) j : n if = stop.
6) j -<- j + 1
7) ~j : ~ ; if = , go to 5.
8) s -<- S v (M j f Xj ) ; go to 5.
Still Another Algorithm for Equality Search. If the content of any memory
element can be inverted by a control signal derived from the argument vector,
the following (unmasked) comparison algorithm can be used. If a bit of x is 0,
188
(less than). The argument is not masked. (Note: The array indices increase
to the right.)
1) x'" x*
2) j ... 0
3) g ... 0
4) ... 0
~
5) j : n if ,stop.
6) j ... j + 1
7) t ... g
8) g ... g v (9:" " X. AM.)
J J
9) ~ ... ~ v (f A Xj A lVl ) ; go to 5.
j
2) j ... 0
3) s ... £
4) t ... s
5) j : n if stop.
6) j ... j + 1
7) m. : 0 ; if , go to 5.
J
8) m ... Ej
9) 5 ... v/(MPX) v s
189
2) j : n if= stop.
3) j + j + 1
4) *
mj : 0 ; if= , go to 2.
5) x. + 1
J
6) m + m* " a j
7) s + £
8) 5 + vi (MPX) v s
9) +/s : 1 if> , go t02;if= , stop.
10) x. + 0 ; go to 2.
J
In this chapter, several detailed circuits and physical principles for the
implementation of CAM functiQns are presented. It may be necessary to point
out that not all of them have been used in practice; it is quite possible
that certain solutions, although patented, may not prove practicable. It
may be said definitely that 8emiconductor CAM circuit8 which are based on
approved logic switching principles have already established their status
in computer technology. Magnetic eLement8, although sometimes extensively
studied, must now be considered obsolete. An important exception are the
magnetic-bubbLe memorie8 which are beginning to replace disk memories; it
seems possible to add active searching and sorting functions to them. The
oldest c~otronic principles, although once extensively studied, too, can
no longer compete with LSI semiconductor circuits in packing density. On
the other hand, the most modern superconducting switches, the J08eph8on
junction device8, have the highest packing density of all known electronic
switches, and they are seriously considered for ultra-high-speed computers.
These devices have already been applied to buffer memories, although they
too are still at an experimental stage of development. Finally, the opticaL,
especially hoLographic memories ought to be mentioned. The general trend has
for some time been away from them since comparable packing densities are
achievable by LSI circuits.
Hardware implementations for CAMs have existed since 1956 [1.10] and CAM
arrays currently exist as off-the-shelf semiconductor components. However,
since the leading principles of electronic components have changed at a
frantic pace during recent years, one may understand why solutions for the
CAMs have not crystallized. One aspect is that the CAM components are needed
in much smaller quantities than the circuits used in random logic and stan-
dard memory devices, which keeps their prices still high and their techno-
192
logy lagging compared to that of other circuits. Some promising new designs
have been suggested and studied by corroputer simulations but there are only a
few commercial CAM components, fabricated by a rather conventional technology,
in arrays of 64 bits at maximum which are available at this writing. Since
the future development of CAM technology cannot easily be forecast, I have
tried to concentrate in this chapter only on the explanation of the basic
principles of operation of these devices, and have left aside details of
their prevailing specifications as well as structures of auxiliary circuits
thereby needed. In any case, this review contains mainly such designs which
can be mass-produced at reasonable prices and which, therefore, can most
seriously be considered in practice. The main purpose in my presentation has
been to render possible the understanding of the functional principles and
the state-of-the-art of this technology.
Contrary to what is generally believed, the degree of complexity of a bit
cell in a hardware content-addressable memory (CAM) need not be high, at
least in principle; cf, e.g., the extremely simple solutions presented in
Sect. 4.2.1. It is even noteworthy that a CAM can be built of quite usual,
readily available standard memory modules by a new organization as shown in
Sects. 3.4.1,2 and 4.3.1,3. Nonetheless CAMs have not yet acquired a position
in computer technology which theoretically they could have. When looking for
reasons for it, one may find a few: 1) The above-mentioned lack of demand
which does not call for mass production. The fact that CAMs have not found
their way into the main memories of general-purpose computers except as small
auxiliary devices may be because it has not yet become clear what would be
the standard way of programming a computer with content-addressable memory,
and accordingly, what architecture should be selected. Widespread adoption of
new designs cannot be expected until new disciplines for their use are generally
accepted. One may recall that the boom of the third-generation computers was
to a large extent due to certain identical basic solutions applied by most
known manufacturers. 2) The simplest and cheapest principles for CAM bit cells
have not yet achieved a stage of technological development which would guar-
antee a high stability and wide noise margin. 3) The more reliable CAM bit
cells already developed involve many transistors because of the mixed modes
of access (addressed and content addressable), and are, therefore, rather
expensive. 4) Coordinate selection which in usual memories allows a high di-
rectly addressed capacity is not readily amenable to implementation in CAM
arrays; the linear-select mode of addressing is normally used. 5) The control
organizations of CAN systems, because of the rather large word-length normally
193
used, are more complex and expensive than with usual memories; large CAMs
may, therefore, not become cost effective except in quite special devices
and applications.
The logic structures shown in Fig. 3.3 can have many implementations by the
va ri ous 1 ogi c famil i es, fi rs t of a11 by TTL, ECl, and r~os. Since these ci r-
cuits are usually realized by large-scale integration techniques, it is pos-
sible and even necessary to include an appreciable number of active elements
per bit cell, in order to retain ample safety margins against noise, and to
make the interfacing with other units easier. However, because it is also
desirable to tend to higher storage capacities, some suggestions have been
made ultimately to simplify the basic bit cell. It may be stated that the
more complex circuit structures have been commercially available at least ten
years, whereas the simplest constructs are still at an experimental stage of
development.
In addition to active electronic circuits, this section also reviews some
superconductive devices, old and new ones, by which CAM devices might be im-
plementab leo
C(O) C(1)
Supply
If both control lines are held at a potential which is about -0.2 V or more
negative, they will draw all the collector currents and keep the emitter-base
junctions Sand E2 cut off. This cell then cannot contribute to the word
line current. This corresponds to the resting state, or alternatively, the
situation in which the bit position is masked. Assume now that C(O) = 0 and
C(l) is made equal to 1 (the 0 value corresponds to a voltage which is equal to
or less than -0.2 V, and the value 1 to, say, voltages over +2 V); if the
flip-flop was in the '0' state (Q2 on), then the collector current of Q2 will
switch from the C(l) line to the word line. If, on the other hand, the state
of the flip-flop was 'I' (Q1 on), then the C(l) signal has no effect on the
distribution of currents. Conversely, if it is taken C(O) = 1 and C(1) = 0,
then the word line will receive current if and only if the state of the flip-
flop was 'I'. Let the signals C(O) and C(1) represent the search argument bit
in the same way as in the circuits of Fig. 3.3. It may be obvious from the
foregoing that a current which is switched from this flip-flop to the word
line indicates a bit mismatah; only if the search argument matches with the
states of the flip-flops at all bit positions, will the word line be current-
less eM = 0).
A Complete TTL Bit Cell. A practical solution for a bit cell in a commercial
bipolar all-parallel CAM is shown in Fig. 4.2. It is actually a hybrid of a
TTL flip-flop and of a special diode-logic, emitter-follower comparison cir-
cuit [4.1].
CIO) CII)
W(o)W(1)
WE
-+-+1--+------~------+-----~r-M
D,
-+-r+-----~------------~--~r
A -+-++------t--------------+----Ht--
r T
o Fig. 4.2. TTL CAM bit cell
195
The TTL flip-flop of this circuit differs from that of a usual TTL random-
access memory cell in that there are extra resistors in series with the base
leads. Because of these resistors, a sufficient voltage swing is guaranteed
for the diode logic. The latter, together with the emitter-followers, imple-
ment an AND-OR function. The OR operation is in fact a Wired-OR; the M
line is assumed to have a proper impedance towards the ground whereby any
single emitter follower (Q 3 or Q4 in Fig. 4.2) is able to raise the potential
of the Mline high. The same lines are used for writing and reading of infor-
mation (cf the signals W(O), W(I), C(O), and C(I) in Fig. 3.3, respectively),
and so an additional WE (write enable) signal is needed to select either
mode of operation.
During writing, the WE signal has a high potential. For words with a low
address-line potential, this has no effect on the state of the flip-flops.
In a word which has a high address-line potential, the new state of the flip-
flop now depends on the potentials of the first emitters El1 and E12 . One
of these (the one corresponding to the side to which the current is switched)
is kept at a low potential, whereas the opposite emitter is now given a pos-
itive potential which is sufficient to cut off the current from that side.
This state remains in the flip-flop after the potential of both El1 and
E12 has been returned to its previous low value.
The logic operations for content-addressable reading, for which the logic
gates G6 and G7 were provided in Fig. 3.3, are in the TTL bit cell implemented
by the diodes D1 , D2 , D3 , D4 , and the transistors Q3 and Q4'
The addressed reading of all the bits of a word, selected by the address
signal AD, is performed by a normal selector gate function implemented by
the diodes D5 and D6 , and the transistor Q5'
Using high-speed components, e.g., transistors and diodes with Schottky
barrier junctions, the access delay can be made as low as 10 to 15 ns.
TTL cells have been described by ASPINAll et al. [4.2], KINNIMENT et al.
[4.3], HIllIS and HART [4.4], HUGHES [4.5], and SERT JOURNAL [4.6].
ECL Bit Cell. The bipolar content-addressable bit cell can also be implement-
ed by ECl (emitter-coupled logic); a typical circuit consisting of 28 bipolar
transistors and six diodes per bit is shown in Fig. 4.3 (without detailed
explanation which is left to the reader [4.7]). The speed of ECl is signifi-
cantly higher than that of TTL, but the supply voltage must be stable in
order to guarantee a noise-free nonsaturated operation. Difficulties in the
achievement of even quality necessary for the operation of this circuit are
perhaps the reason why the price of ECl memories is still rather high. How-
196
~-------JA~------~
Oat. out
Road
Data in
cn)
Clear
Strobe
CIO)
ever, it is decaying quite fast with time and, according to some forecasts,
will approach the price of TTL memories in the 1980s.
Special Bipolar CAM Bit Cells. It is obvious that plenty of new circuit prin-
ciples for CAM bit cells can be developed. Since price has been a limiting
factor in the acceptance of CAMs, several attempts have been made to simplify
the basic circuitry. Two of such designs, as suggested by MATSUE [4.8],
MURPHY [4.9], and BERGER et al. [4.10] are mentioned here. In all of them,
the state of a flip-flop can be set and read in linear-select as well as con-
tent-addressable mode using special output transistor stages: for reading or
searching, they behave as amplifying gates, whereas for the writing of infor-
mation into the flip-flops, the output transistor is used in the backward
direction, by using the output current obtained from the base to set the flip-
flop. It was suggested by MURPHY that the flip-flops can be provided with
simultaneous horizontal and vertical sense logic; this is a feature that
might also be used in other memory circuits. We would also like to mention a
special two-way memory organization of CHU [3.15] which makes extensive use
of this feature.
197
IOV) IOV)
(C)~~~-~-------~-;--WL1
IOV WL2
.......-+---/.7V
The bit cell of BERGER et al. is shown in Fig. 4.4. The two PNP junction
structures represent symmetrical transistors. During searching, their lower
P regi ons act as emitters, whereas duri ng readi ng, the upper P regi ons are
emitters.
The voltage values shown in parenthesis in Fig. 4.4 are stable state va-
lues when 01 is conducting. The state of the flip-flop formed of 01 and 02
can be read (in the linear-select mode) by raising the potential of the word
line l(WL 1) by about 0.8 V, i.e., by forcing a current into WL 1. The out-
put transistors 05 and 06 then act as grounded-base logic gates, and the
current is transmitted to either B(O) or B(l), depending on whether 01 or
02 is conducting. In the condition shown in Fig. 4.4, the current flows into
B(O), and it must be noticed that this is equivalent to sensing a 1. During
searching, the output transistors are used in the reverse direction. To search
for l's, the potential of B(l) is raised by 0.8 V, and to search for 0, the
same is done for B(O). The mismatches are indicated by a current on ~JL 1:
for example, if the state is 1 as indicated in Fig. 4.4, and B(O) is activat-
ed, then 05 conducts. If the state were 0, 05 would have been cut off.
The writing of information is done by controlling the output transistors
in the backward direction, i.e., by sinking enough current at the collectors
of the flip-flop through the base of either 05 or 06' The write enable oper-
ation is implemented by holding the word line WL 2 at about -0.9 V, and im-
pressing current on the selected bit line.
A Bipolar FM Bit Cell. One possible implementation of a cell for FM is a
double TTL flip-flop of which the content-addressable reading function, si-
milar to that depicted in Fig. 4.1, is shown in Fig. 4.5 [3.59].
198
&ItO Bit 1
line line
The states of the cell, as shown in Fig. 4.5, are defined in the following
ways: state '0', Ql and Q3 on; state '1', Q2 and Q4 on; state '0', Q2 and
Q3 on. The fourth state, with Ql and Q4 on, is forbidden in Functional Me-
mory 1, while it is utilized in Functional ~lemory 2. The bit value 0 is found
from the memory by holding the bit 0 line high and the bit 1 line low. The
bit value 1 is found with the bit 0 line low and the bit 1 line high, respec-
tively. If masking is needed, both bit lines are held low. The matching of
all specified bits of a word is indicated by the absence of mismatch, i.e.,
by the absence of word line current in the searching operation. If the cell
is in the 0 state", it does not contribute to the word line mismatch current
for any bit line control condition whatsoever. The logic MATCH signal is ob-
tained by complementation of the logic value of the word line currents. In
the following it is assumed that true MATCH signals are always obtained from
the complete memory array.
other BipoZar Circuits. Further designs of CAM circuits based on bipolar
components have been presented by BIDWELL and PRICER [4.11], REPCHICK [4.12],
HILBERG [4.13], PRICER [4.14], and ORLIKOVSKII [4.15].
MOS Bit CeZZs. The circuits of KOO [3.4], BURN and SCOTT [4.16], WALD [4.17],
LEA [4.18], SHEAD [4.19], as well as PRANGISHVILI et al. [4.20] are very
much related in principle. In each of them, the bilateral nature of MOS tran-
sistors is utilized in order to implement an AND function during reading and
writing. All of the MOS bit cells mentioned have been implemented by in-
tegrated circuits, but it is difficult to guess which of them will be pre-
ferred in mass production. Making an arbitrary choice, the circuit of WALD
shall be described here. It is depicted in Fig. 4.6.
The operation of this circuit can be studied in two parts, first dealing
with the writing and reading of information in the addressed mode. The second
discussion is then related to the content-addressable search. f~ting is per-
formed in a similar way as in random-access MOS memories. The word control
line Wis activated whereby the switching transistors Q3 and Q4 are made to
199
conduct. One of the bit lines B(O) and B(1) is kept low (grounded) whereby
the flip-flop formed of Q1 and Q2 is set to a corresponding state. If B(O)
is grounded, the new state will be 1. During addressed reading, the word
line Wis also activated, but the bit lines B(O) and B(1) are now connected
to high-impedance sense amplifiers. Depending on the present state of the
flip-flop, current is now impressed on one of the bit lines through Q3 or Q4'
thus indicating the state of the flip-flop. Only one of the word lines is
activated at a time, so a Wired-OR operation is implemented on the bit lines.
SE B(O) B(1)
1 1 1
W-+~~~----~~----------+-~--~~
~f-+----~~--+---~--4-4-----+-,r-M
Fig. 4.6. MOS CAM
bit cell
WRRsRS
~______~~~VR
~~~ ________________~~-sW~__~
Fig. 4.7. MOS CAM bit cell with bilateral read/write. (The control signals
correspond to those explained with Fig. 4.8)
An ultimately simplified circuit, with only four MOS transistors per bit
cell has been invented by ~1UNDY [4.27-29]. In order to make its understanding
easier, one half-cell of it, containing only two transistors, is first ex-
201
Control circuit
, - -_ _---JA' -_ _ _....,
WRRS
Qs
VG--r-~--~ r-~~---'
w-+--+------1
Fig. 4.8. Simple ~10S CAll bit cell
plained with the aid of Fig. 4.8. This analysis follows closely that of LEA
[4.30,31]. The half-cell is able to act as a simple addressable memory ele-
ment in addressed writing and reading modes, as will be explained below. The
r,10S transistors are of the P-channel type, and the binary information in the
cell is stored in the form of electrical charge at the gate capacitance of
Q1' Presence of a charge corresponds to value 1, and absence of it to the
value O. There are neither constant supply voltages nor reference potentials
in this cell, and its state transition, as well as transfer of information
from it, are completely mediated by three lines with signals W, D, and VG,
respectively. The operation of the cell is "dynamic" in the sense that the
stored data will be lost by charge leakage unless special "refreshing" op-
erations are performed intermittently with the aid of external circuits (the
VG control cell). These operations are equivalent to read/write cycles. The
D (data) as well as the W (write) signal assume one of the states logical 0
(+12 V) or logical 1 (0 V). (The negative logic convention is thereby ap-
plied.) The WR, RS, and VG signals assume different voltage combinations
depending on the operations to be performed. For their description, three
operational states, termed State 1, State 2, and State 3, respectively, have
been defined in Table 4.1.
State W WR RS VG
[V] [V] [V] [V]
1 0 - 8 +12 - 2
2 +12 - 8 +12 +11
3 X +12 - 8 +12
X = either 0 or +12
202
DB
VG--~~------~--r-
By convention, the state of the bit cell is said to be 1 if the left half-
cell is in State 0 and the right half-cell in State 1, respectively. Con-
versely, the state of the bit cell is 0 with the left half-cell in State 1
and the right half-cell in State O. The two other value combinations (0,0)
and (1,1) of the half-cells are forbidden in this design but at least the
first of them may be utilized in the functional memories described in Sect.
3.7. This kind of complicated arrangement is necessary because it will be
easy to indicate a match between the data line signal value 1 and the cor-
responding state of the half-cell, whereas the half-cell states 0 are not
203
Bit line
Enablec~~~
c~i~~t l-O
control Output
_ l-Ovoltage
Writing or It
reading current t
(a) (b)
Fig. 4.10a,b. Cryotron memory: a) bit cell, b) cell array (enable control
not shown)
II "
, I
' \ Tunnel oxide
(a) (b)
will be at least one stable operating point at (0, Is)' In order to make the
device switch from this point, it is possible to lower the peak value 1m by
the application of control current to a value Im(H) < Is whereby a switching
cycle, similar to that encountered in tunnel diode circuits, is triggered.
This cycle ends in different ways depending on the value of R. Three prin-
cipally different cases with different values of R can now be distinguished:
1) The load line always makes a second intersection with the right-hand
branch.
2) There is a second intersection only in the absence of control current.
3) There is never a second intersection.
With a typical junction, these cases may correspond to values of R of, say,
0.3 n, 0.2 n, and 0.1 n, respectively.
In the first case, termed the Zatching operation, the circuit makes a
transition which converges to the second stable point. The exact form of the
trajectory which is usually oscillatory depends on the circuit reactances.
In order to reset the circuit back to the operating point (0, Is) it is nec-
essary to lower the supply current for a moment in order to shift the load
line below the valley point (V v ' Iv).
In the second case, named nonZatching operation, there will be a trajectory
which also converges in the second stable point as long as the control current
is on; after the control current is removed, there will be an automatic tran-
sition to the only remaining stable point (0, Is)'
208
",
IS Supply current line
ControlHI'
current Z, Output
"~~ ",
The main motive for the introduction of bit-serial and word-serial CAM archi-
tectures has always been ultimately to lower the cost of the bit cell. One of
the word-parallel, bit-serial memory technologies which was accepted for the
first large-scale CAM installations is the plated-wire memory principle dis-
cussed below in Sect. 4.3.2. It was later replaced by LSI random-access me-
mory modules not only because of higher speed but also because of the skew-
addressing mode of operation used.
211
The memory component needed for reading and writing by bit slices, as explain-
ed in Sect. 3.4.2, shall be a linear-select random-access memory (RAM) module.
Since with bit-serial operation it would be desirable to have a small access
time, maybe the Schottky barrier diode clamped bipolar (TTL-type) memory mo-
dules would be most suitable. An example of commercial 256-word by I-bit me-
mory module with built-in decoder and Wired OR output is the Intel 3106 shown
in Fig. 4.15; it is to be noted that no control is necessary for reading,
whereas for writing, the wr pin must be activated. If these modules are to be
connected to a larger array, up to 2 K words, of storage locations, the chip
select (CS) inputs can be used.
Assume now that all words with the '0' state in this bit position have to
be found. When the IS current equal to + IC is applied on the bit line, all
cores in this bit column with the state '0' will switch to '1', thereby in-
ducing a voltage signal in the Sw line. If the state was '1', no induction
voltage due to state change will be detected. Similarly, if all words with
the bit state 'I' have to be searched, the bit current is set to - IC' The
213
no provision for addressed reading since this memory is normally used for
location of variables in parallel computations. Its physical principle is the
following. When a nonmagnetic substrate wire is plated with a thin nickel-iron
214
',' P d
write 1 write 0
(8) Ib)
Strobe
drivers
Word 1
~ Write
~amplifier
Word 2
~ Read
Word 3
--v--- amplifier
The two primary objectives in the design of electronic circuits for active
semiconductor memories are a high packaging density, and a small power/speed
ratio. The MOS (metal-oxide-semiconductor) circuits are advantageous in both
of these respects. The linear dimension of a single transistor on the silicon
chip is typically of the order of 20 ~m, and one bit storage is typically
implementable by six transistors.
There are two basic types of MOS shift registers which may be taken into
consideration in a word-parallel, bit-serial content-addressable memory,
namely, the static and the dynamic one. One stage (bit position) of a static
I~OS shift register, together with the waveforms of timing signals needed to
effect one shifting operation is shown in Fig. 4.19. Information in the static
216
-VGG -Voo
,---------4>---0 Output
Input
(a)
Clocking
period
I-----
0
'" 1 IU IU
0
"'F
1 - I I
0
"'S 1 - I I
Input
data 0 I
1
Gate
of Q 2 0
1
--- l r
Output 0 ---- -- I I
(b) Time----
shift register will be saved for indefinite periods of time, and the shift-
ing can be started asynchronously.
Contrary to the above circuit, information in a dynamic ~lOS shift register
is volatile and will be lost unless the memory is "refreshed", i.e., the
contents are continually shifted end-around at a certain minimum clock fre-
quency. Accordingly, the operation of a dynamic shift register is usually
synchronous. One practical implementation, the foup-phase shift register to-
gether with the waveforms of its four clocking signals, is shown in Fig. 4.20.
217
a)
Output
b)
(jJ1?~
(jJ2? LJ Fig. 4.20a,b. One stage of a
(jJ3? LJ dynami c (four-phase) ~10S shift
register: a) circuit, b) control
~? '-l signals
In this solution, the four clock phases must be divided cyclically between
the different stages so that the gates open in succession and the bit values
"ride on the waves". For more details of these and related circuits, see,
e.g., [3.62].
Another elementary component for dynamic memory which has the same technology
of fabrication as the MOS circuits but does not operate on standard logic
signals is the charge-coupled device (CCD). It, too, holds great promises in
future computer technology. The construction of a CCD is still simpler than
that of the MOS shift registers, and the packaging density is corresponding-
ly higher. In return, its speed of operation is somewhat inferior to that of
the shift registers, a feature which in large-scale word-parallel operation
is not particularly bad, however. A review of the present state of CCDs can
be found in [4.163].
The basic design of a CCD is a silicon substrate, coated with a thin
(0.12 ~m) insulating Si0 2 layer over which a series of metal electrodes are
sputtered. (Fig. 4.21a). Contrary to the MOS components, no metal electrodes
are evaporated directly onto the semiconductor in this device. When a suffi-
ciently high voltage is applied on one of the electrodes (positive with
P-type silicon) then the induced field will attract minority carriers, with
a result that a local conductive channel (N-type in this example), as well
218
(a)
H8avie'--l==
(e) doping
(d)
as a potential well are formed. The channel and the well are capable of hold-
ing mobile charges (electrons in this case). By a suitable timing of control
voltages applied on consecutive electrodes, the channel and the well can be
made to move along the surface (Fig. 4.21b). Any amount of mobile charge con-
tained with them will then be moved, too.
219
The first task is to load the channel at one end of the structure with
a proper amount of mobile charge, whereafter, since the surrounding substrate
in neutral state is a good insulator, the charge (and binary information
associated with it) will be propagated along the surface. As there is a minor
amount of leakage of charge from the moving channel, however, the electrode
structure cannot be made indefinitely long, and the memory must be "refreshed"
by amplification of the signals received at the other end, whereafter they
are let recirculate in the same way as with delay-line memories and the dy-
namic shift registers.
As the memory capacity of a CCO, or the maximum useful length of the elec-
trode structure directly depends on leakage effects, especially those which
occur near the surface, some improvements to this structure have recently
been made. One of them is the so-called buried ahannet which means that if,
for instance, on a P-type silicon substrate, a thin layer of N-silicon is
formed by applied doping, then the potential well, as well as the channel
will be shifted in the vertical direction below the p-n junction, away from
the charge traps and other causes of leakage at the surface. The efficiency
of charge transfer from one electrode location to another of about 99.99 per
cent is thereby attainable, with the result that the recirculating bit string
can be made several thousands of bit positions long. This is already more
than needed in a word-parallel, bit-serial content-addressable memory, and
the new types of CCO might be considered as a memory medium in the word-
serial designs, too.
One practical electrode structure, together with two-phase controlling
waveforms, and the shape of potential wells which results from auxiliary
implanted electrodes (due to extra P-type doping) are shown in Fig. 4.21c
and d [4.163].
Some typical specifications of CCO devices, those already achieved, and
projections to the 1980s, respectively, are shown in Table 4.2 [4.163].
64 4 5 1 1-5 5-20
220
Storage Mechanisms of the MBM. It may have become clear from the foregoing
that the bubbles are made to carry binary information: if the spacing of the
bubbles can be controlled and maintained during their movement, then an exist-
ing bubble may represent the value 'I' and an empty location in a row of
bubbles is equivalent to '0'. In fact, bit densities of 10 6/in 2 (155 kbits/cm 2 )
are readily achievable. But although bubbles can be made to move distances
which are long when compared with the diameter, nonetheless special provisions
are necessary to preserve the stability and configuration of bubble trains.
AAA
~~~~
....-
1 <>--
AAA
-- 8~~~ ~
1 8~rO~~ A~~
Rotating Bubble Permalloy
field pattern
157
minor -----
loops
New bubbles are created in the major loop by the externally controlled
generate-current electrode; this, however, only implements the writing of
bit value 'I' which takes about 1D ~s. To write a '0', a bubble must be re-
moved from the train. This is done at the replicate eZectrode which either
passes the bubble or annihilates it. Detection of bubbles for readout from
the major loop is also made at this electrode structure. If the magnetore-
sistive effect is used for detection, bubbles are elongated under a chevron-
type Permalloy structure to maximize coupling. Output signals have a level
of a few mV and they must be amplified.
Transfer of information between the major and minor loops is made at
transfer positions at the tops of the minor loops. A record is usually stored
so that it occupies the same bit position in the minor loops. If a record is
suitably positioned in the major loop, writing of it into the minor loops or
reading of it can occur in parallel over all bits.
In order for blocks of data to be correctly transferred between the major
and the minor loops, they must enter the major loop correctly timed such
that, e.g., the segment in the minor loop into which a block is written will
be rotated into the right position in synchronism with the contents of the
major loop, whereafter the transfer can begin.
The major-minor-loop organization, as mentioned, was intended to speed
up addressed reading and writing. For the selection of a particular loop and
the segment in it, an external counter system is necessary, very much anal-
225
ogous to the selector system used to define a track and sector in a magnetic-
di sk memory.
Several alternative architectures, however, can be devised, too. It seems
that independent storage loops have some value in practice, too, especially
in the content-addressable organization discussed below. The interactive
loop organization, on the other hand, can be developed in many ways. The
following are only a few ideas: 1) Loops differing in length by a fixed amount
can be used to shift or shuffle bubble trains relative to each other in mutual
read-write operations. 2) Propagation of bubbles in the loops can be made bi-
directional, which further aids in shuffling, sorting, and many kinds of
editing, indexing, and stack operations. 3) Architectures with several major
loops allow, e.g., interleaving operations. 4) The loops may have junctions,
and the flow of bubbles can be steered into alternative branches depending
on control signals acting at the junctions (''bubbLe Ladder"). This is an
especially powerful method for sorting. 5) Clock rates can be made different
for different loops, and the trains can be stopped ("frozen") and restarted.
This operation preferably ought to be program-controlled.
An Architecture for Content-AddressabLe MBM. The first suggestion for bubble
CAM was based on logic operations performed by the electrode structure [4.167].
It seems, however, that the word-parallel, bit serial CAM architecture, in
which ~lB~ls are used for delay lines in the word locations, is particularly
suitable for this technology. The most central idea in making a content-
addressable MBt4 cost-effective is to implement the results storage by magnetic-
bubble technology, too.
MURAKAMI [4.168] was first to propose a word-parallel, bit-serial CAM
using shift register bubble memories. A somewhat simpler design, devised by
LEE and CHANG ([4.169], cf also [4.166]) will be reviewed here. This struc-
ture implements the paraLLeL equaLity4mateh operation and also opens the
passages at the outputs of all matching words for their readout.
The mechanism used to register matching conditions is based on the so-
called LoadabZe-Lateh control of bubble propagation. A bubble, because of
magnetic polarization caused by it, can be latched at a small Permalloy disk
which belongs to the control structure. Latching occurs only at the coinci-
dence of the bubble and an electrical current simultaneously flowing in a
control conductor (Fig. 4.25); without this current the train of bubbles will
pass the disk. After a bubble has been latched at a disk, however, the mag-
netic dipole moment of the bubble will be enough to repel other bubbles and
to prevent their flow along their intended path.
226
Argument bits:
0 1 0
.0 ••
0
.0
0 .0
0 ~o
•• 0. •• 0 ••0 .,......
First Bubble 0 0 1 0 1
pass: trains
0000 000 00 0
0 0 0 0
0.0. 0.0 O. 0
0 0 0
Permalloy disk
Control conductor
~
0 0
0.00
0
0.0
0 °CO
0
0
00.0
01 OO~O
00
0 0
0
1
0
Second
pass: •• 0
.. ~O ·~O Ct;
• 0.0 .0 • .0 .... - - --
0 0 0 0
word has been received, the latch has become loaded if there is at least one
coincidence of bit value 1 between the search argument and the bubble train.
Matching by logical equivalence means, however, that also the bit values
o have to be compared. This is done in a second pass during which the bits
of the search argument as well as those of the bubble train are inverted. In
the MBM, inverting of bits occurs at the I/O position of the loops. During
this second pass of operation, again, a bubble is latched at the disk if
there is at least one coincidence of the current pulse with an existing
bubble in the train. After both passes, the latch will remain unloaded if
and only if there was a mismatch at every bit position of the original search
argument and the original bubble train. In other words, this mechanism de-
tects the logical equivalence of the stored information with the logical
complements of the search argument applied. Naturally, the search argument
then must be the complement of the keyword information.
Masking of bits is simple in this method: it is necessary only to keep
the control conductor currentless in both passes when handling the corre-
sponding bit position.
The mechanism presented above does not yet involve any provision for mul-
tiple-match resolution; this feature must be implemented by additional bubble
logic or by an external electronic circuit. A 256 Kbit MBM array complete
with signal connections but without external magnets is shown in Fig. 4.26.
Further Works on Bubble CAMs. A number of articles have been published on
MBMs applied to content-addressable search. KLUGE [4.170] describes a simul-
taneous readout logic implementable by the MBM technology. Various features
of MBMs have been described by AVAEVA et al. [4.171,172], ALLAN [4.173],
LEE and NADEN [4.174], IL'YASHENKO et al. [4.175-179], and NADEN [4.180a].
An up-to-date review of magnetic-bubble devices has recently been worked
out by ESCHENFELDER [4.180b].
Fig. 4.26. A 256 Kbit MBM array with connectors (by the courtesy of
Hitachi, Ltd.)
nation of a signal level which results from many superimposed wave intensities;
it is finally the stability of the discrimination circuits which determines
the maximum capacity of these memories. It may be mentioned that discrimi-
nation methods for the resolution of parallel matches have been tried in
electronic memories, too, but they have not been shown practicable. Parallel
matching on the basis of superimposed signal levels cannot be used in magnetic
memories at all since the variations in the signal '0' and 'I' levels, due to
partial switching and reversible polarization, are too large to allow this.
In this section a straightforward suggestion for a parallel-readout (con-
tent-addressable) memory by magneto-optical means is presented first. After
that suggestions for content-addressable memory based on holography are dis-
cussed.
It has been claimed that optical memories would possess the storage ca-
pacity necessary for extensive liZes. For the time being it is too early to
verify this, because even the usual (addressed) optical memories exist mainly
as laboratory versions. As for eontent-addPessabZe hoZographie memories, it
will be neccessary to point out that they have not yet been implemented as
mass memory versions even in laboratory. Nonetheless the presentation of
their principles may be useful to show the feasibility of this principle.
The memory elements in this solution are thin-film dots of magnetizable ma-
terial. The writing of information in a magneto-optical memory is made as in
a conventional magnetic thin-film memory, in a way analogous to that des-
cribed with the plated-wire memory. For the content-addressable reading of
information from the magnetized dots which is more interesting in this con-
nection, the longitudinal magneto-optical effect as suggested by SMITH and
HARTE [4.181,182] can be used.
In Fig. 4.27, a collimated, plane-polarized beam of light falling on the
magnetic dots is passed through electro-optical films, one for each bit line.
Light is then reflected from the magnetic dots to an analyzer which consists
of a set of photodetectors, one per word. The light intensities reflected
from all bits in a word are summed up. In front of the photodetector there
is another polarizing sheet, the analyzer. Its polarizing direction is such
that if there were no extra change in the polarization plane by the electro-
optical films associated with the polarizer or by the dots, all light would
be absorbed by the analyzers. The reading of bits can now be performed using
230
Bit interrogators Word detectors
\ /
B, B:z II:! /
W,O 0 0 II J
W2 0 0 0 I I
W3 0 0 0 I, I
/ \
Polariser Analyser
p
B
"
./
A
(a) (b)
(4.1 )
(4.2)
Now FA(r) FA * (r) and FB(r) FB *(r) are the wave intensities which are assumed
constant with r over plate P, and for simplicity, they both can be normalized
to unity. On the other hand, - A[FA(r) FB *(r)] FA(r) is a term in which the
expression FA(r) FB * (r) represents a so-called destructive interference and
is not able to produce any image. The noise caused by thi·s term may be neg-
lected. It is further possible to assume that the initial transmittance of
the plate is T(r) = T, i.e., constant with r. For the field behind the holo-
gram it is, therefore, possible to write
(4.3)
N
F(r) Rl (T - 2NA) FA(r) - A I (4.4)
k=l
The recollection, represented by the sum over N terms. is then a linear mix-
ture of images of the B patterns. with relative intensities that depend on
the degree of matching of the field patterns FAk(r) with the field pattern
FA(r) that occurs during reading. So if the A pattern used during reading
as the search argument would be identical with one of the earlier A patterns
used as ''keyword'', then for one value of k. FAk * (r) FA(r) would be equal to
unity. and the corresponding term in the sum. representing the associated
field FBk(r). would dominate. The other terms in the mixture in general have
variable phases and represent superimposed noise due to "crosstalk" from the
other patterns. If the various A patterns were independent random images or
randomized by modulation with irregular "speckles". or if they had their non-
zero picture elements at different places. the noise content would probably
remain small. When the information patterns must represent arbitrary infor-
mation. however. this assumption is in general not valid.
On the Concept of Addressed Storage Location in Holographic Memories. As
mentioned above. when a content-addressable memory is used to store arbitrary
(normally binary) patterns. a large storage capacity cannot be achieved by
the previous superposition method on account of the crosstalk noise between
the patterns. It will then become necessary to store the partial holograms
that represent different entries on locally separate areas on the film. cor-
responding to addressed storage locations. The addressed writing and reading
of the small holograms can be made by the same technique that has been applied
with conventional holographic memories; this will briefly be reviewed below.
When it comes to content-addressable reading. however. special provisions
are necessary.
A holographic memory plane may comprise. for example. an array of 100 by
100 small holograms. each with an area of 1 mm by 1 mm. Every small hologram
234
beams are small, the distribution of the electromagnetic field over the holo-
gram corresponds to the two-dimensional Fourier transform of the spatial dis-
tribution of the field amplitude in the A pattern. Assume, without loss of
generality, that the field amplitude of the A pattern is a real function A(x)
where x is the spatial coordinate vector of the plane in which the A pattern
is defined. In other words, it is assumed that (for a unit area)
where Sx denotes the plane corresponding to x. Assume now that the field of
an A pattern recorded on the hologram is Ak(x), with corresponding field at
plate P equal to FAk(r). When the field on the output plane, corresponding
to the recollection of the B beam, is integrated spatially and its intensity
is detected as shown in Fig. 4.29 (notice that during reading the real B beam
has been switched off), for the output it is obtained
(4.6)
where the last expression results from general properties of Fourier trans-
forms. When it was assumed that Ak(x) and A(x) are real amplitude distribu-
tions, (4.6) then states that the output i's directly proportional to the
square of the inner product of the patterns Ak(x) and A(x).
Fig. 4.29. Response from a hologram with spatial integration of the output
beam intensity
In practice, the patterns Ak(x) and A(x) are formed of binary picture
elements corresponding to one of two light intensities; one of the values is
usually zero. The output IB is then zero if and only if all light spots of
A(x) match with dark spots in Ak(x).
237
It was assumed in the above analysis that the B pattern had a constant
intensity. It can be shown that the same analysis basically applies to a
system in which the B beam carries an information pattern.
There was also another feature which on purpose was neglected above,
namely, that the small holograms were formed by interference from three beams.
The narrow control beam, however, has no effect in the readout since its
"recollection" points at a direction where detection of light intensity is
not made in this example.
A masked equaZity search can now be implemented in the following way.
Every bit position in Ak(x) and A(x) must be provided by two squares, one
which is light and one which is dark. Let us call these values 0 and 1, re-
spectively. The Zogic vaZue of the bit positions is '0' if the value combi-
nation in the squares is, say, (1,0), whereas the bit value is 'I' if (0,1)
is written into the squares. If the value combination is (0,0), then it
corresponds to the masked bit position; in the previous content-addressable
memories, masking was usually made in the search argument, whereas in the
functional memories discussed in Sect. 3.7, masking can also be due in the
stored information. The value combination (1,1) shall be forbidden.
Consider now the integral in I B; the integrand is zero at all places in
which either Ak(x) or A(x) has a masked value. If and only if the bit values
of Ak(x) and A(x) at all unmasked positions are logic complements of each
other, will the output IB be zero. This suggests a straightforward implemen-
tation of equality match operation: the bit values of the pattern to be used
as search argument are inverted logically, and these values, together with
the masking markings, then constitute the pattern A(x). A match in unmasked
portions is indicated by absence of light in the output detector corresponding
to this small hologram.
In holography, the coherent light beam corresponding to A(x) is during
reading spread simultaneously over all small holograms. If in the output
plane there is an intensity detector for each of the small holograms, their
responses then correspond to muZtipZe response detection, and may be buffered
by flip-flop states in a resuZts store. The rest of a holographic content-
addressable memory system is then analogous to that described earlier, e.g.,
in Sect. 3.3.3. The responses can in turn be handled by a multiple response
resolver, and the address code thereby obtainable is used to control the
final readout process.
Addressed Reading. During writing, a constructive interference between three
beams, namely, the A beam, the B beam, and the narrow control beam was formed.
The combination of the A and control beams, or either beam alone, is then
238
/' /'
/'
Search /'
/'
/'
source
Location address
Many systems and organizations exist in computer technology and data manage-
ment in which the CAM is included as an auxiliary memory unit or an other-
wise essential system part. This chapter reviews a few such examples: advanced
memory architectures and control structures implementable by CAM circuits.
The main emphasis is in pointing out the functional role of the CAM in the
system, and its interaction with the other units. Some of the most important
applications are mentioned in the proper context, and some related hardware
principles can be found in Chap. 6. The particular applications discussed in
this chapter are:
1) Virtual memory.
2) Dynamic memory allocation.
3) Content-addressable buffer.
4) Programmable logic.
)C'U~'_~J
Primary
storage I
I
I
I
)
Secondary
storage
Peripheral
} units
Tertiary
storage Fig. 5.1. Memory hierarchy
are transmission systems using eight or nine parallel lines, i.e., one byte
wide. One stored byte is usually equivalent to a character code, most often
expressed in the ASCII. The data transfer paths between the slow ferrite-
core memories and the CPU are normally wider, one to four word lengths, and
the transfer may be interleaved in the sense that the reading command is
simultaneously given to several memory units or memory banks, but the data
words are read in escalated order, being transmitted through common lines.
Interleaving is advantageous in speeding up data transfer if the access time
of a memory unit is significantly longer than the transmission time.
Interleaving. There exist many organizational solutions in computer engineer-
ing for which the principal motivation has been to circumvent handicaps
characteristic to a particular technology. One of them is interleaving which
was introduced to speed up retrieving of large but slow ferrite-core memories
used in the big third-generation computer installations. These memories, used
as backing storages, were physically separate from the CPU (mainframe), where-
as the primary storages (mainframe memories) were parts of the CPU. Communi-
cation between the backing storages and the CPU was through data paths, the
interconnection cables, including as many as 128 parallel lines.
By virtue of the electronic technology used, the transmission delays,
and especially the intervals at which the CPU was able to receive parallel
words from the data path were much shorter than the access times of the ferrite-
core memories, although the latter were of the order of 1 ~s. In order to
fully utilize the transmission band width of the data path, the latter was
used to multiplex information from several sources. A typical large third-
244
Access time
~
Reading command
to all banks
~'-------- time-
Data available
in memory registers
Block
address
Block
Block i+1
i+2
i+3
i+4
i+&
Paged Virtual Memory. Assume that the backing storage is divided into bloeks
(pages) with, say, 256 words each. Assume that there were 4 Mwords in the
backing storage; then each word could be identified by giving a 14-bit bloek-
address, and an 8-bit word-address or within block address. The block address
is hereupon named tag. Now assume that the primary storage has a capacity
which is a multiple of 256. Actually the primary storage could be fairly
small, in our fictive example 1 K, corresponding to four sections, one block
each. The primary storage is hereupon named the buffer. A control logic will
be assumed in the computer system which is able automatically to transfer or
copy the contents of any block of the backing storage to any block section
in the buffer. The CPU, during computation, then could directly make use of
the information stored in the fast buffer provided that it were known from
which block in the backing storage it was taken. For this purpose every sec-
tion of the buffer is provided with a tag register into which the correspond-
ing block address is automatically written every time when a block is trans-
ferred. Assume now that the CPU issues an instruction referring to a word in
the backing memory, and the corresponding block has already been transferred
to the buffer. When the 14 most significant bits of the address part of the
instruction are compared with the contents of the tag registers, one of the
latter is found to agree indicating that the needed word has been buffered in
the corresponding section. Thus, using the eight least significant bits of
the address, the relative position of the word within the block will be found,
and the word can be read out of the buffer.
Apart from considerations associated with the automatic transfer of in-
formation, there will arise some principal problems. For instance, there are
only four sections in the buffer of this example. When, and in which order
shall they be filled, and what is to be done when all are full? Obviously the
sections can be filled in any order since a tag is sufficient to identify a
block. As to the remaining parts of the question, obviously if the buffer was
initially empty the transfer of the first block could be initiated at the
first memory-reference instruction; the next references are then likely to
be made to the same block. We shall discuss this argument, associated with
the philosophy of buffering, below in Sect. 5.1.3 in more detail. As the first
rule-of-thumb, it may be stated that the transfer of a block must be initiated
every time when the sought word does not already exist in the buffer, and this
can be easily deduced from the fact that the contents of none of the tag re-
gisters agrees with the 14 most significant bits of the address given in the
instruction. Now assume that all sections are already full. It seems very
likely that the block in the backing storage to which the latest instruction
247
referred is more important than some of the old blocks in the buffer, since
the next references-are also likely to be made to the same new block. It
seems wise to replace one of the old blocks in the buffer by the new one.
Which one, however, will be abandoned? There are actually two algorithms
applied for this decision. By one of them, the block to be deleted is select-
ed at random. A more sensible strategy might be to delete that block which
was referred to the longest time ago. These repLaaement aLgorUhms will be
discussed in Sect. 5.1.4 in more detail.
It has been customary to restrict the usage of the word "virtual memory"
to a memory organization of the above type in which data are buffered by blocks
or pages, e.g., 256 words in size, and in which the backing storage is a
relatively slow device such as disk memory. In principle, however, the same
strategy could be applied between any two levels of addressable memories
with greatly different speeds. In particular, another type of virtual memory
which seems to become accepted even in small computers, primarily due to de-
velopments in semiconductor component technology, is the aaahe. This is a
buffer which works according to the same principle as the virtual memory but
which, however, is mainly intended to speed up the access time of the primary
storage up to the ultimate limit. In a cache, usually an extremely fast bi-
polar buffer memory is associated with a larger conventional semiconductor
memory. If a cache buffer is used, then the range of direct addresses is that
of the largest configuration of the primary memory. As for swapping of pages
between the primary and secondary storages, different criteria are then used
by the operating system.
It may be stated that by virtue of the buffer. the aomputer may seem to
have a one-LeveL memory with aapaaity that of the seaondary storage and speed
whiah is essentiaLLy the same as that of the primary storage.
When compared with the virtual memory organization implemented between the
primary and secondary storages, there are certain characteristic features of
the cache because of which certain address mappings will prove more effective
than those used in the usual virtual memory. First, the capacity of the pri-
mary storage is usually a few decades smaller than that of the secondary
memory. Since the primary memory now takes the role of the backing memory,
the cache buffer, being the fastest part of the memory system, then must be
yet smaller. This means that only fragments of a program being run can usually
be stored in it at a time, while in a paged memory, a complete procedure may
248
be stored on one 256-word page. The relative difference in speed between the
buffer and the primary storage is also much smaller, say, 20 to 1, as com-
pared with that of the primary-to-secondary memory difference which may range
10 3 to 1 or higher. For this reason, the transfer of data between the buffer
and the primary storage must be made in rather small chunks, using wider path-
ways (with more parallel lines), whereby the block size must be selected
smaller.
It is difficult to give any universal rules for the dimensioning of the
cache because different programs may make widely different patterns of me-
mory references. The optimization must be performed by benchmarking runs on
a mixture of typical programs, say, operating systems, FORTRAN, ALGOL, COBOL,
etc. procedures, and possibly assembly-language programs.
Free~y Loadab~e Cache. The simplest buffering principle similar to that dis-
cussed in the preliminary example of virtual memory is hereupon called ~ee~y
~oadab~ since any block of the primary memory can be buffered in any section
of the cache. (Originally this mapping was called "fully associative" [5.1]
but it may be advisable to avoid an inflatory usage of the attribute "asso-
ciative".) If there are plenty of sections in the cache, the set of tag re-
~isters can be replaced by a single all-parallel content-addressable memory
(CAM); agreement of the most significant digits of the address issued by the
CPU, the virtua~ address, with the tags stored in the CAM is then made by a
content-addressable reading operation. (This is just the place in which the
CAMs have their most important application in general-purpose computers.)
For comparison with the other memory mappings, the freely loadable cache is
once again depicted in Fig. 5.4. The primary storage which supplies the cache
is in this discussion called backing memory.
CAM Block
~
Tag Section
Tag Section
Block
Block
Tag Section
Buffer "-
Any Block
mapping Fig. 5.4. Memory mapping in the freely
Backing memory loadable cache
249
10
Z
Miss Cache capacity 2K
(,,,
rate (bytes'
K
5
8K
~S~ 16K
I I
The first large computers with buffer memory were IBM System/360 ~lodels
85 and 95, as well as System/370 Models 155, 165, and 195. All of these are
byte-oriented computers. A typical backing memory in them was a large and
slow ferrite-core memory, divided in several (typically four) memory banks
housed in separate cabinets. The address capacity selected for the machine
instruction was 16 Mbytes, and justified by simulations such as those shown
in Fig. 5.5, a block size of 64 bytes was selected. There were 262 144 (256 K)
blocks in the backing storage, and a typical size of a freely loadable
cache was 128 blocks. The tag word capable of identifying 256 K blocks thus
had to be 18 bits long.
Direat-Mapping Caahe. From the organizational point of view, the freely load-
able cache is Simplest. At the time when the first cache memories were de-
signed (around 1965-67) it was thought that the access time of a 18 bit by
128 word CAM would. be significantly higher than that of smaller arrays. Now-
adays this is no longer the case. Nonetheless, a significantly smaller tag
memory than the previous method can be used if the blocks of the backing stor-
age are allowed to be mapped only into particular sections of the cache such
that, e.g. the number of the section in the cache is equal to the block address
modulo 128; see the exemplification in Fig. 5.6. This is named the direat-
mapping cache principle.
The format of an address word in the direct-mapping cache is shown in
Fig. 5.6. The 11 most significant bits are now sufficient to identify the
transferred block since the section into which a block can be buffered is
now predetermined, and is given by the next 7 bits. The restriction imposed
upon the locations of blocks in the buffer effects the replacement algorithm
and then an optimal set of blocks can no longer be maintained in the buffer;
251
Block 0
Block
Block 256
Tag Section n Block 257
Buffer
notice that transferring a block into the cache determines which one of the
old blocks must go.
"Set-Associative" Cache. This principle, like the previous one, was also
intended to make the tag memory smaller, although in a different way. In
Fig. 5.7, the principle applied is demonstrated.
Block 0
Tag Section 0
'M' { Tag
Tag
Section 1
Section 2
810ck 64
Set 1
Tag Section 3
Block 128
Buffer
Backing memory Fig. 5.7. "Set-associative" cache
The cache is divided into sets, with a small number (e.g., two) of blocks in
each. Analogously to the direct mapping principle, a block from the backing
storage can be mapped only into a set the number of which is the block address
modulo 64; the order of the block within the set can be arbitrary, however.
In this solution, too, the tag can be shorter (12 bits) since it only needs
to identify a group of 64 blocks in the backing storage. The six following
bits identify the within-block address of a byte. The tag is only one bit
longer than that of the direct-mapping cache, and the freedom in location of
the block within a set makes it possible to maintain a better set of buffered
252
blocks in the cache since one is now free to decide which one of the blocks
within a set will be replaced.
The "set-associative" as well as the direct-mapping cache were mainly re-
viewed above since they are in use in computers existing at the present time.
Nonetheless, with the present CAM technology they would hardly be adopted
for new designs.
Sector Cache. The purpose of this design was to radically decrease the number
of tag words in the cache so that instead of a CAM, a small set (in the ex-
ample below, 16) of faster special registers could be used. The idea is to
divide the backing storage as well as the cache into sectors, each one capable
of holding, say, 16 blocks. This was in fact the detailed design adopted to
the IBr~ System/360 Model 85 (see Fig. 5.8).
Valid bit
Block •
Sector Block V
Block V
Tag Sector One
Tag Sector sector
I Sect",
I
I
Tag Sector
Buffer ~ '---
Sector
Backing memory
A sector of the backing storage can map into any sector of the cache, and
each sector of the cache needs only one tag register. Now, in contrast to the
direct-mapping approach, blocks must be mapped congruently within the sector,
i.e., the relative address of the block within the sector remains the same.
In the first commercial design, as mentioned above, the cache was able to
hold 16 sectors with 16 blocks each, which is the same number as in the other
alternatives exemplified above. It must be noted, however, that there is no
need to transfer an entire sector to the cache in buffering, and this would
indeed take too much time. It will be enough to transfer only the block needed.
253
This will leave some unused capacity in the buffer, and furthermore it is
necessary to have a "usage flag", or a "valid bit" for every block in the
cache to show its occupancy.
Blocks from different sectors cannot be mixed in one sector of the cache
since the tag is common to a sector; in replacement, a whole old sector must
be emptied. This is an extra restriction when compared to the other designs.
The control of the sector cache is obviously more complicated than with the
other methods.
It may have become clear that one of the central problems in buffering is to
predict what set of addresses are likely to be referred to next since this
will affect the size of the buffer needed. Another related problem is to pre-
dict which sets of addresses already buffered are going to be needed in the
most distant future because it would then be possible to determine the block
that is next in the order to be replaced by a new one. One may assume that
address references in large are ergodic processes whereby predictions can be
based on past events; on the average, those addresses which were used a longer
time ago are also likely to be used later. A strategy in which the block into
which any memory references were made the longest time ago is replaced by a
new one is named the LRU algorithm (least recently used block is to go)
[5.4,5].
Bookkeeping of the Usage of Blocks. Assume for simplicity that the freely
loadable cache organization is used. The sections for blocks in the cache
are numbered, and every time when an address reference is made into a section,
the corresponding number is recorded in a special list. In order to comply
with the speed of computing, this list must be implemented by hardware. It
is not quite obvious, however, which list principle should be used. The first
one which comes into mind might be the list named queue (FIFO, first-in-first-
out) that must be modified, however, because of possible occurrence of the
same number in a sequence which then would be placed in several positions of
the list. Consider the storage structure shown in Fig. 5.9a into which a
number enters at the left end, and all contents of the list are thereby shift-
ed one step to the right.
The number at the right overflows and it would represent the oldest item
in the list, provided that all numbers were different. Assume now that a
number may occur several times in the sequence. The problem thereby caused is
that when the new number enters, the same number already occurring in the
254
(a) 2 -l 9 I 7 I 6 I 5 I 3 I 4 f-
(b) 7 --1 2 I9 I \L 6 I5 I3 I
Fig. 5.9. Ordering of block references in a
(e)
I 7 I 2 I 9 I 6 I 5 13 1 queue
list must be withdrawn (Fig. 5.9b) and the gap thereby formed must be closed
by shifting only those items in front of it (Fig. 5.9c).
Instead of the FIFO, another list organization named staak, pushdown Zist,
or LIFO (last-in-first-out) can be used. The stack may be visualized as a
vertically ordered array in which every new item is placed at the top; the
oldest item resides at the bottom. If a number to be entered already exists
in the stack, it must be withdrawn and all numbers above it pushed down by
one step.
Identification of a number in the list (by content addressable search or
by a slower process of serial scan), and restriction of the shift operation
to the front end are possible but a bit complex solutions. The hardware im-
plementation of the list must combine in itself the CAM and shift register
operations. For this reason there would be considerable interest in alter-
native solutions. One of them is a tricky method which uses a simple binary
matrix without any shifting operations. This principle was applied for the
System/360 Model 85 cache control. Consider Fig. 5.10 which exemplifies this
method for a four-block buffer. The circuit implementation will be shown
later in Fig. 5.12.
Records of the block numbers are represented by certain binary numbers auto-
matically generated in the following way. In the matrix, the row representing
the number of block is set full of ones except for the column with the same
255
block number; this column is written full of zeroes. It is very easy to see
that the binary numbers so formed on the respective rows are always ordered
in the same way as the four latest block references (although these numbers
are not identical with block numbers), and moreover, after the cache has been
filled up, the least recently used block is represented by a row of zeroes.
If there is a zero decoder for each row, that one giving a response can then
be used to control writing into the corresponding section of the buffer.
Yet another solution is the numbering counter method [5.5]. A sequential
number is formed in a special numbering counter at every memory reference
made to the cache. Every section is provided with a register into which the
sequential number is written upon reference to this block. The smallest num-
ber stored then indicates the least recently used block. The most difficult
problem in this design is to compare the sequential numbers fast enough, in
order to determine their minimum. This operation may be done by a special
logic circuit which in effect implements the content-addressable minimum-
search algorithm earlier described in Sect. 3.4.5. In other words, a small
word-parallel, bit-serial CAM with special search argument bit control logic
is needed for this purpose.
One problem with the numbering counter method arises when the counter over-
flows. In this case the contents of the registers associated with the blocks
must be renumbered. It has turned out that in order to avoid extra complica-
tions of control, the LRU order may at this point be forgotten and all blocks
renumbered in their physical order. Of course, this causes a disturbance in
the LRU order but if overflows of the numbering counter are relatively rare
events (e.g., with 12-bit numbering counters once in every 4096 memory re-
ferences), this disturbance does not last long and its effect on the average
performance of the cache is negligible.
Random Replacement. In view of the complexity of control circuits needed to
implement the LRU replacement algorithm, sometimes a much simpler rule can
be used, especially if there are many blocks in the buffer and if the ulti-
mate speed is not of paramount importance. Especially in minicomputer envi-
ronment, the block to be replaced can be decided quite randomly. For this
purpose, an elapsed-time clock with a binary display register can be used:
a number formed of the least significant bits, capable of numbering all blocks
of the buffer, may be regarded random enough for this purpose.
The LFU (LeaBt-~equently-UBed) Rule. Although the LRU algorithm is the most
commonly used in cache control, and random replacement has the simplc3t im-
plementation, it may be expedient to mention one further algorithm suggested
256
for this purpose, namely, the LFU (least-frequently-used) rule. The idea in
this method is to count references to every block during a certain elapsed
period, and to choose the block with the smallest number of counts for re-
placement. If the address references were describable by an ergodic stochastic
process, there would not be much theoretical difference between the LRU and
LFU criteria. In view of the possibility of simple implementation of LRU
control logic as shown in Fig. 5.10, the LFU, however, has seldom been con-
sidered in practice.
The computing processes normally modify stored data and possibly the program
code, too. When these modifications are made in the contents of the buffer,
a problem arises since a buffered block is then no longer identical with its
primary source in the backing storage. It might seem that this problem does
not manifest itself until the block is to be replaced; then the contents of
the modified block must be written into the backing storage. This updating
principle is named post-storing. It must be mentioned, however, that virtual
memories are frequently used in multiprocessing environments (cf Sects. 5.1.7,
6.6.1) whereby the same block of the backing storage may be buffered in sev-
eral places. For this reason, all information in the backing storage ought to
be immediately updated. This means revision of the backing storage every time
changes are made in the buffer, and this principle is named through-storing.
Immediate updating increases data traffic to the backing storage but the sys-
tem control, especially in multiprocessing is simplified; the interlocking
considerations caused by different buffers are less problematic. On the other
hand, with post-storing, the system performance is better. A choice between
these two updating methods is very much case-dependent.
The computations in through-storing are generally slower than in post-
storing since every intermediate result word must immediately and uncondition-
ally be written into the buffer as well as into the backing storage. Especially
in recursive computing operations it would be significantly faster to iterate
a program loop with references only to the buffer memory. In general, writing
into the memory occurs much more seldom than reading of data from it, espe-
cially in view of the fact that the buffer may also be used to store program
code from which the instructions are fetched; thus the difference in the
average speed between through-storing and post-storing is not remarkably
great. Further one has also to take into account the fact that in post-storing,
~Ihen a new block is transferred into the buffer, the old one must first be
257
saved by copying it into the backing storage. which will cause an additional
delay in the execution of a reading instruction under the miss condition.
This subsection discusses some details of the cache and the automatic control
necessary for buffering operations. The example given is fictive and combines
ideas from several sources.
Instruction register
Program counter (PC) Operand"address
r=1
~
-
r-- e ...-- ~
i a: r--
Block
t: ~
¢::
g LRU section
.r
control CAM
~ ~ ~
~~
- matrix ~ Block
0 N ~ ~
section
~ r
II: Columncont
1 1
~ ~
Data bus-
I I ~
Block
counter
Miss
detector Backing ~
storage b
Consider Fig. 5.11. The cache operation starts when the CPU issues a me-
mory-reference instruction in its instruction register. A similar sequence
of operations will be carried out during instruction-fetching; in this case
the memory address of the stored program word is given by the program counter.
The reading sequence in both cases shall end with the appearance of the search-
ed data on the data bus; it shall be fetched from the buffer of the backing
storage. Additionally. if the data is read from the backing storage. the
whole contents of the corresponding block shall be copied into the buffer.
into a section indicated by the LRU controZ. Transfer of data shall be made
via the data bus. one word at a time. As it is most important to have the
258
sought word immediately available for the CPU, the block transfer commences
with reading of the needed word, whereafter the rest of the words in the block
are read and transferred to the buffer in cyclical order. In the through-
storing principle discussed in this example, there will be no need to write
the displaced block into the backing storage; it can simply be deleted.
In the case in which the memory-reference instruction was that of writing
data into a memory location, the cache control in the through-storing method
is rather simple: the data presented on the data bus must unconditionally be
written into the buffer as well as into the backing memory.
In this example only reading of operands from the memory system and writing
the results into it are discussed. The cache is assumed freely loadable and
the LRU algorithm control is assumed to be implemented by the binary matrix
principle shown in Fig. 5.10.
The following notation shall be used in the explanation of micro-operations:
AR = number represented by the set of least significant bits in the address
given in the instruction register of the CPU, equivalent to the "within-
block address" in the buffer as well as in the backing storage; BR = number
represented by the set of most significant bits in the address given in the
instruction register, corresponding to block address in the backing storage;
(BR,AR) = the complete operand address; BUS = set of data signals on the data
bus, one word wide; M(N) = contents of the backing storage location (virtual
memory location) with address N; B(I,J) = contents of the storage location J
in buffer section I; CAM(I) = contents of the storage location I in the CAM;
Z(I) = output of zero decoder associated with row I of the LRU control matrix;
MR(I) = output of the multiple-response resolver with the Z(J),J = O,l, ... ,m-l
as its input where m is the number of sections in the buffer; MISS = output of
zero decoder associated with the outputs of the CAM, BC = contents of the
cycle counter which counts the number of words transferred to a block.
In addition to the above definition of machine variables, the following
notation conventions shall be made: the condition for a micro-operation to
be performed on value x of function f is denoted (f = x:). Assignment of
(vectoral) value X to a (vectoral) variable Y is denoted X ~ Y. Further,
aSSignment operations are initiated with escalation in time roughly given by
the order of numbering of the rows. (This description of micro-operations
does not actually comply with the formalism known as register transfer lan-
guage [5.6]: actually the numbered rows only show what happens at a particular
time.)
The operation of the LRU control matrix needs a special discussion. The
writing of new information must be made in two steps. First, the row I selected
259
by a response from the CAM is written full of ones. After that, the column I
similarly selected by a response from the CAM is written full of zeroes. Both
of these writing operations are conditional on that MISS = 0, i.e., one of
the responses is nonzero. One possible solution for a bit cell in the matrix,
showing the row and column writing control, is shown in Fig. 5.12.
Write 1
To the CAM
on row
CAM
response
Fig. 5.12. A possible bit cell structure for the LRU control matrix
(cf Figs. 5.10,11)
2) The MISS signal is used as a reading command to the backing storage. The
(BR,AR) address has automatically been mediated to the backing storage.
MR(I O)' one of the multiple-response resolver outputs, indicates the
section to be replaced.
Writing command is given to the CAM.
3) m ~ BC; BR ~ CAM(I O) (m: number of blocks)
4) M(BR,AR) ~ BUS
5) Writing command is given to the buffer.
6) BUS ~ B(IO,AR)
BC - 1 ~ BC
7) BC = 0: Stop.
BC f 0: (AR + 1) mod b ~ AR (b: block size = AR capacity)
8) Reading command is given to the backing storage. Return to step 4.
Notice that operand needed by the CPU was fetched from the backing storage
as early as possible, namely, at step 4 of the first cycle.
writing Operations. Assume that the following machine instruction has to be
executed: "Write the result given on the data bus into the virtual memory
location (BR,AR) using through-storing principle", whereby the address is
given in the instruction register. The following sequential phases are in-
cluded in the execution:
1) Reading command is given to the CAM.
Writing command is given to the backing storage.
2) One of the sections of the buffer, say 10 , receives a selection control
from the CAf4. The location is further selected by AR.
Writing command is given to the buffer.
3) BUS ~ B(IO,AR)
4) BUS ~ M(BR,AR)
Notice the timing of commands and transfer operations which is due to dif-
ferent delays.
261
Larga Large
memory memory
I I I
1
J I
C8cha Cacha C8cha
The operating system of the multiprocessor network shall take care of accesses
to the common memories which have to be made in an escalated order; upon con-
flict of reading or writing requests, the latter have to be ordered according
to their priority. Some of the CPUs in the multiprocessor network may be de-
dicated to management of files, e.g., taking care of data traffic between the
primary and secondary storages. It is usually necessary to have some sort of
interlocking control so that the CPUs cannot change intermediate results or
status words of each other, if these have to reside in the common memories.
As mentioned earlier, in a multiprocessor system it is necessary to apply
the through-storing principle of updating between the cache and the backing
storage.
If the CPUs have no working memory other than the cache, the latter should
have a sufficient capacity, say, a few K words. With the present semiconductor
technology this is still an inexpensive solution. On the other hand, such a
cache effectively decouples the CPU from backing memory operations.
Cache memory systems in multiprocessor architectures have been described
by NESSETT [5.7], AGRAWAL [5.8], AGRAWAL et al. [5.9], as well as JONES and
JUNOD [5.10,11].
Although by now a lot of experience from virtual memories and operating cache
units in real computer systems already exists, nonetheless it is difficult to
262
Assume that the main memory is divided into "frames", each one capable of
holding one page, e.g., 256 consecutive words of a program. All programs,
hereupon named segments (of program code) are similarly assumed to be di-
vided into pages; there may be one or more pages in a segment, and the last
264
page may not be full. The same page, however, shall not contain parts from
different programs. The data needed by a program are assumed to be included
in the segment, too. Upon programming the operand addresses and the line num-
bers of machine instructions which are referenced in jump instructions are
always written relative to the beginning of the segment, the latter having
the line number zero.
When a program is loaded into the main memory, the pages belonging to a
segment may be allocated in arbitrary frames. With the use of the dynamic
loader it is not necessary to make any changes in the address fields of ma-
chine instructions upon loading. Instead, in order to find the actual word
location in the memory, the true address of operand or jump instruction is
computed at the time of execution of the instruction. The conversion into
true or absolute addresses is made with the aid of a memory map which is
stored in a special content-addressable memory. The procedure by which the
loading of segments into the main memory is decided, and creation and main-
tenance of the memory map are explained later. Let us first concentrate in
address conversion on the basis of ready memory maps.
The memory-reference address which is given in a machine instruction con-
sists of two parts: the page number relative to the origin of a segment,
starting with 0, and the line number or within-page address. The segments
or programs are identified by a particular identification number. In order
to find the actual address, it is necessary only to find the converted page
frame number, since the line number is not changed in loading. The CAM used
for the page number conversion is shown in Fig. 5.14. Its consecutive loca-
tions have a one-to-one correspondence with the consecutive page frames in
the main memory.
Only part of the CAM, namely, the fields RPN and SN corresponding to the
relative page number within the segment and the segment number, respectively,
is needed for this discussion. A content-addressable search with (RPN,SN) as
the search argument and the other field NPF (next page frame, discussed later)
masked off is then performed. Only one response is assumed to result since
segments must have different identification numbers. This response is converted
into an absolute page frame address by an encoder. The encoded number and the
line number together define the complete main memory address. If the machine
instruction defined an operand fetch, then the memory address must be trans-
ferred to the address register AR of the main memory at the execution phase
of the operation. If, on the other hand, the instruction defined a jump in the
program, then the memory address must first be stored in the program counter
265
=}
Search argument RPN SN
Mask register I I 1
CAM array
Selection of
t..------------...J paga frames in
the main memory
(al
Segmant number register
Machine
instruction Opcode Page no. Line no. SNR
II
II
RPN
-L
SN
-
_r--
Ii
I-- "8U I I
I:
III
1--,-
.(
~
Effective
(bl main memory addrass
Fig. 5.14a,b. The CAM used for address conversion in dynamic memory
allocation a) word fields, b) address conversion
PC from which the contents are transferred to the address register of the
main memory at the next instruction-fetching phase.
Chaining of Pages in the Memory Map. Dynamic maintenance of the memory map
requires that all empty page frames be chained in some order into a single
linked list from which frames are then assigned to the program segment that
is loaded next. After loading, the list of remaining empty frames must be
updated which is a very simple operation as shown below. Whenever a loaded
program becomes useless, its page frames can be released by appending them
to the list of empty pages. Initially the whole main memory is empty and the
pages are linked in consecutive order. After a few loading operations the
structure of the list starts looking more random.
The linked list organization can be implemented within the CAM using the
next page frame field NPF, shown in Fig. 5.14 which is equivalent to a pointer
discussed earlier in Chap. 2. This field is simply set to indicate the next
page frame in the list of empty pages. Similarly all stored program segments
are chained whereby the NPF indicates the next page belonging to a segment.
It should be noticed that this chaining is in no way necessary from the point
266
A segment is loaded in two phases. In the first of them, a memory map corre-
sponding to the new segment is constructed in the CAM. In the second phase,
the actual transfer of pages, say, from the backing memory storage into the
main memory is carried out. The following discussion refers to Fig. 5.15
which illustrates the main units of the control hardware.
NPF I RPN I SN I
f "-
d>
I!! Q; Main
III -8 '" 'C
Fig. 5.15. Control hard-
I!!
'C
0
U
CAM '"
I!!
'C Q~
0 memory
array ware for dynamic memory
~ 8 'C
« allocation
Before loading, the operating system checks that the size of the program
segment does not exceed that of the empty segment, and determines the number
of pages. This value is set into the number-of-pages counter NPC shown in
Fig. 5.15. An arbitrary identification number, different from that reserved
for the empty segment and from the other numbers already used, must also be
assigned to this segment. The operating system may take care of listing the
segment numbers and their beginning addresses. It is further assumed that
267
the first frame address in the list of empty pages is available in a special
frame address counter FAC which will be updated during loading.
The loading starts with copying the address of the first frame of the
empty segment from the FAC register into the address register of the CAM,
and simultaneously recording it in the list of beginning addresses of pro-
grams. This address selects the corresponding location in the CAM into which
a new relative page number from the relative page number counter RPNC, this
time 0, and the new segment identification number from the segment number
register SNR are stored, using an appropriate mask in writing. [One may re-
call that in the circuits of the all-parallel CAM shown in Fig. 3.3, masking
is implemented by W(O) = W(l) = 0.] The contents of FAC and RPNC are incre-
mented by one, and those of NPC decremented by one. After that, an addressed
reading of the same CAM location makes the next page frame address from the
NPF field available at the output, and this is copied into the address re-
gister of the CAM; a new location in the chain is thereby selected. The con-
tents of RPNC and SNR are written into the RPN and SN fields of the new lo-
cation, respectively, and the above reading and writing phases are alternated
until the contents of the NPC become zero. By that time the loading of the
segment has been concluded. In order to update the linked lists, it is neces-
sary only to perform an additional writing operation to insert an end marker
into the NPF field of the last location.
Since the frame address counter FAC was always updated when pages were
loaded, its last value after loading is valid for the storage of the beginning
of the empty segment and can be utilized when the next segment is loaded.
The actual loading of a program segment commences with copying the address
of its first page frame from the above-mentioned list into the main memory
address register, in its frame address part. The same address indicates the
beginning of the memory map of this segment in the CAM and is thus copied
into the address register of the CAM. The line address begins at O. It is
assumed that the line-address part of the main memory address register is
simultaneously a line counter, capable of counting upwards modulo 256, whereby
the writing of one page into the given frame is easily implemented. No attempt
is made here to describe the control of the backing memory which is simply
assumed to deliver the whole program segment, one word at a time, on the mem-
ory bus upon demand. When one page has been loaded, which is indicated by an
overflow from the line counter, the next page address is read from the select-
ed location in the CAM, and this address is copied into the address register
of the CAM as well as that of the main memory. Loading continues with the next
page, etc., until an end marker is encountered in the NPF field of the CAM.
268
It may be simplest to continue loading, although the last page is not full,
until the next overflow from the line counter is obtained.
Releasing of a Frogram Segment. When a program segment becomes useless, it
is not necessary to perform any unloading operations. It will be enough to
indicate in the memory map that the corresponding space is available for
other programs, and as mentioned earlier, this is effected by appending its
linked list into the linked list of "empty" frames, in fact in front of it.
The appending is done very simply, e.g., by setting the NPF field (pointer)
of the last location in the program segment equal to the contents of the
FAC counter, and thereafter updating the FAC counter corresponding to the
first frame of the released segment.
Illustrative Example. An example of the contents of a memory map before and
after loading of a segment are shown in Fig. 5.16a,b, respectively.
4 7 0 0 7 0 0
5
6
0
2
0
0
0
0
•2 2 1
0
1
0
7 6 0 0 6 0 0
(a) (b)
Comment: The example given in this section has been presented in a simple
form, mainly to illustrate the role of CAM in a memory organization.
In the previous types of virtual memory, the identification of data was still
based on an explicitly given address. Another interesting type of buffer which
is intended for extensive search operation is completely content-addressable,
269
O;:~~i~:~k ~ "1..-~~_a_~A_raM_y
.J. __ 64-word
by
....J
256-bit
Fig. 5.17 presents its major parts. The buffer CAM is a bit-serial, word-
parallel array, 64 words by 256 bits, capable of performing all the usual
modes of search described in Chap. 3. The search time in the CAM is 100 pS
at maximum and this may well comply with the speed of buffering. The disk
memory has 72 fixed heads, one for every track of which 64 are used to hold
data. The purpose is to carry out an exhaustive search of the contents of the
disk, and since the disk is rotating continuously but the CAM is loaded and
read in turn, in alternating phases, it is found most effective to divide
270
the tracks into sectors 256 bits each, and to read only every second sector
at each rotation. To search the complete disk, two rotations are thus needed.
It takes 100 ~s to read one sector and to simultaneously write the data, as
64-bit parallel words, into the consecutive locations of the CAM. Thus a
continuous operation at maximum speed can be guaranteed for the CAM. As there
were 384 sectors in the original design, the whole memory capacity was 6.29
million bits or 24 K words, 256 bits each; the search time was about 76.8 ms.
On the Use of Hierarchical CAM in Large Data Bases. The 24 K words, or 6.29
million bits of the previous design may not yet be regarded as a mass storage
of data. Some special data bases [5.76] may consist of archives with memory
units capable of storing as many as 10 12 bits of data. Now it is to be noted
that the information stored in such archives is organized in categories of
which only a small subset will be selected for a particular searching task.
Nonetheless, for instance with personal files, it is not uncommon to have a
search be performed over 100 000 to one million records, each one possibly
equivalent to a maximum of 100 character codes. In view of the present CAM
technology it must be stated that: 1) the word length in a content-addressable
array (especially in a word-parallel, bit-serial CAM) can be rather long, say,
512 bits so that a complete record can be stored in one word, 2) present tech-
nology facilitates significantly shorter search times than 100 ~s, 3) the
loading of the CAM array can be made from several sources, e.g., by inter-
leaving techniques described in Sect. 5.1.1.
In view of the high inherent cost of special computer systems dedicated to
file management, the relative costs of even large content-addressable arrays
may remain modest; the maximum capacity is mainly dictated by technical rea-
sons. For instance, if the addressed writing were done in the linear-select
addressing mode, the decoding costs may limit the capacity to, say, 8 K words.
Speedup of BUffering in the Hierarchical CAM. If an electronic, say, bipolar,
all-parallel CAM array were used for buffering, its access time, being usually
less than 10 ns is much shorter than the intervals at which subsequent words
can be transferred from the backing storage. By interleaving, the loading time
of the array can be reduced to a fraction. Notice that the order or location
of the buffered words in the CAM can be arbitrary from the point of view of
content-addressable search.
Another possibility to reduce the buffering time is to divide the CAM
array in several parts which are then loaded simultaneously. For this solution
the addressing system applied in addressed writing must be changed, e.g., in
the way illustrated in Fig. 5.18 which uses independent address registers
and selection circuits.
271
~--------~vr---------~
Addressed outputs
Fig. 5.18. Partitioning of the buffer CAM for fast parallel loading.
D = decoder, M.M.R. = multiple-match resolver, aij = addresses issued by
parallel sources of data during loading of the array
During the history of development of digital computers there has always been
a tendency to replace wired logic by programming. To a large extent this has
materialized in the arithmetic units, many of which are nowadays controlled
by flexible miaroppo~s. The definition of micro-operations is usually made
using random-access memories (RAMs) with rather long words.
272
Once in a while a suggestion is made that the central control logic, too,
might be implementable by programming, especially using content-addressable
memories. Although this has been tried in some special systems (cf, e.g.,
[5.81]), nonetheless the CAMs have not become widely accepted in control
circuits. In principle, programmed logic is an elegant method of making the
circuitry universal and its design as well as documentation easy. Since the
programs can be stored in read-only memories, there is not any loss of in-
formation when the supply power is switched off. Anyway, loading of a control
memory is a simple task, similar to that of loading a high-level program,
provided that the system has a simple bootstrap loader. ~1aybe the most seri-
ous objection against the CAMs in control circuits is that there also exist
other, maybe even more amenable alternatives for programmed functions as dis-
cussed below in Sect. 5.4.4. Nonetheless, the programming of logic by con-
tent-addressable memories deserves a discussion, and we shall attempt to
approach this problem systematically.
The logic functions are primarily represented by truth tables. Assume N in-
dependent logic Boolean variables whereby a completely defined truth table
has 2N rows. It is possible to represent several logic functions of the same
variables in a common table. Consider the example given in Table 5.1, where
x, y, and z are independent Boolean variables and f l , f2' and f3 their
Boolean functions.
x y z f1 f2 f3
0 0 0 0 0 1
0 0 1 0 1 1
0 1 0 0 0 1
0 1 1 1 0 1
1 0 0 0 0 1
1 0 1 0 1 0
1 1 0 0 0 1
1 1 1 0 0 1
273
Assume that the output functions attain the value 1 on a few rows only, and
many of the input value combinations are forbidden or undefined, as is the case
frequently in practice. It then seems that a standard memory module with 2N
locations would have plenty of waste capacity, and for this reason there might
be some interest in an alternative solution, based on a CAM array or its
read-only equivalent. Consider the example given in Table 5.2.
A B C D E f1 f2
0 0 1 0 1 0 1
0 1 0 0 1 1 1
1 0 1 1 1 1 0
The l's for f1 or f2 correspond to the normal product terms (cf, e.g., [3.62])
which are assumed familiar to the reader. In this example we have
f1 (AABAC'AITAE) v (AA"B"ACADAE)
This time we would store the binary values of the left half of Table 5.2 in
a 3-word by 5-bit CAM with (A,B,C,D,E) its search argument. If now every row
is thought to correspond to word-line output, then, in order to form f1 it is
necessary only to connect the second and third output to an OR gate. In order
to form f2' another OR gate should be connected between the first and second
output, respectively. (These OR gates can also be programmed as indicated in
Sect. 3.7.2). The operation of the circuit is easily proven by stating that
274
f1 attains the value 1 if and only if (A,B,C,D,E) matches with the second or
the third row, and the converse is true for f2 for matches on the first and
second row, respectively.
5.4.3 FM Implementation
Obviously there is no reason for using memories for logic functions unless
an appreciable number of logic gates can thereby be replaced. Now we shall
consider the case in which the whole central control logic of a computer or
other automatically controlled information processor is to be replaced by
programming. A typical minicomputer involves of the order of 50 independent
logic variables, and of the order of 50 control functions in which these
variables enter as arguments. Normally very different subsets of variables,
with only a few variables in each, enter the Boolean expressions.
The complete truth table would have about 10 15 rows which excludes the
RAM solution. With such a high number of rows there is no hope for ending
up with a reasonable implementation with the CAM, either. This leaves only
the functional memory (FM) implementations as discussed in Sect. 3.7.
Below we shall exemplify only the use of Functional Memory 1 in a way
which does not require any sophistication in the logic design.
Assume that the control conditions of a computing device have been derived
into a set of simplified Boolean expressions which are in the disjunctive
normal form, or as a logical sum of logical products. (Every product term is
a prime implicant.) In view of the fact that the storage capacity is rela-
tively cheap, it may not be necessary to strive for absolutely simplest ex-
pressions. All of the product terms that occur in the expressions are now
directly converted back into corresponding rows of the combined truth table
in the Quine-McCluskey method (cf Table 3.5 in Sect. 3.7.2, as well as [3.62]).
These rows which contain "don't care" values are then stored in an FM array.
The word line outputs of the FM array which correspond to ones in a function
are connected by a set of OR circuits, one for every Boolean function, in
the same way as done in Sect. 4.4.2. Alternatively, the OR functions may be
programmed.
Example 5.1:
Assume that the following expressions have to be implemented by a CAM:
f1 (AA~AC) v (DAE) v (FAG)
f2 BADAF
f3 (AABAC) v (KAL) (5.2)
275
The contents of the FM, using the explicit notation ~ for "don t care",
I
A B C D E F G K L fl f2 f3
1 0 1 ~ ~ ~ ~ ~ Iil 1 0 1
~ ~ ~ 0 1 ~ ~ ~ ~ 1 0 0
~ Iil ~ ~ ~ 1 0 ~ ~ 1 0 0
~ 1 ~ 1 ~ 1 ~ ~ ~ 0 1 0
~ ~ ~ ~ ~ ~ ~ 1 1 0 0 1
In the first matrix, the vertical signal lines correspond to independent logic
variables or their negations, and each of the horizontal output lines is equiv-
alent to a logical product term. The product expressions are defined by switch-
ing elements (in the example, diodes) connected between the line crossings
in a proper way. The horizontal lines are inputs to the second matrix which
forms the logic sum of product terms whereby its vertical lines correspond to
the logic function outputs. The switching elements in the first matrix must
be such that they generate a set of AND functions, and in the second matrix
they must generate OR functions, respectively. During the fabrication process
a switching element has been provided at every line crossing; proper subsets
of elements are activated or passivated, e.g., by a suitable photolithographic
process using special masks, or the switching elements can be set by high
fields obtainable from automatically controlled programming devices.
~ PLA as a Read-Only CAM and FM. Consider once again the first switching
2n_l
Vi=O
(5.3)
where each of the superscripts i 1,i 2 , ... ,in attains a constant binary value
E{O,l}. These values together form the binary-number representation of index i.
i .
Further XjJ is an operational notation with the meaning
Notice that f(i 1 ,i 2 , ... ,in) E{O,l} is a Boolean constant. If there existed a
standard universal modular logic circuit which would implement the right-hand
side of (5.4), then any Boolean function of n variables would be implement-
277
V
2n - 1_1. .
'l '2
xl A X2 A
i=O
(5.5)
can be applied which contains fewer terms but in which f(i1.i2 •...• in_1.xn)
is not constant; it may take on any of the values O. 1. xn ' or x n .
An example illustrating both solutions for the implementation of a uni-
versal logic module (UL~l) is given below. The circuit which directly imple-
ments (5.3) or (5.5) is a multiplexer which is available as a standard LSI
circuit component [5.83].
Example 5.2:
The two expressions of f(x.y.z) are:
f(x.y.z) [XAYAZAf(O.O.O)] v [xAYAzAf(O.O.l)]
v [xAYAzAf(0.1.0)] v [xAYAzAf(O.l.l)]
v [xAYAzAf(1.0.0)] v [xAYAzAf(1.0.1)]
v [xAYAzAf(1.1.0)] v [xAYAzAf(l.l.l)] (5.6)
or
f(x.y.z) [x~Af(O.O.z)] v [xAYAf(O.l.z)]
v [xAYAf(1.0.z)] v [xAYAf(l.l.z)] (5.7)
z V x
flO.O.O} =0
f(0.0.1} =0
(a) l(x.V.z)
1
0
f(1.1.1} =1
V x
f(O.O.z) =0
(bl f(O.1.z) =1
f(x.V.z)
f(1.0.z) =1 Fig. 5.20a,b. Two implementations of
f(U.z) =z logic by a multiplexer: a) based on
(5.6), b) based on (5.7)
much higher number of variables, say 50, or more, which is almost the lower
limit in practice, the usual random-access memory falls outside consideration.
Maybe the most important argument in favor of the functional-memory im-
plementation is flexibility which manifests itself in logic design, testing,
correction of errors, and documentation which even with large systems can be
managed; the difference is immense when compared e.g., with random logic
circuits with which any updatings imply new drawings, wiring diagrams, and
much tracing of wiring. For all these reasons the designers of LSI circuits
ought to reconsider the importance of the FM.
This last chapter is intended to show in what ways the structures of content-
addressable hardware can further be developed and how CAM functions can be
applied to the implementation of other than pure searching functions. In order
to increase parallelism and flexibility in searching and computing operations,
a few fundamental lines exist of which the following three are believed
to represent the basic directions: 1) More logic functions can be added to
each cell in an all-parallel CAM, and the cells can be made to intercommuni-
cate locally, in order to distribute the proceSSing algorithms over the me-
mory hardware in a highly parallel manner. 2) An array processor can be built
of many higher-level processing elements (e.g., microprocessors) whereby the
CAM or an equivalent circuitry may be applied to define the individual control
conditions and intercommunication between the processing elements. 3) The re-
sults storage of a conventional CAM array, in particular that of a word-paral-
lel, bit-serial CAM can be made more versatile, and for the control and manip-
ulation of the memory array as well as of the results stora.ge, a powerful host
computer can be used.
It is somewhat striking that all the new possibilities opened by the large-
scale integration technology have not led to novel and more complex internal
structures of memories. Obviously the distributed-logic devices, because of
the high degree of complexity needed in their cells, have to date been too
expensive to make these suggestions practicable for high-capacity memories.
A preliminary counterexample given in Sect. 3.6.3 has clearly demonstrated
that if information retrievaZ were the only task, and this is for what the
distributed logic devices were originally meant, it would seem more profit-
able to trade off speed and parallelism for a big increase in memory capacity
and greatly lowered costs. The situation, however, may change in the near
future when certain new technologies such as the CCD and the magnetic-bubble
282
Ptopagate AH
activity
MMCh ~----~~----;t------H_---
J.~i1 B
arguments )I----~~----;t------H_---
B
: :
I I
.e.
;;::
&
;::
I
Xi2
I
::
I I
DDi[;] D
Search
arllument
Mlcro-
operMion ~
Fig. 6.1. One-dimensional dis-
control.lllna" One cell tributed-logic memory
Conment 1:
In the first three micro-operations, only part of the specifications may be
given: e.g., matah xil = 1, xin = 0 means that xi2 through xi ,n-1 are "don't
cares" and Ci can have any value.
Conment 2:
For reading, there must be only one active cell. If Mi = 1 for several i,
reading must be preceded by a priority-resolution microprogram as explained
below. (In practice, a multiple-response resolver might be preferred for
th is purpose.)
Example 6.1:
This microprogram locates all character strings which match with an ex-
ternal search argument, given as a sequence of characters.
Assume that initially each Mi = O. Denote the string of external char-
acters (search argument) by {~1' ••• , ~k' ••• , ~n}' The microprogram
leaves Mi = 1 in the cells which follow each matching string.
Mi croprogram Conments
1) k = 1 }
2) matah Xi = ~k
Marks Mi 1 in all cells that match with ~1
3) set Ci = 0
4) store Ci = 1
5) set Mi =0 Marks Ci = 1 in each cell which follows a string
6) right matching so far
7) set Ci =0
8) store Ci =1
9) set Mi =0
286
10) k =k + 1
Example 6.2:
This microprogram singles out the leftmost active cell in the array,
leaving Mi = 1 in it and resetting Mi in the others:
1) set Ci = 0
2) mark
3) store Ci = 1
4) set Mi = 0
5) right
6) store Ci = 0
7) set Mi = 0
8) match Ci = 1
For further illustration, assume that the values of pairs (Mi' Ci ) for
i = 0,1, ... ,6 are shown typographically as strings with a space between
the pai rs. The transformed contents after each (numbered) microinstruc-
tion are shown below.
(Mi' Ci )
Initial contents: 01 11 10 00 10 01 00
1) 00 10 10 00 10 00 00
2) 00 10 10 10 10 10 10
3) 00 11 11 11 11 11 11
4) 00 01 01 01 01 01 01
5) 00 01 11 11 11 11 11
6) 00 01 10 10 10 10 10
7) 00 01 00 00 00 00 00
8) 00 11 00 00 00 00 00
287
Mi croprogram Comment
I
1) match Xi = ~1
2) set Ci = 0
3) store Ci = 1 Sets XiI = xi2 = 1 in each cell which
4) set M.1 = 0 follows a character matching with ~1'
5) right If ~1 does not occur in the memory,
6) set XiI = 1 , xi2 = 0 sets XiI = 1, xi2 = 0 in all cells.
7) store xi2 = 1
8) set Mi = 0
9) k =2
I
10) match Xi = E;k , XiI = 1 Sets XiI = 1 in all cells following
11) set Ci =0 those in which a match occurred at
12) store C.1 =1 the last character comparison, and
13) set Mi =0 when no more than one error had oc-
14) right curred so far. (Notice that this sub-
15) set XiI = 0 program is iterated for k = 2... n).
16) store XiI = 1 Sets XiI = 0 in all other cases. Does
17) set Mi = 0 , Ci = 0 not yet alter the xi2 .
18) match xi2 = 1 Leaves otherwise correct values for
19) store C.1 = 1 , xi2 = 0 XiI and xi2 after the last matching
20) set Mi = 0 operation except sets XiI = 0, xi2 = 1
21) right if the string had been correct so far
22) store xi2 = 1 and if an error occurred at the last
23) set M.1 = 0 character comparison.
288
This example shows that the microprograms tend to become rather long.
It should be realized, however, that each program step is executed in
parallel over all cells.
An ExampZe of Bit ControZ Logia in a Linear DLM. As stated above, the aeZZ
of a DLM must include two types of logic control: one for the storage, writing,
reading, and matching of the data bits XiI through xin ' and the other for the
sequential control of the status /Zip-/Zops Mi and Ci . From the point of view
of production costs, it is the bit logic which is more decisive, and an exam-
ple of it (according to [6.4]) is shown in Fig. 6.2a. A control circuit for
the Mi and Ci flip-flops is delineated in Fig. 6.2b. Since an asynchronous
principle of sequential operation was used, a double-rank shifting method
had to be applied to propagate the status of the ~li flip-flop into the adja-
cent cell. It is for this reason that so many micro-operations were necessary
in a simple sequential matching (e.g., set Mi = 0, matah Xi = ~, ·set Ci = 0,
store Ci = 1, set Mi = 0, right) to isolate the partial operations of sending
away old information and receiving new one.
The external control of the DLM which interpretes and executes the micro-
programs must be such that it distributes a sequence of control Signals,
corresponding to micro-operations listed in Table 6.1, to the respective com-
mon control lines. It is hoped that the circuit diagrams of Fig. 6.2 are
otherwise self-explanatory.
MiaPOprog~s written for DLMs. To recapitulate, it may be mentioned that at
least the following microprograms have been written for DLMs: retrieval,
editing, and moving of variable-length character strings [6.1-4]; bulk addi-
tion, subtraction, multiplication, and Boolean operations on many sets of
operands [6.6,7]; matrix inversion by the Gauss-Jordan elimination procedure
[6.6]; multiple-response resolution [6.5,6]; searching for maximum and mini-
mum values [6.6,7]; and magnitude comparison [6.6].
GPOup-Organized DLM. If the principal mode of use of a DLM is parallel oper-
ation on numerical variables, e.g., magnitude search or bulk arithmetic, then
289
Sij S
la)
R.. FF
'J R
~ ~~~--------------+-~~-i
~j ~~~--------------+-~~-i
~T~~----------------+------i
STORE ---4-----------------+------i
READ--------------------+------i
Ai -
-
~r
S
-
Mi
-
~ -S C·
~- r-- Ai+1
C i- 1 -
FF - FF
Ib) Ci+1- RM · r--- Re· -
'R r- r---.!.R
-
Ei -
'---- r- - C· ' - -
~Mi
~
,..-- -1
iii
[:0- ~M.1+1
l1i
Pi
Fi
MATCH
SET
STORE
MARK
LEFT
RIGHT
Fig. 6.2a,b. Circuit logic of the DLM: a) bit cell, b) status flip-flops
placed by circuit logic. The adjacency of the cells is defined in such a way
that the left and right neighbor of an X cell can only be found within the
same group, whereas the neighbors of the Y cells are the Y cells of adjacent
groups.
In accordance with the word-parallel, bit-serial CAM operation, the numer-
ical variables are stored in "bit slices" of the X cells, i.e., horizontally
in each group.
X
input. output• L
1nnn
and contro I
lines
. .
I I I I
I I I I
I
• • I
I
I
I I
I
I • •
I I I I
I I I I
I I I I
I I I I
V cell
- I I~r I ~\
Gi Vim Yi2 Yi1
V
input. output t 1t ,
and contro I
lines I Fig. 6.3. Group-organized
X cell DLM, one group
a) An active cell is one with Mij 1 (Vi 1). The following microprogram
exemplifies the operation.
Example 6.4:
This microprogram adds pairs of binary numbers in parallel over all cell
groups. The two operands are stored in the xij2 and xij3 bit slices of
the X cells in each group, and the sums are left in the sets of M. flip-
1
flops, respectively. These additions will be performed only in groups
which have Gi = 1.
For the carry bits, flip-flops xij1 are used.
292
Microprogram
The sum and carry bits are determined in the usual way: if the carry bit
is xij1 = 1, then the sum (match) bit will become 1 if and only if
(x ij2 , xij3 ) is (0,0) or (1,1) and if xij1 = 0, the sum (match) bit be-
comes 1 if and only if (x ij2 , xij3 ) is (0,1) or (1,0). This is implemented
at steps 6 through 10 of the microprogram. Before that, the correct carry
bits had to be computed in the first part of the program. When scanning
the bit positions from right to left, one will note that the first carry
cannot be generated until a bit combination (x ij2 , xij3 ) = (1,1) is found
after which the carry is propagated to the left until a combination
(x ij2 , xij3 ) = (0,0) is found. When proceeding further left, the generation
of a new carry again requires the occurrence of the operand bit combination
(1,1).
Further Works on the DLM. A Cryotron implementation of DLM is reported by
CRANE and LAANE [6.8]. Suggestions for generalizations and further algorithms
have been made by EDWARDS [6.9] and SPIEGELTHAL [6.10]. An algorithmic lan-
guage for DLM has been presented by TREMBLAY [6.11]. STURMAN [6.12,13] has sug-
gested that a general-purpose computer be implemented by the DLM structure.
Cost-effectiveness, due to the high cost of memory modules, is the worst
handicap thereby met.
Further DLM applications have been described by SMATHERS [6.14]. Later in
Sect. 6.4.4 we shall describe the content-addressable computer named PEPE;
its input has been implemented by the DLM principle.
293
. - - - - - -.......------~--Clock
At this point it may be proper to mention that the cells can be controlled
to carry out the following modes of logic: 1) Combinational logic operations
along the horizontal cascades of cells. 2) Combinational logic operations
along the vertical cascades of cells. 3) Sequential logic operations whereby
the states of the Yij flip-flops can be made to change individually in syn-
chronism with the system clock.
It may further be pointed out that if the ACAM array shall realize Boolean
functions, the stored bit values Yij are used to program this logic, i.e.,
they can be set to define particular functions. It shall further be shown
below that content-addressable search actually belongs to these functions.
Clocking signals are not needed to compute these functions. If, on the other
hand, this array is used for the processing of numerical information, the
Yij' organized as words by the rows, are used to represent stored data. Clock-
ing thereby allows recursive transformation of the contents. When the array
is used as a data memory and processor, the various signals then serve the
following purposes:
295
CeZZ Logia. Controlled by the three signals ai' bi , and cj ' one of eight pro-
cessing modes, described by the logic equations in Table 6.3, can be defined
for each cell. The result of a cell operation is in general a new value for
the stored bit Yij' and new values for the intercommunication variables xi + 1 ,j
and z1,J
. . +1. It may further be noted that there are control conditions under
which the intercommunication signals Zij and xij are constant along the whole
row or column, respectively, and the Zij resp. xij then can be used as extra
control signals. With cj = 0, the cell operates in the unaZoaked mode in which
only combinational logic operations are performed. During the aZoaked (arith-
metic) operations which in general change the bit value, cj = 1. If it is
desirable to disable (mask) a particular column in arithmetic operations, its
cj is made O.
The particular cell logic defined by Table 6.3 represents only one paradigm
of ACAM designed to demonstrate the processing ability of such an array. A
set of aPray functions implementable by this logic is explained below in more
detail. Each bit cell must be realized by about 36 logic gates which is almost
three times as many as in the DLM, and about five times as many as in the
all-parallel CAM. It is to be noted that the bit storage for Yij must have
the structure of a clocked flip-flop to isolate the old and new values from
each other. Two-phase (master-slave) flip-flops seem advantageous, whereby in
certain electronic circuits two external clocking signals may be used.
Ar~y Functions. A justification for the cell logic defined in Table 6.3 is
now given by a detailed discussion of the functions that the array is sup-
posed to implement.
1) Equa~ity Search: With (cj,bi,a i ) = (0,0,1) the x word is propagated
vertically as such through the array. If now zi1 = 1, the leftmost bit cell
on a row forms an output zi2 which is the logical equivalence function of
xiI and Yi1' as immediately verified by a comparison of the truth tables of
(x$y) and (xey). The result of this bit comparison (zi2) is propagated to
the next cellon the right which performs a similar bit comparison if zi2 = 1.
°
If, on the other hand, zi2 = 0, all the further Zij are for j = 3, ... , n.
An induction shows that the final output zi,n+1 indicates the word match,
i.e., zi,n+1 = 1 if and only if xij and Yij match at all positions on the row.
Masked equality search is implemented by setting cj = 1 at all masked bit
positions. Table 6.3 now shows that for (cj,bi,a i ) = (1,0,1) the cell propa-
gates the xij and Zij values as such, i.e., this bit position cannot produce
a mismatch signal.
2) Searching on the Basis of Logica~ Imp~ication: With (cj,bi,a i ) = (0,0,0)
and zi1 = 1 another content-addressable searching is performed. A response
is obtained at all words which do not imp~y the search argument, i.e., which
have Yij = 1 at least in one position in which xij = O. Masking of a bit is
again possible by setting cj = 1.
When the control pattern is (0,1,0) and Yij = 1,
3) Permutation SWitch:
it is seen from Table 6.3 that the cell transmits signals x·+ 1 . and z . . 1
1 ,J 1,J+
which are obtained from xij and Zij by their interchanging. If Yij = 0, no
interchange occurs. By this function, after writing proper values to the cells,
any permutation of the input signals becomes possible whereby the set of in-
put signals is also shifted downward by an amount depending on the number of
permutations made. (The paths of the signals in the array resemble fletched
wires.)
297
An ACAM Specially Designed for a Kalman Filter. KAUTZ and PEASE [6.17) have
designed a content-addressable processor for use in target tracking, optimal
filtering, etc. which primarily means an embodiment of the Kalman filter.
The array functions were selected by requirements arising in matrix processing,
but at little extra cost it was possible to include a number of other func-
tions, to mention the computation of the Fast Fourier Transform. The processor
was designed around an ACAM array the bit cells of which are implementable by
about 50 logic gates.
Development of this system has mainly been carried out on the software
level (microprograms, macros, routines, and application-oriented programs).
Other Iteraiive-Cell Array Processors. Cell arrays related to that of KAUTZ
have been proposed by HOOD et al. [6.18). A distributed-logic structure has
been presented by TREPP [6.19). Linear arrays that can be chained in many
ways have recently been suggested by FINNILA and LOVE [6.20). Array algo-
rithms have been devised by BERKOVICH et al. [6.21).
299
A common feature of all memory designs discussed so far is that they were in-
tended for the storage and retrieval of independent items such as numbers,
strings, and possibly compound identifiers consisting of (name, attribute)-
pairs. While such content addressability may be enough for simple document
retrieval and parallel computation, it does not yet cope with many of the
more complex tasks that occur in problems of artificial intelligence, lan-
guage understanding, and in general, when dealing with semantic expressions
which are stored in memory in the form of relational structures. Retrieval
of information from such structures means that all items which occur in a
specified "context", i.e., which are related to a set of other items in a
specified way, must be spotted and read out. To put it in another way, assume
that the elementary items form relations which are ordered sets of these
elements. The searching problem may be formulated by expressing a system of
relations in which some elements are left unknown, and the task is to find
all possible values for these unknowns which simultaneously satisfy all stated
relations, i.e., for which a corresponding set of relations can be found in
the data base referred to.
In Chapter 1, the structure of semantic associative memory was described
briefly, and it was mentioned that the searching problem is usually formulated
and solved using certain high-level computer languages such as LEAP. This
section discusses a special hardware memory system named association-storing
processor (ASP) which is especially designed for the parallel storage and re-
trieval of semantic data structures. This deSign, introduced by SAVITT et al.,
as well as SAVITT, LOVE, RUTMAN etc. [1.14,15, 6.22-28] was preceded by a
careful software study to find out the processing functions to be embodied in
hardware. To the knowledge of this author there exists no hardware implementa-
tion of the ASP up to the present time.
A corollary of the requirements embodied in the ASP design is that a number
of parallel searches, in general involving many different search arguments,
are proceeding simultaneously. This is not possible in the other content-
addressable memories in which the search argument is given externally.
Representation of Relations in the ASP. Semantic data structures can implicitly
be defined and represented by a set of relations or "associations" which in
its simplest form consists of ordered triples of the form (A,R,B). Here A and
B are two items and R, the link label, specifies the relation between A and B.
A structure resembling a network results when several relations share common
(identical) items. An example of this will be given in Fig. 6.5. The storage
300
of the data structure itself presents no problems: the relations, the triples,
are stored as such. It is in the retrieval where an analysis of the structure
must be performed.
The ASP is a regular array of memory cells provided with parallel pro-
cessing logic. Each cell in it can be used to store any of the following
types of information: 1) A single item or link label (e.g., A, B, or R).
2) The coded representation of a relation. 3) The coded representation of a
compound item which has the form of a relation but can be used instead of an
item in another relation.
The array of cells in the ASP is preferably a square. In principle, the
ASP could be linear, too. However, in square arrays, the average distance be-
tween a pair of cells may be several orders of magnitude shorter than in a
linear array, whereby faster communication becomes possible. The two-dimen-
sional geometry is also suitable in large-scale integration of planar struc-
tures.
The ASP in fact belongs to the category of distributed logic memories,
too, and it has many features in common with the usual DLMs. For instance,
in its memory cells, data as well as various flags are stored. The cells can
be identified by global content-addressing, and they are locally connected
for intercommunication. However, while this latter feature in the DLM orig-
inally served the propagation of the matching status to the next cell in
order to facilitate sequential matching of characters in connected strings,
the objectives of intercommunication in the ASP have another scope. The
purpose is to dispatch coded information over longer distances into addressed
destinations, very much in the same way as the pointers discussed in Chap. 2
defined locations where the relevant items could be found. Although the cells
are interconnected only locally, they can be made to pass signals in the
same way as the shift registers do.
It is another characteristic feature of the ASP that the same memory array
can be used to store "memori zed" data and data structures, as well as de-
scriptions of the searching criteria; the latter, named control structures,
usually consist of a set of relations in some of which one or two of its
elements are unknown and denoted by special symbols. The control structure
is usually written in the form of a data structure which contains unknown items
or link labels. For these, all values have to be found such that when many
of them are substituted into the control structure this will match with some
part of the "memorized" data structure. The values so found then constitute
the set of all possible answers to the query. The searching task is somewhat
analogous to solving an equation. While an equation, upon substitution of
301
o B
~ "{2/R3
Data A E
X1=O
structure,
~ ~4 ""R5
{
X2=E
Fig. 6.5. An example of ASP data
structures
F G
Description of the ASP Array. The memory structure described below permits
completely parallel processing of its contents, i.e., simultaneously over any
number of specified relations.
, 2 3 4 5
,CJQOOQ
280800
300800
4
IB.R3.E'
5 1(3.412.4)(3.5,1
IX1.R2.X21
Consider Fig. 6.6. which exemplifies a small-size memory array with in-
formation, corresponding to that used in Fig. 6.5, stored in it. The cells
are designated by their addresses which consist of a pair of coordinates
(r, c) with r the row and c the column. Each cell contains three data fields
of equal size and some flag fields. If the cell is used to represent a literal
item or link label, all of the three fields can be concatenated. If, how-
ever, the cell has to hold a triple, i.e., a relation or compound item, its
elements are represented indireatZy, by the address codes of locations at
which the literal items and the link label are stored. In this mode of re-
presentation, the contents of the above three fields are equivalent to pointers.
If a triple contains unknown elements, special reserved codes, distinguishable
from addresses, must be used for them. The actual values stored in the cells
are written as constants in Fig. 6.6; for clarity, symbolic descriptions of
the contents are written beneath the cells.
Each cell, in addition to the contents of its three data fields, can ex-
press its own address code in the form of wired-in signals. Any of these four
addresses can be switched and transmitted into the communication lines, using
global control signals. The switching and transmission 'is activated only in
cells which have their match flip-flop set; however, the field to be switch-
ed can be defined individually in each cell, using two control bits reserved
in the cells for this purpose. These bits can again be set in a separate pro-
cessing step by global control, with the aid of match flip-flops for the lo-
cation of the corresponding cells. It may thus be obvious that a great number
of cells can simultaneously transmit information into the communication lines
where they are propagated like bits in a shift register, under the timing
control of a common system clock.
The cells can be made to pass address codes to each other in the upward
(south-north) direction as well as to the left (from the west to the east).
Codes transmitted to the west are replicas of those entering the cell either
from the east or from the south. If the codes come from the east, the pro-
pagation is automatically continued unless the received code matches with
the cell address, in which case the propagation is stopped and the match
flip-flop is set in that cell. If the code was received from the south, it
will automatically turn to the west if the row address part of it matches
with the row address of the cell. Otherwise the code will continue travelling
to the north.
Codes transmitted to the north may be replicas of codes entering the cell
either from the south or from the east, or they may originate within the cell
(being the contents of one of the three data fields or the cell address).
303
of the search argument, the ASP needs an external argument register and a
mask register.
It seems that the original reports and subsequent reviews of the ASP over-
look the simple fact that the content-addressable search function can be ap-
plied only to cells in which items or link labels are stored (in literal
form); the elements in relations, on the other hand, are address codes which
are not directly known and thus cannot be used as (masked) search argument.
For the content-addressable search of relations which contain the address
code of a particular item or link label, it seems necessary to apply the
special "box-car" function discussed below.
2) write: This function writes the contents of the search argument register,
corresponding to unmasked fields, into all cells whose match flip-flop is
set.
In particular, the write function can be used to set some of the jZags
(transmit flag, box-car flag, relation identifier, etc.) described below in
proper context, in all cells whose match flip-flop is set.
3) Reset: This function resets the match flip-flop in each cell, and in ad-
dition, sets the sequence jZip-jZop, similar to that of the multiple-response
storage described in Sect. 3.3.2, to the value 1. The latter is also used
to collect intermediate results from several passes as explained in the next
function.
4) Pulse: The purpose of this function is to form intersections of the sets
of responses obtained in subsequent passes. The pulse function resets the
sequence flip-flop in each cell whose match flip-flop was not set, and resets
the match flip-flop. Thus, the sequence flip-flop forms the logical AND of
its previous contents and of the match flip-flop value.
5) Context Addressing: This memory function, being one partial step in con-
text-addressable retrieval, aims at the location of stored items or link labels
which could be possible solutions for a relation that occurs in a control
structure. As a control structure is often complex, this function is usually
applied in several passes as explained below.
Assume first that the transmit flags have been set in all cells which con-
tain relations of the form (A,R,X), with A and R specified and X arbitrary;
such cells must be located by the special "box-car" function explained below
in paragraph 6. The context-addressing function transmits all the values cor-
responding to X, from cells whose transmit flag has been set, into the inter-
communication lines. Notice that many codes which are mutually different may
305
be sent away. When these codes meet cells with matching addresses, they set
the match flip-flop in these cells.
It is obvious that the context-addressing function can be applied to re-
lations in which the unknown occurs in any field.
The context-addressing function is applied in several passes in the case
in which an unknown is shared by several relations. (Such a control structure
may be regarded as a system of equations all of which must simultaneously be
satisfied. Assume, for instance, that the control structure is of the form
A~X~B which is equivalent to a system of two relations, (A,RI,X) and
(X,R2,B). A preparatory operation is to reset the match flip-flops, and to
set all sequence flip-flops to I by the reset function. The retrieval com-
mences with a set of operations (described in paragraph 6 in connection with
the box-car function) such that the match flip-flops will become set in all
relation cells whose left and middle fields contain addresses of A and RI,
respectively. Then, a write function is applied to simultaneously set the
transmit !Zags in these cells. It is to be noted that this first processing
phase may select a number of relations which do not contain solutions for the
whole control structure but only for one relation, whereas all final solu-
tions are certainly contained in the set of selected cells. A context-ad-
dressing function executed next masks off the A and Rl fields and sends the
contents of the third fields into the intercommunication lines. When these
codes hit cells with matching addresses, they set the match flip-flops in
them. In this way, a set of candidates for solutions are located. A pulse
function applied next resets the match flip-flops and leaves the value 1 in
sequence flip-flops, thereby identifying the candidates obtained at the first
pass. The next part of the retrieval is similar to the phases described above.
It begins with a set of operations which locate all relation cells whose
middle and right fields contain addresses of R2 and B, respectively. The
transmit flags in these cells are set. The contents of the fields corresponding
to X are dispatched into the intercommunication lines, and they set the match
flip-flops in all cells with matching address. A pulse function executed next
resets the sequence flip-flop if the match flip-flop at the second pass was
not set, and resets the match flip-flop. Thus, the value 1 remains in the se-
quence flip-flop if and only if this cell responded to both context-addressing
functions described above, and this value signifies all final solutions, i.e.,
those cells which contain a value of X which satisfies the complete control
structure.
It may be clear that more complicated control structures are handled in a
similar way in several passes.
306
all relations of the form (A,X1,X) were found; a writing function can be
used to set the transmit flags in this subset. Next the literal items R are
searched and their box-car flags are set; these flags are reset in other
cells. Another box-car function, with the leftmost and rightmost fields masked
off, dispatches the addresses from the middle fields of all relations found
in the first pass. The box-cars can only be received by such cells which have
the address code of R in the middle field. These cells constitute the solu-
tion to the above problem.
7) Read: This function initiates a simultaneous transmission of information
from all cells whose match flip-flop is set. The intercommunication lines are
used which means that only one specified field can be read out at a time.
These signals are made to travel towards the cell in the upper left corner
with address (1,1) which is attached to the output port. This is a box-car
type transmission with address (1,1) as the "locomotive", and the specified
fields as the box-car. Whenever a match occurs, the box-car only is sent out.
Notice that any number of cells may be outputting their contents producing a
stream of data, and if there was conflict at cell (1,1), it is handled in the
normal way. There thus does not exist any problem from multiple responses with
the ASP.
8) Mass write: This is a function for the automatic construction and storage
of a set of relations which have a regular structure. For instance, if {A k}
is a set of items stored in arbitrary cells, and the elements of this set can
be located by some characteristics which facilitate content-addressable search,
then the mass write function can be used to create a set of relations of the
type (Ak,R,B) and to store their representations in some set of cells which
happen to be empty. As in the CAM, information in the ASP can be stored in
arbitrary cells.
It will be necessary to introduce a further marker in each cell, named
usage flag, to indicate its vacancy. Let this flag have the value 1 for an
empty cell. A signal line which passes all cells on each row may be used to
form the Wired OR-function of the corresponding usage flags, indicating whether
any empty cells exist on that row.
For the mass write function, all cells Ak have to be searched and their
transmit flags set. When the mass write function is called out, these cells
simultaneously send their addresses to the west, provided that the signal
line described above announces that there is vacancy on the same row. When
during the execution of this function an address signal meets a cell which is
empty, it writes its value into aU the three fields of the cell, sets the
308
match flip-flop, and turns the usage flag off. It will be necessary to write
the information into all fields because it is not yet specified at this stage
which field shall be reserved for the Ak. As soon as all vacancies on a row
are filled, the address codes being propagated on that row must be deviated
to the north; when they meet a row with vacancy, they a re made to turn to the
west. Conflicts are handled as described earlier. After the address codes of
all the Ak have been written, the write function is applied to store the
common address codes of Rand B in proper fields. If Rand B are stored in
the memory, their addresses can be found out by a search, with R or B as the
search argument, followed by a read function with the cell address specified
as the data to be read out.
Figure 6.7 illustrates the signal trajectories in six of the memory func-
tions. The cells with their transmit flags set are denoted by T; other flags
turned on are M = match flip-flop, B = box-car flag, and U = usage flag.
D D §J DO §J D D
§J D D §J 0 0 D
D ~O D§J D §J
Search Write Context-address
Ii
D
Box-car
D
~~~
DO D
Read
Hi
D T
Mass write
T
Fig. 6.7. Signal trajectories
in the ASP for six memory
functi ons
It will be obvious that for the implementation of all memory functions dis-
cussed above, the cell structure of the ASP has to be yet more complex than
that of the CAMs and DlMs. The number of control and communication lines is
also appreciable. It is striking, however, what an enormous amount of paral-
lelism in global as well as local operations is thereby achievable; as the
internal structure moreover is regular and thus suitable for large-scale
integration, the principles of this architecture ought to be considered as
an alternative for control principles applied in present computers.
309
Rl R2 R3
A~Xl-X2~ B,
the address code R3 in the middle (irrespective of Xl and X2), and two box-
car functions are issued, one with the addresses of the leftmost fields, the
other with those of the rightmost fields as box-cars. Using the match and
sequence flip-flops of the transmitting cells, it is then possible to screen
out only those transmitting cells which receive the box-car in both of these
latter box-car functions. It may be clear that these are the cells which have
the address code of the final solution in their Xl and X2 fields.
The final task is to read out the pairs of lateral items (Xl,X2). Because
the read function is able to output only one item at a time, reading of pairs
is a bit lengthy, although straightforward task. For instance, the Xl items
can readily be located on the basis of the address codes found in the rela-
tions, and read out. The address codes of the Xl can similarly be read. It
is now necessary to make an external list of the Xl and their address codes.
To find the related items X2, the address code of each Xl in turn is used as
search argument to locate the corresponding relations, after which reading
of the related X2 is a straightforward task.
Programs for the ASP. Let it be restated that all the design features of the
ASP were aimed at the location of items or link labels which occur within a
syntactic structure; this may be followed by processing steps in which a new
value is given to these items or labels, new relations are created and con-
nected to the items so found, or relations connected to these items are de-
leted. In general, thus, the syntactic data structure stored in the memory is
changed by processing. This kind of change means updating of the stored know-
ledge. A simpler query, however, aims only at retrieval of specified items,
without any changes made in the memory.
A typical processing step which also might be a simple retrieval can be
defined by an instruction which is represented graphically; an instruction
consists of a controL structure described earlier, and a repLacement struc-
ture. The latter shall be substituted for the control structure everywhere in
the stored data structures. Assume that the control structure matches with
the data structure; this is signified as "success" (S), and the opposite is
"failure" (F). When the instructions are provided with LabeLs for their iden-
tification, S or F defines which instruction shall be executed next. This is
equivalent to conditionaL branching in usual computers. An example of a com-
plete instruction is given in Fig. 6.8.
The bookkeeping of instructions, and the detailed execution of memory
functions contained in them, can be controlled by a simple microprogram held
in an external microprogram memory. The structure of the instructions, in the
311
form of a set of relations having special symbols for their unknowns, however,
is held in the ASP memory.
A~ X1~X2~B X2 -.!!.- C
Control structure Replacement structure Branch
(a)
o B C
A
r~2,/R3
E A E
Y
R~ ,/R4 "'R5G ~ F ~4 "'-R5G
F
Before After
(b)
Fig. 6.8a,b. An ASP instruction: a) its format, b) its effect upon a data
structure
very desirable feature from the programming point of view, especially when
dealing with complex problems.
one on the left has always the higher index. The cell with the lowest index
value dircectly communicates with the root.
Root t
The tree can be a binary one like in Fig. 6.10, or any cell may be connect-
ed to more neighbors. The rails always connect bilaterally a pair of cells
which have consecutive indices. If this were the only mode of intercommunica-
tion, the cells then could be drawn as a linear array. However, when an array
is conceived as a tree, it is possible to define shortcut paths for signals
such as that which would connect the points A and B in Fig. 6.10; this will
decrease the propagation time, especially if the whole array is partitioned
into small subtrees.
I/O bus
While the array and ensemble processor architectures described above are very
effective for the handling of bulk computations in special applications such
as filtering of radar data, it has been proven that computer systems built
around a word-parallel, bit-serial CAM are the most cost-effective and flex-
ible ones over a range of diversified parallel computing and searching prob-
lems. Such computer systems will hereupon be named bit-slice content-address-
able processors. The structures and operations of their central parts, the
CAM array and the results storage, were already discussed in much detail in
Sects. 3.4 and 4.3.
It may be illustrative to first compare the organization of a "highly
parallel" processor, say, a group-oriented DLM, and that of a simple word-
parallel, bit-serial CAM system. The memory array of the latter is here visu-
alized using shift registers .
.------
"E
~
~
"
'6>
0
-J
"E
~
~
i.----
Fig. 6.12. Comparison of a
DLM Bit-slice processor DLM and a bit-slice processor
Storage of operands as shown on the left in Fig. 6.12 directly implies that if
arithmetic operations are to be performed, some intercommunication logic must
be provi ded between the X ce 11 s to take ca te of the ca rry signa 1s. On the
other hand, if the storage of operands is made as shown on the right, and the
operands are rotated through the results storage by shifting, the iterative
intercommunication logic is replaced by a single sequential logic circuit
per word location. Thus, if these systems are used mainly for arithmetic
operations, there is no significant difference in speed between them since
bit-serial steps of operation must anyway be executed in both. On the other
hand, it is clear that the memory hardware of the latter is very much cheaper
per bit, especially with long operands; the memory itself has to perform noth-
ing else but shifting.
318
Direct Program
memory access control I/O
(DMA) memory
Control bus
External-
function
logic
CAM External
functions
The CAM Array. In STARAN, addressed reading and writing of words is made in
parallel. It is mainly for this purpose that the skew addressing principle
was introduced. The EXOR skew network for diagonal addressing, built of com-
mercially available EXOR chips, is named flip netwopk (FN) in this system.
It is to be noted, however, that a programmer does not "see" the permutation
of data in skew addressing; the array normally appears to him as simply being
addressable by word locations or bit slices. The EXOR skew network could,
however, be programmed to pick up also other types of slices from the CAM
array, by the application of a particular argument in the address mode re-
gister (cf Fig. 3.16).
Because of the skew addressing principle, words and bit slices of a memory
unit have to be of equal length. The CAM array of STARAN, therefore, consists
of square modules called blocks; a size of 256-word by 256-bit for them was
chosen", mainly along with the size of the most usual commercial RAM modules.
Selection of words, bit slices, or other bit patterns in a block is made by
an 8-bit address and an 8-bit address-mode argument. The data read out appear
in parallel at a 256-bit I/O port, or they communicate via a 256-bit wide
path with the results storage. The array may contain 1 to 32 blocks, the
maximum number being determined by the indexing capacity of the machine in-
struction words which control the overall system operation. Bit slices over
sets of blocks can be defined by programming.
Since the writing and reading principles as well as the logic of the re-
sults storage were already discussed in Sects. 3.4.2,4 there is not much to
be added here to the description of the CAM hardware.
~e Results Storage HapawaPe. The results storage of STARAN (cf Sect. 3.4.4)
X. Y. or M X or Y
r(X) • r(Y) • r(F) X. Y. and F. resp .
f 0: (F.X.Y) X a = 0.1 •••.• 15
g~(F.Y) Y ~ = 0.1 •...• 15
h(F) X or Y
k(X.Y) X or Y
m(F) F
where the subscript i defines a particular bit in the registers. and oPo:
defines one of the 16 possible Boolean functions of two variables.
g~: The pointer FP1. in a way described above. selects one of two functions
which are of the form Fi oPe Vi'
h: F is shifted right end-around by 32N bits. N = 0.1 ••.• •7. whereby F32
through F255 must be D.
k: If all Yi are D. the value of this function is X. otherwise it is Y.
m: The order of bi ts in F is reversed.
2) Ins."tPuations for the HandUng of Multiple Responses: The control block
has two field pointers FP1 and FP2 of which FPI here defines a block and FP2
the word or bit slice. The instruction "find first responder" determines the
address code of the uppermost 1 in the Y register (within a block). using the
built-in multiple-match resolver. and loads it into (FPI. FP2). The "reset
first responder" instruction resets the uppermost 1 in Y. A single compound
instruction of the above instructions exists. too. The "reset other responders"
singles out the uppermost 1 in Y and resets the other ones.
322
Between a pair of reading and writing instructions, the field pointers FPl,
FP2, etc. may be modified, e.g., by incrementing or decrementing them. This
allows various shifting and permutation operations to be performed.
Program Control. All of the above instructions dealt solely with the CAM
system. The stream of instructions which defines the information process,
however, must also contain instructions which are connected with the stored
program control and which refer to the control memory. These include instruc-
tions for unconditional and conditional branching, control of program loops
of specified length, handling of priority interrupts, communication with the
323
The computing operations in STARAN and similar computers were mainly restrict-
ed to bit slices, whereas the word-addressing provision was primarily used
to guarantee fast I/O. In other words, the computational variables which
were stored in word locations were processed independently, by simultaneous
execution of similar program steps on all or a selected subset of them (bulk
processing). However, for interrelated variables further dimension in parallel
324
I I
Horizontal
arithmetic unit
Vertical arithmetic unit
3
I I I
~ ~ ~---~ ~ ~ ~--~
I I I
I I I
I I I
I I I
I I I
~
I I I
I
I
I
I I I
11 I
I I I Orthogonal memory unit
I
I
I
I
I
I
I
I
I
V
I I I
The block in the vertical arithmetic unit named function generator has a
logic circuit at every bit position, capable of forming any of the 16 Boolean
functions on two variables which are the corresponding bits in registers Vi.
Alternatively, one of the bits may be that found in a register, and the se-
cond one is read from the memory. The result can be subsituted into any
vertical register or bit slice. In order to speed up arithmetic operations,
each bit position in the vertical arithmetic unit is further provided with
a full adder network and a carry bit storage; there are several carry regis-
ters Cj to allow several arithmetic operations to be performed simultaneously.
Several attempts have been made to represent general systematics for parallel
computers, and for content-addressable parallel processors in particular
[6.60-69]. In view of the fact that most existing parallel processors com-
prise unique solutions for special purposes, it may be more advisable to
restl'ict this discussion to the main lines along which different configurations
can be identified; the number of individual systems is too large to be re-
viewed here.
One of the first attempts to classify existing computer systems is due to
FLYNN [6.62]. He divided computers in four categories:
1) SISD (Single Instruction Stream, Single Data Stream) machines, also named
uniprocessors; this is the fundamental type of general-purpose computers.
2) MISD (Multiple Instruction Stream, Single Data Stream) machines, also named
pipeZine processors. These could be particular general-purpose computers,
326
OM
PM
Machine I Machine II
Distributed
logic
data memory
c:J
I
'----__----'-1
(DM+PU)
CU
Machine V Machine VI
AppZications. The following listing tries to cover the most important appli-
cations of the content-addressable processors:
Text HandZing
DYKE and LEA [6.196], PRONINA and CHUDIN [6.197], LOVE and BAER [6.198],
and LEA [6.199].
MathematicaZ ProbZems
Parallel processing is needed in many mathematical tasks of which the follow-
ing are examples: matrix computations [6.223-226]; Fast Fourier Transform
[6.227-228]; Hadamard transform [6.229]; solution of differential equations
[6.230]; linear decision processes [6.231]; multidomain algorithm evaluation
[6.232,233]; statistical processing [6.234], and spatial problems [6.235].
MisceZZaneous
The basic principles and most of the practical solutions explained in Chap.
2 have remained almost as such in later implementations. The works reviewed
in this section mainly contain new analyses, refined details, and applica-
tions. There is one novel idea, though, the linear hashing (Sect. 7.1.5)
which seems to constitute the most important advance in hash coding in the
1980'ies. It neatly continues the original philosophy of locating the item
by a few arithmetic mappings. (This principle should not be confused with
linear probing, or with the linear quotient method, discussed in Sect. 2.3.2).
One of the traditional features of hash tables, the bucket organization, has
further been developed by LYON [7.81,82]. A comparison of various bucket-
organization methods can be found in QUITTNER et al. [7.83].
For indirect addressing of data spaces in general, see [7.84] by CREMERS
and HIBBARD.
A hybrid approach for overflow handling has been presented by SCHEUERMANN
[7.85]. Another efficient combination of index tables and hashing has been
suggested by QUITTNER [7.86].
Retrieval from hash tables can be reduced by several methods: ordering of
the tables (GONNET and MUNRO [7.87]), using predictors (NISHIHARA and IKEDA
[7.88]), split sequence search (LODI and LUCCIO [7.89]), self-organizing
search (BURKHARD [7.90]), direct rehashing (MADDISON [7.91]), repeated hash-
ing (LARSON [7.92]), and reorganization of the table (SCHOLL [7.93]).
Another organizational idea can be found in [7.94] by ASTAKHOV.
Criteria for efficient packing of hash tables have been presented by
LYON [7.95] and MUEHLBACHER [7.96].
334
It was stated above that most of the fundamental ideas in hash coding have
remained almost as such since their introduction in the 1950'ies. There is,
however, one significant exception. The concept of linear hashing (actually,
linear virtual hashing) was developed around 1980 by LITWIN [7.97,98] for
the management of files that can expand dynamically during use. The average
number of accesses to the table or file stays reasonably close to unity at
rather high loading of the table, and even a great number of insertions can
be made without heavy reorganization.
The most serious disadvantage of the usual hashing methods is that if
the load factor approaches unity, the average number of accesses for to
search an item grows rapidly. This handicap is worse with open addressing,
whereas in chaining through an overflow area, each item in the chain may
add an extra access to memory, too. The rehashing method discussed in Sect.
2.3.5 solved this problem by moving all the items into a new, bigger table
when the old one became filled up, whereby a completely new hashing function
had to be chosen.
It would be more reasonable, however, that additional address space to
the old hash table were allocated gradually, according to need; modification
of the hashing function can thereby not be avoided, but it turns out that
this kind of dynamic hashing function is derivable from the static one in a
simple way.
The bucket organization (Sect. 2.4.2) will now be assumed. Let the address
space of the hash table first be 0 ... N-1, with a bucket containing a certain
number of slots for items at each address. Any number of new addresses
N,N + 1, ... shall be appendable upon need. Denote the key by K. The original
hashing function, hO(K), shall hash uniformly over O... N - 1. When any of
the buckets is filled, the items overflowing are first chained to it.
The original idea of LITWIN is that every time when an overflow from any
bucket occurs, the new item is chained to it, but, in addition, a new bucket
is appended to the memory, and one of the old buckets in the numerical order
(i.e., not necessarily where the overflow occurred) is rehashed. Rehashing
shall split the contents of the old bucket between it and the bucket appended
last. If there was a chain appended, all items in it are rehashed, too.
Assume that in total p + 1 overflows have occurred, whereby address pis,
in turn, to be rehashed; then its contents are randomly split between the
addresses p and p + N. A rehashing function h1 (K') which does this can be,
e.g., of the form
335
(7.1)
where bO(K') is an extra hash bit computed at the same time as hO(K'). No-
tice that K' does not only stand for the key which caused the overflow, but
any of the keywords at address p. It may be obvious that with growing p most
of the buckets and chains appended to them will be rehashed. After N over-
flows, the address space will be 0 ... 2N - 1. The next-level hashing function
shall then be
(7.2)
and the process shall continue with p 0 again. At level i, the hashing
function is
For searching with key K it will be necessary and sufficient to know the
current value of p and the current highest level i. Assume that key K shall
be located. In the following, K is assumed to exist in the table. The search-
ing starts with the hashing function hi(K). Let m be the correct bucket to
be found. The following procedure can easily be deduced to compute m (al-
though the complete argumentation shall be abandoned here):
begin
if p 0 then m hi (K)
else m = hi _1 (K) ;
if m< p then m hi (K)
end
Notice that no accesses to the memory are necessary until the correct
address is known. (It can be shown that if K is not found at the address m
defined by this procedure, it does not exist in the table). The total aver-
age number of accesses is slightly greater than unity, because some buckets
may still have chains although most of them will have been rehashed away.
For the many details, performance, and variants of this method, see
[7.99] by MULLIN, [7.100-103] by LARSON, and [7.104] by RAMAMOHANARAO and
SACKS-DAVIS.
336
The linear hashing scheme was preceded by the ideas of dynamic hashing and
extendible hashing (Sect. 2.9). Their principles were analogous, namely,
allocation of new buckets upon need and splitting overflowing buckets. How-
ever, because the systematic linear order, characteristic of linear hashing,
was not applied, the dynamic hashing function had to be defined using auxil-
iary index tables or directories. The address structure thereby became a tree
of TRIE (Sect. 2.8). The following works, in addition to the original ones
of LARSON [2.135] and FAGIN et al. [2.136] shall be mentioned: SCHOLL
[7.105-107], REGNIER [7.108], FROST [7.109], FROST and PETERSON [7.110],
RAMAMOHANARAO and LLOYD [7.111], MULLIN [7.112], VEKLEROV [7.113], KAWAGOE
[? .114], FAGIN et al. [7.115], TAMMINEN [7.116-123], YAO [7.124], LLOYD and
RAMAMOHANARAO [7.125], MENDELSON [7.126], FLAJOLET [7.127], BECHTHOLD and
KOPERT [7.128], BRYANT [7.129], and HUANG [7.130].
In external hashing, in general, a small internal table is used to
direct accesses to an external storage. This method facilitates addressing
of collections of data structures: LIPTON et al. [7.131], GONNET and LARSON
[7.132], BELL and DE EN [7.133], and LARSON and RAMAKRISHNA [7.134].
In addition to the traditional symbol lists and data base managenent appli-
cations, it will be interesting to mention the following new applications
of hash-codi ng methods: hi gh-performance memory management (THAKKAR and
KNOWLES [7.147]), sorting (DUCOIN [7.148]), storing a sparse table (TARJAN
and CHICHIH YAO [7.149], FREDMAN et al. [7.150]), storage structures for a
337
al. [7.191,192J). A hybrid memory cell has been suggested by HOU [7.193J. For
high-performance complementary CAM/RAM circuits, see HARASZTI [7.194J.
The following LSI CAM arrays have been reported: 1-Kbit array (NIKAIDO et
al. [7.195]), 1.5-Kbit array (VERNAY et al. [7.196]), and 4-Kbit array
(OGURA et al. [7.197J). An 8-Kbit content-addressable and re-entrant memory
has been published by KADOTA et al. [7.198J. A CAM with 256-byte "superwords"
has been reported by LAMB [7.199J.
A re-orderable-content RAM array has been used in CAM arrays by PUCKNELL
and RAYMOND [7.200,201J. Description of a parallel cellular memory has been
given by PEREZ [7.202J.
A methodology for the testing of CAMs has been developed by GILES and
HUNTER [7.203J.
A CAM chip for virtual memory has been designed by HAMAZAKI [7 .210J. The
implementation of a paged-memory management unit is described by COHEN and
McGARITY [7.211J. The realization of the LRU algorithm is discussed by
SCHUBERT [7.212J. Parallel garbage collection can be implemented using an
associative tag (SHIN et al. [7 .213J). Memory defects can be corrected by
339
spare components. using a CAM for control (anon.) [7.214]; similarly. fault-
tolerant MOS RAMs can be implemented (HARASZTI [7.215]). Reference data
structures using CAM have been organized by GEKHT and FROLOV [7.216].
The fifth-generation computers have called for new architectures. Content
addressing for their information management has been discussed in MALLER
and SCARROT [7.217]. Performance enhancement of computation in general has
been published by MALMS et al. [7.218].
Logical minimization of multilevel coding functions using optical CAM
has been devised by MIRSALEHI and GAYLORD [7.219]. Truth-table lookup. es-
pecially residue numbers. and its optical CAM implementation has been sug-
gested by GUEST and GAYLORD [7.220]. GUEST et al. [7.221]. MIRSALEHI and
GAYLORD [7.222]. GAYLORD et al. [7.223]. as well as GAYLORD and MIRSALEHI
[7.224]. Multivalued logic seems to be another useful application area for
the CAM (PAPACHRISTOU [7.225]. The number of storage locations thereby
needed has been estimated by BUTLER [7.226].
Arithmetic operations are implementable by content addressing (NIKITIN
et al. [7.227]. PAPACHRISTOU and KAI HWANG [7.228]).
Content-addressable memory systems have further been suggested for the
following special applications: pattern recognition (MALMS [7.229]. BHAVSAR
et al. [7.230]. BADI'I and MAJD [7.231]). image analysis (SNYDER and SAVAGE
[7.232]. SNYDER and COWART [7.233]). designing assemblers (SINHA and SRIMANI
[7.234]. BADI'I and JAYAWARDENA [7.235]). evaluation of logic programs
(NAKAr~URA [7.236]. OLDFIELD [7.237]). for a time-division switch cell
After submission of the manuscript for the first edition of this book, a
special issue on data base machines was published in Computer [7.262]. The
reader is adviced to look at this 73-page edition because many of the ideas
of Chapter 6 appear there in slightly updated form.
Hardware systems for text information retrieval can be found in HOLLAAR
and KUEHN [7.263]. An associative backend processor for data base management
is described by HURSON [7.264]. Performance analysis of several alternative
data base machines has been given by HAWTHORN and De WITT [7.265]. A multi-
processor system which supports relational database systems has been pre-
sented by De WITT [7.266]. Memory allocation for multiprocessor systems
with content-addressable memories is discussed in KARTASHEV and KARTASHEV
[7.267].
A very ambitious objective of encoding and retrieving large amounts of
text by a special high-density MOS circuitry has been envisioned by GOSER
et al. [7.268]. They suggest an adaptive, distributed character cell which
is able to tolerate fabrication errors, bound to occur at extremely high
component densities.
341
7.5.1 General
To the last the recent work on optical associative memories should be men-
tioned, especially on systems storing items in distributed form, as super-
imposed memory traces. Many of them aim at the implementation of the opti-
mal associative mappings described in Sect. 1.4.4 and in [1.1] and [7.278].
There exist in fact two types of these devices. In one of them, the matrix
operations for associative recall are performed by multiplying light inten-
sities using transmission filters, and summing up convergent light beams
locally, cf. PSALTIS and FARHAT [7.279]. The second type is based on holo-
graphy.
3.43 D.C. Alexander, R.H. Dennard, F.L. Post: IBM Advanced Systems, 17.022,
May 1961
3.44 F.H. Young: Oregon State Univ., Dept. ~lath., In-House Doc. 1962
3.45 P.T. Rux: Oregon State Univ., July 1967 (AD-660 792)
3.46 P.T. Rux: Oregon State Univ., Feb. 1968 (AD-671 910)
3.47 P.T. Rux: IEEE Trans. C-18, 512 (1969)
3.48 P.T. Rux, F.W. Weingarten, F.H. Young: IEEE Comput. Group Repository,
No. 67-72, March 1967
3.49 B. Parhami: Tech. Report UCLA-ENG-7213, Univ. California LA (1972)
3.50 B. Parhami: Proc. AFIPS 1972 FJCC, p. 681 (1972)
3.51 G.L. Hollander: Proc. 1956 JCC, p. 128 (1956)
3.52 D. Warren: IEEE Symp. on Search Memory, May 1964 (IEEE.New York 1964)
3.53 R.I. Roth: U.S. Patent No. 3,257,646, June 21 (1966)
3.54 D.L. Slotnick: Adv. in Comput. 10, 291 (1970
3.55 L.D. Healy, G.J. Lipovski, K.L. Doty: Proc. AFIPS 1972 FJCC, p. 691
(1972 )
3.56 N. Minsky: Proc. AFIPS 1972 FJCC, p. 587 (1972)
3.57 G.B. Houston, R.H. Simonsen: U.S. Patent No. 3,906,455, Sep. 16 (1975)
3.58 Chyuan Shiun Lin, D.C.P. Smith: ACM Trans. Database Syst. 1, 53 (1976)
3.59 M. Flinders, P.L. Gardner, R.J. Llewelyn, J.F. Minshull: Proc.1970 IEEE
Int. Comput. Group Conf., p. 314
3.60 P.L. Gardner: IEEE Trans. C-20, 764 (1971)
3.61 P.L. Gardner: U.K. Patent No.1 281 387, July 12 (1972)
3.62 T. Kohonen: Digital Circuits and Devices (Prentice-Hall, Englewood
Cl iffs, N.J. 1972)
3.63 J.R. Brown, Jr.: Proc. Spec. Tech. Conf. on Nonlinear Magnetics,
Los Angeles, Cal., Nov. 1961
3.64 M.H. Lewin, H.R. Beelitz, J.A. Rajchman: Proc. AFIPS 1963 FJCC 24,
101 (1963)
3.65 G.G. Pick, D.B. Brick: Am. Doc. Inst., 26th Annu. Meeting, p. 245,
Oct. 1963
3.66 G.G. Pick: Proc. AFIPS 1964 FJCC 26, 107 (1964)
3.67 E.L. Younker, C.H. Heckler, D.P. Masher, J.M. Yarborough: Stanford
Res. Inst., Oct. 1964 (AD-609 126)
3.68 E.L. Younker, C.H. Heckler, D.P. Masher, J.M. Yarborough: Proc. SJCC
25, 515 (1964)
3.69 S.T.C.: B.P. 1013241, Dec. 1965
3.70 M.H. Lewin: U.S. Patent No. 3,245,052, Apr. 5 (1966)
3.71 R.A. Henle, I.T. Ho, G.A. Maley, R. Waxman: Proc. FJCC 1969, p. 61 (1969)
3.72 D.C. Wyland: Comput. Des. No.9, p. 61 (1971)
3.73 K.E. Iverson: A Programming Language (Wiley, New York 1962)
3.74 A.D. Falkoff: J. ACM 9,488 (1962)
3.75 G. Estrin: Proc. WJCC, May 1960, p. 33
3.76 J. Ausley: Moore School of Electr. Eng., M. Sc. Thesis, 1961
3.77 M.J. Flynn: Purdue Univ., Ph. D. Thesis, BTP-62-1782, Jun 1961
3.78 R.H. Fuller: Disser. Absts. 24, 1960 (1963)
3.79 R.H. Fuller: UCLA, Dept. of Eng., Rept. No. 63-25 (1963)
3.80 Computer Command and Control: Rept. No. 5-101-5, Jan. 1964
3.81 J.E. McAteer, J.A. Capobianco, R.L. Koppel: Proc. 1964 FJCC, p. 81
(1964 )
3.82 A.E. Slade: Am. Doc. Inst. 27th Annu. Meeting (1964)
3.83 S. Sohara: IEEE Symp. on Search Memory (1964)
3.84 A.V. Campi, R.M. Dunn, B.H. Gray: IEEE Trans. AES-1, 168 (1965)
3.85 W.F. Chow: Univac, Quart. Prog. Rept., Oct. 1965 (AD-477 446)
3.86 W.F. Chow: Sperry Rand Corp. Quart. Rept. 1965 (AD-472 571)
3.87 W.F. Chow: Sperry Rand Corp. Quart. Rept. 1966 (AD-804 628)
3.88 R.W. Haas, E.H. Blevis: Marquardt Corp., July 1965 (AD-620 915)
3.89 R.R. Seeber, A.B. Lindqvist: Proc. IFIP Congo 2, 479 (1965)
350
4.59 J. Goldberg, M.W. Green: "Large Files for Information Retrieval Base
on Simultaneous Interrogation of All Items", in Large-Capacity Memory
Techniques for Computing Systems, ed. by M.C. Yovits (MacMillan Co.,
New York 1962) p. 63
4.60 K. Goser, H.G. Kadereit: Proc. IEEE 56, 121 (1968)
4.61 C.C. Green, B. Raphael: Stanford Res. Inst. May 1967 (AD-656 789)
4.62 M.W. Green: Suppl. C. Quart. Rept. 2, AF-30(602)-2142, RADC July 1960
4.63 M.W. Green: U.S. Patent No. 3,243,785, March 29 (1966)
4.64 R.S. Green, J. Minker, W.E. Shindle: Auerbach Corp., Management Rept.,
Vol I, Final Rept. July 1966 (AD-489 660)
4.65 R.S. Green, J. Minker, W.E. Shindle: Auerbach Corp., Tech. Disc.
Final Rept., July 1966 Vol II (AD-489 661)
4.66 C.H. Heckler, Jr.: In Multiple Instantaneous Response File, p. 195,
ed. by J. Goldberg, RADC-TR-61-233, 1961 (AD-266 169)
4.67 H.G. Kadereit, K. Goser: Proc. IEEE 56, 121 (1968)
4.68 A.B. Lindqvist: IBM Tech. Discl. Bull. 7,1115 (1965)
4.69 H.T. Mann, J.L. Rogers: Proc. Nat. Aerosp. Electron. Conv. (1962) p. 359
4.70 W.L. McDermid, R.I. Roth: U.S. Patent No. 3,242,468, March 22 (1966)
4.71 B.T. McKeever: Proc. AFIPS 1965 FJCC 28, 371 (1965)
4.72 V.L. Newhouse, R.E. Fruin: Proc. AFIPS 1962 SJCC 21, 89 (1962)
4.73 V.L. Newhouse, R.E. Fruin: Electronics 35, 31 (1962)
4.74 J.P. Pritchard: Texas Instruments F.T. Rept. 100~T66, RADC-TR-66-775
(AD-811 983)
4.75 J.P. Pritchard: Texas Instruments, Final Rept. May 1965 (AD-618 491)
4.76 J.P. Pritchard: IEEE Spectrum 3, 46 (1966)
4.77 J.P. Pritchard: IEEE Comput. Group News 2, 25 (1968)
4.78 J.P. Pritchard, L.D. Wald: Proc. Int. Conf. Nonlinear Magn. (1964)
p. 2-5-1
4.79 J.P. Pritchard, L.O. Wald: IEEE Trans. MAG-1, 68 (1965)
4.80 J.A. Rajchman: ONR Rept. ACR-97, Inform. Syst. Summaries, July 1964
4.81 G. Retiz: IEEE Symp. on Search Memory, May 1964
4.82 J.L. Rogers: TRW Space Tech. Lab., Quart. Rept. Apr. 1963
4.83 J.L. Rogers: TRW Space Tech. Lab., Quart. Rept. Aug. 1963
4.84 J.L. Rogers: ONR Rept. ACR-97, Task No. NR-348-002, RR 003-10-02, 1964
4.85 J.L. Rogers, A. Wolinsky: TRW Space Tech. Labs., Final Rept.
No. NR 3839 (1001), May 1964
4.86 J.L. Rogers, A. Wolinsky: U.S. Gov. Res. Repts. 39, 166 (1964)
4.87 H. Rosenberg: U.S. Patent No. 3,235,839, Feb. 15 (1966)
4.88 G.B. Rosenberger: IBM Data Syst. Div. Final Tech. Rept. 1964 (AD-602 067)
4.89 R.F. Rosin: Proc. AFIPS 1962 SJCC 21, 203 (1962)
4.90 P. Schupp, T. Singer: Mitre Corp., Aug. 1963 (AO-416 301)
4.91 R.R. Seeber, Jr.: Proc. EJCC 18, 179 (1960)
4.92 R.R. Seeber, Jr.: IBM Data Systems, TR-00,756, Nov. 1960
4.93 R.R. Seeber, Jr.: Proc. Nat. Conf. ACM 14 (1960)
4.94 R.R. Seeber. Jr .• A.J. Scriver, Jr.: U.S. Patent No. 3,191,155,
June 22 (1965)
4.95 A.E. Slade: Proc. Int. Symp. Theory Switching, (1959) p. 326
4.96 A.E. Slade: Proc. IRE 50, 81 (1962)
4.97 Space Technology Laboratories, Inc.: Rept. Proposal 0739.00, July 1961
4.98 E.D. Van De Rift: In Multiple Instantaneous Response File, p.158, ed. by
J. Goldberg, RADC-TR-61-233, 1961 (AD-266 169)
4.99 1.0. Voitovich: Rept. No. FTO-HT-23-942-68, May 5, 1969 (AO-695 318)
4.100 C. Yang: "A Study of Cryotron Associative Memory in Digital Systems",
M.Sc. Thesis, Northwestern Univ. (1964)
4.101 C.C. Yang, J.T. Tou: J. Franklin Inst. 284, 109 (1967)
4.102 S.S. Yau, C.C. Yang: Proc. Nat. Electronic Conf. 22, 764 (1966)
4.103 S.S. Yau, C.C. Yang: Northwestern Univ. Tech. Rept., Nov. 1966
(AD-644 439)
353
5.68 P.B. Berra: Proc. COMPSAC 1978 (IEEE, New York 1978) p. 698
5.69 A.J. Symonds: IBM Syst. J. ?, 229 (1968)
5.70 J. Fotheringham: Commun. ACM 4, 435 (1961)
5.71 D. Aspinall, D.J. Kinniment, D.B.G. Edwards: IFIP Edinburgh, Aug. 1968,
p. D81
5.72 H.R. Holt, J.A. Timmons, D.C. Gunderson: Honeywell Inc., Final Tech.
Rept. 12099-FR1, Sept. 1968
5.73 M.H.J. Baylis, D.G. Fletcher, D.J. Howarth: InfoPm. Proc. 68
(North-Holland, Amsterdam 1969) p. 831
5.74 Y. Chu: Computep Opganization and Micpoppogpamming (Prentice-Hall,
Englewood Cliffs, N.J. 1972)
5.75 R. Moulder: Proc. AFIPS Nat. Conf. Comput. Composition and EXpo. 42,
171 (1973)
5.76 W.B. Riley: Electronics 45, 91 (1972)
5.77 J.L. Gertz: Infotech. State-of-the-Art Rept. (Maidenhead, England
1976) p. 273
5.78 K. Koch: IBM Tech. Discl. Bull. 15, 3088 (1973)
5.79 J.R. Carlberg: Taylor Naval Ship Res. and Dev. Center, Rept. No.
DTNSRDC-77-0083, Aug. 1977
5.80 M. Takesue: Inform. Process. Soc. Jpn. 19, 158 (1978)
5.81 C.V.W. Armstrong: Proc. 2nd Annu. Symp. Comput. Archit. (IEEE, New
York 1975) p. 34
5.82 C.E. Shannon: Bell Syst. Tech. J. 28, 59 (1949)
5.83 T.F. Tabloski, F.J. Mowle: IEEE Trans. C-25, 684 (1976)
5.84 W.E. Donath: IBM J. Res. Dev. 18, 401 (1974)
5.85 B.A. Holum: IBM Confidential, SRI Term Paper, No. 11-31, April 1964
5.86 F.T. Baker, W.E. Triest, C.H. Forbes, N. Jacobs, J. Schenken: IBM,
Final Rept. May 1966, AF-30(602)-3573
5.87 D.C. Gunderson, J.P. Francis, W.L. Heimerdinger: Honeywell Inc.,
Rept. No. 12029, Dec. 1966 (RADC TR-66-573)
5.88 D.C. Gunderson, W.L. Heimerdinger, J.P. Francis: "A Multiprocessor
with Associative Control", in Prospects fop SimuZation and SimuZatops
of Dynamic Systems (Spartan Books, New York 1967) p. 183
5.89 R. Gonzales, D.C. Gunderson, J.A. Timmons: Honeywell Inc., Final
Rept. Nov. 1967 (AD-662 361)
5.90 R.P. Bair: Moore School of Electr. Eng., May 1968 (AD-674 199)
5.91 L.D. Wald, G.A. Anderson: Final Rept. NAS 12-2087, Sept. 1971
5.92 F. Tsui: IBM Tech. Discl. Bull. 15, 2342 (1972)
5.93 L. Hellerman, G.E. Hoernes: IEEE Trans. C-l?, 1144 (1968)
5.94 I.N. Hooton: In Automatic Acquisition and Reduction of NucZeap Data
(Ges. FUr Kernforschung G.m.b.H., Karlsruhe 1964) p. 338
5.95 I.N. Hooton: In Ref. 5.94, p. 349
5.96 H. Meyer, W. Stuber: In Ref. 5.94, p. 357
5.97 E. Blanca, A. Carriere: CEA-R-3394, Dec. 1967
5.98 M.D. Johnson, D.C. Gunderson: Proc. 1970 Int. Telemetry Conf.,
April 1970, p. 109
5.99 L. Rettelbusch, H. Pfahlbusch: Nachrichtentech. Elektron. 24, 340
(1974 )
5.100 T.L. Saxton, C.-C. Huang: IEEE Trans. C-26, 170 (1977)
5.101 R. R. Seeber, Jr.: Commun. ACr~ 4, 301 (1961)
5.102 E.S. Gershuny, O.L. Lamb: IBr~ Tech. Discl. Bull. 15, 1109 (1972)
5.103 B.A. Crane: IEEE Trans. C-l?, 691 (1968)
5.104 B.H. Scheff: Electron. Prog. 10, 31 (1966)
5.105 C. Peters: NTIS AD-824 213
5.106 S.N. Porter: J. ACM 13, 369 (1966)
5.107 R.C. Minnick: IEEE Trans. EC-13, 685 (1964)
5.108 C.C. Yang, S.S. Yau: IEEE Trans. EC-15, 522 (1966)
5.109 S.S. Yau, M. Orsic: IEEE Trans. C-19, 259 (1970)
358
5.110 S.S. Yau, C.K. Tang: IEEE Trans. C-19, 141 (1970)
5.111 C. Barre: Electron. Appl. Ind. (France) 250, 21 (1978)
6.1 C.Y. Lee, M.C. Paull: Proc. IEEE 51,924 (1963)
6.2 C. Lee, M. Paull: Proc. IEEE 52, 312 (1964)
6.3 C.Y. Lee: "Content-Addressable and Distributed Logic Memories", in
Applied Automata Theory, ed. by J.T. Tau (Academic Press, New York
1968)
6.4 E.S. Lee: Proc. AFIPS 1963 SJCC, p. 381 (1963)
6.5 R.S. Gaines, C.Y. Lee: IEEE Trans. EC-14, 72 (1965)
6.6 B.A. Crane, J.A. Githens: IEEE Trans. EC-14, 186 (1965)
6.7 G. Nemeth: Helsinki U.Tech., Dept. Tech. Phys. Report TKK-F-A347
(1978)
6.8 B.A. Crane, R.R. Laane: Proc. AFIPS 1967 SJCC, p. 517 (1967)
6.9 R.P. Edwards: Proc. IEEE 52, 83 (1964)
6.10 E.S. Spiegelthal: Proc. IEEE 52, 74 (1964)
6.11 A. Tremblay: Cybernetics XIX, 105 (1976)
6.12 J.N. Sturman: IEEE Trans. C-17, 2 (1968)
6.13 J.N. Sturman: IEEE Trans. C-17, 10 (1968)
6.14 J.E. Smathers: Ph. D. Dissert., Oregon State Univ. (1969)
6.15 W.H. Kautz, K.N. Levitt, A. Waksman: IEEE Trans. C-17, 443 (1968)
6.16 W.H. Kautz: J. ACM 18, 19 (1971)
6.17 W.H. Kautz, M.C. Pease III: AD 763 710 (1971)
6.18 J. Hood, M. Mark, J. Cotton: Proc. 1976 Int. Conf. Parallel Processing,
Aug. 24-27, 1976, p. 168
6.19 R. Trepp: RADC-TR-66-182, June 1966
6.20 C.A. Finnila, H.H. Love, Jr.: IEEE Trans. C-26, 112 (1977)
6.21 S. Ya. Berkovich, Ya.Ya. Kochin, G.M. Lapir: Autom. Remote Control
35, 1342 (1974)
6.22 D.A. Savitt, H.H. Love, R.E. Troop: AD 488 538 (1966)
6.23 D.A. Savitt, H.H. Love, R.E. Troop, R.A. Rutman: Association Storing
Processor Interpretive Program - Program Logic Manual. Final Report,
Hughes Aircraft Co., FR-11-558 (1968)
6.24 H.H. Love, D.A. Savitt: RADC-TR-65-32 (1965)
6.25 H.H. Love, D.A. Savitt: In Associative Information Techniques, ed.
by E.L. Jacks (American Elsevier, New York 1971) p. 147
6.26 H.H. Love, R.A. Rutman: Hughes Aircraft, FR-68-11-1179, Dec. 1968
6.27 H.H. Love: Hughes Aircraft, FR-69-11-487, Jun. 1969
6.28 R.A. Rutman: Hughes Aircraft, FR-69-11-208, Feb. 1969
6.29 J.H. Holland: 1959 EJCC, p. 108
6.30 G.J. Lipovski: Proc. AFIPS 1970 SJCC, p. 385 (1970)
6.31 G.H. Schmitz: Final Rep. Contr. No. DAH 60-72-C0050 (1972)
6.32 Proc. 1972 Sagamore Camp. Conf. (IEEE, New York 1972)
6.33 W.S. Litzler: 1973 Swieeeco Record of Technical Papers, p. 482
6.34 E.C. Stanke II: RADC-TR-77-366 (1978)
6.35 J.A. Githens: "An Associative, Highly-Parallel Computer for Radar
Data Processing", in Parallel Processor Systems, Technologies, and
Applications, ed. by L.C. Hobbs, D.J. Theis, J. Trimble, H. Titus,
I. Highberg (Spartan Books, New York 1970)
6.36 R.O. Berg, M.D. Johnson: Proc. IEEE 1970 Int. Camp. Group. Conf.,
Washington, p. 336
6.37 J.A. Githens: Proc. NAECON 1970, p. 290
6.38 J.A. Githens: Proc. IEEE 1972 Int. Camp. Soc. Conf., p. 57
6.39 R.O. Berg, H.G. Schmitz, S.J. Nuspl: Proc. NAECON 1972, p. 312
6.40 J.A. Cornell: Proc. WESCON 1972, p. 1/3-1
6.41 J.A. Cornell: Proc. COMPCON 1972, p. 69
6.42 K.E. Batcher: WESCON Tech. Papers 16, 1 (1972)
6.43 J.A. Rudolph: Proc. AFIPS 1972 FJCC, p. 229 (1972)
359
6.83 Proc. 1976 Int. Conf. on Parallel Processing, Aug. 24-27, 1976
(IEEE, New York 1976)
6.84 Proc. 1977 Int. Conf. on Parallel Processing, Aug. 23-26, 1977
(IEEE, New York 1977)
6.85 Control Eng. 9, 22 (1962)
6.86 R.H. Fuller: General Precision-Librascope Inc., Interim Rept.,
AD-608 427, October 1964
6.87 R.H. Fuller: Comput. Des. 6, 43 (1967)
6.88 R.H. Fuller: Proc. AFIPS 1967 SJCC, p. 471
6.89 R.H. Fuller: General Precision, ONR/RADC Seminar on Assoc. Proc. 1967
6.90 R.H. Fuller, R.M. Bird, J.N. Medick: "Associative Processor Study",
Librascope Div. General Precision, Oct. 1964
6.91 R.H. Fuller, R.M. Bird, R.M. Worthy: RADC-TR-65 210, AD-621 516,
August 1965
6.92 Westinghouse Defense and Space Center, Final Rept. June 1964,
AD-602 693
6.93 J.A. Feldman: M.I.T. Lincoln Lab. Tech. Note 1965-13, April 1965,
AD-614 634
6.94 General Precision Inc.: "Associative Processing Techniques"
(Librascope Group, 1965)
6.95 D.L. Reich: "Associative Memories and Information Retrieval",
in Some Problems in Information Science, ed. by M. Kochen
(Scarecrow Press, New York 1965)
6.96 J.A. Dugan, R.S. Green, J. Minker, W.E. Shindle: Proc. ACM 21st
Nat. Conf. 1966, p. 347
6.97 K.E. Knight: Datamation 12, 40 (1966)
6.98 M.A. Knapp: "RADC Programs in Associative Processing", ONR/RADC
Seminar on Assoc. Proc., May 1967
6.99 H.I. Jauvits: Interim Rept., Lab. For Electronics Inc., FFB, 1968,
NASA-CR-86076
6.100 J.A. Rudolph: Proc. IEEE Region 6 Conf., Apr. 1969, p. 179
6.101 M.H. Cannell, A.J. Nickelson, ~1.F. Owens, K.W. Wadman, M.L. Urban:
Mitre Corp., Repts. Nos. MTR-1735-Rev-1, MTR-863, AD-879 281
Dec. 1970
6.102 L.C. Hobbs, D.J. Theis: "Survey of Parallel Processor Approaches and
Techniques", Symp. on Parallel Proc. Systems Technologies and
Applications, Monterey 1969 (Papers ed. by L.C. Hobbs et al. 1970)
6.103 J.C. Murtha: NAECON '70 Records, May 1970, p. 298
6.104 W.C. Meilander, R.G. Gall: "Evaluation of the Goodyear associative
processor in an operational ATC environment", IEEE Compo Soc. Conf.,
Boston, Mass., Sep. 1971
6.105 M. Minsky, S. Papert: "On Some Associative, Parallel and Analog
Computations", in Associative Information Techniques, ed. by E.J.
Jacks (American Elsevier, New York 1971)
6.106 K.J. Thurber, R.O. Berg: Comput. Des. 10, 103 (1971)
6.107 B. Parhami: "Design Techniques for Associative Memories and Processors",
UCLA, Comput. Sci., Rept. No. PB-220 714 (1973)
6.108 K.J. Thurber, P.C. Patton: IEEE Trans. C-22, 1140 (1973)
6.109 R.M. Lea: Computer 8, 25 (1975)
6.110 L.C. Higbie: Comput. Electr. Eng. 2, 397 (1975)
6.111 L.C. Higbie: Comput. Des. 15, 75 (1976)
6.112 B.W. Prentice, R. Katz, R. Komadja, H. Lee: Boeing Compo Services Inc.,
Seattle, Wash. Jan. 1975, RADC-TR-74-326, AD-A005 308
6.113 K.J. Thurber, L.D. Wald: Comput. Surv. 7, 215 (1975)
6.114 D. Lewin: "Introduction to Associative Processors", Proc.Conf. Compo
Archit., St. Raphael, France, 12-24 Sept. 1976, ed by G.G. Boylaye,
D.W. Lewin
361
6.115 M.W. Summers: Rome Air Devel. Cent. Rept. RADC-TR-75-318, Jan. 1976,
AD-A021 232
6.116 Proc. IEEE 1977 Int. Conf. on Parallel Processing. Aug. 23-26, 1977
(IEEE, New York 1977)
6.117 Infotech. Int.: FutuPe Systems. State of the Art Rept. (Maidenhead,
England, 1977)
6.118 S.S. Yau, H.S. Fung: Comput. Surv. 9, 3 (1977)
6.119 N.J. Zimmerman, H.J. Sips: Informatie (Netherlands) 20, 3 (1978)
6.120 D.L. Slotnick, W.C. Borck, R.C. McReynolds: Proc. AFIPS 1962 FJCC 22,
97 (1962)
6.121 Westinghouse Defence and Space Center: "Parallel Network Computer
(SOLOMON) Applications Analyses", August 1964, AD-606 578
6.122 F.W. Weingarten, P.T. Rux, J.A. Boles: "On an Associative Memory for
Nebula Computer", Dept. of Math., Oregon State Univ., In-House Doc.,
1964
6.123 J.A. Boles: "The Logical Design of the Nebula Computer"; Ph. D. Thesis,
Oregon State Univ. (1968) AD-673 990
6.124 IBM: "Project Lightning", AD-250 678 (1960)
6.125 IBM: "Project Lightning", U.S. Gov. Res. Repts. 36, 124(A) (1961)
6.126 S.H. Unger: Proc. IRE 46, 1744 (1958)
6.127 J.H. Holland: Proc. WJCC, 259 (1960)
6.128 W.T. Comfort: IBM Report No. 62-825-496 (1962)
6.129 E.A. Feigenbaum, H.A. Simon: Proc. IFIP Congr. 1962, p. 177
6.130 P.M. Davies: Proc. 1963 IEEE Pacific Compo Conf. (1963) p. 109
6.131 P. Davies: "Associative Processors", IEEE Symp. on Search Memory,
May 1964
6.132 P.M. Davies: U.S. Patent No. 3,320,594, May 16, 1967
6.133 E.V. Evreinov, Y.G. Kosarev: Kibernetika 4, 3 (1963)
6.134 R.G. Ewing, P.M. Davies: Proc. FJCC 25, 147 (1964)
6.135 B. Hasbrouck, N.S. Prywes, D. Lefkovitz, N. Kornfield: Compo Command
and Control Co., April 1965, AD-466 313
6.136 R.G. Gall: "Hybrid Associative Computer Study", Vol. I, AD-489 929
(Goodyear Aerospace Corp., 1966)
6.137 R.G. Gall: "Hybrid Associative Computer Study", Vol. II, AD-489 930
(Goodyear Aerospace Corp., 1966)
6.138 R.G. Gall, D.E. Brotherton: "Associative List Selector", AD-802 993
(Goodyear Aerospace Corp., 1966)
6.139 D. L. Rohrbacher: "Advanced Computer Organi zati on Study", AD-631 870
and AD-631 387 (April 1966)
6.140 J.L. Cass: "Organization and Applications of Associative File
Processors", ONR/RADC Seminar on Associative Processing, May 1967
6.141 T. Feng: "An Associative Processor"; Ph. D. Dissertation, Univ. of
Michigan (1967)
6.142 T. Feng: "An Associative Processor", Tech. Rept., Systems Engineering
Lab., Univ. of Michigan, Dec. 1967
6.143 T. Feng: "An Associative Processor", Michigan Univ. Rept.
No. 06920-17-T, AD-682 353 (Jan. 1969)
6.144 T. Feng: Proc. Nat. Electron. Conf. XXIV, 257 (1968)
6.145 Auerbach Publ. Inc.: TECH Note 1374-TR-500-1 (AD-679 227) (1968)
6.146 W.A. Lea: NASA-TM-X1544, March 1968
6.147 R.M. Lea: Radio and Electron. Eng. 46, 487 (1976)
6.148 R.M. Lea: Comput. J. 21, 45 (1978)
6.149 MIT Lincoln Lab.: Rept. No. ESD-TR-6890 (1968)
6.150 H.H. Love: Hughes Aircraft Co., Rept. No. FR-69-11-487, AD-855 770
(1969)
6.151 H.H. Love: Proc. Sagamore Comput. Conf. Parallel Process., Aug. 22-24,
1973 (IEEE, New York 1973) p. 103
6.152 P.M. Melliar-Smith: Proc. FJCC 1969, p. 201
362
6.153 J.E. Shore, F.A. Polkinghorn: NRL Rept. NRL-6961, Nov. 1969,
AD-702 394
6.154 W.S. Tuma: Goodyear Aerospace Corp., Rept. No. GER-14566, AD-862 134
(1969)
6.155 R.R. Kressler: Air Force Report No. AFAL-TR-70-142, Aug. 1970
6.156 R.O. Berg, K.J. Thurber: NAECON '71 Record, p. 206 (1971)
6.157 J.E. Shore, T.L. Collins: Rept. of NRL Progress, p. 15, March 1972
6.158 R.A. Urban: Nat. Electron. Conf. 1972, p. 318
6.159 R.D. Arnold: Colorado Univ. Rept. CU CS 051 74, NSF GH 660, August
1974
6.160 D.L. Baldauf: Mitre Corp., Bedford, Mass., MTR-2879, ESD-TR-74-199
(AD-A003 414), Nov. 1974
6.161 L.A. Gambino: Army Engineer. Topographic Labs., AD-A056 438, Jun. 1978
6.162 G.J. Lipovski: Proc. 5th Annual Symp. Compo Archit. (IEEE, New York
1978) p. 31
6.163 S.Ya. Berkovich, Yu.Ya. Kochin, G.M. Lapir: Autom. Remote Control 35,
1342 (1974)
6.164 H.K. Resnick: California Univ., Livermore Lawrence Rad. Lab., Computer
Inf. Center, Vol. 3, Publication No.6 (1975)
6.165 L. Kerschberg, E.A. Ozkaharan, J.E.S. Pacheo: Proc. 2nd Int. Conf.
Software Engineering, San Fransisco, Cal., 13-15 Oct., 1976
(IEEE, New York 1976) p. 505
6.166 C.Y. Hicks: ACM Compo Sci. Conf., 31 Jan.-2 Feb., 1977, Atlanta,
Georgia
6.167 R.R. Seeber, A.B. Lindquist: Proc. AFIPS 1963 FJCC 24, 489 (1963)
6.168 J.S. Squire, S.M. Paleis: Proc. AFIPS 1963 SJCC, 395 (1963)
6.169 R.S. Entner: "The Advanced Avionic Digital Computer", Symp. Parallel
Processor Systems, Tech. & Appl., Monterey, June 1969
6.170 L.J. Koczela, G. Wang.: IEEE Electron. Comp., p. 520, June 1969
6.171 G.J. Lipovski: Report R-424, Coordinated Sci. Lab., Univ. of Illinois,
July 1969 (AD-692 195)
6.172 C.C. Foster: Goodyear Aerospace Corp. Doc. GER-11772 (1964)
6.173 M.J. Kroeger: Goodyear Aerospace Corp. Doc. GER-16378, RADC-TR-76-352
(1976)
6.174 Z.H. Glanz: Int. Electr. Electron. Conf. and Expos., 29 Sep.-1 Oct.,
1975, Toronto, Canada
6.175 B. Parhami, A. Avizienis: Symp. on Comput. Archit., Univ. of Florida,
Gainesville, p. 141 (1973)
6.176 K.J. Thurber, P.C. Patton: COMPCON '72, p. 275 (1972)
6.177 G.J. Nutt: Acta Infor. 6, 211 (1976)
6.178 A.P. Kisylia: Illinois Univ. Rept. No. R-390, Aug. 1968 AD-675 310
6.179 R.R. Linde, R. Gaten, T.F. Peng: Proc. AFIPS Nat. Compo Conf. 42,
187 (1973)
6.180 C.R. DeFiore: Datamation 16, 47 (1970)
6.181 C.R. DeFiore, N.J. Stillman, P.B. Berra: Proc. ACM Nat. Conf.,
Aug. 3-5, 1971, p. 28
6.182 V.L. Arlazarov, S.Ya. Berkovich, A.A. Leman, M.Z. Rosenfeld: Avtom.
Telemekh. 12, 184 (1971)
6.183 G. Salton: Commun. ACM 15, 658 (1972)
6.184 C.R. DeFiore, P.B. Berra: Proc. AFIPS Conf. Nat. Compo Composition
and Exposition 42, 181 (1973)
6.185 C.R. DeFiore, P.B.Berra: IEEE Trans. C-23, 121 (1974)
6.186 R. Moulder: Proc. Sagamore Comput. Conf. Parallel Process., Sagamore
Lake, N.Y. 1973 (IEEE, New York 1973) p. 161
6.187 E.A. Ozkarahan, S.A. Schuster, K.C. Smith: Proc. AFIPS Nat. Comput.
Conf. Expo. 44, 379 (1975)
6.188 E.A. Ozkarahan, S.A. Schuster, K.C. Sevcik: ACM Trans. Database Syst.
2, 175 (1977)
363
6.230 P.A. Gilmore: Proc. AFIPS 1971 FJCC, 39, 411 (1971)
6.231 W.F. Beausoleil, R.M. Chittenden, G.H. Ottaway: IBM Tech. Disc1.
Bull. 20, 2770 (1977)
6.232 W.C. Liles, J.C. Demme1, I.S. Reed, J.D. Mallett, L.E. Brennan:
Rept. No. TSC-PD-8525-1-Vo1.-1, Apr. 1978, AD-A054 357
6.233 W.C. Liles, J.C. Demme1, I.S. Reed, J.D. Mallett, L.E. Brennan:
Rept. No. TSC-PD-8525-1-Vo1.-2, Apr. 1978, AD-A054 358
6.234 M.E. Sherry: Amer. Document. Instit. 27th Ann. Meeting, 1964
6.235 M.A. Wesley, S.K. Chang, J.H. Mommens: Proc. AFIPS 1972 FJCC, 461
(1972)
6.236 D.C. Gunderson: WESCON Tech. Papers (Session 9, 1966)
6.237 J.P. Hayes: Univ. of Illinois, Comput. Lab. Rept. 227, June 1967
6.238 J. Previte, E. Tippie: EMI-TM-67-1, Feb. 1967
6.239 V.A. Orlando, P.B. Berra: Proc. AFIPS 1972 FJCC, 859 (1972)
6.240 G.M. Popova, I.V. Prangishvi1i: Avtom. Te1emekh. 1, 171 (1972)
6.241 L.D. Wa1d, T.R. Armstrong, C.C. Huang, T.L. Saxton: RADC-TR-73-19
Final Tech. Rept., Feb. 1973
6.242 W.T. Cheng, T.Y. Feng: Proc. 1974 Sagamore Comput. Conf. Parallel
Processing, Aug. 20-23, 1974 (IEEE, New York) p. 53
6.243 W. Cheng, T. Feng: AD-A009 873, Syracuse Univ., Dept. of E1ectr.
and Comput. Eng., March 1975 (RADC-TR-75-65)
6.244 H.O. Welch: Proc. 1977 Int. Conf. Parallel Processing, ed. by J. Baer
p. 186 (IEEE, New York 1977)
6.245 D.D. Marshall: Proc. 1977 Int. Conf. Parallel Processing, ed. by J.
Baer p. 199 (IEEE, New York 1977)
6.246 R. Napoli: E1ettrotecnic. 65, 641 (1978)
6.247 N.V. Find1er: Cybernetica (Namur) 10, 229 (1967)
6.248 N.V. Find1er, W.R. McKinzie: Proc. Int. Joint Conf. Artificial
Intelligence, May 1969, p. 259
6.249 C.C. Foster: Univ. of Mass., Comput. Sci. Dept., TNCS-00023,
(Dec. 1970)
6.250 J.E. Shore: Rept. of NRL Prog., April 1972, p. 12
6.251 B.F. Meyers: 8th Hawaii Int. Conf. Syst. Sci., 1975, p. 113
6.252 W. Ash, E. Sibley: Univ. of Michigan, Tech. Rept. 5, June 1967
AD-672 206
6.253 W.L. Ash, E.H. Sibley: Proc. ACM 23rd Nat. Conf., 1968, p. 143
6.254 W.L. Ash: Univ. of Michigan Rept. TR-17, May 1969 (AD-689 861)
6.255 E.H. Sibley, R.W. Taylor, D.G. Gordon: Proc. AFIPS 1968 FJCC 33, 545
(1968)
6.256 P.D. Rovner, J.A. Feldman: MIT, Lincoln Lab. (AD-655 810), April 1967
6.257 P.D. Rovner, J.A. Feldman: In Information Processing 68 (North-Holland,
Amsterdam 1969) p. 579
6.258 J.A. Feldman, P.D. Rovner: Stanford Univ. Rept. No. AI-Memo-66,
Aug. 1968 (AD-675 037)
6.259 P.D. Rovner, D.A. Henderson, Jr.: Proc. Int. Joint Conf. Artificial
Intelligence, May 1969, p. 9
6.260 J.A. Feldman, J.R. Low, D.C. Swinehart, R.H. Taylor: Proc. AFIPS
1972 FJCC 41, 1193 (1972)
6.261 J.A. Feldman: Abst. of Tech. Repts. Comput. Sci. Dept. of Univ.
Rochester, TR9, Nov. 1976
7.1 R.M. Cowan, M.L. Griss: In Symbolic and Algebraic Computation, ed.
by E.W. Ng (Springer, Berlin, Heidelberg 1979) p. 266
7.2 R. Devi11ers, G. Louchard: BIT 19, 302 (1979)
7.3 J. Hemenway, E. Teja: EDN Mag. 24, 108 (1979)
7.4 T. Gunji, E. Goto: J. Inf. Process. 3, 1 (1980)
7.5 E. Goto, M. Sassa, Y. Kanada: J. Inf. Process. 3, 13 (1980)
7.6 E. Goto, M. Terashima: J. Inf. Process. 3, 23 (1980)
365
7.220 C.C. Guest, T.K. Gaylord: Appl. Opt. 19, 1201 (1980)
7.221 C.C. Guest, M.M. Mirsalehi, T.K. Gaylord: IEEE Trans. C-33, 927
(1984)
7.222 M.M. Mirsalehi, T.K. Gaylord: Appl. Opt. 25, 2277 (1986)
7.223 M.M. Mirsalehi, T.K. Gaylord: IEEE Trans. C-35, 829 (1986)
7.224 T.K. Gaylord, M.M. Mirsalehi, C.C. Guest: Opt. Eng. 24, 48 (1985)
7.225 C.A. Papachristou: In Froc. of 11th Int. Symp. on Multiple-Valued
Logic (IEEE, New York 1981) p. 62
7.226 J.T. Butler: In Proc. of the 13th Int. Symposium on Multiple-Valued
Logic (IEEE, New York 1983) p. 94
7.227 G.A. Nikitin, B.V. Vinnikov, I.L. Kaftannikov: Autom. Control &
Comput. Sci. 18, 23 (1984)
7.228 C.A. Papachristou, Kai Hwang: In Proc. of 7th Symposium on Computer
Arithmetic (IEEE Comput. Soc. Press, Silver Spring, MD 1985) p. 182
7.229 M. Malms: Regelungstechnische Praxis 25, 270 (1983)
7.230 V.C. Bhavsar, T.Y.T. Chan, L. Goldfarb: In 1985 IEEE Computer
Society Workshop on Computer Architecture for Pattern Analysis and
Image Database Management (IEEE Comput. Soc. Press, Washington, DC
1985) p. 126
7.231 F. Badi'i, F. Majd: In 1985 IEEE Computer Society Workshop on Com-
puter Architecture for Pattern Analysis and Image Database Manage-
ment (IEEE Comput. Soc. Press, Washington, DC 1985) p. 183
7.232 W.E. Snyder, C.D. Savage: IEEE Trans. C-31, 963 (1982)
7.233 W. Snyder, A. Cowart: IEEE Trans. PAMI-5, 349 (1983)
7.234 B. Sinha, P.K. Srimani: Inf. Sci. 24, 201 (1981)
7.235 F. Badi'i, J. Jayawardena: In Proc. of 7th Int. Conf. on Pattern
Recognition (IEEE Comput Soc. Press, Silver Spring, MD 1984) p. 659
7.236 K. Nakamura: J. of Logic Progr. 4, Vol. 1,285 (1984)
7.237 J.V. Oldfield: IEEE Proc. I 133, 123 (1986)
7.238 M. Demange: IBM Tech. Discl. Bull. 26, 267 (1983)
7.239 S. Bozinovski, C. Anderson: In Froc. of MELECON '83, Mediterranean
Electrotechnical Conference, ed. by E.N. Protonotarios, G.I.
Stassinopoulos, P.P. Civalleri (IEEE, New York 1983) p. 13
7.240 Anon.: IBM Tech. Discl. Bull. 27,7069 (1985)
7.241 H. Yamamoto, T. Furukawa: Trans. Inst. Electron. & Commun. Eng. Jpn.,
J68A, 524 (1985)
7.242 S. Kaczamarek, P. Gofta: In 6th European Conference on Electrotech-
nics-EUROCON 84, Computers in Communication and Control (Peter
Peregrinus, London 1984) p. 84
7.243 M.F. Deering: Byte 10, 193 (1985)
7.244 E.J. Schuegraf: In Communicating Info~ation, FrOB. of the 4rd ASIS
Annual Meeting (Knowledge Ind. Publications, White Plains, NY 1981)
p. 329
7.245 T. Ichikawa, N. Kamibayashi: J. Inst. Electron. & Commun. Eng. Jpn.
64, 609 (1981)
7.246 J. Koller: Elektronik 32, 45 (1983)
7.247 T. Durham: Computing 8 (1983)
7.248 B. Svensson: Dissertation, Dept. of Computer Eng., University of
Lund, Sweden (1983)
7.249 C. Fernstrom: Dissertation, Dept. of Computer Eng., University of
Lund, Sweden (1983)
7.250 I. Kruzela: Dissertation, Dept. of Computer Eng., University of
Lund, Sweden (1983)
7.251 C.C. Foster: Massachusetts Univ. Rep. AD-AI23 028/3 (1982)
7.252 C. Weems, S. Levitan, C. Foster: In Proc. IEEE Int. Conf. on Circuits
and Computers ICCC '82 (IEEE, New York 1982) p. 236
371
7.253 S. Berkovich, J.M. Pullen: In Proc. of the IEEE Int. Conf. on Com-
puter Design: VLSI in Computers ICCD '84 (IEEE Comput. Soc. Press,
Silver Spring, MD 1984) p. 382
7.254 K.E. Batcher: IEEE Trans. C-31, 377 (1982)
7.255 Anon.: Multiple Instruction Associative Processor (MIAP),
PB80-980220, PB80-925110, PC E02 NTIS
7.256 S.A. Gerasimova, V.M. Zakharchenko: Sov. J. Opt. Technol. 48,404
(1981)
7.257 S. Kumar, S.N. Maheshwari, P.C.P. Bhatt: In Proq. of the First Int.
Conf. on Supercomputing Systems: SCC{ 85 (IEEE Comput. Soc. Press,
Washington, DC 1985) p. 641
7.258 D. Parkinson, H.M. Liddell: IEEE Trans. C-32, 32 (1983)
7.259 O. I-Jing: In Proc. IEEE Int. Conf. on Computer Design: VLSI in Com-
puters ICCD '83 (IEEE Computer Soc. Press, Silver Spring, MD 1983)
p. 247
7.260 L. Wallis: Electronic Design 32, 217 (1984)
7.261 Y. Shimazu, T. Tamati: Trans. Inf. Process. Soc. Jpn. 26, 53 (1985)
7.262 Computer, Vol. 12, No.3 (1979)
7.263 L.A. Hollaar, J.J. Kuehn: In Proc. of the Sixth AnnuaZ InternationaZ
ACM SIGIR Conference on Research and DeveZopment in Information
RetrievaZ, No. 24, p. 3 (1983)
7.264 A. Hurson: In Proc. of IEEE Computer Society Workshop on Computer
Architecture for Pattern AnaZysis and Image Database Management
(IEEE, New York 1981) p. 225
7.265 P. Hawthorn, D.J. De Witt: AD-A104 927/9, Report No. CSTR-383, Wis-
consin Univ.-Madison, Dept. of Computer Sciences (1980)
7.266 D.J. De ~Jitt: IEEE Trans. C-28, 59 (1979)
7.267 S.P. Kartashev, S.I. Kartashev: IEEE Trans. C-33, 28 (1984)
7.268 K. Goser, C. Foelster, U. Rueckert: Inf. Sci. 34, 61 (1984)
7.269 D. Lawton, S. Levitan, C. Weems, E. Riseman, A. Hanson, M. Callahan:
Proc. SPIE Int. Soc. Opt. Eng. 504, 92 (1984)
7.270 C. Weems, D. Lawton, S. Levitan, E. Riseman, A. Hanson, M. Callahan:
In Proceedings CVPR '85: IEEE Computer Society Conference on Com-
puter Vision and Pattern Recognition (IEEE Comput. Soc. Press, Silver
Spring, MD 1985) p. 598
7.271 A.M. Veronis: In IEEE SOUTHEASTCON '83 Conference Proc. (IEEE, New
York 1983) p. 119
7.272 C. Weems, D.T. Lawton: Proc. SPIE Int. Soc. Opt. Eng. 435, 121 (1983)
7.273 J.L. Potter: In Proc. of the IEEE Int. Conf. on Computer Design: VLSI
in Computers ICCD '84 (IEEE Comput. Soc. Press, Silver Spring, MD
1984) p. 520
7.274 R.M. Lea: lEE Proc. I 133, 105 (1986)
7.275 M.E. Steenstrup, D.T. Lawton, C. Weems: In Proc. of IEEE Compo Soc.
Conf. on Computer Vision and Pattern Recognition, ed. by H.J. Siegel,
L. Siegel (IEEE Comput. Soc. Press, Silver Spring, MD 1983) No.2,
p. 492
7.276 D.R. McGregor: In IEEE CoZZoquium on VLSI SpeciaZ Purpose Computer
Architectures and ImpZementations (lEE, London 1985) p. 6
7.277 W.R. Cyre: AD-A082 324/5, Control Data Corp., Minneapolis (1979)
7.278 T. Kohonen: SeZf-Organization and Associative Memory, Springer Ser.
Inform. Sci., Vol. 8 (Springer, Berlin, Heidelberg 1984)
7.279 D. Psaltis, N. Farhat: Opt. Lett. 10, 98 (1985)
7.280 IEEE Spectrum, Vol. 23, No.8 (1986) (Special issue)
7.281 H. Mada: Appl. Opt. 24,2063 (1985)
7.282 D.A. Gregory, H.K. Liu: Appl. Opt. 23,4560 (1984)
7.283 H.J. Caulfield: Opt. Commun. 55, 80 (1985)
372
7.284 A.D. Fisher, C.L. Giles: In Proe. of the IEEE 1985 COMPCON Spring
(IEEE Computer Society Press, Silver Spring, MD 1985) p. 342
7.285 B.H. Soffer, G.J. Dunning, Y. Owechko, E. Marom: Opt. Lett. 11, 118
(1986)
7.286 P.J. Becker, H. Bolle, A. Keller, W. Kistner, W.D. Riecke: FB-DV-79-
05, Bundesministerium fUr Forschung und Technologie, Bonn-Bad Godes-
berg, FRG, March (1979)
7.287 S.A. Gerasimova, V.M. Zakharchenko: Sov. J. Opt. Technol. 48,404
(1981)
7.288 A.A. Verbovetskii: Autom. Remote Control 45, 1382 (1984)
7.289 C. Warde, J. Kottas: Appl. Opt. 25, 940 (1986)
7.290 D.Z. Anderson: Opt. Lett. 11, 56 (1986)
7.291 A. Yariv, S.-K. Kwong: Opt. Lett. 11, 186 (1986)
7.292 Digest of Teehnieal Papers, Optieal Soeiety of Ameriea 1985 Annual
Meeting, Washington, DC, October 14-18, 1985
7.293 Proe. of the SPIE Speeial Institute on Optieal and Hybrid Computing,
Leesburg, VA, March 24-27, 1986 (in press)
Subject Index