Address correspondence to Roger Ratcliff, Psychology Department,
Northwestern University, Evanston, IL, 60208.
According to Jacoby's two factor theory, performance in recognition memory is determined by the combination of an unconscious familiarity process and a conscious intentional recollection process. But according to the global memory models, performance in recognition is typically determined by only a single process. In this article, we compare the two factor theory, single process theories, and other two process theories against empirical and simulated data, outlining conditions under which data cannot discriminate between the theories and conditions under which a single process is not sufficient.
The hallmark of memory research over the past 25 years has been the development of models- theoretical accounts of how people perform when they are given tasks for which previously learned information is useful or required. The goal has been to account for patterns of experimental data within a concise theoretical framework, a framework that can also lead to predictions that can be tested against new data. With the development of data driven theories has come the realization that memory processes are not fully open to introspection. Our intuitions about how retrieval from memory operates are certainly incomplete and probably often wrong.
The newest challenge to intuitions and models alike has been the proliferation of demonstrations that memory can affect performance in the absence of awareness (e.g., Jacoby & Witherspoon, 1982; Schacter, Bowers & Booker, 1989; Warrington & Weiskrantz, 1968). Amnesiac patients show effects of prior learning without being able to remember the learning episode itself. Normal subjects show effects of prior learning on tasks which do not ask for or require the use of previously learned information, and their performance on such tasks is influenced differently by some variables than is their performance on direct tests of memory.
As demonstrations of memory without awareness have proliferated, so have ideas about how to understand the phenomenon. A proposal that has garnered much attention is that performance on indirect tests, those tasks for which direct recollection is not required, is mediated by an "implicit" memory system whereas performance on direct tests is mediated by a different system, an "explicit" memory system (Schacter & Tulving, 1991; Squire, 1992). Countering the notion of implicit memory systems are proposals that differences in performance on indirect versus direct tasks come about because of different processes operating in the same memory system (e.g., Hintzman, 1990; Jacoby & Witherspoon, 1982; Nosofsky, 1988). Also opposing the notion of an implicit memory system is Jacoby's idea (Jacoby, 1991; Jacoby & Dallas, 1981; Jacoby, Toth, & Yonelinas, 1993; Jacoby, Woloshyn, & Kelley, 1989; Jacoby, Yonelinas, & Jennings, in press) that performance on the two kinds of tasks is a mixture of exactly two processes, one conscious and the other unconscious, with indirect tasks relying more heavily on unconscious processes than direct tasks.
These different proposals illustrate the differences among researchers' intuitions. Clashes among intuitions have sometimes inspired models of memory. In an effort to understand memory without awareness, Jacoby (1991) has developed the two process idea into a model he has labeled "two factor theory"; one factor is an unconscious automatic process that assesses the familiarity of a probe to memory and the other factor is conscious recollection. Jacoby has applied the theory to a number of both direct and indirect tasks, including cued recall (Jacoby, Toth, & Yonelinas, 1993), fame judgements (Jacoby, Woloshyn & Kelly, 1989), stem completion (Jacoby, Yonelinas, & Jennings, in press), and perception of briefly flashed stimuli (Jacoby & Kelley, 1992). In each case, the theory leads to the same conclusion, that the familiarity process remains invariant across manipulations that would be expected to affect, and in fact do appear to affect, only the recollection process (Jacoby, Yonelinas, & Jennings, in press).
Two factor theory has also been applied to recognition memory (Jacoby, 1991; Yonelinas, in press). For this task, unlike some other tasks, two factor theory stands in opposition to previously developed models. The global memory models (Gillund & Shiffrin, 1984; Hintzman, 1988; Murdock, 1982; Ratcliff & McKoon, 1988) assume that performance in a recognition memory task typically depends not on two processes, but just one process. The single process is labeled "familiarity." Although Jacoby used the same label for one of the factors of two factor theory, familiarity is defined differently in the two theories. Two factor theory also assumes that the same familiarity process and the same recollection process operate in all memory tasks, whereas global memory models allow the possibility that different processes can operate in different tasks. The aim of the research described in this article was to pit the two factor theory of recognition against one of the global memory models, Gillund and Shiffrin's model SAM (1984). As both an alternative to the new implicit memory proposals and an alternative to the more traditional global memory models, Jacoby's model has the potential to play an important role in understanding retrieval processes, so it is critical to subject the model to the most stringent tests possible. In the sections that follow, we first describe Jacoby's two factor theory and its application to recognition, and then the global memory model SAM (Gillund & Shiffrin, 1984) with which we evaluated two factor theory. We then examine how both models can explain data from experiments that were originally interpreted as support for two factor theory. We also compare the two factor theory to a different two process theory, Atkinson and Juola's familiarity plus search model.
The great interest in the proposal of implicit memory systems has come about because of task dissociations: indirect measures can show influences of prior experiences that are not shown by direct measures, and performance on indirect tasks can be affected by different variables than performance on direct tasks. One factor that has made these dissociations exciting to researchers is the linking of performance on indirect tasks to memory without awareness; that is, indirect tasks have appeared to offer the possibility of investigating and perhaps eventually understanding the unconscious.
However, to investigate unconscious processing, it must be separated cleanly from conscious processing. The problem is that task dissociations cannot necessarily accomplish this. As Jacoby pointed out, research using task dissociations to investigate unconscious processes has relied on the assumption that there is a one-to-one mapping between a task and a process; the methods that have been used require that a task be "factor-pure" for the process it is designed to measure (Jacoby, 1991). But indirect tasks do not necessarily provide pure tests of implicit memory; they can be contaminated by subjects' explicit recall of earlier experiences (cf Richardson-Klavehn & Bjork, 1988; Ratcliff & McKoon, 1994a; 1994b). Moreover, Jacoby argues, performance on direct tests can also be the result of a mixture of conscious and unconscious processes.
In Jacoby's view, unconscious influences in direct tests can take either of two forms. Usually, the effect of unconscious influences is to facilitate performance. In recognition, for example, the unconscious familiarity of a test item can lead to a correct positive response even when conscious attempts to recognize it fail. But unconscious influences can also interfere with performance on direct recollection. In this regard, Jacoby (1991) cites the early Warrington and Weiskrantz (1968) finding that when amnesics were asked to recall or recognize words from a studied list, incorrect responses were often intrusions from earlier lists. Normal subjects' performance can also show decrements from unconscious influences. In a study by Jacoby, Woloshyn, and Kelley (1989), subjects were asked to judge whether or not each name in a list was the name of a famous person. The judgements were preceded by study of some of the nonfamous names. Subjects were told that previously studied names were all nonfamous, so they could use retrieval of names from the studied list to reduce their likelihood of making the error of judging a nonfamous name famous. But when direct recollection was made difficult by adding a concurrent task to be performed during the fame judgements, the familiarity produced by prior study led to an increased probability of judging previously studied nonfamous names to be famous.
If performance on direct tests of memory is influenced by both conscious and unconscious processes, then empirically investigating either or both requires separating their influences on performance. By Jacoby's view, this can never be done through task manipulations because performance always reflects a mixture of conscious and unconscious processes. For instance a concurrent task might limit conscious intentional recollection but not eliminate it completely. Instead, the separation of conscious and unconscious processes must be accomplished theoretically. In Jacoby's two factor theory, it is assumed that there is one conscious intentional recollection process and one unconscious process, and the theory shows how the two combine to produce performance. Simultaneously, the theory provides a method for separating and identifying the contributions of each of the two processes, with the hope of eventually understanding them by examining how they are independently affected by different variables.
The process dissociation method uses a "commonsense approach of measuring intentional control (recollection) as the difference between performance when one is trying to as compared with trying not to engage in some act" of memory retrieval (Jacoby, Toth, & Yonelinas, 1993). In other words, performance on a task for which recollection facilitates the production of some class of responses is compared against performance on a task for which recollection facilitates the suppression of those responses. It is assumed that recollection alone is responsible for the difference between performance in the two cases, so the difference provides a measure of conscious recollection, from which a measure of unconscious familiarity can be calculated through the model explained below.
For recognition, two factor theory can be operationalized in a situation in which subjects study two lists of words (e.g., List 1 and List 2, Yonelinas, in press). In one condition, they are instructed to respond positively to words from List 1, and negatively to words from List 2 and to new words. In a second condition, they are instructed to response negatively to words from List 1 and to new words and positively to words from List 2. Thus, subjects are to try to respond positively to words from List 1 in the first condition, the "inclusion" condition, and to try not to respond positively to words from List 1 in the second condition, the "exclusion" condition. In the inclusion condition, both the unconscious process, labeled "familiarity", and conscious recollection contribute to the probability of a correct yes response for words from List 1:
P(Include) = P(I) = P(R)+P(F)-P(R)*P(F) = P(R)+P(F)*(1-P(R)) \t\t\t\t\t\t\t (1)
where P(R) is the probability of successful recollection and P(F) is the
probability that the familiarity of a test item is higher than the
criterion value that is necessary for a positive response.
Both
processes also influence performance in the exclusion condition. A
yes response in the exclude condition (an incorrect response) is due to
familiarity exceeding the positive
response criterion when there is a failure of the recollection process:
P(Exclude) = P(E) = P(F)*(1-P(R)). \t\t\t\t\t\t\t (2)
The probability of recollection is calculated as
the difference between the include and exclude scores (when a subject is
trying to respond positively versus when the subject is trying not to
respond positively):
P(R) = P(I)-P(E). \t\t\t\t\t\t\t (3)
Then familiarity is:
P(F) = P(E)/(1-P(R)). \t\t\t\t\t\t\t (4)
These equations are the two factor theory as applied to recognition memory. They are based on the assumption that recollection and familiarity have statistically independent influences on memory. That is, whether an item's familiarity is high or low has no bearing on its likelihood of recollection, and vice versa. An item's familiarity is unchanging across tasks; once encoded, an item's familiarity value is the same in any memory retrieval task. Jacoby, Yonelinas, and Jennings (in press) present support for the independence assumption in contrast to some other possible assumptions that might be made about the relation of the two factors (see Curran & Hintzman, in press).
In addition to the assumptions embodied in these equations, Jacoby (Jacoby, Toth, & Yonelinas, 1993; Yonelinas, in press) adopted signal detection theory as an account of familiarity. Values of familiarity are assumed to be distributed across a continuum from high to low, and subjects are assumed to respond according to whether the familiarity value of a test item is above or below a criterion amount of familiarity. To the standard signal detection theory assumptions, Jacoby and Yonelinas added the assumption that the distributions of previously studied and nonstudied items have equal variance.
In summary, the two factor theory has been proposed as a means of dealing with difficult issues that have plagued research on conscious versus unconscious processes, issues recently brought into sharp focus by findings of dissociations between tasks and effects that intuitively seem to involve conscious recollection and tasks and effects that intuitively seem to involve memory without awareness. Jacoby's theory states that two factors, unconscious familiarity and conscious recollection, influence performance independently on all tasks, and the theory provides a method for investigating their separate influences on performance as a means of eventual development of an understanding of both conscious and unconscious processing.
Support for two factor theory comes in two forms. First, when the process dissociation method is applied, it appears to separate the effects of a number of variables into conscious versus unconscious influences in intuitively predictable ways that are invariant across a range of tasks (see Jacoby, Yonelinas, & Jennings, in press, for an overview). For example, the variable of full versus divided attention during test has a large influence on conscious processing but little influence on unconscious processing, and this is true for stem completion, cued recall, recognition, and fame judgements (Jacoby, Yonelinas, & Jennings, in press). Two factor theory itself does not predict which factor should be influenced by dividing attention- independent notions about how attention interacts with unconscious processes do that- but the theory does predict that if dividing attention affects only familiarity in one task, it should affect only familiarity in all tasks, and Jacoby's data are consistent with this prediction. The second form of support for two factor theory is that the measure of unconscious processing that is derived from process dissociation is affected in generally the same ways by experimental manipulations as performance on indirect tests of memory. This would be predicted because indirect tests are thought to rely mostly on unconscious processes.
In this article, the main domain of investigation is recognition memory. Jacoby (1991) found support for the two factor theory for recognition by comparing the contributions to performance of recollection and familiarity as they were estimated from process dissociation (by the equations above) to their contributions as measured directly by manipulation of full versus divided attention. According to the theory, dividing attention at test should severely reduce recollection and give a relatively pure measure of familiarity. This amount of familiarity should match the estimate of familiarity derived for a full attention test from the process dissociation equations and, as predicted, the two values, observed and estimated, were nearly equal (Jacoby, 1991).
The claim of two factor theory that performance in recognition memory reflects exactly two processes contradicts the accounts that current global memory models' have put forward for performance in most recognition memory experiments (Gillund & Shiffrin, 1984; Hintzman, 1988; Murdock, 1982). In these models, recognition is typically modeled by a single process. Another process, such as a recall process, could be added, but is not required by the data for most applications. A key difference between the global memory models and two factor theory is two factor theory's assumption that the familiarity process is unchanging between include and exclude conditions. In the global memory models, a single familiarity process can allow retrieval to focus on different kinds of information across experimental conditions.
For the purposes of this article, we exemplify the single process models with Gillund and Shiffrin's SAM model (1984). Like all of the global memory models, SAM assumes that learned items are stored in long-term memory and that a test item is matched against all the items in long-term memory in parallel (hence the label "global"). For SAM, long-term memory stores associative strengths between items in memory and items that might be presented as tests of memory (cues). An item to be learned is encoded into a working memory buffer. While in the buffer, the strength of the item as a possible future test item is increased by strengthening the association between the item as a cue and itself in long term memory, strengthening the association between the item as a cue and other items in the buffer at the same time, and strengthening the association between the item and the context in which it is learned. The association between the item as a cue and itself in memory is called self-strength, the strength between the item and the other items in the buffer at the same time is called inter-item strength, and the strength between an item and its context is called context strength. An item that is never encoded into the buffer (a "new" item on a recognition test) is assumed to have some pre-experimental, residual strength. The amount of increase in each strength value is a function of the time spent in the buffer. Also, the amount of the increase in each strength value is variable: the value is multiplied by 0.5 with probability 1/3 or it is multiplied by 1.5 with probability 1/3 or it is left unchanged with probability 1/3. This assumption about variability leads to normally distributed familiarity values by the central limit theorem once many strengths are multiplied and summed (Equation 5 below).
Table 1 shows a part of the association structure that might be built
when two lists of words are learned. Item 2 from List 1, for example,
is encoded with some value of strength (S) between itself as a
test item and itself as an item in memory and some other value of strength
between itself and other items that were in the buffer at
the same time (e.g. Items 1 and 3 in List 1). There is also some
value of strength between the list context (C) in which an item was
studied and the item in memory. For the two list situation, it is
assumed that for an item encoded in one list, there is some small
residual context strength between the context of the other list and the
item (R
), because there is some overlap in general context between the
two lists. There is also some residual strength (R) from an item as a
test to all items in memory which were not encoded in the buffer at the
same time as itself.
INSERT TABLE 1 ABOUT HERE
For recognition, there is a single retrieval process: a test probe is matched against all the items in memory. The probe is made up of a test item and the relevant context(s). The match process produces a global value of familiarity: a value above a criterion leads to a positive response and a value below the criterion leads to a negative response. For the situation in which two lists of words were studied, a test probe is made up of the test item and the contexts of the two lists. The contributions to familiarity are weighted to allow the relative contributions of the two context strengths and the item strength to vary across different experimental situations. If, for example, items from List 1 were to receive a positive response and items from List 2 a negative response, then the context strength for List 1 would be weighted more heavily than the context strength for List 2. The three weights are assumed to sum to one. Familiarity for a test item j with contexts C1 and C2 (for List 1 and List 2) is computed by summing over all items in memory (all i):
F
The equation for familiarity represents a single process for
recognition, a process by which responses are determined by the sum
over items in memory of joint multiplicative functions of the strengths
of association between the test context and items in memory and the
test item and items in memory. Summing over values obtained from the
multiplicative function leads, by the central limit theorem, to
normally distributed familiarity values. The choice of a positive
versus negative response is made by comparing the familiarity value for
a test item in context to a criterion.
The single process stands in clear
opposition to Jacoby's additive two factor model. An essential point
about the difference between the two models is that the single
familiarity process in SAM is influenced by list context so that the
value of familiarity for a test item in an include condition (P(F
The Gillund and Shiffrin model, like Hintzman's and Murdock's models,
is supported by a wide range of data over a number of experimental
variables and tasks. For recognition, SAM successfully accounts for
the effects of variables such as study time, list length, encoding
context, and word frequency. With the same memory structure but
different processing assumptions, it has been applied to recall and cued
recall (and see also a related categorization model, Nosofsky, 1988).
Other global memory models have been similarly successful with various
independent variables in tasks assessing frequency judgment, recency
judgment, categorization, serial order, and so on. The global memory models
have also been successfully applied to priming phenomena in recognition
and lexical decision (Dosher & Rosedale, 1989; McKoon & Ratcliff, 1992;
Ratcliff & McKoon, 1988; 1994c). In most cases, the models provide
not only qualitative accounts of data but also close quantitative fits
to parametric data.
While two factor theory
can marshal a number of intuitively compelling and interesting
dissociations and decompositions of data in its support, the global memory
model likewise can marshal a range of successful applications to data and
interpretations of empirical findings.
Jacoby's use of inclusion versus exclusion conditions in recognition
memory experiments has yielded new data against which the global memory
models have not been tested. We first addressed the conflict between
the two factor theory and single process models by examining whether
the single process model SAM can account for data from three
inclusion/exclusion experiments. We then turned to a second, critical
issue: how to evaluate the process dissociation method if two factor
theory and a single process theory can equally well account for
inclusion/exclusion data.
In Yonelinas' Experiment 1, subjects were given two lists of words to
study. There were two conditions: At test, subjects were either
instructed to respond yes to words from the first list and no to
words from the second list (and no to new words) or they were
instructed to respond no to words from the first list (and no to
new words) and yes to words from the second list. The first list was
"included" in the first condition and "excluded" in the second
condition, and the second list was "excluded" in the first condition
and "included" in the second.
Yonelinas asked subjects to make their responses on a
six-point confidence judgment scale, but for our purposes, we used the
positive-negative split to produce two response categories, grouping
high, medium, and low confidence positive responses into one category
and high, medium, and low confidence negative responses into the other
category. Yonelinas used short lists (10 items each) and long
lists (30 items each).
Yonelinas' data is shown in the first two rows of Table 2. The
probability of a positive response to items when they were in the
include condition is shown in column one, the probability of a positive
response to items when they were in the exclude condition is shown in
column two, and the probability of a positive response to items that
were not from either list is shown in column three. Using the
equations given above for process dissociation (Equations 1 to 4), the
probabilities of recollection and familiarity can be calculated and
these are shown in the remaining columns. These probabilities exhibit
the kind of dissociation that has been claimed to support the
two factor theory: list length affects recollection but not
familiarity. The support for the theory comes only from the existence
of a dissociation of the list length effect for familiarity versus
recollection. The theory does not specify why list length should
affect recollection and why it should not affect familiarity, so the
actual form of the dissociation provides no particular support for the
theory.
Yonelinas and Jacoby (1994) reported a second experiment in which list
length was manipulated. Subjects were given one list of words to study,
with the words on the list alternating in presentation modality between
visual and auditory. At test, they were instructed to respond
positively to words that had been presented in one of the modalities and
negatively to words that had been presented in the other modality and to
new words. The length of the studied list was either 60 words or 30
words. Results of the experiment are shown in the first two rows of
Table 3, along with the probabilities of recollection and familiarity
as calculated from the two factor theory. The results again show a
dissociation between recollection and familiarity, with list length
affecting recollection but not familiarity.
INSERT TABLES 2 AND 3 ABOUT HERE
To apply the SAM model to these data, the size of the encoding buffer
was assumed to be four words (the same assumption as was made by
Gillund & Shiffrin for study lists in which single words were presented
individually). From SAM's assumptions about how strengths are built up
during encoding and the equation for the calculation of the global
familiarity of a test probe, explicit expressions can be derived for
the mean and variance of each of the necessary distributions of
familiarity values: the mean and variance for included test items, the
mean and variance for excluded test items, and the mean and variance
for new items. Because the distributions are approximately normal,
standard signal detection theory can be used to compute the
probabilities of positive responses in the different conditions. A
least-squares minimization routine was used to estimate the values of
the parameters of the model that best fit the data.
Tables 2 and 3 show that SAM fits the include/exclude data well. The
differences between the real and estimated data are within the bounds
of experimental error. The parameter values used to produce the
estimated data are listed in the tables. The parameter that varies to fit
the include versus exclude conditions is the weight assigned to their
contributions to familiarity; the weight given to the strength for a
list context is high if the items from the list are to be "included",
and the weight is low if items from the list are to be "excluded."
Unlike the two factor theory, the familiarity value of a test item is
not constant across these two conditions; instead, it is a function of
the include versus exclude task requirements as represented in the
model by a change in context weighting. In SAM, the single retrieval
process focusses on different information in the include versus exclude
conditions, in contrast to two factor theory which uses different
contributions from two processes in the include versus exclude
conditions.
The presence of both included and excluded items in the same test list
gives less freedom to SAM in fitting the data than would be the case
for other include/exclude paradigms. The only parameter free to vary
to account for list length is the familiarity criterion (see Gillund &
Shiffrin, 1984). Because there are more studied items in memory
contributing to familiarity for a test item from a long list than a
test item from a short list, familiarity is higher on average for items from a
long list and familiarity is more variable for items from a long list.
This is true for both old test items and new test items. It is true for
new test items because they are
matched
against a larger number of studied items from a long list than from a short
list
(see Gillund & Shiffrin, 1984). Because of the higher
and more variable familiarity values for both old and new test items, the
criterion familiarity value is higher for long than short lists.
The relative values of the other parameters (the learning parameters)
are typical of other SAM fits (Gillund & Shiffrin, 1984): self-strength
is higher than inter-item strength which in turn is higher than
residual strength, and context strength is higher for the list in which
an item was learned than residual context strength for the other list.
To provide generality to other experimental variables, SAM was also fit
to the data from Yonelinas' Experiment 3 which used a strength
manipulation (as opposed to the length manipulation in the prior two
experiments). In this experiment, there were again two lists of words.
The words were studied in pairs, either for 1 s or for 3 s, with study
time a within-list variable. At test, subjects were instructed as in
Experiment 1, either to "exclude" the first list or to "exclude" the
second list. The data are shown in Table 4, along with the
probabilities of recollection and familiarity derived by the process
dissociation method.
SAM was fit to these data in the same way as for Yonelinas' Experiment
1, except that the encoding buffer was assumed to hold two words (i.e.
one pair) at a time instead of four words. The estimated data in Table
4 show that again SAM fits the data well. The include versus exclude
difference comes from shifting weight from one list context to the
other, as with the other experiments. The difference between strongly
and weakly encoded items (long and short study time) comes about
because the encoding strength values (self, interitem, and context) are
multiplied by study time multiplied by 0.41. The scaling factor, 0.41,
allows the model to match the empirically observed rate at which d'
increases with study time (see Shiffrin, Ratcliff, & Clark, 1990).
INSERT TABLE 4 ABOUT HERE
It should be pointed out that, for Yonelinas' Experiments 1 and 3 and
Yonelinas and Jacoby's Experiment 1, it is relatively easy for SAM to
fit the data because there are relatively few constraints on the
model. A more rigorous test of SAM would require fits to a number of
different study/test conditions simultaneously. However, our purpose
here is the simple demonstration that a single process model can
equally well account for some of the data that have been used to
support the two factor theory. SAM predicts the qualitative
differences in performance for long versus short lists, for long versus
short study times, and for list discrimination instructions, and it can
fit the effects quantitatively. Two factor theory predicts that its
two processes will sometimes dissociate, but not whether (or how) they
will do so for list length or study time.
The implication of SAM's success in accounting for Yonelinas and
Jacoby's data is that the process dissociation method could be applied
equally well to predicted data from SAM as to real data from subjects.
Even though the predicted data points were generated from a single
process, the process dissociation method would still provide estimates
of the contributions of two processes. The method gives no way to tell
whether data were produced by two processes or a single process.
When the process dissociation method was applied to the data points
generated from SAM, the resulting estimates of familiarity and
recollection are almost the same as when the method was applied to the
real data (see Tables 2, 3, and 4). The
estimates of familiarity and recollection given by the method for the
SAM-generated data have to be almost identical to those given for the
real data because SAM's fits to the data were so close. But the
estimates are valid only under the two factor theory, and invalid under SAM,
and their interpretation is valid only under two factor theory, not
under SAM. According to two factor theory, the results of Yonelinas
Experiment 1 and Yonelinas and Jacoby's Experiment 1
show that list length
affects recollection, a conscious process, but not familiarity, an
unconscious process. In SAM, the results are interpreted to show that
list length does affect SAM's familiarity process. What is learned
about conscious versus unconscious processes in recognition depends on
accepting two factor theory. For Yonelinas' Experiment 3,
two factor theory and SAM both interpret the results to show that
strength of encoding affects familiarity, but two factor theory also
has strength affecting recollection. Again, what is learned about
retrieval depends on the choice of model.
SAM can account for the study time effects and list length effects in
Yonelinas and Jacoby's experiments, and Gillund and Shiffrin (1984)
have shown that the model can account for the effects of these
and other variables simultaneously. The focussing mechanism
of weighted contexts allows the model to explain list discrimination
data (e.g. Anderson & Bower, 1972) and the same mechanism allows
it to explain the data from the include versus exclude conditions. But
the global match process by which SAM calculates familiarity would
never be expected to apply to all memory retrieval situations. Free
recall, for example, is modeled with a different process, a repeated
sampling and recovery process (Gillund & Shiffrin, 1984; Raaijmakers &
Shiffrin, 1981). There should also be situations in which more than
one process is required to explain performance. A candidate for such a
situation is provided by another include/exclude experiment by Jacoby
(Experiment 3, Jacoby, 1991).
In Jacoby's experiment, subjects heard the words of one list, and in
another list, they read some of the words and they were asked to solve
anagrams to produce some of the words for themselves. In the include
condition, subjects were asked to respond positively to all of the
studied words. In the exclude condition, they were asked to respond
positively only to the words that they had heard; they were warned to
respond negatively both to words that were studied in their normal form
(the "read" words) and to words that were presented as anagrams.
Table 5 shows Jacoby's data. The difference between the probabilities of
responding positively in the include and exclude conditions is
much larger for the anagram words than the read words, in accord with
the expectation that the extra work required for the anagrams at study
would lead to better memory at test. The table also shows estimates
of familiarity and recollection derived from process dissociation.
Jacoby assumed that the difference between the include and exclude
conditions was a measure of the probability of recollecting the anagram
and read words (Equation 3). He also assumed that familiarity
sometimes led subjects to respond positively to read and anagram words
when they were supposed to be excluded, so that familiarity could be
calculated from Equation 4. The probability of recollecting an anagram
word, as derived from process dissociation, is much higher than the
probability of recollecting a read word, and familiarity is also higher
for an anagram than a read word.
INSERT TABLE 5 ABOUT HERE
The question we addressed was whether SAM could account for the
data from all the conditions (include versus exclude for read, heard,
and anagram test words). We suspected that it could not, based on the
intuition that there might be some recall contributing to performance
for anagram test words. To address the question, we first found
parameters that allowed SAM to fit the data for the words that were
read and the words that were heard. We started with the read and heard
words because we thought that, if more than SAM's single process were
needed to fit all the data, it would most likely be needed for the
anagram test words.
To model the read and heard test words in the include
and exclude conditions, the encoding parameters were kept constant
except that self-strength was allowed to be different for the two kinds
of words (it might also be reasonable to allow interitem strength to
vary but it was unnecessary because SAM's fit was exact without this;
also, decoding anagrams probably suppressed interitem rehearsal). The
criterion value of familiarity for dividing positive from negative test
responses was different for the include and exclude
conditions because the include and exclude items were presented in
different test lists. The context weights were set to differentiate
the lists: items from the read and anagram list required positive
responses in the include condition and negative responses in the
exclude condition, so the weighting of the read/anagram list context
had to be high in the include condition and lower in the exclude
condition. The weighting of the heard context had, correspondingly, to
vary in the opposite way, higher when read/anagram words were excluded
than when they were included. The parameter values are shown in Table
5, and they are reasonable compared with fits of SAM to other data. The
predicted probabilities of positive responses for read, heard, and new
words exactly match the data.
It is important to note, once again, the difference between SAM's account
of the include versus exclude conditions and two factor theory's account.
In SAM, the familiarity of a test word in the include condition
is different than in the exclude condition
(F
Given that SAM can accommodate the data for the read and the heard test
words, the question was whether it could simultaneously accommodate the
data for the anagram test words. In Jacoby's experiment, all four
kinds of test words were mixed within a test list, so there was no way
for subjects to change the positive/negative criterion from one kind of
test word to another. Also, the weights for the list contexts could
not be different for the read words than the anagram words because they
were studied in the same list. Thus, all the test parameters were
fixed. In addition, the interitem strength parameter could not
vary between read versus anagram words, again because they were studied
in the same list. The only parameter free to vary between the anagram
and read words was the self-strength parameter.
To find out whether there was a value of anagram self-strength that
could fit the data, it was varied over the range shown in Figure 1.
The figure shows how the probability of a positive response to an
anagram test item varies as a function of anagram self-strength. The
figure displays the probability that the familiarity value is above the
criterion for a positive response in the include condition
(P(F
One possibility is that anagram test words evoke a recall process
(Raaijmakers & Shiffrin, 1981) in addition to the familiarity process.
The recall
process in SAM is defined differently than the recollection process in
two factor theory; SAM's recall was defined by Raaijmakers and Shiffrin in a
specific and detailed way that allowed the SAM model to account for a
number of aspects of recall data. For the anagram test words, the recall
process could be evoked either
in both the include and exclude conditions or only in the exclude
condition. It might be reasonable to assume recall for both include
and exclude because solving anagrams would make them very strongly
encoded. On the other hand, using recall only in the exclude condition
might make sense because it is only in the exclude condition that words
from one list are to be distinguished from words in the other list; in
the include condition, all highly familiar test words from either list
are to be given a positive response. We examined the consequences of
adding recall to the exclude condition alone and to both conditions,
and the results are presented in the following section.
The goal was to evaluate what can be learned about retrieval processing
from SAM versus what can be learned from two factor theory. The
question is not whether SAM with its two processes can accommodate the
data, because that will be guaranteed by the flexibility gained from
adding a second process. Rather the question is whether the
conclusions that can be drawn about retrieval are the same for the two
models.
One of the ways SAM could accommodate the data for the anagram test
words in addition to the read and heard test words is to assume that
recall is used for the anagram test words in both the include and
exclude conditions equally. We can use the difference between SAM's best
predictions based on familiarity alone and the empirical data to
provide an estimate of what the recall process must contribute to
performance.
With these assumptions, the probability of a positive response to an
anagram test word in the include condition is given by:
P(I) = P(R)+P(F
In the exclude condition, the probability of a positive response to an
anagram test word is given by:
P(E) = P(F
In essence, recall adds to the probability of a positive response in
the include condition and subtracts from the probability of a positive
response in the exclude condition, in order to make up the difference
between the data and the probabilities of positive responses based on
familiarity alone.
Solving for P(R),
P(R) = (P(F
P(E)/(1-P(I)) = P(F
With the assumption of recall contributing to performance equally for
anagram test words in both the include and exclude conditions,
SAM estimates the probability of recall at .37. This account contrasts
with the two factor theory account. For SAM, familiarity provides the
only basis for responses for read and heard test words, and recall adds
to familiarity for anagram test words. For two factor theory,
responses for all test words are based on both familiarity and
recollection.
It turned out that the probability of extra information contributing to
performance for anagram test words estimated by SAM (P(R)=0.37) was
about the same as the probability of extra recollection for anagram
test words over read test words in two factor theory. In two factor
theory, the difference in the probability of recollection for anagram
versus read test words was 0.40 (see Table 5). However, the two
accounts will not always be consistent in this way. Figure 1 shows
that the amount of extra information needed from a recall process will
vary as a function of the familiarity values in the include versus
exclude conditions. At low values of anagram self strength, a
considerable amount of information must be added from recall to
increase the probability of a yes response in the include condition and
decrease the probability of a yes response in the exclude condition.
But at high values of self-strength, the amount that must be added by
recall is less (because the familiarity include and exclude scores
diverge). So it is not necessarily the case that SAM, under the
assumptions outlined above, would estimate the same contribution from
recall to differentiate anagram from read test words as two factor
theory.
INSERT FIGURES 1 AND 2 ABOUT HERE
We used simulations to examine the generality of this lack of
equivalence across a range of possible empirical results. The
probabilities of positive responses for read test words in the include
and exclude conditions were fixed at the values obtained in Jacoby's
experiment. The probabilities of positive responses for anagram test
words in the include and exclude conditions were systematically varied
(from their real values of P(I)=0.80 and P(E)=0.29) to simulate
different levels of recall. From these probabilities, we used process
dissociation to calculate the difference in recollection for read
versus anagram test words, and for SAM, we calculated the contribution
from recall (i.e., what is not accounted for by familiarity) for anagram test
words. For process dissociation, the
difference is simply the estimate of recollection for anagrams minus
the estimate of recollection for read words
(P(R
Because all highly familiar test words should be given a positive
response in the include condition, it might be reasonable for subjects
to adopt a strategy of attempting recall only for the exclude
condition. In the exclude condition, the instruction is to respond
positively only for words that were heard. Jacoby (1991) assumed that
subjects do not rely entirely on recollection to do this, that they
still respond positively to highly familiar words when recollection
fails (see Equation 1; see also discussion by Curran & Hintzman,
in press). We followed that assumption for the exclude condition here.
The probability of a positive response for an anagram test word in the
inclusion condition is simply
P(I)=P(F
P(E) = P(F
The two ways that we have discussed of adding a recall process to SAM's
account of Jacoby's include/exclude data do not, of course, exhaust all
possibilities. For example, there might be recall in both the include
and exclude conditions for anagram test words but the probability of
its success might be different in the two conditions instead of the
same as we assumed above. There might be a recall process operating
for the heard and read test words, instead of just the anagram test
words, especially in the exclude condition. Our goal was not to
provide a definitive model for Jacoby's data. (The data do not provide
enough constraints to do that for the SAM model; more comprehensive sets
of data would be required for complete
model testing). Our concern was to show
that there exist plausible explanations of the data that are different
from two factor theory's explanation, and that what is learned about
retrieval processes is different under the different theories.
This general conclusion is the same as was reached for the Yonelinas
and Jacoby experiments for which SAM's single familiarity process was a
sufficient account of the data. According to two factor theory,
performance in recognition memory always involves exactly two
processes, whereas in SAM performance can often be explained as the
outcome of just one process. The variables that affect SAM's single
familiarity process are different than those that affect two factor
theory's familiarity process. When a recall process is added to SAM,
it is understood quite differently from recollection in two factor
theory. Recollection is assumed to participate in all retrieval
processes whereas the recall process in SAM can be added in different
ways to model different test conditions. As before, the picture given
by two factor theory of conscious versus unconscious retrieval is not
the same picture that would be given of retrieval processing by SAM.
The often compelling intuition that the retrieval of information
from memory involves two processes, even for recognition tasks, is not
new (Atkinson & Juola, 1973; Jacoby & Dallas, 1981; Mandler, 1980).
For example, Mandler (1980) postulated a familiarity process and a
recollection process, and proposed that familiarity was a fast
retrieval process running in parallel with the slower recollection
retrieval process. Mandler's model was applied to explain a range of
recognition and recall data and the hypotheses about the time course of
the two processes have also been tested (e.g., Mandler & Boek, 1974).
The model would apply to data from Jacoby's include versus exclude
manipulation in the same way as two factor
theory because it makes the same assumption about the independence
of the two processes as two factor theory.
A different two process model was developed by Atkinson and Juola
(1973) for recognition. In their model, there are two processes of
retrieval but both processes are not always
executed. If the familiarity of a test item is above some criterion
value or below some second criterion value, then a response is made
directly. The second process, a "search" process, is initiated only if
the value of familiarity falls between the two criteria. This model
was successfully applied across a range of reaction time data
(Atkinson, Herrmann & Westcourt, 1974; Atkinson & Juola, 1973).
Both the Mandler and the Atkinson and Juola models have been explicitly
tested and it has been argued that data do not in general support their
assumptions about two processes (e.g., Gillund & Shiffrin, 1984;
Monsell, 1978). The issue of concern here is whether the process
dissociation method is compatible with the general assumption of two
retrieval processes in recognition or limited to the two factor
theory. More specifically, the question is whether different two
process models that fit the data equally well will produce the same or
different estimates of the contributions of the two processes, when the
models are applied to data from include/exclude recognition
experiments. To address this question, we followed the same logic as
with comparisons of SAM and the two factor theory: We fit the Atkinson
and Juola model to include/exclude data and then compared the parameter
estimates from the Atkinson and Juola model to the parameter estimates
from process dissociation.
The data chosen to model were those from Yonelinas' Experiment 1, shown
in Table 6. The first step in the analysis of the Atkinson and Juola
model was to find parameters of the model that would fit Yonelinas'
data. The model was originally intended to deal with retrieval of
words from lists that were so highly memorized that the search process
would always give perfect performance. That was not the case in
Yonelinas' experiment, and so we assumed some lesser degree of
learning. We assumed that study led to a higher degree of learning for
words from short lists than long lists, so that the distributions of
values of familiarity used by the familiarity retrieval process were
ordered, with the mean of the new word distribution set at zero, the
mean of the distribution for words from long lists above zero, and the
mean of the distribution for words from short lists farther above
zero. These distributions were assumed to be normal, each with
associated variance.
At test, the familiarity retrieval process determines whether the
familiarity of a test word is above the positive criterion or below the
negative criterion, and if it is, then a response is executed. If
familiarity is between the two criteria, then in the original model the
result of the search process determined the response (always accurate).
In our application, we assumed the search process would not always
succeed. We added two parameters: p, the probability that the search
process successfully finds a word in a studied list, and q, the
probability of a positive response if the search fails (see Atkinson,
Herrmann, & Westcourt, 1974, p. 113, footnote 5).
In the include condition, a positive response can come about if
the familiarity of a test item is above the positive criterion, or if it
is between the two criteria and the search process is successful or there
is a positive guess:
P(I) =
P(F>C
In the exclude condition, a positive response can come about if the
familiarity of a test item is above the positive criterion, or if it
is between the two criteria and the search process fails and there is a
positive guess:
P(E) =
P(F>C
The number of parameters is greater than the number of data points, and
the model can easily fit the data (the model was designed to deal
with reaction time data in addition to the accuracy data considered here).
In fitting the model, we discovered that the guessing process could trade
off against the familiarity process so that the same level of
performance could be obtained from a few positive responses due to high
familiarity and a high positive guessing rate, or many positive response
due to high familiarity and a lower guessing rate. To illustrate this,
we fit the model to the data twice. Table 6 shows the
results of the two different fits, and Table 7 shows the values of the
parameters that were used to produce the fits.
INSERT TABLES 6 AND 7 ABOUT HERE
The first conclusion is that a different two process theory can fit the
include/exclude data as well as Jacoby's two factor theory can. The
second issue is how the process dissociation method fares in light of
this mimicking. Process dissociation produces the estimates of the
contributions of familiarity and recollection shown in Table 6 for the
Jacoby model. Are these also accurate estimates of
familiarity and the search process for the Atkinson and Juola model?
From Equations 10 and 11 above, we can derive
estimates of familiarity and
search directly from the Atkinson and Juola model and compare them to
the estimates derived via process dissociation. The probability of a
yes response based on the search process is the probability of
executing a search multiplied by the probability of the search being
successful:
P(S) = p P(C
= P(I) - P(E).
For both fits of the model to the data, the estimate of the
contribution of the search process for the Atkinson and Juola model is
the same as the estimate of the contribution of the recollection
process for two factor theory. But the estimates of familiarity
derived from the two models are different. In Jacoby's model,
the unconscious familiarity process is not affected by list length. In the
Atkinson and Juola model, familiarity is affected by list length,
in either direction: in the first fit, familiarity is greater
for a long list than a short list and in the other fit, it is less.
This occurs because of the trading off mentioned above between the
different components of the model,
search, familiarity, and guessing.
In the two fits we present, familiarity and guessing trade off against each
other. This is not a positive aspect of the fits of the model to data,
but it is to be expected when a limited range of data is modeled relative to
the range of data for which the model was designed.
The conclusion offered by these simulations is that the include/exclude
data do not support estimates of two components that are the same for
all two process models. This is similar to the situation when two
factor theory was compared to SAM. The picture given of conscious
versus unconscious processing is different when it is drawn from
Atkinson and Juola's model than when it is drawn from two factor
theory.
In this section, we investigate the predictions of two factor theory
for the shapes of z-ROC curves for recognition memory and test those
predictions against the data from an include/exclude experiment. It
has become clear from recent research that the global memory models are
inconsistent with the slopes of the z-ROC curves obtained in
recognition memory experiments. If it turned out that two factor
theory was consistent with the slopes, then this would constitute major
support for that theory.
The problem with z-ROC curves for the global memory models arises
because of the models' assumptions about the relative variability in
familiarity values for old versus new test items. Empirically, the ratio of
the standard deviation of new item familiarity values to the standard
deviation of old item familiarity values can be obtained from signal
detection theory using confidence judgment data. This is done by
plotting the z-transforms of the hit and false alarm rates against each
other for each level of confidence to produce a z-ROC curve. For the
global memory models, the underlying distributions of familiarity
values are normal (either directly or by the central limit theorem
applied to sums of discrete values), so the slope
of the z-ROC is the ratio of the new item standard deviation to the old
item standard deviation,
Yonelinas (in press; Jacoby, Toth, & Yonelinas, 1993) proposed an
explanation of the z-ROC slopes in terms of two factor theory. In
two factor theory, the distributions of familiarity values for old and
new test items are assumed to be normal and to have equal variance, so
familiarity alone would lead to a z-ROC slope of one. This assumption
of equal variance is usually justified as derived from "standard signal
detection theory." However, standard signal detection was applied
to experimental tasks where the noise (in auditory perception say) was
the same in signal and noise conditions and a constant signal was added
to noise to produce the signal condition. In most of the work, the
signal and noise distributions were allowed to have different standard
deviations and the equal variance case was seen as a special case.
There is no special reason to assume equal variances other then the
appeal to standard theory, and allowing the ratio of variances to be
derived from the fits of the model to data would not lead to a test of
Yonelinas's proposal.
According to Yonelinas' account, the slope of the z-ROC is less than 1
because of the recollection process. When subjects make high
confidence positive responses, some of them are based on familiarity
and some on recollection. The addition of the recollection based
responses in the high confidence category causes an increased standard
deviation for old items, which in turn makes the slope of the z-ROC
less than one. With this assumption about recollection contributing to
high confidence positive responses, Yonelinas attempted to show that
two factor theory was consistent with z-ROC functions observed in his
experiments.
Yonelinas' proposal provides a test of an inherent prediction of two
factor theory. Previous support for the theory has been the intuitive
reasonableness of the theory's accounts of patterns of dissociations
and patterns of the relative contributions of recollection and
familiarity to processing. For example, while it might be reasonable
that list length affects recollection but not familiarity as in
the experiments discussed above, two factor theory itself does not make that
prediction. The theory alone would be equally consistent with the
opposite outcome. In contrast, if Yonelinas' proposal about z-ROC
curves fails, then the signal detection assumptions of two factor
theory fail. In the sections that follow, we present the results of
several different evaluations of Yonelinas' proposal, and show that the
proposal is not consistent with empirical z-ROC curves.
For our first analysis, we calculated what the shape of z-ROC curves
should be according to two factor theory. We assumed that the theory
was correct, that recognition performance in confidence judgement tasks
is based on the two processes, familiarity and recollection, and then
examined the forms of predicted hypothetical z-ROC curves.
We began with data from an experiment by Ratcliff et al. (1994;
Experiment 4). In that experiment, subjects studied lists of words
that were either high or low frequency, encoded either strongly or
weakly (i.e., studied for a short time or a long time). At test,
subjects were instructed to respond positively to any word that had
been studied, using a six-point confidence scale. This corresponds to an
"include" condition in that subjects are instructed to respond positively
to all studied words. From the z
transform of the hit and false alarm rates at each confidence level,
z-ROC curves were produced (as described at the beginning of the
appendix to this article). The experiment did not use an exclude
condition, so we could not calculate an empirical measure of
recollection by using the process dissociation method. But performance
in an include condition must, by two factor theory, depend on both
recollection and familiarity. We examined a range of possible values of
recollection, looking for a value that would make two factor theory
consistent with the empirically obtained z-ROC functions.
The methods by which we examined two factor theory's predictions for
z-ROC functions are described in detail below. Figure 3 shows the
results for one experimental condition (weakly encoded high frequency
words). The z-ROC curve obtained directly from the data is shown by
the diamonds, and it has the slope less than one that is characteristic
of recognition memory. The other z-ROC curves are predictions from two
factor theory, each based on a different hypothetical value of
recollection (the probability of a yes response based on
recollection, P(R), varied from 0 to 0.45). Not all of these values
for recollection are actually possible in two factor theory. What we
show in the following analyses is that in general there are no values
for recollection that are consistent simultaneously with two factor
theory and the empirical z-ROC functions.
INSERT FIGURE 3 ABOUT HERE
To generate the curves in Figure 3 and to test two factor theory
requires a multistep algorithm that is given in detail in the
appendix. The algorithm begins with data from confidence judgements
with "include" instructions, that is, subjects are instructed to
respond positively to all studied items. The algorithm first uses
confidence judgement data to obtain a d' value for the familiarity
process (see Appendix, steps 1-3). This is done by collapsing over
the positive half of the confidence categories to get one hit rate and
one false alarm rate. From this hit rate, this false alarm rate, a
hypothetical value of P(R), the process dissociation equations, and the
two factor theory's assumption that familiarity distributions are
normal with equal variance, a d' can be calculated for the familiarity
process alone, separate from the hypothetical recollection process.
Once d' is obtained, then it can be used with the empirical false alarm
rates for the different confidence judgement categories to obtain a
familiarity based hit rate for each confidence category. (Because of
the assumptions of normality and equal variance for old and new item
distributions of familiarity, the familiarity based hit rates and the
empirical false alarm rates must give a z-ROC slope of one.)
The algorithm gives the familiarity based hit rate for each confidence
category and the data give the false alarm rate for each category. To
generate the predicted z-ROC curve for familiarity plus recollection,
the hypothetical value of P(R) can be added back in at each confidence
category to give predicted hit rates for familiarity and recollection
combined (see Appendix, step 4). These predicted hit rates and the
empirical false alarm rates are then used to give the predicted z-ROC
curve. The z-ROC curves for the 10 hypothetical values of P(R) shown
in Figure 3 were generated in this way. Not all of the 10 values of
P(R) are actually possible; values of 0.35 and above (very high values
of recollection) give d' values for the familiarity based process that
are not greater than zero. None of the remaining predicted z-ROC
curves match the shape of the real z-ROC curve from the data. That
curve overlaps the curve for the third lowest value of recollection at
lower z
The method just described for generating predicted z-ROC curves uses
the same hypothetical value of P(R) for all confidence categories to
predict the hit rates for recollection and familiarity combined.
Another way of generating predicted z-ROC curves is to use the
empirical hit rates to predict what the values of P(R) should be for
each confidence category (see Appendix, step 5). The hit rates and the
false alarm rates from the data and the familiarity based hit rates
from the algorithm are used to predict what the probability of
recollection should be at each confidence interval (P(R)
Rejecting these values and those for which d' for the familiarity based
process was not greater than zero (.35 and above), leaves only the
hypothetical P(R) values of .2 to .3. For these values, the bend of
the ROC curve (the U shape) is quite large, large enough to be
empirically detectable (see Figure 3). Examination of empirical z-ROC
functions (including many from single subjects tested over many
sessions) in Ratcliff, McKoon, & Tindall, (1994) shows only a small
fraction of the total cases for which the z-ROC functions have this
shape, so data do not in general support this two factor theory
prediction.
The values of P(R) can be submitted to a further constraint. The
predicted probabilities of recollection (P(R)
Yonelinas (in press) also obtained P(R)
The preceding analyses were based on data from an experiment in which
there were only include conditions. Without an exclude condition,
there is no way to estimate recollection directly from the data. All
the analyses were based on hypothetical values of recollection. To
pursue the analyses, we collected include/exclude data using the list
discrimination procedure that Yonelinas (in press) used in his
experiments. Subjects studied two lists of words, and then they were
cued as to whether the words of the first list or the words of the
second list were to be given positive responses.
To provide the strongest test of two factor theory, we chose
experimental conditions for which the slope of the z-ROC curve would be
farthest from predictions from two factor theory for the familiarity
process alone, that is, a slope as much less than one as possible.
This is a strong test because the recollection process has to be
assumed to move the slope far from one. We also picked conditions in
which recollection seemed most unlikely to be able to do this. If
recollection could move the slope far from one, under conditions where
recollection was intuitively unlikely, then two factor theory would
have passed a strong test. The conditions we used were low frequency
words studied at a fast presentation rate. These conditions give low
z-ROC slopes and a low probability of recall, which suggests a low
probability of recollection (see Glanzer & Adams, 1990; Glanzer et al.,
1993; Ratcliff et al., 1994).
Subjects. The subjects were 8 undergraduates from Northwestern
University paid to participate in the experiment.
Each subject participated in one 50 min
session.
Materials. The pool of 865 low frequency words used by Ratcliff et
al. (1994) was used for this experiment. For the experimental study
and test lists, only words from this pool were used. There was also a
pool of high frequency words used only for practice lists.
Procedure and Design. All stimuli were presented on the screen of a
PC, and keys of the PC's keyboard were used to record responses.
Each block of the experiment consisted of two lists of words to be
studied followed by a single test list. There were 16 words in each
study list, presented at a rate of 750 ms per word. The beginning of
each list was signaled to the subjects by the instruction to press the
space bar on the keyboard. At the end of the second list, subjects
were given an instruction to tell them for words of which list they
were to give a positive response. They were also instructed to flip an
index card to show which was the positive list; the card was used to
make sure subjects noted the instruction and to serve as a reminder if
they needed one during the list. There were 48 words in the test list,
16 from the first studied list, 16 from the second, and 16 new words
that had not appeared on either studied list. The words of the test
list were presented one at a time, each remaining on the PC screen
until a response key was pressed. There was a 250 ms blank screen
following each response and then the next test word was presented.
Subjects were instructed to respond on an eight point confidence scale,
with responses ranging from extremely sure negative to extremely sure
positive. For the positive end of the scale, the m, comma, period,
and ?/ keys of the keyboard were used. For the negative end, the
keys were z, x, c, and v. Labels for the response keys were shown
on the index card that subjects flipped to show which list required
positive responses. Subjects were instructed to try to use the full
range of response keys.
There were 20 blocks in an experimental session, the first two used
only for practice. For half of the blocks, a positive response was
required for the first study list and for the other half of the
blocks, a positive response was required for the second study list.
Words for the study lists, new words for the test lists, and the orders
of presentation of words in study and test lists and the order of the
two kinds of blocks were decided randomly, with the randomization
changed after every second subject.
For test words that were from the study list designated for positive
responses (the include condition) the proportion of positive responses
was 0.553. For test words from the other study list (the exclude
condition) the proportion of positive responses was 0.418. For new test
words, the proportion of positive responses was 0.147, which leads to d'
values (based on the equal variance assumption) of 1.18 for the include
condition and 0.84 for the exclude condition. There were over 2000
responses in each of the include, exclude, and new test word conditions,
giving good stability to the data.
Figure 5 shows the z-ROC curve for test words in the include condition.
According to Yonelinas' application of two factor theory,
responses to
these test words should be based on both familiarity and recollection.
Using process dissociation, we plotted the z-ROC curve for familiarity
alone.
Following Yonelinas (in press), we
estimated the probability of recollection (P(R)) by calculating a hit
rate for the exclude condition and a hit rate for the include condition,
and subtracting the exclude hit rate from the include hit rate (Equation
1). The hit rates were calculated by summing the numbers of responses
in each of the high positive, high medium positive, low medium
positive, and low positive confidence categories and dividing by the
total number of responses across all confidence categories for the
class of items (include or exclude). From P(R) and the hit rate at
each confidence category in the include condition, a familiarity based
hit rate was calculated for each confidence category (Equations 1 and
2). The familiarity based z-ROC was obtained from these hit rates and
the false alarm rates from the data.
The estimates of the slopes of the z-ROC curves and their intercepts
along with the standard errors in those estimates are shown in the first
three rows of Table
8. As expected from two factor theory, the recovered familiarity
slopes are nearer one than the slopes for the include or exclude
conditions. But they are still significantly different from one.
The slopes for the include and exclude conditions are around 0.7 (in the
range of those found by Glanzer & Adams, 1990, and Ratcliff et al., 1994).
The derived familiarity based z-ROC has a slope of 0.857 with a standard
error of 0.045 which means it is significantly different from one.
INSERT FIGURE 5 AND TABLE 8 HERE
According to two factor theory, the slopes of z-ROC curves from
recognition memory experiments are
generally less than one because a recollection process contributes to
high confidence positive responses. Taking out this process should
leave familiarity alone, for which the z-ROC should be linear with a
slope of one. The data from the experiment presented here contradict
two factor theory. For experimental conditions designed to produce a
slope much less than one and low probability of recollection
(conditions that provide an extreme test of two factor theory), the
familiarity slope was significantly different from one.
The failure of two factor theory to predict the data raises a question
about the use of signal detection theory as the process that underlies
familiarity.
Two factor theory assumes that the old item and new item distributions
of familiarity have equal standard deviations. This assumption is
questionable. What it means is that if an item is studied, its
familiarity is increased by an amount that is constant no matter what
its position was in the new item distribution. For example, an item
originally with familiarity one standard deviation below the mean of
the new item familiarity distribution will have, after study,
familiarity exactly the same distance below the mean of the old item
distribution. This implies that, for the familiarity based process,
there will be no item effects in learning-
no items easier to learn than others beyond baseline
differences- and this seems to contradict what we know about item
effects. The assumption of equal standard deviations comes from the
classical application of signal detection theory to perception where a
fixed signal is added to noise and so the z-ROC is expected to have a
slope of one. It is not obvious that the assumption should transfer to
memory, and it appears that it does not work when combined with two
factor theory. Unfortunately, if the assumption of equal standard
deviations in old and new item familiarity is dropped from the two
factor theory, then the theory has no way to predict either the shapes
or the slopes of z-ROC curves so the theory will not be constrained;
it will be consistent with almost any reasonable slope of the z-ROC
function.
Another point that should be made explicit about the use of signal
detection, a point of difference between the two factor theory and the
global memory models, is that in two factor theory, a test item
contacts its representation in memory to read out its familiarity (as
in strength theory, Norman & Wickelgren, 1969). The test item does not
contact other items in memory; if it did, the effects of these other
items would be included in the determination of standard deviations in
familiarity values as they are in the global memory models.
According to the two factor theory, z-ROC curves should have slopes
equal to one after applying process dissociation. In general, if the
slope of the empirical include (and exclude) z-ROC is less than one,
application of process dissociation is guaranteed to produce a slope
nearer one. Therefore, finding that the estimated the slope of the
recovered familiarity z-ROC slope is nearer one than the slope of the
data is not a strong test of the theory.
To illustrate this point in more detail, assume that recognition
confidence judgement responses come from a single underlying strength
process, such as in the SAM model, with no second recollection
process. Further assume that there are three different distributions
of strength values, one for new items (mean=0, SD=1.0), one for items
to be excluded (mean=1.0, SD=1.25), and one for items to be included
(mean=1.5, SD=1.25). Distributions like these are what the z-ROC data
from Ratcliff et al. (1992) imply if the familiarity distributions are
normal. The distributions are shown in Figure 6, with seven confidence
judgement criteria. If this were a true description of underlying
processing arising from a single familiarity based process (e.g., one
of the global memory models), there would be no recollection component,
but the process dissociation equations could still be applied to the
data. For the purposes of this illustration, the estimate of
recollection is taken to be the difference between the include and
exclude distributions at the highest confidence response category; the
difference is 0.12. Then the slope of the recovered z-ROC for the
hypothetical familiarity process is 1.0 (obtained as for the
experimental data above). Both the original z-ROC curve for the
include data and the recovered familiarity z-ROC are shown in Figure
7. The slope of the recovered curve is nearer one than the slope for
the original data. The bottom two lines in Table 8 show the linear
regressions for the include condition (the slope is the ratio of the
standard deviations, 1.0/1.25, and the intercept is the difference in
means, 1.5, divided by the included distribution standard deviation,
1.25) and the "familiarity" z-ROC regression slopes and intercepts.
INSERT FIGURES 6 AND 7 ABOUT HERE
This example illustrates that the process dissociation method is
guaranteed to make the slope of the z-ROC larger. The method does
so because it removes probability density from the upper right hand
tail of the observed distribution which reduces the standard deviation for
the old items.
Figure 7 also shows a bend at the high confidence end of the
recovered z-ROC curve similar to that in Figure 3. If two factor
theory were correct, the z-ROC curve should be linear, with no bend.
For a single process model, if the original distributions are normal,
then removing probability density from the high end necessarily results
in the bend in the z-ROC function. In other words, the bend
indicates a non-normal distribution in "familiarity" after process
dissociation, a contradiction of two factor theory but consistent with
single process models. The appearance of an bend in the recovered
familiarity z-ROC is a strong pointer to a failure of process
dissociation under the assumption of normal distributions (see also the
"familiarity z-ROC function in Figure 5).
The process dissociation method is very appealing. It offers a method
of separating conscious from unconscious components of processing, with
the hope that such a separation will lead to better understanding of
both. If correct, the method would begin to solve age old questions
about the relative contributions of the conscious and unconscious to
processing in any task. But the method is built upon a specific model
of processing (as noted by Jacoby, 1991) and it must be considered in
that context, not as a model-free procedure and not as a purely empirical
procedure.
The potential strength of the process dissociation method lies in the
include versus exclude manipulation. In recognition memory, for each of
the sets of data we considered, the account of include/exclude data
given by two factor theory was different than that given by other
theories. It follows that the explanation of how unconscious
processing is affected by experimental variables will be different for
the different theories. For example, for the experimental results
considered in this article, if the SAM account of recognition is
correct, then the estimates from process dissociation of how
familiarity and recollection are affected by experimental variables are
wrong, or if the process dissociation estimates are correct, then the
SAM account is wrong.
The strongest form of the logic of our argument is: Suppose SAM is the
correct description of underlying processing and we generate data from
SAM; then if we apply process dissociation to produce estimates of the
contributions of two processes, those estimates will be incorrect
because the data came only from one process. We fit SAM to
experimental data to be sure this argument applies in the range of
normal performance on recognition with include and exclude
instructions.
The obvious challenge that arises from this situation is to find some
way of choosing which theoretical account is correct. A traditional
measure is falsifiability. For recognition memory, SAM can potentially
fail in a multiplicity of ways internal to itself by making predictions
that are incorrect. In contrast, two factor theory has just two
assumptions that can lead to internally generated predictions. One is
the assumption that familiarity is described by signal detection theory
with equal variance in old and new item values of familiarity. We
discussed this assumption in earlier sections of this article, and
showed that it can be falsified. However, it can be viewed as an
auxiliary assumption unnecessary to the basic two factor theory.
Process dissociation can still be applied to data, with or without the
signal detection theory assumptions. The second potentially
falsifiable assumption of two factor theory is that the recollection
and familiarity processes are independent. This assumption has been
criticized by Curran and Hintzman (in press) and Joordens and Merikle
(1993). However, whatever the
result of those critiques, two factor theory can emerge with the process
dissociation method intact. Even if there is dependence between the
two factors, their relative influences on performance can still be
computed from data (see Joordens & Merikle, 1993). Process
dissociation is applied to exactly two performance measures
(probability of a positive response in the include condition and
probability of a positive response in the exclude condition), and two
measures can always be fit by two parameters, so the model is not
falsifiable at this level.
The SAM model has been very successful with recall phenomena
(Raaijmakers & Shiffrin, 1981) and with recall and recognition
interactions (Gillund & Shiffrin, 1984). With relatively few
assumptions, it affords a reasonably coherent view of the effects on
performance of
a large number of independent variables in terms of the behavior
of underlying parameters. It might seem that the model
has enough freedom and enough parameters to deal with any pattern of
experimental results. This would be correct if there was a one to one
correspondence between parameters and empirical effects such that
adjustments to one parameter completely controlled predictions for one
variable, or if all the parameters varied in unprincipled ways to
account for the effects of every variable.
However, this is not the case. There are many situations in which the model
is tightly constrained, and it requires insight into the structure of
the model to determine what situations provide such constraint. One way
in which predictions have been falsified is
SAM's failure to predict
the behavior of the z-ROC data discussed earlier in this article (see
Ratcliff et al., 1992; Ratcliff et al., 1994). SAM fails, in part,
because of the way variability is introduced into the encoding process,
and changing this would result in a new and different model (see also
Ratcliff, Shiffrin, & Clark, 1990; Shiffrin, Ratcliff, & Clark, 1990).
The point is that there are potentially multiple ways (some not
intuitively apparent) in which SAM could be falsified by failures of
predictions generated from its assumptions. But, in fact, the model
has been remarkably accurate in its predictions (both qualitatively and
quantitatively, as exemplified by the following list of
successes:
List length. In the typical experiment, subjects do not know whether a
list of words they are given is going to be a long list or a short
list, so the encoding parameters of SAM remain constant across
different list lengths (the self, interitem, and residual strength
parameters). Increasing list length simply increases the number of
items encoded into memory. The result is larger values of familiarity
for items from longer lists and larger variability in their familiarity
values. For example, for a new item presented as a test word,
familiarity is twice as large for a list twice as long. This means
that the familiarity criterion that separates positive from negative
responses has to be moved as list length changes in order to keep it
between the old and new item distributions (Gillund & Shiffrin, 1984,
p. 64, and see the sections above where SAM was fit to Yonelinas and
Jacoby's data).
When test items are presented, they must enter the short-term memory
buffer just as study items do. They add to the items from the
study list to increase the total number of items in the experiment.
Therefore, their effects are modeled in the same way as variations in
list length, and effects due to the position of a test word in the test
list are accurately predicted.
Study time. The only variable that changes as a function of study
time is the amount of strength that accumulates during encoding for
each studied item. The strength parameters are fixed,
multiplied by the amount of time an item spends in the encoding buffer
(with some scaling factor). As with list length, the criterion has to
be adjusted to keep it between the old and new distributions because
both old and new items have higher familiarity values for lists with
longer study times (see Gillund & Shiffrin, 1984, p. 64, and above).
List context effects. The context parameter in SAM was designed to
allow retrieval to focus on subsets of information in memory, for
example items from studied lists versus all the other items in memory, and
items from one studied list versus another. In modeling list
context effects, none of the encoding parameters can vary. The only
parameter that can be adjusted is the weight placed on the context
parameter for each context in the retrieval cue.
Rehearsal instructions. With "maintenance" instructions, subjects
are instructed to rehearse each item during its entire presentation
time and not to rehearse any other items during that time. With
"elaboration" instructions, they are instructed to use the presentation
time for an item to relate it to other items in the study list. The
only parameters adjusted to fit data for this manipulation are the
self-strength and inter-item strength encoding parameters (Gillund &
Shiffrin, 1984, p25).
Item effects. Similarity of distractor test words is modeled by
varying the residual strength parameter (with small adjustments to the
criterion). Word frequency effects are also explained with the
residual strength parameter: lower frequency distractors have a lower
residual strength, so they are farther from the distribution of studied
items than would be higher frequency distractors. For a complete
discussion of word frequency and its interactions with other variables,
see Gillund and Shiffrin (1984).
Demonstrations of the kind just summarized show how SAM accounts for
empirical data in principled ways, and point to the most salient
contrast between the global memory models and two factor theory. The
global memory models' goal is to simultaneously and quantitatively
explain the effects of a number of different variables on recall and
recognition (and other tasks for some of the models). The goal of two
factor theory is to explore possible dissociations between a conscious
retrieval process, recollection, and an unconscious process,
familiarity. If global memory models are ultimately found wanting, it
will likely be because they are internally falsified by their own
predictions. If two factor theory ultimately comes to be viewed with
suspicion, it will likely be because its explanations of data are
implausible for external reasons. Of course, both models should
provide reasonable interpretations of data across a range of variables
and tasks. But this kind of evaluation is difficult because what is
reasonable must be defined from outside the theories. If we
hypothesized that some variable would affect familiarity in the two
factor theory and residual strength in SAM, but we turned out to be
wrong- the variable affected two factor theory's recollection and SAM's
focussing weights- then it would not be the theories that failed but
our intuitive hypotheses. Failure of our intuitions is not necessarily
grounds for rejection of a model; the model might be correct and our
intuitions wrong. However, at the same time, two factor theory gains
considerable strength from the consistency of its estimates of
familiarity and recollection across a range of tasks. The prediction
of this consistency and the evaluation that the prediction is a
reasonable one come not from two factor theory but from other sources
external to the theory.
Our conclusions apply to recognition memory, the domain of testing in
this article. We believe testing can be especially provocative in
this domain because there exist several well developed models. But for
many tasks to which process dissociation might most fruitfully be
applied, such as those that have been used to postulate implicit memory
systems, there are no other well developed models against which to test
two factor theory. For those domains, two factor theory will serve as
the default model against which future models will be tested.
Atkinson, R. C., Herrmann, D.J., & Westcourt, K.T. (1974). Search
processes
in recognition memory. In R.L. Solso (Ed.), Theories in cognitive
psychology: The Loyola Symposium (pp. 101-146), Hillsdale, NJ:Erlbaum.
Atkinson, R. C., & Juola, J. F. (1973). Factors influencing the speed
and accuracy of word recognition. In S. Kornblum (Ed.), Attention
and Performance IV, New York: Academic Press. Pp. 583-612.
Curran, T. & Hintzman, D. (in press). Violations of the independence
assumption in process dissociation. Journal of Experimental
Psychology: Learning, Memory, and Cognition.
Dosher, B.A., & Rosedale, G. (1989). Integrated retrieval cues as a
mechanism for priming in retrieval from memory. Journal of
Experimental Psychology: General, 2, 191-211.
Gillund, G., & Shiffrin, R.M. (1984). A retrieval model for both
recognition and recall. Psychological Review, 91, 1-67.
Glanzer, M., Adams, J.K., Iverson, G.J., & Kim, K. (1993). The
regularities of recognition memory. Psychological Review, 100,
546-567.
Glanzer, M., & Adams, J.K. (1990). The mirror effect in recognition
memory: Data and theory. Journal of Experimental Psychology: Learning,
Memory, and Cognition, 16, 5-16.
Hintzman, D. (1988). Judgments of frequency and recognition memory
in a multiple-trace memory model. Psychological Review,
95, 528-551.
Hintzman, D.L. (1990). Human learning and memory: Connections and
dissociations. Annual Review of Psychology Annual, 41, 109-139.
Jacoby, L. L. (1991). A process dissociation framework: Separating
automatic from intentional uses of memory. Journal of Memory and
Language, 30, 513-541.
Jacoby, L.L. & Dallas, M. (1981). On the relationship between
autobiographical memory and perceptual learning. Journal of
Experimental Psychology: General, 3, 306-340.
Jacoby, L. L. & Kelley. C. M. (1992). A process-dissociation framework
for investigating unconscious influences: Freudian slips, projective
tests, subliminal perception, and signal detection theory. Current
Directions in Psychological Science, 1, 174-179.
Jacoby, L.L., Toth, J.P., & Yonelinas, A.P. (1993). Separating conscious
and unconscious influences of memory: Measuring recollection. Journal of
Experimental Psychology: General, 122, 139-154.
Jacoby, L. L., Woloshyn, V., & Kelley, C. (1989). Becoming famous
without being recognized: Unconscious influences of memory produced
by dividing attention. Journal of Experimental Psychology: General,
118, 115-125.
Jacoby, L.L., Yonelinas, A.P., & Jennings, J. (in press). The relation
between conscious and unconscious (automatic) influences: A declaration
of independence. To appear in J. Cohen & J.W. Schooler (Eds.),
Scientific approaches to the questions of consciousness. Hillsdale,
NJ: Erlbaum.
Joordens, S. & Merikle, P. M. (1993). Independence or redundancy? Two
models of conscious and unconscious influences. Journal of Experimental
Psychology: General, 122, 462-467.
Kendall, M.G., & Stewart, A. (1976), The Advanced Theory of
Statistics, Vol 1. New York: McMillan.
Mandler, G. (1980). Recognizing: The judgment of previous occurrence.
Psychological Review, 87, 252-271.
Mandler, G., & Boek, W.J. (1974). Retrieval processes in recognition.
Memory and Cognition,, 2, 613-615.
McKoon, G., & Ratcliff, R. (1992b). Spreading activation versus compound
cue accounts of priming: Mediated priming revisited. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 18,
1155-1172.
Monsell, S. (1978). Recency, immediate recognition memory, and reaction
time. Cognitive Psychology, 10, 465-501.
Murdock, B.B. (1982). A theory for the storage and retrieval of item
and associative information. Psychological Review, 89, 609-626.
Norman, D.A. & Wickelgren, W.A. (1969). Strength theory of decision
rules and latency in short-term memory. Journal of Mathematical
Psychology, 6, 192-208.
Nosofsky, R.M. (1988). Exemplar-based accounts of relations between
classification, recognition, and typicality. Journal of Experimental
Psychology: Learning, Memory, & Cognition, 14, 700-708.
Raaijmakers, J.G.W. & Shiffrin, R.M. (1981). Search of associative
memory. Psychological Review, 88, 93-134.
Ratcliff, R., Allbritton, D.W., & McKoon, G. (1994). Manuscript in
preparation.
Ratcliff, R., Clark, S. E., & Shiffrin, R.M. (1990). The list-strength
effect: I. Data and discussion. Journal of Experimental
Psychology: Learning, Memory, and Cognition, 16, 163-178.
Ratcliff, R., & McKoon, G. (1988b). A retrieval theory of priming in
memory. Psychological Review, 95, 385-408.
Ratcliff, R., & McKoon, G. (1994a). Bias and the priming
of object decisions. In press, Journal of
Experimental Psychology: Learning, Memory, and Cognition.
Ratcliff, R., & McKoon, G. (1994b).
Bias effects in implicit memory and information processing. Submitted
Ratcliff, R., & McKoon, G. (1994c). Retrieving information from
memory: Spreading activation theories versus compound cue theories.
Psychological Review, 101, 177-184.
Ratcliff, R., McKoon, G., & Tindall, M. H. (1994). Empirical generality
of data from recognition memory receiver-operating characteristic
functions and implications for the global memory models. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 20, 763-785.
Ratcliff, R., Sheu, C-F., & Gronlund, S. (1992). Testing Global Memory
Models using ROC Curves. Psychological Review, 99, 518-535.
Richardson-Klavehn, A., & Bjork, R.A. (1988). Measures of memory. Annual
Review of Psychology, 39, 475-543.
Schacter, D.L., Bowers, J., & Booker, J. (1989). Intention, awareness,
and implicit memory: The retrieval intentionality criterion. In S.
Lewandowsky, J.C. Dunn, & K. Kirsner (Eds.), Implicit memory:
Theoretical issues (pp. 47-65). Hillsdale, NJ: Erlbaum.
Shiffrin, R.M., Ratcliff, R., & Clark, S. E., (1990). The list strength
effect: II. Theoretical mechanisms. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 16,
179-195.
Squire, L.R. (1992). Memory and the hippocampus: A synthesis from
findings with rats, monkeys, and humans. Psychological Review, 99,
195-231.
Tulving, E., & Schacter, D.L. (1990). Priming and human memory systems.
Science, 247, 301-306.
Warrington, E.K., & Weiskrantz, L. (1968). New method of testing
long-term retention with special reference to amnesic patients. Nature
(London), 217, 972-974.
Yonelinas, A.P. (in press). Receiver operating characteristics in
recognition memory: Evidence for a dual-process model. Journal of
Experimental Psychology: Learning, Memory & Cognition.
This research was supported by NIMH grants HD MH44640 and MH00871
to Roger Ratcliff and by NIDCD grant R01-DC01240 and NSF grant
SBR-9221940 to Gail McKoon.
Correspondence concerning the article should be addressed to Roger
Ratcliff, Psychology Department, Northwestern University, Evanston, IL,
60208.
A hit rate and a false alarm rate for
each of six confidence categories can be calculated from data as follows:
Assume that the numbers of responses for old items in each category are
n
Note: An empirical z-ROC curve can be generated from the z scores for
these hit and false alarm rates for each of the categories; the z-ROC for
one experimental condition from Ratcliff et al. (1994) is shown by the
diamonds in Figure 3.
TO TEST TWO FACTOR THEORY: ASSUME SOME VALUE of P(R) (between 0 and 0.45); then:
1. Assume that the top half of the confidence judgement categories
are all positive responses in order to calculate the hit rate hF
hF
2. In the two factor theory, distributions of familiarity for old and
new items are assumed to be normal with equal variance, so d' tables can
be used to calculate d' for the familiarity process from hF
3. With d' for the familiarity process, we can then calculate hit rates
based on familiarity alone for all the confidence categories (hF
***There are two alternatives, 4 and 5 below, that can follow from this
point and both are used in this article.
4. Using process dissociation again, we can
add the assumed value of P(R) back to the familiarity process at each
confidence level to
generate the predicted empirical z-ROC curve for familiarity and
recollection combined,
plotting the predicted hit rates (hP
hP
Curves obtained in this way for 10 values of P(R) are plotted in Figure 3.
This way of generating a predicted z-ROC function assumes that P(R) is
constant across confidence categories. P(R) was calculated by summing
across the positive response categories; most of the recollection based
responses should have come from the highest confidence positive
category but it is possible some recollection based responses would
occur in the medium and low confidence categories. If so, then the
predicted hit rates might be a little too large in the upper right hand
corner of the z-ROC function. However, allowing the amount of
recollection at each of the positive confidence levels to be free
parameters would weaken the predictive power of the model.
5. The values of hF
P(R)
These values of P(R)
The figures are not available from www and if they are needed, they should be
obtained from the authors.
jctable.ps
= 
S
where w1, w2, and w3 are the weights, S
.
)) can be
different from its value in an exclude condition (P(F
)). In fact, context was
originally (Gillund & Shiffrin, 1984) made part of the test probe to
allow the recognition process to focus on recently learned items and so
it is exactly the mechanism to deal with list discrimination effects.
Familiarity as defined in the two factor theory is not dependent on
list context; the value of familiarity for a test item is the same in
an include test condition as in an exclude test condition.
Experiments 1 and 3, Yonelinas (in press) and Experiment 1, Yonelinas &
Jacoby (1994)
The Process Dissociation Method
for read items is different from F
for read items, and
F
for heard items is different from F
for heard items).
In two factor theory, probability of a yes response based on familiarity
is the same in the include and exclude conditions.
))and in the exclude condition (P(F
)).
The figure also gives the empirical
probabilities of a positive response in the include and exclude
conditions (.80 and .29). As the figure shows, there is no value of
anagram self-strength that allows SAM to predict the empirical values.
SAM cannot simultaneously accommodate the read, heard, anagram, and new
test items in the include and exclude conditions with its single
familiarity process. In terms of the SAM model, it must be that the
anagram manipulation does more than just change the familiarity of the
anagram versus read items.
Estimating Recall from SAM:
Recall of anagram test words in both the
include and exclude conditions
)-P(R)P(F
), \t\t\t\t\t\t (6)
where P(R) is the probability of a positive response based on recall and
P(F
) is the probability in the include condition that the
familiarity value exceeds the criterion for a positive response.
)-P(R)P(F
), \t\t\t\t\t\t (7)
where P(F
) is the probability in the exclude condition that
the familiarity value exceeds the criterion for a positive response.
)-P(E))/P(F
), \t\t\t\t\t\t\t (8)
and eliminating P(R) from the above equations,
)/(1-P(F
)). \t\t\t\t\t\t\t (9)
Equation 9 shows that the ratio of the familiarity values for anagram
test items in the exclude versus include conditions is fixed by the
experimental data (P(E), .29, and P(I), .80 in Jacoby's experiment fix
the ratio in Equation 9 to be 1.45), and this in turn determines the
self-strength parameter for anagrams (because it is the only parameter
free to vary for anagram familiarity values). Across the possible
values of the self-strength parameter (Figure 1), the only value of
self-strength that produces the correct ratio is 4.91 (where P(F
)
is .68 and P(F
) is .46). Using these values in Equation 8 yields a
value for P(R) of .37.
Estimating Recall from SAM:
Recall of anagram test words in only the
exclude condition
).
The probability of a positive response for an anagram in the exclude
condition is the same as Equation 7,
)-P(R)P(F
),
The probability of recollection can be estimated from Equation 8 where
the value of P(F
) is obtained from Figure 1. In Figure 1, the
function for inclusion familiarity P(F
) reaches the value 0.8
(i.e., P(F
)=P(I)=0.8 from the data) at the point where
P(F
) = .533. With P(E) = .29 (from the data), P(R) is
estimated to be .456, using Equation 8. This is about 15% higher than
the value of P(R) estimated from the process dissociation equations.
The Atkinson and Juola Model Tested Against Include/Exclude Data

/
. The
data presented by Ratcliff et al. (1992) showed a roughly straight line
z-ROC function with slope of about 0.8 for both weakly encoded items
and strongly encoded items. The constant value of the slope across
different strengths of encoding is difficult if not impossible for the
current global memory models to accommodate. For example, SAM predicts
that the standard deviation of old item familiarity should increase
relative to the standard deviation of new item familiarity as a
function of overall level of familiarity. The predicted increase comes
from the way variability of encoding is introduced into the model.
Increasing
the mean value of strength that results from encoding (e.g. by
increasing study
time) increases the variance in the encoded strength values. This
assumption lies at the heart of the model; changing this assumption to
fit the z-ROC data would be tantamount to proposing a new model
requiring new fits to all experimental data. The difficulties
presented by the z-ROC data are similarly critical for the other global
memory models (Hintzman's model, 1988, also predicts that the standard
deviation for old item familiarity increases with strength,
and Murdock's model, 1982, predicts almost
equal standard deviations for old and new item familiarity values, see
Ratcliff et al., 1992).
z-ROC Curves based on Familiarity plus Recollection
).
These values must all be positive (at no confidence level can the
probability of a yes response due to recollection be zero or
negative). If any value of the hypothetical P(R) that was used to
generate the familiarity d' does lead to a negative or zero value of
any P(R)
, then it must be rejected as inconsistent with two
factor theory. Values of P(R) less than and equal to 0.15 in Figure 3
must be rejected for this reason.
) at each
confidence category must never decrease from the highest confidence
positive category to lower confidence categories. This is because hit
rates come from cumulating correct positive responses from the highest
confidence category down to the lower confidence categories, so the
number of responses based on recollection can only increase across
these categories, never decrease. To test this, we used data from
Experiment 4 in Ratcliff et al. (1994). Moving from highest positive
to lowest confidence categories is equivalent to increasing the false
alarm rate, and so the predicted probabilities of recollection can be
plotted against the false alarm rate as it changes across confidence
categories. This was done for all the conditions of the experiment,
strongly and weakly encoded low and high frequency words, for all
values of P(R) that did not yield d' zero or negative or recollection
less than zero. The results are displayed in Figure 4; each panel
shows a subset of the different P(R) values for different conditions of
the experiment. The value of P(R) used to generate the P(R)
is always the same as the middle value of P(R)
(because the
middle split is used to obtain d' and P(R) in the algorithm presented in
the appendix).
What the
panels show is that there are almost no values of P(R) that are
consistent with two factor theory; instead of holding constant or
perhaps increasing across confidence categories, the P(R)
generally decrease from the midpoint to the most confident negative
category.
values that decreased
like those in Figure 4. He attributed the decrease to floor and
ceiling effects on accuracy. However, most of the data for Figure
4 are not subject to floor and ceiling problems and so contradict two
factor theory.
Experiment 1
Method
Results
Discussion
from the Two Factor
Theory
through n
for the high confidence negative to the high
confidence positive categories and that the numbers of responses for new
items in each category are m
through
m
for the high confidence negative to the high confidence positive
categories. Then the hit and false alarm rates are
computed as follows: If N=
and
M=
, then the hit rate
for a category i is h
=
/N, and the
false alarm rate
for a category i is f
=
/M.
, the
probability based on familiarity alone of a yes response to an old test
item. (If the data use include and exclude conditions as in Experiment
1, then P(R) can be estimated from the difference in hit rates for the
two conditions.) Using process dissociation (Equation 1),
P(F)=(P(I)-P(R))/(1-P(R)), so
= (h
-P(R))/(1-P(R))
and
f
.
)
using the d' and the false alarm rates for those categories (f
).
) against false
alarm rates (f
):
= hF
+ P(R) - hF
P(R).
represent hit rates based on familiarity
alone, obtained by assuming that the value of P(R) is constant across
confidence categories. An alternative is to use the empirical
h
, the familiarity based hF
, and the process
dissociation equations to calculate P(R)
for each confidence
category:
= (h
- hF
)/(1 - hF
).
can be plotted for each of the f
(i.e., for each confidence category), as shown in Figure 4.