Two Factor Theory, Single Process Theories, and Recognition Memory

Roger Ratcliff

Northwestern University

Trish Van Zandt

Johns Hopkins University

Gail McKoon

Northwestern University

Address correspondence to Roger Ratcliff, Psychology Department, Northwestern University, Evanston, IL, 60208.

Abstract

According to Jacoby's two factor theory, performance in recognition memory is determined by the combination of an unconscious familiarity process and a conscious intentional recollection process. But according to the global memory models, performance in recognition is typically determined by only a single process. In this article, we compare the two factor theory, single process theories, and other two process theories against empirical and simulated data, outlining conditions under which data cannot discriminate between the theories and conditions under which a single process is not sufficient.


The hallmark of memory research over the past 25 years has been the development of models- theoretical accounts of how people perform when they are given tasks for which previously learned information is useful or required. The goal has been to account for patterns of experimental data within a concise theoretical framework, a framework that can also lead to predictions that can be tested against new data. With the development of data driven theories has come the realization that memory processes are not fully open to introspection. Our intuitions about how retrieval from memory operates are certainly incomplete and probably often wrong.

The newest challenge to intuitions and models alike has been the proliferation of demonstrations that memory can affect performance in the absence of awareness (e.g., Jacoby & Witherspoon, 1982; Schacter, Bowers & Booker, 1989; Warrington & Weiskrantz, 1968). Amnesiac patients show effects of prior learning without being able to remember the learning episode itself. Normal subjects show effects of prior learning on tasks which do not ask for or require the use of previously learned information, and their performance on such tasks is influenced differently by some variables than is their performance on direct tests of memory.

As demonstrations of memory without awareness have proliferated, so have ideas about how to understand the phenomenon. A proposal that has garnered much attention is that performance on indirect tests, those tasks for which direct recollection is not required, is mediated by an "implicit" memory system whereas performance on direct tests is mediated by a different system, an "explicit" memory system (Schacter & Tulving, 1991; Squire, 1992). Countering the notion of implicit memory systems are proposals that differences in performance on indirect versus direct tasks come about because of different processes operating in the same memory system (e.g., Hintzman, 1990; Jacoby & Witherspoon, 1982; Nosofsky, 1988). Also opposing the notion of an implicit memory system is Jacoby's idea (Jacoby, 1991; Jacoby & Dallas, 1981; Jacoby, Toth, & Yonelinas, 1993; Jacoby, Woloshyn, & Kelley, 1989; Jacoby, Yonelinas, & Jennings, in press) that performance on the two kinds of tasks is a mixture of exactly two processes, one conscious and the other unconscious, with indirect tasks relying more heavily on unconscious processes than direct tasks.

These different proposals illustrate the differences among researchers' intuitions. Clashes among intuitions have sometimes inspired models of memory. In an effort to understand memory without awareness, Jacoby (1991) has developed the two process idea into a model he has labeled "two factor theory"; one factor is an unconscious automatic process that assesses the familiarity of a probe to memory and the other factor is conscious recollection. Jacoby has applied the theory to a number of both direct and indirect tasks, including cued recall (Jacoby, Toth, & Yonelinas, 1993), fame judgements (Jacoby, Woloshyn & Kelly, 1989), stem completion (Jacoby, Yonelinas, & Jennings, in press), and perception of briefly flashed stimuli (Jacoby & Kelley, 1992). In each case, the theory leads to the same conclusion, that the familiarity process remains invariant across manipulations that would be expected to affect, and in fact do appear to affect, only the recollection process (Jacoby, Yonelinas, & Jennings, in press).

Two factor theory has also been applied to recognition memory (Jacoby, 1991; Yonelinas, in press). For this task, unlike some other tasks, two factor theory stands in opposition to previously developed models. The global memory models (Gillund & Shiffrin, 1984; Hintzman, 1988; Murdock, 1982; Ratcliff & McKoon, 1988) assume that performance in a recognition memory task typically depends not on two processes, but just one process. The single process is labeled "familiarity." Although Jacoby used the same label for one of the factors of two factor theory, familiarity is defined differently in the two theories. Two factor theory also assumes that the same familiarity process and the same recollection process operate in all memory tasks, whereas global memory models allow the possibility that different processes can operate in different tasks. The aim of the research described in this article was to pit the two factor theory of recognition against one of the global memory models, Gillund and Shiffrin's model SAM (1984). As both an alternative to the new implicit memory proposals and an alternative to the more traditional global memory models, Jacoby's model has the potential to play an important role in understanding retrieval processes, so it is critical to subject the model to the most stringent tests possible. In the sections that follow, we first describe Jacoby's two factor theory and its application to recognition, and then the global memory model SAM (Gillund & Shiffrin, 1984) with which we evaluated two factor theory. We then examine how both models can explain data from experiments that were originally interpreted as support for two factor theory. We also compare the two factor theory to a different two process theory, Atkinson and Juola's familiarity plus search model.

Jacoby's Two Factor Theory of Recognition Memory

The great interest in the proposal of implicit memory systems has come about because of task dissociations: indirect measures can show influences of prior experiences that are not shown by direct measures, and performance on indirect tasks can be affected by different variables than performance on direct tasks. One factor that has made these dissociations exciting to researchers is the linking of performance on indirect tasks to memory without awareness; that is, indirect tasks have appeared to offer the possibility of investigating and perhaps eventually understanding the unconscious.

However, to investigate unconscious processing, it must be separated cleanly from conscious processing. The problem is that task dissociations cannot necessarily accomplish this. As Jacoby pointed out, research using task dissociations to investigate unconscious processes has relied on the assumption that there is a one-to-one mapping between a task and a process; the methods that have been used require that a task be "factor-pure" for the process it is designed to measure (Jacoby, 1991). But indirect tasks do not necessarily provide pure tests of implicit memory; they can be contaminated by subjects' explicit recall of earlier experiences (cf Richardson-Klavehn & Bjork, 1988; Ratcliff & McKoon, 1994a; 1994b). Moreover, Jacoby argues, performance on direct tests can also be the result of a mixture of conscious and unconscious processes.

In Jacoby's view, unconscious influences in direct tests can take either of two forms. Usually, the effect of unconscious influences is to facilitate performance. In recognition, for example, the unconscious familiarity of a test item can lead to a correct positive response even when conscious attempts to recognize it fail. But unconscious influences can also interfere with performance on direct recollection. In this regard, Jacoby (1991) cites the early Warrington and Weiskrantz (1968) finding that when amnesics were asked to recall or recognize words from a studied list, incorrect responses were often intrusions from earlier lists. Normal subjects' performance can also show decrements from unconscious influences. In a study by Jacoby, Woloshyn, and Kelley (1989), subjects were asked to judge whether or not each name in a list was the name of a famous person. The judgements were preceded by study of some of the nonfamous names. Subjects were told that previously studied names were all nonfamous, so they could use retrieval of names from the studied list to reduce their likelihood of making the error of judging a nonfamous name famous. But when direct recollection was made difficult by adding a concurrent task to be performed during the fame judgements, the familiarity produced by prior study led to an increased probability of judging previously studied nonfamous names to be famous.

If performance on direct tests of memory is influenced by both conscious and unconscious processes, then empirically investigating either or both requires separating their influences on performance. By Jacoby's view, this can never be done through task manipulations because performance always reflects a mixture of conscious and unconscious processes. For instance a concurrent task might limit conscious intentional recollection but not eliminate it completely. Instead, the separation of conscious and unconscious processes must be accomplished theoretically. In Jacoby's two factor theory, it is assumed that there is one conscious intentional recollection process and one unconscious process, and the theory shows how the two combine to produce performance. Simultaneously, the theory provides a method for separating and identifying the contributions of each of the two processes, with the hope of eventually understanding them by examining how they are independently affected by different variables.

The process dissociation method uses a "commonsense approach of measuring intentional control (recollection) as the difference between performance when one is trying to as compared with trying not to engage in some act" of memory retrieval (Jacoby, Toth, & Yonelinas, 1993). In other words, performance on a task for which recollection facilitates the production of some class of responses is compared against performance on a task for which recollection facilitates the suppression of those responses. It is assumed that recollection alone is responsible for the difference between performance in the two cases, so the difference provides a measure of conscious recollection, from which a measure of unconscious familiarity can be calculated through the model explained below.

For recognition, two factor theory can be operationalized in a situation in which subjects study two lists of words (e.g., List 1 and List 2, Yonelinas, in press). In one condition, they are instructed to respond positively to words from List 1, and negatively to words from List 2 and to new words. In a second condition, they are instructed to response negatively to words from List 1 and to new words and positively to words from List 2. Thus, subjects are to try to respond positively to words from List 1 in the first condition, the "inclusion" condition, and to try not to respond positively to words from List 1 in the second condition, the "exclusion" condition. In the inclusion condition, both the unconscious process, labeled "familiarity", and conscious recollection contribute to the probability of a correct yes response for words from List 1:

P(Include) = P(I) = P(R)+P(F)-P(R)*P(F) = P(R)+P(F)*(1-P(R)) \t\t\t\t\t\t\t (1)
where P(R) is the probability of successful recollection and P(F) is the probability that the familiarity of a test item is higher than the criterion value that is necessary for a positive response. Both processes also influence performance in the exclusion condition. A yes response in the exclude condition (an incorrect response) is due to familiarity exceeding the positive response criterion when there is a failure of the recollection process:

P(Exclude) = P(E) = P(F)*(1-P(R)). \t\t\t\t\t\t\t (2)
The probability of recollection is calculated as the difference between the include and exclude scores (when a subject is trying to respond positively versus when the subject is trying not to respond positively):

P(R) = P(I)-P(E). \t\t\t\t\t\t\t (3)

Then familiarity is:

P(F) = P(E)/(1-P(R)). \t\t\t\t\t\t\t (4)

These equations are the two factor theory as applied to recognition memory. They are based on the assumption that recollection and familiarity have statistically independent influences on memory. That is, whether an item's familiarity is high or low has no bearing on its likelihood of recollection, and vice versa. An item's familiarity is unchanging across tasks; once encoded, an item's familiarity value is the same in any memory retrieval task. Jacoby, Yonelinas, and Jennings (in press) present support for the independence assumption in contrast to some other possible assumptions that might be made about the relation of the two factors (see Curran & Hintzman, in press).

In addition to the assumptions embodied in these equations, Jacoby (Jacoby, Toth, & Yonelinas, 1993; Yonelinas, in press) adopted signal detection theory as an account of familiarity. Values of familiarity are assumed to be distributed across a continuum from high to low, and subjects are assumed to respond according to whether the familiarity value of a test item is above or below a criterion amount of familiarity. To the standard signal detection theory assumptions, Jacoby and Yonelinas added the assumption that the distributions of previously studied and nonstudied items have equal variance.

In summary, the two factor theory has been proposed as a means of dealing with difficult issues that have plagued research on conscious versus unconscious processes, issues recently brought into sharp focus by findings of dissociations between tasks and effects that intuitively seem to involve conscious recollection and tasks and effects that intuitively seem to involve memory without awareness. Jacoby's theory states that two factors, unconscious familiarity and conscious recollection, influence performance independently on all tasks, and the theory provides a method for investigating their separate influences on performance as a means of eventual development of an understanding of both conscious and unconscious processing.

Support for two factor theory comes in two forms. First, when the process dissociation method is applied, it appears to separate the effects of a number of variables into conscious versus unconscious influences in intuitively predictable ways that are invariant across a range of tasks (see Jacoby, Yonelinas, & Jennings, in press, for an overview). For example, the variable of full versus divided attention during test has a large influence on conscious processing but little influence on unconscious processing, and this is true for stem completion, cued recall, recognition, and fame judgements (Jacoby, Yonelinas, & Jennings, in press). Two factor theory itself does not predict which factor should be influenced by dividing attention- independent notions about how attention interacts with unconscious processes do that- but the theory does predict that if dividing attention affects only familiarity in one task, it should affect only familiarity in all tasks, and Jacoby's data are consistent with this prediction. The second form of support for two factor theory is that the measure of unconscious processing that is derived from process dissociation is affected in generally the same ways by experimental manipulations as performance on indirect tests of memory. This would be predicted because indirect tests are thought to rely mostly on unconscious processes.

In this article, the main domain of investigation is recognition memory. Jacoby (1991) found support for the two factor theory for recognition by comparing the contributions to performance of recollection and familiarity as they were estimated from process dissociation (by the equations above) to their contributions as measured directly by manipulation of full versus divided attention. According to the theory, dividing attention at test should severely reduce recollection and give a relatively pure measure of familiarity. This amount of familiarity should match the estimate of familiarity derived for a full attention test from the process dissociation equations and, as predicted, the two values, observed and estimated, were nearly equal (Jacoby, 1991).

Single Process Models for Recognition Memory

The claim of two factor theory that performance in recognition memory reflects exactly two processes contradicts the accounts that current global memory models' have put forward for performance in most recognition memory experiments (Gillund & Shiffrin, 1984; Hintzman, 1988; Murdock, 1982). In these models, recognition is typically modeled by a single process. Another process, such as a recall process, could be added, but is not required by the data for most applications. A key difference between the global memory models and two factor theory is two factor theory's assumption that the familiarity process is unchanging between include and exclude conditions. In the global memory models, a single familiarity process can allow retrieval to focus on different kinds of information across experimental conditions.

For the purposes of this article, we exemplify the single process models with Gillund and Shiffrin's SAM model (1984). Like all of the global memory models, SAM assumes that learned items are stored in long-term memory and that a test item is matched against all the items in long-term memory in parallel (hence the label "global"). For SAM, long-term memory stores associative strengths between items in memory and items that might be presented as tests of memory (cues). An item to be learned is encoded into a working memory buffer. While in the buffer, the strength of the item as a possible future test item is increased by strengthening the association between the item as a cue and itself in long term memory, strengthening the association between the item as a cue and other items in the buffer at the same time, and strengthening the association between the item and the context in which it is learned. The association between the item as a cue and itself in memory is called self-strength, the strength between the item and the other items in the buffer at the same time is called inter-item strength, and the strength between an item and its context is called context strength. An item that is never encoded into the buffer (a "new" item on a recognition test) is assumed to have some pre-experimental, residual strength. The amount of increase in each strength value is a function of the time spent in the buffer. Also, the amount of the increase in each strength value is variable: the value is multiplied by 0.5 with probability 1/3 or it is multiplied by 1.5 with probability 1/3 or it is left unchanged with probability 1/3. This assumption about variability leads to normally distributed familiarity values by the central limit theorem once many strengths are multiplied and summed (Equation 5 below).

Table 1 shows a part of the association structure that might be built when two lists of words are learned. Item 2 from List 1, for example, is encoded with some value of strength (S) between itself as a test item and itself as an item in memory and some other value of strength between itself and other items that were in the buffer at the same time (e.g. Items 1 and 3 in List 1). There is also some value of strength between the list context (C) in which an item was studied and the item in memory. For the two list situation, it is assumed that for an item encoded in one list, there is some small residual context strength between the context of the other list and the item (R), because there is some overlap in general context between the two lists. There is also some residual strength (R) from an item as a test to all items in memory which were not encoded in the buffer at the same time as itself.

INSERT TABLE 1 ABOUT HERE

For recognition, there is a single retrieval process: a test probe is matched against all the items in memory. The probe is made up of a test item and the relevant context(s). The match process produces a global value of familiarity: a value above a criterion leads to a positive response and a value below the criterion leads to a negative response. For the situation in which two lists of words were studied, a test probe is made up of the test item and the contexts of the two lists. The contributions to familiarity are weighted to allow the relative contributions of the two context strengths and the item strength to vary across different experimental situations. If, for example, items from List 1 were to receive a positive response and items from List 2 a negative response, then the context strength for List 1 would be weighted more heavily than the context strength for List 2. The three weights are assumed to sum to one. Familiarity for a test item j with contexts C1 and C2 (for List 1 and List 2) is computed by summing over all items in memory (all i):

F = SC1,iw1 SC2,iw2 Sj,iw3 \t\t\t\t\t\t(5)
where w1, w2, and w3 are the weights, SC1,i is the strength of the List 1 context to item i, SC2,i is the strength of the List 2 context to item i, and Sj,i is the strength of test item j to item i. The computation of familiarity for Item 1 from List 1 is shown with Table 1 where C1=C and C2=R.

The equation for familiarity represents a single process for recognition, a process by which responses are determined by the sum over items in memory of joint multiplicative functions of the strengths of association between the test context and items in memory and the test item and items in memory. Summing over values obtained from the multiplicative function leads, by the central limit theorem, to normally distributed familiarity values. The choice of a positive versus negative response is made by comparing the familiarity value for a test item in context to a criterion.

The single process stands in clear opposition to Jacoby's additive two factor model. An essential point about the difference between the two models is that the single familiarity process in SAM is influenced by list context so that the value of familiarity for a test item in an include condition (P(F)) can be different from its value in an exclude condition (P(F)). In fact, context was originally (Gillund & Shiffrin, 1984) made part of the test probe to allow the recognition process to focus on recently learned items and so it is exactly the mechanism to deal with list discrimination effects. Familiarity as defined in the two factor theory is not dependent on list context; the value of familiarity for a test item is the same in an include test condition as in an exclude test condition.

The Gillund and Shiffrin model, like Hintzman's and Murdock's models, is supported by a wide range of data over a number of experimental variables and tasks. For recognition, SAM successfully accounts for the effects of variables such as study time, list length, encoding context, and word frequency. With the same memory structure but different processing assumptions, it has been applied to recall and cued recall (and see also a related categorization model, Nosofsky, 1988). Other global memory models have been similarly successful with various independent variables in tasks assessing frequency judgment, recency judgment, categorization, serial order, and so on. The global memory models have also been successfully applied to priming phenomena in recognition and lexical decision (Dosher & Rosedale, 1989; McKoon & Ratcliff, 1992; Ratcliff & McKoon, 1988; 1994c). In most cases, the models provide not only qualitative accounts of data but also close quantitative fits to parametric data. While two factor theory can marshal a number of intuitively compelling and interesting dissociations and decompositions of data in its support, the global memory model likewise can marshal a range of successful applications to data and interpretations of empirical findings.

A Single Process Theory Can Account for Include/Exclude Data

Jacoby's use of inclusion versus exclusion conditions in recognition memory experiments has yielded new data against which the global memory models have not been tested. We first addressed the conflict between the two factor theory and single process models by examining whether the single process model SAM can account for data from three inclusion/exclusion experiments. We then turned to a second, critical issue: how to evaluate the process dissociation method if two factor theory and a single process theory can equally well account for inclusion/exclusion data.

Experiments 1 and 3, Yonelinas (in press) and Experiment 1, Yonelinas & Jacoby (1994)

In Yonelinas' Experiment 1, subjects were given two lists of words to study. There were two conditions: At test, subjects were either instructed to respond yes to words from the first list and no to words from the second list (and no to new words) or they were instructed to respond no to words from the first list (and no to new words) and yes to words from the second list. The first list was "included" in the first condition and "excluded" in the second condition, and the second list was "excluded" in the first condition and "included" in the second. Yonelinas asked subjects to make their responses on a six-point confidence judgment scale, but for our purposes, we used the positive-negative split to produce two response categories, grouping high, medium, and low confidence positive responses into one category and high, medium, and low confidence negative responses into the other category. Yonelinas used short lists (10 items each) and long lists (30 items each).

Yonelinas' data is shown in the first two rows of Table 2. The probability of a positive response to items when they were in the include condition is shown in column one, the probability of a positive response to items when they were in the exclude condition is shown in column two, and the probability of a positive response to items that were not from either list is shown in column three. Using the equations given above for process dissociation (Equations 1 to 4), the probabilities of recollection and familiarity can be calculated and these are shown in the remaining columns. These probabilities exhibit the kind of dissociation that has been claimed to support the two factor theory: list length affects recollection but not familiarity. The support for the theory comes only from the existence of a dissociation of the list length effect for familiarity versus recollection. The theory does not specify why list length should affect recollection and why it should not affect familiarity, so the actual form of the dissociation provides no particular support for the theory.

Yonelinas and Jacoby (1994) reported a second experiment in which list length was manipulated. Subjects were given one list of words to study, with the words on the list alternating in presentation modality between visual and auditory. At test, they were instructed to respond positively to words that had been presented in one of the modalities and negatively to words that had been presented in the other modality and to new words. The length of the studied list was either 60 words or 30 words. Results of the experiment are shown in the first two rows of Table 3, along with the probabilities of recollection and familiarity as calculated from the two factor theory. The results again show a dissociation between recollection and familiarity, with list length affecting recollection but not familiarity.

INSERT TABLES 2 AND 3 ABOUT HERE

To apply the SAM model to these data, the size of the encoding buffer was assumed to be four words (the same assumption as was made by Gillund & Shiffrin for study lists in which single words were presented individually). From SAM's assumptions about how strengths are built up during encoding and the equation for the calculation of the global familiarity of a test probe, explicit expressions can be derived for the mean and variance of each of the necessary distributions of familiarity values: the mean and variance for included test items, the mean and variance for excluded test items, and the mean and variance for new items. Because the distributions are approximately normal, standard signal detection theory can be used to compute the probabilities of positive responses in the different conditions. A least-squares minimization routine was used to estimate the values of the parameters of the model that best fit the data.

Tables 2 and 3 show that SAM fits the include/exclude data well. The differences between the real and estimated data are within the bounds of experimental error. The parameter values used to produce the estimated data are listed in the tables. The parameter that varies to fit the include versus exclude conditions is the weight assigned to their contributions to familiarity; the weight given to the strength for a list context is high if the items from the list are to be "included", and the weight is low if items from the list are to be "excluded." Unlike the two factor theory, the familiarity value of a test item is not constant across these two conditions; instead, it is a function of the include versus exclude task requirements as represented in the model by a change in context weighting. In SAM, the single retrieval process focusses on different information in the include versus exclude conditions, in contrast to two factor theory which uses different contributions from two processes in the include versus exclude conditions.

The presence of both included and excluded items in the same test list gives less freedom to SAM in fitting the data than would be the case for other include/exclude paradigms. The only parameter free to vary to account for list length is the familiarity criterion (see Gillund & Shiffrin, 1984). Because there are more studied items in memory contributing to familiarity for a test item from a long list than a test item from a short list, familiarity is higher on average for items from a long list and familiarity is more variable for items from a long list. This is true for both old test items and new test items. It is true for new test items because they are matched against a larger number of studied items from a long list than from a short list (see Gillund & Shiffrin, 1984). Because of the higher and more variable familiarity values for both old and new test items, the criterion familiarity value is higher for long than short lists.

The relative values of the other parameters (the learning parameters) are typical of other SAM fits (Gillund & Shiffrin, 1984): self-strength is higher than inter-item strength which in turn is higher than residual strength, and context strength is higher for the list in which an item was learned than residual context strength for the other list.

To provide generality to other experimental variables, SAM was also fit to the data from Yonelinas' Experiment 3 which used a strength manipulation (as opposed to the length manipulation in the prior two experiments). In this experiment, there were again two lists of words. The words were studied in pairs, either for 1 s or for 3 s, with study time a within-list variable. At test, subjects were instructed as in Experiment 1, either to "exclude" the first list or to "exclude" the second list. The data are shown in Table 4, along with the probabilities of recollection and familiarity derived by the process dissociation method.

SAM was fit to these data in the same way as for Yonelinas' Experiment 1, except that the encoding buffer was assumed to hold two words (i.e. one pair) at a time instead of four words. The estimated data in Table 4 show that again SAM fits the data well. The include versus exclude difference comes from shifting weight from one list context to the other, as with the other experiments. The difference between strongly and weakly encoded items (long and short study time) comes about because the encoding strength values (self, interitem, and context) are multiplied by study time multiplied by 0.41. The scaling factor, 0.41, allows the model to match the empirically observed rate at which d' increases with study time (see Shiffrin, Ratcliff, & Clark, 1990).

INSERT TABLE 4 ABOUT HERE

It should be pointed out that, for Yonelinas' Experiments 1 and 3 and Yonelinas and Jacoby's Experiment 1, it is relatively easy for SAM to fit the data because there are relatively few constraints on the model. A more rigorous test of SAM would require fits to a number of different study/test conditions simultaneously. However, our purpose here is the simple demonstration that a single process model can equally well account for some of the data that have been used to support the two factor theory. SAM predicts the qualitative differences in performance for long versus short lists, for long versus short study times, and for list discrimination instructions, and it can fit the effects quantitatively. Two factor theory predicts that its two processes will sometimes dissociate, but not whether (or how) they will do so for list length or study time.

The Process Dissociation Method

The implication of SAM's success in accounting for Yonelinas and Jacoby's data is that the process dissociation method could be applied equally well to predicted data from SAM as to real data from subjects. Even though the predicted data points were generated from a single process, the process dissociation method would still provide estimates of the contributions of two processes. The method gives no way to tell whether data were produced by two processes or a single process.

When the process dissociation method was applied to the data points generated from SAM, the resulting estimates of familiarity and recollection are almost the same as when the method was applied to the real data (see Tables 2, 3, and 4). The estimates of familiarity and recollection given by the method for the SAM-generated data have to be almost identical to those given for the real data because SAM's fits to the data were so close. But the estimates are valid only under the two factor theory, and invalid under SAM, and their interpretation is valid only under two factor theory, not under SAM. According to two factor theory, the results of Yonelinas Experiment 1 and Yonelinas and Jacoby's Experiment 1 show that list length affects recollection, a conscious process, but not familiarity, an unconscious process. In SAM, the results are interpreted to show that list length does affect SAM's familiarity process. What is learned about conscious versus unconscious processes in recognition depends on accepting two factor theory. For Yonelinas' Experiment 3, two factor theory and SAM both interpret the results to show that strength of encoding affects familiarity, but two factor theory also has strength affecting recollection. Again, what is learned about retrieval depends on the choice of model.

When a Single Familiarity Process Is Not Enough

SAM can account for the study time effects and list length effects in Yonelinas and Jacoby's experiments, and Gillund and Shiffrin (1984) have shown that the model can account for the effects of these and other variables simultaneously. The focussing mechanism of weighted contexts allows the model to explain list discrimination data (e.g. Anderson & Bower, 1972) and the same mechanism allows it to explain the data from the include versus exclude conditions. But the global match process by which SAM calculates familiarity would never be expected to apply to all memory retrieval situations. Free recall, for example, is modeled with a different process, a repeated sampling and recovery process (Gillund & Shiffrin, 1984; Raaijmakers & Shiffrin, 1981). There should also be situations in which more than one process is required to explain performance. A candidate for such a situation is provided by another include/exclude experiment by Jacoby (Experiment 3, Jacoby, 1991).

In Jacoby's experiment, subjects heard the words of one list, and in another list, they read some of the words and they were asked to solve anagrams to produce some of the words for themselves. In the include condition, subjects were asked to respond positively to all of the studied words. In the exclude condition, they were asked to respond positively only to the words that they had heard; they were warned to respond negatively both to words that were studied in their normal form (the "read" words) and to words that were presented as anagrams.

Table 5 shows Jacoby's data. The difference between the probabilities of responding positively in the include and exclude conditions is much larger for the anagram words than the read words, in accord with the expectation that the extra work required for the anagrams at study would lead to better memory at test. The table also shows estimates of familiarity and recollection derived from process dissociation. Jacoby assumed that the difference between the include and exclude conditions was a measure of the probability of recollecting the anagram and read words (Equation 3). He also assumed that familiarity sometimes led subjects to respond positively to read and anagram words when they were supposed to be excluded, so that familiarity could be calculated from Equation 4. The probability of recollecting an anagram word, as derived from process dissociation, is much higher than the probability of recollecting a read word, and familiarity is also higher for an anagram than a read word.

INSERT TABLE 5 ABOUT HERE

The question we addressed was whether SAM could account for the data from all the conditions (include versus exclude for read, heard, and anagram test words). We suspected that it could not, based on the intuition that there might be some recall contributing to performance for anagram test words. To address the question, we first found parameters that allowed SAM to fit the data for the words that were read and the words that were heard. We started with the read and heard words because we thought that, if more than SAM's single process were needed to fit all the data, it would most likely be needed for the anagram test words.

To model the read and heard test words in the include and exclude conditions, the encoding parameters were kept constant except that self-strength was allowed to be different for the two kinds of words (it might also be reasonable to allow interitem strength to vary but it was unnecessary because SAM's fit was exact without this; also, decoding anagrams probably suppressed interitem rehearsal). The criterion value of familiarity for dividing positive from negative test responses was different for the include and exclude conditions because the include and exclude items were presented in different test lists. The context weights were set to differentiate the lists: items from the read and anagram list required positive responses in the include condition and negative responses in the exclude condition, so the weighting of the read/anagram list context had to be high in the include condition and lower in the exclude condition. The weighting of the heard context had, correspondingly, to vary in the opposite way, higher when read/anagram words were excluded than when they were included. The parameter values are shown in Table 5, and they are reasonable compared with fits of SAM to other data. The predicted probabilities of positive responses for read, heard, and new words exactly match the data.

It is important to note, once again, the difference between SAM's account of the include versus exclude conditions and two factor theory's account. In SAM, the familiarity of a test word in the include condition is different than in the exclude condition (F for read items is different from F for read items, and F for heard items is different from F for heard items). In two factor theory, probability of a yes response based on familiarity is the same in the include and exclude conditions.

Given that SAM can accommodate the data for the read and the heard test words, the question was whether it could simultaneously accommodate the data for the anagram test words. In Jacoby's experiment, all four kinds of test words were mixed within a test list, so there was no way for subjects to change the positive/negative criterion from one kind of test word to another. Also, the weights for the list contexts could not be different for the read words than the anagram words because they were studied in the same list. Thus, all the test parameters were fixed. In addition, the interitem strength parameter could not vary between read versus anagram words, again because they were studied in the same list. The only parameter free to vary between the anagram and read words was the self-strength parameter.

To find out whether there was a value of anagram self-strength that could fit the data, it was varied over the range shown in Figure 1. The figure shows how the probability of a positive response to an anagram test item varies as a function of anagram self-strength. The figure displays the probability that the familiarity value is above the criterion for a positive response in the include condition (P(F))and in the exclude condition (P(F)). The figure also gives the empirical probabilities of a positive response in the include and exclude conditions (.80 and .29). As the figure shows, there is no value of anagram self-strength that allows SAM to predict the empirical values. SAM cannot simultaneously accommodate the read, heard, anagram, and new test items in the include and exclude conditions with its single familiarity process. In terms of the SAM model, it must be that the anagram manipulation does more than just change the familiarity of the anagram versus read items.

One possibility is that anagram test words evoke a recall process (Raaijmakers & Shiffrin, 1981) in addition to the familiarity process. The recall process in SAM is defined differently than the recollection process in two factor theory; SAM's recall was defined by Raaijmakers and Shiffrin in a specific and detailed way that allowed the SAM model to account for a number of aspects of recall data. For the anagram test words, the recall process could be evoked either in both the include and exclude conditions or only in the exclude condition. It might be reasonable to assume recall for both include and exclude because solving anagrams would make them very strongly encoded. On the other hand, using recall only in the exclude condition might make sense because it is only in the exclude condition that words from one list are to be distinguished from words in the other list; in the include condition, all highly familiar test words from either list are to be given a positive response. We examined the consequences of adding recall to the exclude condition alone and to both conditions, and the results are presented in the following section.

The goal was to evaluate what can be learned about retrieval processing from SAM versus what can be learned from two factor theory. The question is not whether SAM with its two processes can accommodate the data, because that will be guaranteed by the flexibility gained from adding a second process. Rather the question is whether the conclusions that can be drawn about retrieval are the same for the two models.

Estimating Recall from SAM:
Recall of anagram test words in both the include and exclude conditions

One of the ways SAM could accommodate the data for the anagram test words in addition to the read and heard test words is to assume that recall is used for the anagram test words in both the include and exclude conditions equally. We can use the difference between SAM's best predictions based on familiarity alone and the empirical data to provide an estimate of what the recall process must contribute to performance.

With these assumptions, the probability of a positive response to an anagram test word in the include condition is given by:

P(I) = P(R)+P(F)-P(R)P(F), \t\t\t\t\t\t (6)
where P(R) is the probability of a positive response based on recall and P(F) is the probability in the include condition that the familiarity value exceeds the criterion for a positive response.

In the exclude condition, the probability of a positive response to an anagram test word is given by:

P(E) = P(F)-P(R)P(F), \t\t\t\t\t\t (7)
where P(F) is the probability in the exclude condition that the familiarity value exceeds the criterion for a positive response.

In essence, recall adds to the probability of a positive response in the include condition and subtracts from the probability of a positive response in the exclude condition, in order to make up the difference between the data and the probabilities of positive responses based on familiarity alone.

Solving for P(R),

P(R) = (P(F)-P(E))/P(F), \t\t\t\t\t\t\t (8)
and eliminating P(R) from the above equations,

P(E)/(1-P(I)) = P(F)/(1-P(F)). \t\t\t\t\t\t\t (9)
Equation 9 shows that the ratio of the familiarity values for anagram test items in the exclude versus include conditions is fixed by the experimental data (P(E), .29, and P(I), .80 in Jacoby's experiment fix the ratio in Equation 9 to be 1.45), and this in turn determines the self-strength parameter for anagrams (because it is the only parameter free to vary for anagram familiarity values). Across the possible values of the self-strength parameter (Figure 1), the only value of self-strength that produces the correct ratio is 4.91 (where P(F) is .68 and P(F) is .46). Using these values in Equation 8 yields a value for P(R) of .37.

With the assumption of recall contributing to performance equally for anagram test words in both the include and exclude conditions, SAM estimates the probability of recall at .37. This account contrasts with the two factor theory account. For SAM, familiarity provides the only basis for responses for read and heard test words, and recall adds to familiarity for anagram test words. For two factor theory, responses for all test words are based on both familiarity and recollection.

It turned out that the probability of extra information contributing to performance for anagram test words estimated by SAM (P(R)=0.37) was about the same as the probability of extra recollection for anagram test words over read test words in two factor theory. In two factor theory, the difference in the probability of recollection for anagram versus read test words was 0.40 (see Table 5). However, the two accounts will not always be consistent in this way. Figure 1 shows that the amount of extra information needed from a recall process will vary as a function of the familiarity values in the include versus exclude conditions. At low values of anagram self strength, a considerable amount of information must be added from recall to increase the probability of a yes response in the include condition and decrease the probability of a yes response in the exclude condition. But at high values of self-strength, the amount that must be added by recall is less (because the familiarity include and exclude scores diverge). So it is not necessarily the case that SAM, under the assumptions outlined above, would estimate the same contribution from recall to differentiate anagram from read test words as two factor theory.

INSERT FIGURES 1 AND 2 ABOUT HERE

We used simulations to examine the generality of this lack of equivalence across a range of possible empirical results. The probabilities of positive responses for read test words in the include and exclude conditions were fixed at the values obtained in Jacoby's experiment. The probabilities of positive responses for anagram test words in the include and exclude conditions were systematically varied (from their real values of P(I)=0.80 and P(E)=0.29) to simulate different levels of recall. From these probabilities, we used process dissociation to calculate the difference in recollection for read versus anagram test words, and for SAM, we calculated the contribution from recall (i.e., what is not accounted for by familiarity) for anagram test words. For process dissociation, the difference is simply the estimate of recollection for anagrams minus the estimate of recollection for read words (P(Ranagram)-P(Rread)). For SAM, the probability of recollection is defined by Equation 8: the probability of recollection depends on familiarity in the exclude condition, and familiarity in the exclude condition must be in a fixed ratio with familiarity in the include condition, a ratio determined by the probabilities of positive responses in the include and exclude conditions (see Equation 9). For the simulations, we used three different values of that ratio, which spanned a range around the real data value from Jacoby's experiment, 1.45, a range that might be obtained from experiments. Figure 2 shows the results of the simulations, with the difference in recollection for the read versus anagram conditions calculated from process dissociation plotted against recall calculated from SAM, for the three different ratios. The large X shows the point for the SAM fit to the data described above, with recall from SAM calculated at 0.37 and the difference in recollection from two factor theory calculated at 0.40. The results in the figure show that the estimates of the two theories often match, but also that there are significant mismatches. For example, for the ratio 0.72 (line 1) the miss is 8% at the upper right point and for the ratio 2.0 (line 3) the miss is 9% in the bottom left corner. The difference in recollection for read versus anagram test words given by two factor theory need not match the amount of recall that must be added to SAM's single familiarity process to account for the anagram data.

Estimating Recall from SAM:
Recall of anagram test words in only the exclude condition

Because all highly familiar test words should be given a positive response in the include condition, it might be reasonable for subjects to adopt a strategy of attempting recall only for the exclude condition. In the exclude condition, the instruction is to respond positively only for words that were heard. Jacoby (1991) assumed that subjects do not rely entirely on recollection to do this, that they still respond positively to highly familiar words when recollection fails (see Equation 1; see also discussion by Curran & Hintzman, in press). We followed that assumption for the exclude condition here.

The probability of a positive response for an anagram test word in the inclusion condition is simply

P(I)=P(F).
The probability of a positive response for an anagram in the exclude condition is the same as Equation 7,

P(E) = P(F)-P(R)P(F), The probability of recollection can be estimated from Equation 8 where the value of P(F) is obtained from Figure 1. In Figure 1, the function for inclusion familiarity P(F) reaches the value 0.8 (i.e., P(F)=P(I)=0.8 from the data) at the point where P(F) = .533. With P(E) = .29 (from the data), P(R) is estimated to be .456, using Equation 8. This is about 15% higher than the value of P(R) estimated from the process dissociation equations.

The two ways that we have discussed of adding a recall process to SAM's account of Jacoby's include/exclude data do not, of course, exhaust all possibilities. For example, there might be recall in both the include and exclude conditions for anagram test words but the probability of its success might be different in the two conditions instead of the same as we assumed above. There might be a recall process operating for the heard and read test words, instead of just the anagram test words, especially in the exclude condition. Our goal was not to provide a definitive model for Jacoby's data. (The data do not provide enough constraints to do that for the SAM model; more comprehensive sets of data would be required for complete model testing). Our concern was to show that there exist plausible explanations of the data that are different from two factor theory's explanation, and that what is learned about retrieval processes is different under the different theories.

This general conclusion is the same as was reached for the Yonelinas and Jacoby experiments for which SAM's single familiarity process was a sufficient account of the data. According to two factor theory, performance in recognition memory always involves exactly two processes, whereas in SAM performance can often be explained as the outcome of just one process. The variables that affect SAM's single familiarity process are different than those that affect two factor theory's familiarity process. When a recall process is added to SAM, it is understood quite differently from recollection in two factor theory. Recollection is assumed to participate in all retrieval processes whereas the recall process in SAM can be added in different ways to model different test conditions. As before, the picture given by two factor theory of conscious versus unconscious retrieval is not the same picture that would be given of retrieval processing by SAM.

Two Factor Theory Compared to other Two Process Models

The often compelling intuition that the retrieval of information from memory involves two processes, even for recognition tasks, is not new (Atkinson & Juola, 1973; Jacoby & Dallas, 1981; Mandler, 1980). For example, Mandler (1980) postulated a familiarity process and a recollection process, and proposed that familiarity was a fast retrieval process running in parallel with the slower recollection retrieval process. Mandler's model was applied to explain a range of recognition and recall data and the hypotheses about the time course of the two processes have also been tested (e.g., Mandler & Boek, 1974). The model would apply to data from Jacoby's include versus exclude manipulation in the same way as two factor theory because it makes the same assumption about the independence of the two processes as two factor theory.

A different two process model was developed by Atkinson and Juola (1973) for recognition. In their model, there are two processes of retrieval but both processes are not always executed. If the familiarity of a test item is above some criterion value or below some second criterion value, then a response is made directly. The second process, a "search" process, is initiated only if the value of familiarity falls between the two criteria. This model was successfully applied across a range of reaction time data (Atkinson, Herrmann & Westcourt, 1974; Atkinson & Juola, 1973).

Both the Mandler and the Atkinson and Juola models have been explicitly tested and it has been argued that data do not in general support their assumptions about two processes (e.g., Gillund & Shiffrin, 1984; Monsell, 1978). The issue of concern here is whether the process dissociation method is compatible with the general assumption of two retrieval processes in recognition or limited to the two factor theory. More specifically, the question is whether different two process models that fit the data equally well will produce the same or different estimates of the contributions of the two processes, when the models are applied to data from include/exclude recognition experiments. To address this question, we followed the same logic as with comparisons of SAM and the two factor theory: We fit the Atkinson and Juola model to include/exclude data and then compared the parameter estimates from the Atkinson and Juola model to the parameter estimates from process dissociation.

The Atkinson and Juola Model Tested Against Include/Exclude Data

The data chosen to model were those from Yonelinas' Experiment 1, shown in Table 6. The first step in the analysis of the Atkinson and Juola model was to find parameters of the model that would fit Yonelinas' data. The model was originally intended to deal with retrieval of words from lists that were so highly memorized that the search process would always give perfect performance. That was not the case in Yonelinas' experiment, and so we assumed some lesser degree of learning. We assumed that study led to a higher degree of learning for words from short lists than long lists, so that the distributions of values of familiarity used by the familiarity retrieval process were ordered, with the mean of the new word distribution set at zero, the mean of the distribution for words from long lists above zero, and the mean of the distribution for words from short lists farther above zero. These distributions were assumed to be normal, each with associated variance.

At test, the familiarity retrieval process determines whether the familiarity of a test word is above the positive criterion or below the negative criterion, and if it is, then a response is executed. If familiarity is between the two criteria, then in the original model the result of the search process determined the response (always accurate). In our application, we assumed the search process would not always succeed. We added two parameters: p, the probability that the search process successfully finds a word in a studied list, and q, the probability of a positive response if the search fails (see Atkinson, Herrmann, & Westcourt, 1974, p. 113, footnote 5).

In the include condition, a positive response can come about if the familiarity of a test item is above the positive criterion, or if it is between the two criteria and the search process is successful or there is a positive guess:

P(I) = P(F>Chigh) + (p+(1-p)q) P(Clow<F<Chigh). \t\t\t\t\t\t\t (10)

In the exclude condition, a positive response can come about if the familiarity of a test item is above the positive criterion, or if it is between the two criteria and the search process fails and there is a positive guess:

P(E) = P(F>Chigh) + (1-p)q P(Clow<F<Chigh). \t\t\t\t\t\t\t (11)

The number of parameters is greater than the number of data points, and the model can easily fit the data (the model was designed to deal with reaction time data in addition to the accuracy data considered here). In fitting the model, we discovered that the guessing process could trade off against the familiarity process so that the same level of performance could be obtained from a few positive responses due to high familiarity and a high positive guessing rate, or many positive response due to high familiarity and a lower guessing rate. To illustrate this, we fit the model to the data twice. Table 6 shows the results of the two different fits, and Table 7 shows the values of the parameters that were used to produce the fits.

INSERT TABLES 6 AND 7 ABOUT HERE

The first conclusion is that a different two process theory can fit the include/exclude data as well as Jacoby's two factor theory can. The second issue is how the process dissociation method fares in light of this mimicking. Process dissociation produces the estimates of the contributions of familiarity and recollection shown in Table 6 for the Jacoby model. Are these also accurate estimates of familiarity and the search process for the Atkinson and Juola model? From Equations 10 and 11 above, we can derive estimates of familiarity and search directly from the Atkinson and Juola model and compare them to the estimates derived via process dissociation. The probability of a yes response based on the search process is the probability of executing a search multiplied by the probability of the search being successful:

P(S) = p P(Clow<F<Chigh)

= P(I) - P(E).

For both fits of the model to the data, the estimate of the contribution of the search process for the Atkinson and Juola model is the same as the estimate of the contribution of the recollection process for two factor theory. But the estimates of familiarity derived from the two models are different. In Jacoby's model, the unconscious familiarity process is not affected by list length. In the Atkinson and Juola model, familiarity is affected by list length, in either direction: in the first fit, familiarity is greater for a long list than a short list and in the other fit, it is less. This occurs because of the trading off mentioned above between the different components of the model, search, familiarity, and guessing. In the two fits we present, familiarity and guessing trade off against each other. This is not a positive aspect of the fits of the model to data, but it is to be expected when a limited range of data is modeled relative to the range of data for which the model was designed.

The conclusion offered by these simulations is that the include/exclude data do not support estimates of two components that are the same for all two process models. This is similar to the situation when two factor theory was compared to SAM. The picture given of conscious versus unconscious processing is different when it is drawn from Atkinson and Juola's model than when it is drawn from two factor theory.

Predicting the Slopes of Z-ROC Functions from Two Factor Theory

In this section, we investigate the predictions of two factor theory for the shapes of z-ROC curves for recognition memory and test those predictions against the data from an include/exclude experiment. It has become clear from recent research that the global memory models are inconsistent with the slopes of the z-ROC curves obtained in recognition memory experiments. If it turned out that two factor theory was consistent with the slopes, then this would constitute major support for that theory.

The problem with z-ROC curves for the global memory models arises because of the models' assumptions about the relative variability in familiarity values for old versus new test items. Empirically, the ratio of the standard deviation of new item familiarity values to the standard deviation of old item familiarity values can be obtained from signal detection theory using confidence judgment data. This is done by plotting the z-transforms of the hit and false alarm rates against each other for each level of confidence to produce a z-ROC curve. For the global memory models, the underlying distributions of familiarity values are normal (either directly or by the central limit theorem applied to sums of discrete values), so the slope of the z-ROC is the ratio of the new item standard deviation to the old item standard deviation, /. The data presented by Ratcliff et al. (1992) showed a roughly straight line z-ROC function with slope of about 0.8 for both weakly encoded items and strongly encoded items. The constant value of the slope across different strengths of encoding is difficult if not impossible for the current global memory models to accommodate. For example, SAM predicts that the standard deviation of old item familiarity should increase relative to the standard deviation of new item familiarity as a function of overall level of familiarity. The predicted increase comes from the way variability of encoding is introduced into the model. Increasing the mean value of strength that results from encoding (e.g. by increasing study time) increases the variance in the encoded strength values. This assumption lies at the heart of the model; changing this assumption to fit the z-ROC data would be tantamount to proposing a new model requiring new fits to all experimental data. The difficulties presented by the z-ROC data are similarly critical for the other global memory models (Hintzman's model, 1988, also predicts that the standard deviation for old item familiarity increases with strength, and Murdock's model, 1982, predicts almost equal standard deviations for old and new item familiarity values, see Ratcliff et al., 1992).

Yonelinas (in press; Jacoby, Toth, & Yonelinas, 1993) proposed an explanation of the z-ROC slopes in terms of two factor theory. In two factor theory, the distributions of familiarity values for old and new test items are assumed to be normal and to have equal variance, so familiarity alone would lead to a z-ROC slope of one. This assumption of equal variance is usually justified as derived from "standard signal detection theory." However, standard signal detection was applied to experimental tasks where the noise (in auditory perception say) was the same in signal and noise conditions and a constant signal was added to noise to produce the signal condition. In most of the work, the signal and noise distributions were allowed to have different standard deviations and the equal variance case was seen as a special case. There is no special reason to assume equal variances other then the appeal to standard theory, and allowing the ratio of variances to be derived from the fits of the model to data would not lead to a test of Yonelinas's proposal.

According to Yonelinas' account, the slope of the z-ROC is less than 1 because of the recollection process. When subjects make high confidence positive responses, some of them are based on familiarity and some on recollection. The addition of the recollection based responses in the high confidence category causes an increased standard deviation for old items, which in turn makes the slope of the z-ROC less than one. With this assumption about recollection contributing to high confidence positive responses, Yonelinas attempted to show that two factor theory was consistent with z-ROC functions observed in his experiments.

Yonelinas' proposal provides a test of an inherent prediction of two factor theory. Previous support for the theory has been the intuitive reasonableness of the theory's accounts of patterns of dissociations and patterns of the relative contributions of recollection and familiarity to processing. For example, while it might be reasonable that list length affects recollection but not familiarity as in the experiments discussed above, two factor theory itself does not make that prediction. The theory alone would be equally consistent with the opposite outcome. In contrast, if Yonelinas' proposal about z-ROC curves fails, then the signal detection assumptions of two factor theory fail. In the sections that follow, we present the results of several different evaluations of Yonelinas' proposal, and show that the proposal is not consistent with empirical z-ROC curves.

z-ROC Curves based on Familiarity plus Recollection

For our first analysis, we calculated what the shape of z-ROC curves should be according to two factor theory. We assumed that the theory was correct, that recognition performance in confidence judgement tasks is based on the two processes, familiarity and recollection, and then examined the forms of predicted hypothetical z-ROC curves.

We began with data from an experiment by Ratcliff et al. (1994; Experiment 4). In that experiment, subjects studied lists of words that were either high or low frequency, encoded either strongly or weakly (i.e., studied for a short time or a long time). At test, subjects were instructed to respond positively to any word that had been studied, using a six-point confidence scale. This corresponds to an "include" condition in that subjects are instructed to respond positively to all studied words. From the z transform of the hit and false alarm rates at each confidence level, z-ROC curves were produced (as described at the beginning of the appendix to this article). The experiment did not use an exclude condition, so we could not calculate an empirical measure of recollection by using the process dissociation method. But performance in an include condition must, by two factor theory, depend on both recollection and familiarity. We examined a range of possible values of recollection, looking for a value that would make two factor theory consistent with the empirically obtained z-ROC functions.

The methods by which we examined two factor theory's predictions for z-ROC functions are described in detail below. Figure 3 shows the results for one experimental condition (weakly encoded high frequency words). The z-ROC curve obtained directly from the data is shown by the diamonds, and it has the slope less than one that is characteristic of recognition memory. The other z-ROC curves are predictions from two factor theory, each based on a different hypothetical value of recollection (the probability of a yes response based on recollection, P(R), varied from 0 to 0.45). Not all of these values for recollection are actually possible in two factor theory. What we show in the following analyses is that in general there are no values for recollection that are consistent simultaneously with two factor theory and the empirical z-ROC functions.

INSERT FIGURE 3 ABOUT HERE

To generate the curves in Figure 3 and to test two factor theory requires a multistep algorithm that is given in detail in the appendix. The algorithm begins with data from confidence judgements with "include" instructions, that is, subjects are instructed to respond positively to all studied items. The algorithm first uses confidence judgement data to obtain a d' value for the familiarity process (see Appendix, steps 1-3). This is done by collapsing over the positive half of the confidence categories to get one hit rate and one false alarm rate. From this hit rate, this false alarm rate, a hypothetical value of P(R), the process dissociation equations, and the two factor theory's assumption that familiarity distributions are normal with equal variance, a d' can be calculated for the familiarity process alone, separate from the hypothetical recollection process. Once d' is obtained, then it can be used with the empirical false alarm rates for the different confidence judgement categories to obtain a familiarity based hit rate for each confidence category. (Because of the assumptions of normality and equal variance for old and new item distributions of familiarity, the familiarity based hit rates and the empirical false alarm rates must give a z-ROC slope of one.)

The algorithm gives the familiarity based hit rate for each confidence category and the data give the false alarm rate for each category. To generate the predicted z-ROC curve for familiarity plus recollection, the hypothetical value of P(R) can be added back in at each confidence category to give predicted hit rates for familiarity and recollection combined (see Appendix, step 4). These predicted hit rates and the empirical false alarm rates are then used to give the predicted z-ROC curve. The z-ROC curves for the 10 hypothetical values of P(R) shown in Figure 3 were generated in this way. Not all of the 10 values of P(R) are actually possible; values of 0.35 and above (very high values of recollection) give d' values for the familiarity based process that are not greater than zero. None of the remaining predicted z-ROC curves match the shape of the real z-ROC curve from the data. That curve overlaps the curve for the third lowest value of recollection at lower zfa values and it overlaps the fourth highest value of recollection at higher zfa values.

The method just described for generating predicted z-ROC curves uses the same hypothetical value of P(R) for all confidence categories to predict the hit rates for recollection and familiarity combined. Another way of generating predicted z-ROC curves is to use the empirical hit rates to predict what the values of P(R) should be for each confidence category (see Appendix, step 5). The hit rates and the false alarm rates from the data and the familiarity based hit rates from the algorithm are used to predict what the probability of recollection should be at each confidence interval (P(R)). These values must all be positive (at no confidence level can the probability of a yes response due to recollection be zero or negative). If any value of the hypothetical P(R) that was used to generate the familiarity d' does lead to a negative or zero value of any P(R), then it must be rejected as inconsistent with two factor theory. Values of P(R) less than and equal to 0.15 in Figure 3 must be rejected for this reason.

Rejecting these values and those for which d' for the familiarity based process was not greater than zero (.35 and above), leaves only the hypothetical P(R) values of .2 to .3. For these values, the bend of the ROC curve (the U shape) is quite large, large enough to be empirically detectable (see Figure 3). Examination of empirical z-ROC functions (including many from single subjects tested over many sessions) in Ratcliff, McKoon, & Tindall, (1994) shows only a small fraction of the total cases for which the z-ROC functions have this shape, so data do not in general support this two factor theory prediction.

The values of P(R) can be submitted to a further constraint. The predicted probabilities of recollection (P(R)) at each confidence category must never decrease from the highest confidence positive category to lower confidence categories. This is because hit rates come from cumulating correct positive responses from the highest confidence category down to the lower confidence categories, so the number of responses based on recollection can only increase across these categories, never decrease. To test this, we used data from Experiment 4 in Ratcliff et al. (1994). Moving from highest positive to lowest confidence categories is equivalent to increasing the false alarm rate, and so the predicted probabilities of recollection can be plotted against the false alarm rate as it changes across confidence categories. This was done for all the conditions of the experiment, strongly and weakly encoded low and high frequency words, for all values of P(R) that did not yield d' zero or negative or recollection less than zero. The results are displayed in Figure 4; each panel shows a subset of the different P(R) values for different conditions of the experiment. The value of P(R) used to generate the P(R) is always the same as the middle value of P(R) (because the middle split is used to obtain d' and P(R) in the algorithm presented in the appendix). What the panels show is that there are almost no values of P(R) that are consistent with two factor theory; instead of holding constant or perhaps increasing across confidence categories, the P(R) generally decrease from the midpoint to the most confident negative category.

Yonelinas (in press) also obtained P(R) values that decreased like those in Figure 4. He attributed the decrease to floor and ceiling effects on accuracy. However, most of the data for Figure 4 are not subject to floor and ceiling problems and so contradict two factor theory.

Experiment 1

The preceding analyses were based on data from an experiment in which there were only include conditions. Without an exclude condition, there is no way to estimate recollection directly from the data. All the analyses were based on hypothetical values of recollection. To pursue the analyses, we collected include/exclude data using the list discrimination procedure that Yonelinas (in press) used in his experiments. Subjects studied two lists of words, and then they were cued as to whether the words of the first list or the words of the second list were to be given positive responses.

To provide the strongest test of two factor theory, we chose experimental conditions for which the slope of the z-ROC curve would be farthest from predictions from two factor theory for the familiarity process alone, that is, a slope as much less than one as possible. This is a strong test because the recollection process has to be assumed to move the slope far from one. We also picked conditions in which recollection seemed most unlikely to be able to do this. If recollection could move the slope far from one, under conditions where recollection was intuitively unlikely, then two factor theory would have passed a strong test. The conditions we used were low frequency words studied at a fast presentation rate. These conditions give low z-ROC slopes and a low probability of recall, which suggests a low probability of recollection (see Glanzer & Adams, 1990; Glanzer et al., 1993; Ratcliff et al., 1994).

Method

Subjects. The subjects were 8 undergraduates from Northwestern University paid to participate in the experiment. Each subject participated in one 50 min session.

Materials. The pool of 865 low frequency words used by Ratcliff et al. (1994) was used for this experiment. For the experimental study and test lists, only words from this pool were used. There was also a pool of high frequency words used only for practice lists.

Procedure and Design. All stimuli were presented on the screen of a PC, and keys of the PC's keyboard were used to record responses.

Each block of the experiment consisted of two lists of words to be studied followed by a single test list. There were 16 words in each study list, presented at a rate of 750 ms per word. The beginning of each list was signaled to the subjects by the instruction to press the space bar on the keyboard. At the end of the second list, subjects were given an instruction to tell them for words of which list they were to give a positive response. They were also instructed to flip an index card to show which was the positive list; the card was used to make sure subjects noted the instruction and to serve as a reminder if they needed one during the list. There were 48 words in the test list, 16 from the first studied list, 16 from the second, and 16 new words that had not appeared on either studied list. The words of the test list were presented one at a time, each remaining on the PC screen until a response key was pressed. There was a 250 ms blank screen following each response and then the next test word was presented. Subjects were instructed to respond on an eight point confidence scale, with responses ranging from extremely sure negative to extremely sure positive. For the positive end of the scale, the m, comma, period, and ?/ keys of the keyboard were used. For the negative end, the keys were z, x, c, and v. Labels for the response keys were shown on the index card that subjects flipped to show which list required positive responses. Subjects were instructed to try to use the full range of response keys.

There were 20 blocks in an experimental session, the first two used only for practice. For half of the blocks, a positive response was required for the first study list and for the other half of the blocks, a positive response was required for the second study list. Words for the study lists, new words for the test lists, and the orders of presentation of words in study and test lists and the order of the two kinds of blocks were decided randomly, with the randomization changed after every second subject.

Results

For test words that were from the study list designated for positive responses (the include condition) the proportion of positive responses was 0.553. For test words from the other study list (the exclude condition) the proportion of positive responses was 0.418. For new test words, the proportion of positive responses was 0.147, which leads to d' values (based on the equal variance assumption) of 1.18 for the include condition and 0.84 for the exclude condition. There were over 2000 responses in each of the include, exclude, and new test word conditions, giving good stability to the data.

Figure 5 shows the z-ROC curve for test words in the include condition. According to Yonelinas' application of two factor theory, responses to these test words should be based on both familiarity and recollection. Using process dissociation, we plotted the z-ROC curve for familiarity alone. Following Yonelinas (in press), we estimated the probability of recollection (P(R)) by calculating a hit rate for the exclude condition and a hit rate for the include condition, and subtracting the exclude hit rate from the include hit rate (Equation 1). The hit rates were calculated by summing the numbers of responses in each of the high positive, high medium positive, low medium positive, and low positive confidence categories and dividing by the total number of responses across all confidence categories for the class of items (include or exclude). From P(R) and the hit rate at each confidence category in the include condition, a familiarity based hit rate was calculated for each confidence category (Equations 1 and 2). The familiarity based z-ROC was obtained from these hit rates and the false alarm rates from the data.

The estimates of the slopes of the z-ROC curves and their intercepts along with the standard errors in those estimates are shown in the first three rows of Table 8. As expected from two factor theory, the recovered familiarity slopes are nearer one than the slopes for the include or exclude conditions. But they are still significantly different from one. The slopes for the include and exclude conditions are around 0.7 (in the range of those found by Glanzer & Adams, 1990, and Ratcliff et al., 1994). The derived familiarity based z-ROC has a slope of 0.857 with a standard error of 0.045 which means it is significantly different from one.

INSERT FIGURE 5 AND TABLE 8 HERE

Discussion

According to two factor theory, the slopes of z-ROC curves from recognition memory experiments are generally less than one because a recollection process contributes to high confidence positive responses. Taking out this process should leave familiarity alone, for which the z-ROC should be linear with a slope of one. The data from the experiment presented here contradict two factor theory. For experimental conditions designed to produce a slope much less than one and low probability of recollection (conditions that provide an extreme test of two factor theory), the familiarity slope was significantly different from one.

The failure of two factor theory to predict the data raises a question about the use of signal detection theory as the process that underlies familiarity. Two factor theory assumes that the old item and new item distributions of familiarity have equal standard deviations. This assumption is questionable. What it means is that if an item is studied, its familiarity is increased by an amount that is constant no matter what its position was in the new item distribution. For example, an item originally with familiarity one standard deviation below the mean of the new item familiarity distribution will have, after study, familiarity exactly the same distance below the mean of the old item distribution. This implies that, for the familiarity based process, there will be no item effects in learning- no items easier to learn than others beyond baseline differences- and this seems to contradict what we know about item effects. The assumption of equal standard deviations comes from the classical application of signal detection theory to perception where a fixed signal is added to noise and so the z-ROC is expected to have a slope of one. It is not obvious that the assumption should transfer to memory, and it appears that it does not work when combined with two factor theory. Unfortunately, if the assumption of equal standard deviations in old and new item familiarity is dropped from the two factor theory, then the theory has no way to predict either the shapes or the slopes of z-ROC curves so the theory will not be constrained; it will be consistent with almost any reasonable slope of the z-ROC function.

Another point that should be made explicit about the use of signal detection, a point of difference between the two factor theory and the global memory models, is that in two factor theory, a test item contacts its representation in memory to read out its familiarity (as in strength theory, Norman & Wickelgren, 1969). The test item does not contact other items in memory; if it did, the effects of these other items would be included in the determination of standard deviations in familiarity values as they are in the global memory models.

Process Dissociation, z-ROC Functions, and Single Process Models

According to the two factor theory, z-ROC curves should have slopes equal to one after applying process dissociation. In general, if the slope of the empirical include (and exclude) z-ROC is less than one, application of process dissociation is guaranteed to produce a slope nearer one. Therefore, finding that the estimated the slope of the recovered familiarity z-ROC slope is nearer one than the slope of the data is not a strong test of the theory.

To illustrate this point in more detail, assume that recognition confidence judgement responses come from a single underlying strength process, such as in the SAM model, with no second recollection process. Further assume that there are three different distributions of strength values, one for new items (mean=0, SD=1.0), one for items to be excluded (mean=1.0, SD=1.25), and one for items to be included (mean=1.5, SD=1.25). Distributions like these are what the z-ROC data from Ratcliff et al. (1992) imply if the familiarity distributions are normal. The distributions are shown in Figure 6, with seven confidence judgement criteria. If this were a true description of underlying processing arising from a single familiarity based process (e.g., one of the global memory models), there would be no recollection component, but the process dissociation equations could still be applied to the data. For the purposes of this illustration, the estimate of recollection is taken to be the difference between the include and exclude distributions at the highest confidence response category; the difference is 0.12. Then the slope of the recovered z-ROC for the hypothetical familiarity process is 1.0 (obtained as for the experimental data above). Both the original z-ROC curve for the include data and the recovered familiarity z-ROC are shown in Figure 7. The slope of the recovered curve is nearer one than the slope for the original data. The bottom two lines in Table 8 show the linear regressions for the include condition (the slope is the ratio of the standard deviations, 1.0/1.25, and the intercept is the difference in means, 1.5, divided by the included distribution standard deviation, 1.25) and the "familiarity" z-ROC regression slopes and intercepts.

INSERT FIGURES 6 AND 7 ABOUT HERE

This example illustrates that the process dissociation method is guaranteed to make the slope of the z-ROC larger. The method does so because it removes probability density from the upper right hand tail of the observed distribution which reduces the standard deviation for the old items.

Figure 7 also shows a bend at the high confidence end of the recovered z-ROC curve similar to that in Figure 3. If two factor theory were correct, the z-ROC curve should be linear, with no bend. For a single process model, if the original distributions are normal, then removing probability density from the high end necessarily results in the bend in the z-ROC function. In other words, the bend indicates a non-normal distribution in "familiarity" after process dissociation, a contradiction of two factor theory but consistent with single process models. The appearance of an bend in the recovered familiarity z-ROC is a strong pointer to a failure of process dissociation under the assumption of normal distributions (see also the "familiarity z-ROC function in Figure 5).

Conclusion

The process dissociation method is very appealing. It offers a method of separating conscious from unconscious components of processing, with the hope that such a separation will lead to better understanding of both. If correct, the method would begin to solve age old questions about the relative contributions of the conscious and unconscious to processing in any task. But the method is built upon a specific model of processing (as noted by Jacoby, 1991) and it must be considered in that context, not as a model-free procedure and not as a purely empirical procedure.

The potential strength of the process dissociation method lies in the include versus exclude manipulation. In recognition memory, for each of the sets of data we considered, the account of include/exclude data given by two factor theory was different than that given by other theories. It follows that the explanation of how unconscious processing is affected by experimental variables will be different for the different theories. For example, for the experimental results considered in this article, if the SAM account of recognition is correct, then the estimates from process dissociation of how familiarity and recollection are affected by experimental variables are wrong, or if the process dissociation estimates are correct, then the SAM account is wrong.

The strongest form of the logic of our argument is: Suppose SAM is the correct description of underlying processing and we generate data from SAM; then if we apply process dissociation to produce estimates of the contributions of two processes, those estimates will be incorrect because the data came only from one process. We fit SAM to experimental data to be sure this argument applies in the range of normal performance on recognition with include and exclude instructions.

The obvious challenge that arises from this situation is to find some way of choosing which theoretical account is correct. A traditional measure is falsifiability. For recognition memory, SAM can potentially fail in a multiplicity of ways internal to itself by making predictions that are incorrect. In contrast, two factor theory has just two assumptions that can lead to internally generated predictions. One is the assumption that familiarity is described by signal detection theory with equal variance in old and new item values of familiarity. We discussed this assumption in earlier sections of this article, and showed that it can be falsified. However, it can be viewed as an auxiliary assumption unnecessary to the basic two factor theory. Process dissociation can still be applied to data, with or without the signal detection theory assumptions. The second potentially falsifiable assumption of two factor theory is that the recollection and familiarity processes are independent. This assumption has been criticized by Curran and Hintzman (in press) and Joordens and Merikle (1993). However, whatever the result of those critiques, two factor theory can emerge with the process dissociation method intact. Even if there is dependence between the two factors, their relative influences on performance can still be computed from data (see Joordens & Merikle, 1993). Process dissociation is applied to exactly two performance measures (probability of a positive response in the include condition and probability of a positive response in the exclude condition), and two measures can always be fit by two parameters, so the model is not falsifiable at this level.

The SAM model has been very successful with recall phenomena (Raaijmakers & Shiffrin, 1981) and with recall and recognition interactions (Gillund & Shiffrin, 1984). With relatively few assumptions, it affords a reasonably coherent view of the effects on performance of a large number of independent variables in terms of the behavior of underlying parameters. It might seem that the model has enough freedom and enough parameters to deal with any pattern of experimental results. This would be correct if there was a one to one correspondence between parameters and empirical effects such that adjustments to one parameter completely controlled predictions for one variable, or if all the parameters varied in unprincipled ways to account for the effects of every variable. However, this is not the case. There are many situations in which the model is tightly constrained, and it requires insight into the structure of the model to determine what situations provide such constraint. One way in which predictions have been falsified is SAM's failure to predict the behavior of the z-ROC data discussed earlier in this article (see Ratcliff et al., 1992; Ratcliff et al., 1994). SAM fails, in part, because of the way variability is introduced into the encoding process, and changing this would result in a new and different model (see also Ratcliff, Shiffrin, & Clark, 1990; Shiffrin, Ratcliff, & Clark, 1990).

The point is that there are potentially multiple ways (some not intuitively apparent) in which SAM could be falsified by failures of predictions generated from its assumptions. But, in fact, the model has been remarkably accurate in its predictions (both qualitatively and quantitatively, as exemplified by the following list of successes:

List length. In the typical experiment, subjects do not know whether a list of words they are given is going to be a long list or a short list, so the encoding parameters of SAM remain constant across different list lengths (the self, interitem, and residual strength parameters). Increasing list length simply increases the number of items encoded into memory. The result is larger values of familiarity for items from longer lists and larger variability in their familiarity values. For example, for a new item presented as a test word, familiarity is twice as large for a list twice as long. This means that the familiarity criterion that separates positive from negative responses has to be moved as list length changes in order to keep it between the old and new item distributions (Gillund & Shiffrin, 1984, p. 64, and see the sections above where SAM was fit to Yonelinas and Jacoby's data).

When test items are presented, they must enter the short-term memory buffer just as study items do. They add to the items from the study list to increase the total number of items in the experiment. Therefore, their effects are modeled in the same way as variations in list length, and effects due to the position of a test word in the test list are accurately predicted.

Study time. The only variable that changes as a function of study time is the amount of strength that accumulates during encoding for each studied item. The strength parameters are fixed, multiplied by the amount of time an item spends in the encoding buffer (with some scaling factor). As with list length, the criterion has to be adjusted to keep it between the old and new distributions because both old and new items have higher familiarity values for lists with longer study times (see Gillund & Shiffrin, 1984, p. 64, and above).

List context effects. The context parameter in SAM was designed to allow retrieval to focus on subsets of information in memory, for example items from studied lists versus all the other items in memory, and items from one studied list versus another. In modeling list context effects, none of the encoding parameters can vary. The only parameter that can be adjusted is the weight placed on the context parameter for each context in the retrieval cue.

Rehearsal instructions. With "maintenance" instructions, subjects are instructed to rehearse each item during its entire presentation time and not to rehearse any other items during that time. With "elaboration" instructions, they are instructed to use the presentation time for an item to relate it to other items in the study list. The only parameters adjusted to fit data for this manipulation are the self-strength and inter-item strength encoding parameters (Gillund & Shiffrin, 1984, p25).

Item effects. Similarity of distractor test words is modeled by varying the residual strength parameter (with small adjustments to the criterion). Word frequency effects are also explained with the residual strength parameter: lower frequency distractors have a lower residual strength, so they are farther from the distribution of studied items than would be higher frequency distractors. For a complete discussion of word frequency and its interactions with other variables, see Gillund and Shiffrin (1984).

Demonstrations of the kind just summarized show how SAM accounts for empirical data in principled ways, and point to the most salient contrast between the global memory models and two factor theory. The global memory models' goal is to simultaneously and quantitatively explain the effects of a number of different variables on recall and recognition (and other tasks for some of the models). The goal of two factor theory is to explore possible dissociations between a conscious retrieval process, recollection, and an unconscious process, familiarity. If global memory models are ultimately found wanting, it will likely be because they are internally falsified by their own predictions. If two factor theory ultimately comes to be viewed with suspicion, it will likely be because its explanations of data are implausible for external reasons. Of course, both models should provide reasonable interpretations of data across a range of variables and tasks. But this kind of evaluation is difficult because what is reasonable must be defined from outside the theories. If we hypothesized that some variable would affect familiarity in the two factor theory and residual strength in SAM, but we turned out to be wrong- the variable affected two factor theory's recollection and SAM's focussing weights- then it would not be the theories that failed but our intuitive hypotheses. Failure of our intuitions is not necessarily grounds for rejection of a model; the model might be correct and our intuitions wrong. However, at the same time, two factor theory gains considerable strength from the consistency of its estimates of familiarity and recollection across a range of tasks. The prediction of this consistency and the evaluation that the prediction is a reasonable one come not from two factor theory but from other sources external to the theory.

Our conclusions apply to recognition memory, the domain of testing in this article. We believe testing can be especially provocative in this domain because there exist several well developed models. But for many tasks to which process dissociation might most fruitfully be applied, such as those that have been used to postulate implicit memory systems, there are no other well developed models against which to test two factor theory. For those domains, two factor theory will serve as the default model against which future models will be tested.


References

Atkinson, R. C., Herrmann, D.J., & Westcourt, K.T. (1974). Search processes in recognition memory. In R.L. Solso (Ed.), Theories in cognitive psychology: The Loyola Symposium (pp. 101-146), Hillsdale, NJ:Erlbaum.

Atkinson, R. C., & Juola, J. F. (1973). Factors influencing the speed and accuracy of word recognition. In S. Kornblum (Ed.), Attention and Performance IV, New York: Academic Press. Pp. 583-612.

Curran, T. & Hintzman, D. (in press). Violations of the independence assumption in process dissociation. Journal of Experimental Psychology: Learning, Memory, and Cognition.

Dosher, B.A., & Rosedale, G. (1989). Integrated retrieval cues as a mechanism for priming in retrieval from memory. Journal of Experimental Psychology: General, 2, 191-211.

Gillund, G., & Shiffrin, R.M. (1984). A retrieval model for both recognition and recall. Psychological Review, 91, 1-67.

Glanzer, M., Adams, J.K., Iverson, G.J., & Kim, K. (1993). The regularities of recognition memory. Psychological Review, 100, 546-567.

Glanzer, M., & Adams, J.K. (1990). The mirror effect in recognition memory: Data and theory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 5-16.

Hintzman, D. (1988). Judgments of frequency and recognition memory in a multiple-trace memory model. Psychological Review, 95, 528-551.

Hintzman, D.L. (1990). Human learning and memory: Connections and dissociations. Annual Review of Psychology Annual, 41, 109-139.

Jacoby, L. L. (1991). A process dissociation framework: Separating automatic from intentional uses of memory. Journal of Memory and Language, 30, 513-541.

Jacoby, L.L. & Dallas, M. (1981). On the relationship between autobiographical memory and perceptual learning. Journal of Experimental Psychology: General, 3, 306-340.

Jacoby, L. L. & Kelley. C. M. (1992). A process-dissociation framework for investigating unconscious influences: Freudian slips, projective tests, subliminal perception, and signal detection theory. Current Directions in Psychological Science, 1, 174-179.

Jacoby, L.L., Toth, J.P., & Yonelinas, A.P. (1993). Separating conscious and unconscious influences of memory: Measuring recollection. Journal of Experimental Psychology: General, 122, 139-154.

Jacoby, L. L., Woloshyn, V., & Kelley, C. (1989). Becoming famous without being recognized: Unconscious influences of memory produced by dividing attention. Journal of Experimental Psychology: General, 118, 115-125.

Jacoby, L.L., Yonelinas, A.P., & Jennings, J. (in press). The relation between conscious and unconscious (automatic) influences: A declaration of independence. To appear in J. Cohen & J.W. Schooler (Eds.), Scientific approaches to the questions of consciousness. Hillsdale, NJ: Erlbaum.

Joordens, S. & Merikle, P. M. (1993). Independence or redundancy? Two models of conscious and unconscious influences. Journal of Experimental Psychology: General, 122, 462-467.

Kendall, M.G., & Stewart, A. (1976), The Advanced Theory of Statistics, Vol 1. New York: McMillan.

Mandler, G. (1980). Recognizing: The judgment of previous occurrence. Psychological Review, 87, 252-271.

Mandler, G., & Boek, W.J. (1974). Retrieval processes in recognition. Memory and Cognition,, 2, 613-615.

McKoon, G., & Ratcliff, R. (1992b). Spreading activation versus compound cue accounts of priming: Mediated priming revisited. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 1155-1172.

Monsell, S. (1978). Recency, immediate recognition memory, and reaction time. Cognitive Psychology, 10, 465-501.

Murdock, B.B. (1982). A theory for the storage and retrieval of item and associative information. Psychological Review, 89, 609-626.

Norman, D.A. & Wickelgren, W.A. (1969). Strength theory of decision rules and latency in short-term memory. Journal of Mathematical Psychology, 6, 192-208.

Nosofsky, R.M. (1988). Exemplar-based accounts of relations between classification, recognition, and typicality. Journal of Experimental Psychology: Learning, Memory, & Cognition, 14, 700-708.

Raaijmakers, J.G.W. & Shiffrin, R.M. (1981). Search of associative memory. Psychological Review, 88, 93-134.

Ratcliff, R., Allbritton, D.W., & McKoon, G. (1994). Manuscript in preparation.

Ratcliff, R., Clark, S. E., & Shiffrin, R.M. (1990). The list-strength effect: I. Data and discussion. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 163-178.

Ratcliff, R., & McKoon, G. (1988b). A retrieval theory of priming in memory. Psychological Review, 95, 385-408.

Ratcliff, R., & McKoon, G. (1994a). Bias and the priming of object decisions. In press, Journal of Experimental Psychology: Learning, Memory, and Cognition.

Ratcliff, R., & McKoon, G. (1994b). Bias effects in implicit memory and information processing. Submitted Ratcliff, R., & McKoon, G. (1994c). Retrieving information from memory: Spreading activation theories versus compound cue theories. Psychological Review, 101, 177-184.

Ratcliff, R., McKoon, G., & Tindall, M. H. (1994). Empirical generality of data from recognition memory receiver-operating characteristic functions and implications for the global memory models. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 763-785.

Ratcliff, R., Sheu, C-F., & Gronlund, S. (1992). Testing Global Memory Models using ROC Curves. Psychological Review, 99, 518-535.

Richardson-Klavehn, A., & Bjork, R.A. (1988). Measures of memory. Annual Review of Psychology, 39, 475-543.

Schacter, D.L., Bowers, J., & Booker, J. (1989). Intention, awareness, and implicit memory: The retrieval intentionality criterion. In S. Lewandowsky, J.C. Dunn, & K. Kirsner (Eds.), Implicit memory: Theoretical issues (pp. 47-65). Hillsdale, NJ: Erlbaum.

Shiffrin, R.M., Ratcliff, R., & Clark, S. E., (1990). The list strength effect: II. Theoretical mechanisms. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 179-195.

Squire, L.R. (1992). Memory and the hippocampus: A synthesis from findings with rats, monkeys, and humans. Psychological Review, 99, 195-231.

Tulving, E., & Schacter, D.L. (1990). Priming and human memory systems. Science, 247, 301-306.

Warrington, E.K., & Weiskrantz, L. (1968). New method of testing long-term retention with special reference to amnesic patients. Nature (London), 217, 972-974.

Yonelinas, A.P. (in press). Receiver operating characteristics in recognition memory: Evidence for a dual-process model. Journal of Experimental Psychology: Learning, Memory & Cognition.


Author Note

This research was supported by NIMH grants HD MH44640 and MH00871 to Roger Ratcliff and by NIDCD grant R01-DC01240 and NSF grant SBR-9221940 to Gail McKoon.

Correspondence concerning the article should be addressed to Roger Ratcliff, Psychology Department, Northwestern University, Evanston, IL, 60208.


Appendix

Algorithm to Generate z-ROC Curves and P(R) from the Two Factor Theory

A hit rate and a false alarm rate for each of six confidence categories can be calculated from data as follows: Assume that the numbers of responses for old items in each category are n through n for the high confidence negative to the high confidence positive categories and that the numbers of responses for new items in each category are m through m for the high confidence negative to the high confidence positive categories. Then the hit and false alarm rates are computed as follows: If N=i=1,6n and M=i=1,6m, then the hit rate for a category i is h=k=1,in/N, and the false alarm rate for a category i is f=k=1,im/M.

Note: An empirical z-ROC curve can be generated from the z scores for these hit and false alarm rates for each of the categories; the z-ROC for one experimental condition from Ratcliff et al. (1994) is shown by the diamonds in Figure 3.

TO TEST TWO FACTOR THEORY: ASSUME SOME VALUE of P(R) (between 0 and 0.45); then:

1. Assume that the top half of the confidence judgement categories are all positive responses in order to calculate the hit rate hF, the probability based on familiarity alone of a yes response to an old test item. (If the data use include and exclude conditions as in Experiment 1, then P(R) can be estimated from the difference in hit rates for the two conditions.) Using process dissociation (Equation 1), P(F)=(P(I)-P(R))/(1-P(R)), so

hF= (h-P(R))/(1-P(R))

2. In the two factor theory, distributions of familiarity for old and new items are assumed to be normal with equal variance, so d' tables can be used to calculate d' for the familiarity process from hF and f.

3. With d' for the familiarity process, we can then calculate hit rates based on familiarity alone for all the confidence categories (hF) using the d' and the false alarm rates for those categories (f).

***There are two alternatives, 4 and 5 below, that can follow from this point and both are used in this article.

4. Using process dissociation again, we can add the assumed value of P(R) back to the familiarity process at each confidence level to generate the predicted empirical z-ROC curve for familiarity and recollection combined, plotting the predicted hit rates (hP) against false alarm rates (f):

hP = hF + P(R) - hF P(R).

Curves obtained in this way for 10 values of P(R) are plotted in Figure 3.

This way of generating a predicted z-ROC function assumes that P(R) is constant across confidence categories. P(R) was calculated by summing across the positive response categories; most of the recollection based responses should have come from the highest confidence positive category but it is possible some recollection based responses would occur in the medium and low confidence categories. If so, then the predicted hit rates might be a little too large in the upper right hand corner of the z-ROC function. However, allowing the amount of recollection at each of the positive confidence levels to be free parameters would weaken the predictive power of the model.

5. The values of hF represent hit rates based on familiarity alone, obtained by assuming that the value of P(R) is constant across confidence categories. An alternative is to use the empirical h, the familiarity based hF, and the process dissociation equations to calculate P(R) for each confidence category:

P(R) = (h - hF)/(1 - hF).

These values of P(R) can be plotted for each of the f (i.e., for each confidence category), as shown in Figure 4.

The figures are not available from www and if they are needed, they should be obtained from the authors. jctable.ps