Northwestern University Law Review : Colloquy : 2007 : Cole
Is the "Junk" DNA Designation Bunk?
By Simon A. Cole
A recent exchange on the Colloquy between Professors Joh and Kaye reflects a larger debate over civil liberties and DNA databases that has been raging for several years. In an essay drawing attention to the constitutional vacuum surrounding "abandoned" DNA—that is, DNA that we shed in public as we go about our daily lives—Joh raised a host of concerns about the proliferation of DNA databases, concerns that are heightened for "abandoned" DNA. In response, Kaye suggested that some of Joh's concerns constituted "science fiction." In this Colloquy Essay, I am primarily concerned with only one of the several issues of dispute between Joh and Kaye: whether the genetic markers used in law enforcement databases, colloquially characterized as "junk DNA," constitute a threat to privacy.
Part I presents some background on the civil liberties debate over DNA databases. In Part II, I seek to clarify the debate over the biological significance of forensic genetic markers. I conclude that, although Joh relied somewhat on innuendo by attributing potential biological function to forensic STRs, Kaye, in his rebuttal, also overstated his argument by claiming that forensic DNA has no predictive value or medical significance. At issue is what is meant by the term "medical significance." I hope to clarify the debate by drawing a sharp distinction between causal function and predictive significance.
In Part III, however, I suggest that the debate itself may have little significance. Whether or not a particular genetic marker "has predictive value" has a great deal to do with whether the relevant scientific actors choose to invest it with such value. Therefore, even if forensic DNA contains meaningful information, civil libertarians' efforts to slow the spread of genetic databases by exaggerating the value of genetic information is likely to be counterproductive. By hyping the supposed predictive value of all genetic information, civil libertarians may actually facilitate rather than forestall the spread of facile genetic determinism.
I. The DNA Database Debate
Civil libertarians worry about the inclusion of genetic information in law enforcement databases for several reasons, most hinging on an argument known as "genetic exceptionalism." Adherents to this argument contend that genetic information is more intimate and private than the information (such as biographical information, photographs, and fingerprints) that is already routinely held in law enforcement databases and is generally not viewed as a threat to civil liberties. Therefore, the storage of genetic information allegedly entails a greater violation of individual privacy than the storage of other forms of information.
Genetic data, it is argued, contains intimate information about an individual. Privacy advocates contend that "DNA samples can provide insights into personal family relationships, disease predisposition, physical attributes, and ancestry." Given this understanding of genetic information, civil libertarians and privacy advocates perceive a number of potential dangers from law enforcement storage of genetic information. First, the government itself may abuse the information stored in such databases, either through law enforcement agencies or through information leakage to other government agencies; after all, the memories of Japanese internment in the United States are not so old. A government agency ordered to round up individuals of a certain ethnic descent could, conceivably, perform ancestry testing on a genetic database to automate the process. Somewhat more speculatively, if a (real or spurious) genetic basis for criminal behavior were established, the government could seek to round up those individuals possessing the genetic stigma.
Even if the government itself were not inclined to abuse the information, the database may present a target for malefactors (or entrepreneurs) interested in the information. Perhaps the most salient threat is that employers or health insurers would want access to genetic information in order to mine it for disease propensity, potentially leading to "genetic discrimination." Although there are a number of state statues banning such discrimination and a federal statute under consideration, such statutes could someday be repealed.
II. On "Junk DNA"
The databases, however, are only as dangerous as the genetic data stored in them. What information is stored in a DNA database? Thirteen loci (locations on the human genome) known as single tandem repeats, or "STRs," are examined to produce the DNA profiles that are standard for databases in the United States. At each locus, people have two "alleles," repeating sequences of DNA base pairs; one is maternally inherited, one paternally. The numbers of repeats vary across individuals. A DNA profile, which is what is stored in a DNA database, is simply a list of the number of repeats found in each of the two copies of the repeating sequence for each locus.
Early in the debate over DNA typing, the loci used in law enforcement DNA testing were popularly characterized as lacking any particular biological function; they were commonly labeled "junk DNA." Among biologists the term "junk" was quickly replaced by "non-coding," but the colloquialism has stuck. The term is unfortunate because, as Kaye points out, there are several different biological entities that fall under the label "junk DNA," and also because, as Joh points out, much "junk DNA" is now believed to have a function. Furthermore, this terminology conflates two different issues. In one sense, "junk DNA" implies a lack of biological function—that the DNA doesn't do anything. This is the current understanding with regard to the STRs used in forensic profiling. Although biologists are discovering functions for some types of "junk DNA," none have yet claimed that the forensic STRs do function. In another sense, however, the term "junk" implies an absence of meaning—that forensic STRs are essentially empty information devoid of biological significance. But the two issues are distinct: a biological marker may be significant, in that it indicates a propensity for physical traits, but not functional, in that it does not cause those physical traits.
In the debate over DNA databases the term "junk DNA" has been invoked by defenders of the databases to blunt privacy concerns. Their argument is that, since forensic STRs are non-functional "junk," the genetic data stored in databases is meaningless. Therefore, there are no privacy concerns regarding DNA profiles. What this argument misses is that forensic STRs may have significance, even if they do not function, so there still may be reason for privacy concerns. For instance, if a particular allele at one of the CODIS loci correlates with certain physical traits, knowledge of an individual's allele at that locus could help predict whether that individual will develop that disease even if that allele plays no role in actually causing those physical traits.
In Professor Joh's Essay about the practice of law enforcement seizure of "abandoned" DNA she argued that the claim that forensic STRs have no significance was disingenuous. Joh argued that abandoned DNA should be afforded greater legal protection than, say, abandoned fingerprints because of the greater intrusiveness of genetic information. In response to database advocates' contention that DNA profiles contain only innocuous information, Joh cited several scientific reports debunking claims that various types of DNA labeled "junk" have no biological function. This was not the first time someone had claimed that the "junk" designation for the CODIS STRs might be overblown.
In his reply, Professor Kaye pointed out that there are a great many different types of non-coding DNA, of which the CODIS STRs are just one. He explained that the scientific reports cited by Joh concerned other types of non-coding DNA, not the CODIS STRs, argued that "Joh's account sweeps all noncoding DNA under the same rug," and accused Joh of "alter[ing]" one scientific report concerning non-coding DNA to appear to refer to STRs when, in fact, it referred to a different type of non-coding DNA. The scientific report that suggested the junk DNA designation "may well go down as one of the biggest mistakes in the history of molecular biology" was discussing "the possibility that the intervening noncoding sequences may be transmitting parallel information in the form of RNA molecules." Kaye argued, "It is a leap from this possibility to the conclusion that the forensic STRs—which do not generate RNA molecules and are not conserved across species—are functional or . . . will prove useful for predicting disease." Moreover, Kaye notes, "no forensic STR locus has been found to be predictive."
Kaye persuasively demonstrates that Joh may have been over-inclusive in her conceptualization of non-coding DNA. Certainly, it is plausible that a legal scholar lacking the sophisticated understanding of genetics that Kaye clearly wields may overlook the nuances that he describes. However, Kaye's conclusion, that "any claim that the DNA profiles currently used for identification constitute 'predictive medical information' is false," seems to paint with an equally broad brush.
In particular, Kaye's source for his assertion that "no forensic STR locus has been found to be predictive," an article by John Butler, supports that statement. But, several paragraphs later, Butler says the following:
This means that even though the forensic STRs don't, as far as we know, cause disease, they correlate with the genes (or assortments of genes) that do, and thus may be useful for tracking which individuals have the disease-causing genes. Essentially, Butler claims that the STRs are useful in "following" or "tracking" genetic disease. Whether or not the medical community ultimately chooses to use forensic STRs or some other genetic test to screen for any particular disease, it is misleading to claim that forensic STRs have no medical significance, are devoid of information, or are completely innocuous from a privacy standpoint.
If this is the case, then Kaye's claim that there is currently no plausible theory supporting the idea that STRs might be predictive of disease is not fully informative. Butler states that some forensic STRs are already predictive, though not causal, of disease, and more may ultimately turn out to be. Predictive relationships are not necessarily causal, as Kaye understands as well as anyone. Just as Joh's account elides the distinction between different kinds of non-coding DNA, Kaye's elides the distinction between causal and predictive significance.
The debate can perhaps be clarified by distinguishing more sharply between function and significance. Joh imputed function to forensic STRs on the basis of functionality claims that pertain to other types of non-coding DNA. Kaye, in debunking the functionality of forensic STRs, simultaneously dismisses their significance. But the significance, if not the functionality, of forensic STRs remains. Forensic STRs are potentially significant because they may turn out to be useful for predicting physical traits.
Therefore, forensic STRs may, in fact, be precisely the kind of "predictive medical information" that concerns privacy advocates. This state of affairs may not raise enormous privacy issues, especially compared to DNA samples, but it does not render forensic STRs immune to privacy concerns, either. Joh is correct that claims that forensic STRs "cannot reveal medical information" are overblown, although not quite for the reasons she stated. The claims are overblown not because STRs have causal functions, but because they may have predictive utility. Under such circumstances, dismissive statements that forensic STRs "tell us nothing truly private," "have no meaning except as a representation of molecular sequences at . . . loci that are not indicative of an individual's personal traits or propensities," "are not socially or medically significant," "reveal nothing about propensities to disease, behavioral traits, or the like," are "like fingerprints or license plate numbers . . . only useful for identification purposes," or that a DNA profile "is very much like a social security number—though it is longer and is assigned by chance, not by the federal government" and "can tell nothing about a person" elide the distinction between causal function and associative utility. Such statements seem overbroad given that Kaye agrees that forensic STRs can be and are used to track disease. If not downright misleading, such statements at least have a high likelihood of being misunderstood by the lay public, and even by scholars, to mean that forensic STRs have no biological significance whatsoever and therefore pose no privacy threat, when they do and they may.
III. On Genetic Exceptionalism
All that being said, I am not a genetic exceptionalist. In fact, I believe that about the worst thing we can do in the debates over the use of forensic genetic information is attribute exaggerated predictive powers to genes. I have elsewhere cautioned libertarian leaning scholars against endorsing the predictive power of genes, even when attempting to alert people about the potential abuses of forensic genetic technologies. I believe that the ideology of genetic determinism poses a greater threat to liberty than does the technology of DNA databanking, and I could not make this point better than Kaye does when he writes, "[a] warrant requirement will not make much difference to a society that, under the sway of a naive and discredited theory of genetic determinism, is willing to lock people away on the basis of their genes."
One reason that I am not a genetic exceptionalist is because genetic exceptionalism incorrectly portrays fingerprints as devoid of hereditary information. As I have argued elsewhere, the widespread view of fingerprints as devoid of information stems from a social decision not to invest in research exploring correlations between fingerprint patterns and race, ethnicity, disease, and behavioral propensities, not from a biological absence of such correlations. In other words, it is not true that fingerprint patterns cannot be correlated with perceived ethnicity or even disease or behavioral characteristics. It is simply that we, as a society, have chosen not to make much of these correlations. Kaye is among the few scholars to correctly understand this admittedly counterintuitive point, and he is correct when he notes that "Joh's assertion that fingerprints 'cannot reveal any more information [than identity] about the person from whom they have been collected' is mistaken."
Thus, when we speak of "predictive medical value," we are actually speaking of correlations, and all sorts of somatic markers may have causal or non-causal correlations with all sorts of biological attributes: ancestry, disease, behavioral propensity, and the like. Some genetic markers have stronger correlations than fingerprint patterns, but the apparent "strength" of biological correlations derives only in part from nature. The strength of these correlations also reflects society's decision to invest scientific resources toward finding such correlations. Even though the strength of a particular correlation may be statistical fact, the strength of known correlations is to some extent a social fact, in that it is dependent on how and in what areas research investments are made in order to develop the strongest possible correlations. Today, of course, we invest heavily in genetic correlations—and little in fingerprint correlations.
In this sense, it is somewhat misguided for us to debate whether or not genetic information "has" predictive value because, to some extent, the answer will be determined by how hard we as a society try to impute predictive value to that genetic information. This is why I am worried about the increasing use of forensic DNA technology: not because I think that "junk DNA" has predictive value, but because I am worried that the widespread collection and social investment in such information will provide an irresistible temptation to treat it as if it does have such value. And the temptation will be to construct correlations along lines that have social resonance—which is to say, especially in the realm of criminal justice in the U.S. today, along racial lines.
Because of this temptation, Kaye's "optimistic" future in which "researchers will find alleles associated with propensities such as risk-taking that are more common in some groups than others, but such alleles will not be unique to any group" and, therefore, will not implicate race, is somewhat naïve. Professor Duster, for example, argues that researchers will be more likely to exploit group differences that pertain to socially resonant categories like race than non-resonant categories. Thus, even if stigmatized alleles are not unique to any group, variations in their appearance in different racial groups may still be exploited to suggest racial correlations with medical or behavioral propensities. Genetic exceptionalism, despite its best intentions, does not forestall such a future; it facilitates it.
One example of this can even be found for the forensic STRs. Phenotypic profiling, the prediction of the "race" of an unknown perpetrator, is already being used in criminal investigations. This practice, which Kaye has found constitutionally unobjectionable and, at least potentially, "statistically valid," would seem to belie the conception of forensic STRs as "not socially or medically significant." They "predict race." Of course, "race" is not thought to have any coherent biological meaning, so what they really predict is phenotypically perceived race or what Professor Hammonds calls "embodied" race. But, through "race," we are back to disease prediction because of the current resurrection of race in biomedical research. In short, "junk DNA," like "race," may lack biological meaning but have plenty of social meaning. Under such circumstances, it is a far cry to call it "junk."
The privacy threat posed by forensic STRs may not be great. However, if citizens concerned about genetic privacy are being asked to make policy decisions about the implementation of genetic databases, they should be clearly and completely informed about the potential privacy risks posed by forensic genetic databases. Calling forensic STRs "junk," "not socially or medically significant," or "as meaningless as fingerprints" does not inform clearly or completely. If some forensic STRs are correlated with genes that cause physical traits, though they do not cause the physical traits themselves, the public can be informed of that fact. The public can decide for itself whether and to what extent the privacy risk offsets the benefits of genetic databases. Blending a relative lack of risk into a claim of no risk at all may reassure the public in the short term, but in the long term, as the blurring of the facts becomes known, will only breed misunderstanding and mistrust.
*. Associate Professor of Criminology, Law & Society, University of California, Irvine; Ph.D. (science & technology studies), Cornell University; A.B., Princeton University. This material is partially based upon work supported by the National Science Foundation under Grant Nos. SES-0115305 and IIS-0527729 and the National Institutes of Health under Grant No. HG-03302. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation or the National Institutes of Health. For comments on an earlier draft of this manuscript, I am grateful to Norah Rudin, Lawrence Mueller, William C. Thompson, Michael Montoya, Jonathan Kahn, Troy Duster, David Kaye, Elizabeth Joh, and Richard McCleary. Responsibility for any errors is mine.
1. Elizabeth E. Joh, Reclaiming "Abandoned" DNA: The Fourth Amendment and Genetic Privacy, 100 Nw. U. L. Rev. 857 (2006) (link); David H. Kaye, Science Fiction and Shed DNA, 101 Nw. U. L. Rev. Colloquy 62 (2006), http://www.law.northwestern.edu/lawreview/colloquy/2006/7/ (link).
2. See generally DNA and the Criminal Justice System (David Lazer ed., 2004) (exploring the civil liberties implications of the development of DNA databases).
3. See George Annas, Genetic Privacy, in DNA and the Criminal Justice System, supra note 2, at 135– 36; Elizabeth E. Joh, supra note 1, at 873.
4. Tania Simoncelli, Dangerous Excursions: The Case Against Expanding Forensic DNA Databases to Innocent Persons, 34 J.L. Med. & Ethics 390, 392 (2006).
6. See, e,.g., Tania Simoncelli & Barry Steinhardt, California's Proposition 69: A Dangerous Precedent for Criminal DNA Databases, 33 J.L. Med. & Ethics 279, 288 (2005).
7. Mark A. Rothstein & Meghan K. Talbott, The Expanding Use of DNA in Law Enforcement: What Role for Privacy?, 34 J.L. Med. & Ethics 153, 158 (2006).
8. Barry Steinhardt, Privacy and Forensic DNA Data Banks, in DNA and the Criminal Justice System, supra note 2, at 173, 182.
10. These 13 STRs are sometimes called the CODIS STRs because they are the STRs used by the Combined DNA Index System administered by the FBI. All U.S. DNA databases use these CODIS STRs in order to be compatible with FBI standards.
11. Kaye, supra note 1, at 64.
12. Joh, supra note 1, at 870.
13. See, e.g., Richard Ingham, Landmark Study Prompts DNA Rethink, Discovery Channel, June 14, 2007, http://dsc.discovery.com/news/2007/06/14/genetics_hea_print.html (link); Colin Nickerson, DNA Study Challenges Basic Ideas in Genetics: Genome 'Junk' Appears Essential, Boston Globe, June 14, 2007; Kaye, supra note 1, at 64-65.
14. See, e.g., Akhil Reed Amar, The Supreme Court 1999 Term—Foreword: The Document and the Doctrine, 114 Harv. L. Rev. 26, 126 (2000) (link) ("[T]here is a clean way of protecting private information of this sort [disease predisposition] by using only part of the DNA code, so-called 'junk DNA,' that only identifies a person but tells us nothing truly private—the DNA equivalent of a fingerprint."); Lisa Schriner Lewis, The Role Genetic Information Plays in the Criminal Justice System, 47 Ariz. L. Rev. 519, 523 (2005) (link) ("[T]he use of junk or non-coding sequences means that while the obtained genetic profile can distinguish individuals, it does not reveal physical traits or genetic predisposition to diseases or conditions."); Randall S. Murch & Bruce Budowle, Are Developments in Forensic Applications of DNA Technology Consistent with Privacy Protections?, in Genetic Secrets: Protecting Privacy and Confidentiality in the Genetic Era 212, 224 (Mark A. Rothstein ed., 1997) ("The predisposition of a donor to one or more genetically induced conditions is generally not retrievable from forensic genetic data; only the potential for the individualization of a donor to the exclusion of all others (or an exclusion of the evidence sample) by the genetic information can be obtained.")
15. It should be noted, however, that the debate over the biological significance of DNA profiles is somewhat of a distraction from the larger issue of the threat to privacy posed by the banking of DNA samples. In every U.S. state except Wisconsin, the state is permitted to store the original biological samples, containing the full complement of genetic information. Tania Simoncelli, supra note 4, at 392. Even those who are sanguine about the privacy threat posed by profiles are concerned about the storage of samples. Michael E. Smith, Let's Make the DNA Identification Database as Inclusive as Possible, 34 J.L. Med. & Ethics 385, 387 (2006).
16. Joh, supra note 1, at 870.
17. Id. at n.74.
18. Pamela Sankar, DNA-Typing: Galton's Eugenic Dream Realized?, in Documenting Individual Identity 273, 285–86 (Jane Caplan & John Torpey eds., 2001); Steinhardt, supra note 8, at 173.
19. Kaye, supra note 1, at 64.
20. Id. at 65.
24. Id. (emphasis added).
25. Id. at 64.
26. Id. at 62–63.
27. See id. 64; John M. Butler, Genetics and Genomics of Core Short Tandem Repeat Loci Used in Human Identity Testing, 51 J. Forensic Sci. 253, 260 (2006).
28. Id. at 260 (quoting Colin Kimpton et al., Report on the Second EDNAP Collaborative STR Exercise, 71 Foresnic Sci. Int'l 137 (1995).
29. David L. Faigman et al., Science in the Law: Standards, Statistics and Research Issues, 135 (2002).
30. Supra note 24 and accompanying text.
31. Joh, supra note 1, at 870.
32. Amar, supra note 14.
33. David H. Kaye & Michael E. Smith, DNA Databases for Law Enforcement: The Coverage Question and the Case for a Population-Wide Database, in DNA and the Criminal Justice System, supra note 2, at 247, 256.
34. D.H. Kaye, Who Needs Special Needs? On the Constitutionality of Collecting DNA and Other Biometric Data from Arrestees, 34 J.L. Med. & Ethics 188, 194 (2006).
35. Id. (emphasis added).
36. Interview by telephone Congressional Quarterly with David Kaye (Sept. 16, 2003), available at http://www.law.asu.edu/?id=8606 (link). Kaye's use of the fingerprint analogy is complicated because Kaye recognizes the little known point that fingerprints are not devoid of hereditary information, albeit not particularly useful information. See infra note 42 and accompanying text. Thus, when Kaye calls something (like a DNA locus) "as meaningless as fingerprints," David H. Kaye, Two Fallacies About DNA Data Banks for Law Enforcement, 67 Brook. L. Rev. 179, 188 (2001) (link), he may not necessarily mean that it actually is meaningless, only that it has little meaning. However, it must be recognized that most lay readers and policymakers and not a few scholars, inculcated in the popular view that fingerprints are empty signifiers, would read such an assertion to mean that the DNA locus is truly "meaningless."
37. D.H. Kaye & Michael E. Smith, DNA Identification Databases: Legality, Legitimacy, and the Case for Population-Wide Coverage, 2003 Wis. L. Rev. 413, 431 (2003). See also Michael E. Smith, Let's Make the DNA Identification Database as Inclusive as Possible, 34 J.L. Med. & Ethics 385, 387 (2006).
38. Kaye's view appears to be that we need not sound alarms unless we can foresee a clear pathway toward diagnostic uses of forensic STRs. Since such a pathway is not clear at this point in time, he sees little point in alarming the public, perhaps unnecessarily. This is perhaps simply a difference in policy preference, but I would tend to err on the side of informing the public of the potential risks. My view is largely informed by my sense that Kaye's subtle understanding of the science is likely to be misread as claiming that forensic STRs are completely meaningless and innocuous.
39. Simon A. Cole, The Myth of Fingerprints, 19 GeneWatch 6, (Nov.–Dec., 2006).
40. Kaye, supra note 1, at 66.
41. See Simon A. Cole, Suspect Identities: A History of Fingerprinting and Criminal Identification, 100–01 (2001).
42. David H. Kaye, supra note 36, at 185.
43. Kaye, supra note 1, at 64.
44. Kaye, supra note 1, at 67.
45. Troy Duster, Comparative Perspectives and Competing Explanations: Taking on the Newly Configured Reductionist Challenge to Sociology, 71 Am. Soc. Rev. 1, 10 (2006).
46. See, e.g., Pilar N. Ossorio, About Face: Forensic Genetic Testing for Race and Visible Traits, 34 J.L. Med. & Ethics 277 (2006).
47. Edward J. Imwinkelried & D.H. Kaye, DNA Typing: Emerging or Neglected Issues, 76 Wash. L Rev. 413, 449 (2001).
48. Id. at 450.
49. D.H. Kaye, supra note 34, at 194.
50. Evelynn M. Hammonds, New Technologies of Race, in Processed Lives: Gender and Technology in Everyday Life 107 (Terry and Calvert eds., 1997).
51. E.g., Duster supra note 45; Jenny Reardon, Race to the Finish: Identity and Governance in the Age of Genomics (2005); Jonathan Kahn, How a Drug Becomes 'Ethnic': Law, Commerce, and the Production of Racial Categories in Medicine, 4 Yale J. Health Pol'y L. & Ethics 1 (2004); Michael J. Montoya, Bioethnic Conscription: Genes, Race, and Mexicana/o Ethnicity in Diabetes Research, 22 Cultural Anthropology 94 (2007).
52. D.H. Kaye, supra note 34, at 194.
53. David H. Kaye, supra note 36, at 188.
Copyright 2007 Northwestern University
Cite as: 102 Nw. U. L. Rev. Colloquy 54 (2007), http://www.law.northwestern.edu/lawreview/colloquy/2007/23/.
Persistent URL: http://www.law.northwestern.edu/lawreview/colloquy/2007/23/