Unravelling a killer’s DNA: Identity, genetic privacy and the Golden State Killer case

Originally posted 14 September 2018

In April 2018, it was reported that the so-called “Golden State Killer” (GSK) – a rapist and murderer who had terrorized northern California in the late 1970s and early 1980s – was finally tracked down and arrested using DNA evidence. A special “cold case” investigative unit extracted the suspect’s DNA from a rape kit from one of the decades-old crime scenes. This DNA was used to generate a profile on an online, open-source, genealogy service called “GEDmatch.” The suspect’s DNA matched to several individuals who appeared to be distant relatives of the killer and who shared a great-great great grandparent with him. By tracing the ancestry of these individuals, the investigators were able to identify this common ancestor. They then began to work forwards, reconstructing family trees of all the descendants of this individual using traditional genealogical methods including church records, newspapers, and other historical documents. Eventually police identified a living individual from these trees who matched the profile of the GSK. Detectives then confirmed that the DNA of this suspect matched the original rape kit before arresting Joseph James DeAngelo.

The details of the story have been set out in the news media (for example: Winton et al., 2018; Zhang 2018). Alongside this, a few commentators have also articulated worries about the “dystopian” (Selk 2018) implications of this form of investigation, especially the implications for privacy. In particular, the GSK case seems to suggest a future world in which we can all be tracked and traced via not only our own DNA, but also the DNA of our distant relations.

Scholars in science and technology studies (STS) have long been concerned with the social and political implications of new genetic and biotechnologies. Over the last several decades, they have articulated various notions of “biological citizenship” or “biocitizenship.” Rose and Novas (2005), for example, argue that biocitizenship encompasses “all those citizenship projects that have linked their conceptions of citizens to beliefs about the biological existence of human beings, as individuals, as families and lineages, as communities, as populations, races, and as a species.” Although these ideas have roots in the nineteenth and twentieth centuries, Rose and Novas suggest that we are experiencing new forms of biocitizenship in the twenty-first century. These forms are rooted in not only new biotechnologies but also in “biovalue” (Waldby 2002) and “biosociality” – namely, in the fact that biological objects (including human bodies) are the grounds for many new forms of value as well as new ways of groupings of humans in the twenty-first century. Rather than being linked to our immediate families and communities, new biotechnologies (and their digital penumbra) afford us possibilities of new connections to others based on DNA, disease status, and so on.

I suggest that the “solving” of the GSK case was not some sort of fortunate or unfortunate side effect of the technologies of genetics, genomics, and DNA data-banking. Rather, the ability to access and generate the kinds of data necessary for this detective work is exactly what these technologies have been designed to achieve. From an STS perspective, we might say that the GSK manifests the new forms of biocitizenship and genetic identity that have been emerging from new genetic and biotechnologies for several decades in an especially potent form.

I argue here that the threat posed by the GSK case is part of the broader social and political problem of unrestricted sharing and circulation of personal data. Mitigating the “dystopian” risks of such data sharing will require, amongst other things, re-thinking notions of privacy in far more “collective” terms that offer possibilities for placing limits on such data circulation.

Private companies, public data

The possibility of the kind of forensic investigation involved in the GSK case is, of course, premised on the widespread existence of personal genetic information. In particular, it is premised on the fact that large enough proportion of the population has participated in genetic profiling; this, in turn, has been enabled by the availability of cheap DNA profiling technology and aggressive marketing by firms such as Ancestry.com and 23andMe.

The majority of these profiles have been collected by these private companies and are stored in their proprietary databases. In the first instance, then, the GSK case is tied to the personal genomics industry and the forms of value it has created – this includes both the financial value associated with these products as well as the perceived social and medical value of knowledge about individual health and individual ancestry that the companies sell.

However, the GSK investigation did not directly utilize such proprietary databases. Although information in such databases could potentially be tapped by law enforcement with a warrant or subpoena, 23andMe claims that it has never shared any of its data under such circumstances (Balsamo 2018; see also: https://www.23andme.com/law-enforcement-guide/). Instead, the investigation relied on the fact that individuals had chosen to make their data available on public, online databases such as GEDmatch. In most cases, this means that individuals have downloaded their data from commercial private websites and then uploaded it into public ones.

This is done in the hope that public, open source data may provide additional, valuable information about personal health and ancestry that is not available on the more heavily regulated commercial sites. In other words, it draws directly on the notion that one’s personal genome contains information of value – value that may be realized by the individual. One review of GEDmatch on the blog “The Legal Genealogist” notes the reason for using the site: “Gedmatch offers a range of utilities that make it a little easier to extract every bit of potentially useful information out of your autosomal test results” (Russell 2017).

Police investigators were able to use this open information to find links between the crime-scene DNA and individuals who had voluntarily uploaded their data. The ability to carry this through is the result of attempts of individuals to capitalize on the value of their own investment in personal genomics and in their DNA by sharing it as publicly. Many of the concerns about personal genomics have centered on the possibilities of breaches and illegitimate uses of genomic data by insurance companies or employers. The use of this “open” data in the GSK case suggests that risks emerge not only from malfeasance, but also from the broader conception of DNA as value. At least for some people, the potential realization of this value justifies the risk of harms.

New relations

The GSK investigation also depended on the exploitation of new social groupings that emerge from new biotechnologies. Specifically, sites such as GEDmatch are designed to expose and make legible links based on similarities in autosomal DNA. Many of the tools on the website – such 2D and 3D chromosome browsers – are explicitly designed for this purpose. Charts show “selected matches not only as they match you but as they match each other, including a very useful option for displaying estimated distance to the most recent common ancestor” (Russell 2017). This is exactly what investigators needed to do their work. Both the existence of these ready tools (as well as the plausibility of this sort of matching) suggests how the GSK case is embedded within regimes of biosociality in which we are increasingly linked to other individuals via DNA and other forms of personal data.

These new “digital” links bring some relations to the fore and push others into the background. Kim Tallbear’s (2013) work has examined how DNA and DNA testing has been used to remake notions of tribe and identity in native American communities in the US. These regimes privilege some forms of relatedness (namely, genetic ones) while other kinds of community and historical ties are undermined. Tracking suspects via DNA matching similarly acts to increase the significance of genetic ties (cousins, uncles, great grandparents) at the expense of other possible conceptions of family, identity, and community.

Connections

The re-energerization of these distant family ties brings into focus what are the most troubling aspects of the GSK case. First, that the investigations involved persons, living and dead, totally unconnected to the crimes. These individuals were likely involved wholly without their knowledge or consent. Second, the evidence that eventually ensnared the GSK came not from the killer himself, but from these other individuals. What is perhaps most shocking to most people reading about this case is that we can be implicated not by what we ourselves have shared, but by what has been shared by others. In this case, these others are sufficiently distant relatives to be total strangers.

These kinds of connections and linkages are not incidental – they are central to the ways in which genetic technologies do their work, drawing geographically and temporally distant people into social and political proximity with one another. They are exactly directed towards the formulation of groupings and the increasingly investment of meaning in such groupings.

Moreover, these methods rely on the basic facts of genetic inheritance: that we share about half our DNA with our closest relatives (parents, children, siblings), one quarter with grandparents, grandchildren, uncles, and aunts, one eighth with great-grand-relations, cousins, and so on. Such overlaps mean that whatever we choose to reveal about ourselves also reveals significant amounts of information about others. This seems to make DNA quite different from other kinds of personal data. A social security number can be changed, but one’s DNA cannot – we cannot, as far as current technology allows anyway, dissociate ourselves from it.

The main risk that seem to be associated with the GSK case, then, are linked to the fact this kind of DNA police work ties us irrevocably to others; surveil them and you are surveilling us too. It links our privacy to directly to the privacy of others and it makes us dependent on the actions of others with whom we may have little or nothing in common. This poses a quandary within a culture within which notions of privacy are based on individual rights. Within the American legal tradition in particular, privacy is associated with the individual self, not families, communities, or other groups (Warren and Brandeis, 1890). As the GSK case makes clear, however, DNA extends beyond the boundaries of the individual. Not only does it stretch to distant relatives, but it also extends across time: both ancestors and descendants share significant fractions of our DNA. A choice to upload our own DNA to a site such as GEDmatch also implicitly makes a decision about the levels of privacy to which our children and great-grandchildren may be entitled to expect. Do we have the right to make such a decision on behalf of others?

Collective privacy

Dealing with DNA privacy requires a different conception of privacy that moves beyond the individual. Articulating a notion of “collective privacy” is likely to be the only means through which we can mitigate some of the risks associated with DNA technologies. According to one definition, “Collective privacy is a means to protect data and information about groups of people rather than individuals, and is a key issue when data collection occurs on identifiable groups that wish to manage or control information about themselves” (Kamira, 2006, p. 44). This seems pertinent to the new “identifiable groups” drawn together through DNA technologies.

Notions of collective privacy have been developed in two separate domains of scholarship. First, collective privacy has been articulated as a way to protect the rights of indigenous communities against aggressive regimes attempting to appropriate DNA and other bio-samples. In the Human Genome Diversity Project (HGDP), for example, human geneticists attempted to collect blood samples from isolated populations throughout the globe in order to create record of human genetic diversity (Reardon 2004). During the project, questions of informed consent became paramount: who within a community had the ability to give consent? Should one person be able to speak for many?

Indigenous communities worried that unique patterns in their DNA might be appropriated to create biological products, to undermine claims to rights or land, or even to target weapons or diseases against them. DNA taken from any one individual within a group might have consequences for the whole community. As such, these communities have long had to grapple with notions of collective privacy for tackling the challenges presented by applications of biotechnology and biomedicine.

The second domain in which collective privacy has been explored in detail is in studies of online social media and big data. Here, users often share photos, videos, or other media which includes private information not only about themselves but about family, friends, co-workers, and so on. “It is not simply user-generated data that has caused the possible privacy breaches; rather, the networked nature of such user data from users and their social ties, and the uncoordinated actions of individuals (both in terms of information sharing and information management), lead to the new privacy challenge” (Jia and Xu 2016). In the domain of big data, the demarcation of groups according to various algorithmic analytics creates risks of discrimination on a collective level that may not exist on the individual level (Mantelero 2016).

Circulations of data

“Collective privacy” suggests that we should have some say over what happens to data that can affect our lives, even if we do not “own” it directly. It recognizes that societies have collective interests that go beyond the aggregation of personal confidentiality and private interest. In her recent account of the uses and threats of big data, Cathy O’Neil (2017) argues that systems and algorithms that collect personal information about us (our habits, shopping, etc., online and offline) need to be open to scrutiny. Without such oversight, algorithms and the companies that control them will have created “weapons of math destruction” that discriminate against the poor and minorities, as well as likely being unfair for almost everyone else. They do this by aggregating information about our past behavior into patterns that place us into broad categories: “unlikely to make loan payments on time,” or “depressed and vulnerable,” for instance.

Like our DNA, these patterns and categorizations implicitly connect us to people we probably will never know. Like our DNA, personal data may be put to use in ways that are unconnected to the original purpose for which they were collected. Like our DNA, they may be acting to our disadvantage in ways we will never know. O’Neil’s solution is to make the ways in which these data are collected and used more transparent. Another complementary possibility is that we retain some level of control over data; this may be impossible at an individual level, but as communities, groups, or as a society we might have more success in controlling what can and cannot be done with data. Rather than treating data privacy (whether online shopping or DNA) as an individual problem, we need to think about ways to act collectively to protect our rights and to set limits on what may and may not be permitted. As in the realm of big data, this may include “audits” of data trails, greater oversight over those using data, and more regulations about when and how data can be circulated or shared. Collective ownership over data implies an ongoing responsibility and stewardship over data.

While the arrest of the GSK is no doubt a good outcome, the kinds of data circulations that undergird this investigation raise serious questions about how else we may all be linked and sorted via our DNA in the future. The commercial regimes emerging around genetic and biotechnologies have vested DNA with value – this value is conceptualized as something that can be captured, bought, and sold by the individual. However, these regimes also leave companies, government agencies, law enforcement largely free to capture this value in a wide variety of ways through the sharing and circulation of DNA data. Such circulations potentially leave all us of vulnerable – we have little control or say in how such data may be used in the future. This is the core of worry that led RAICES to reject offers from DNA testing companies to help re-unite immigrant children with their parents: the short-term outcomes may be positive but it is unclear how such data may circulate in the future (Kofman 2018). By conceptualizing DNA data as a collective, rather than individual, good we may in a better position to realize the benefits of DNA testing while guarding against potential future harms.

References

Balsamo, M. (2018) A look at DNA testing that ID’d a suspected serial killer. Washington Post, 26 April. https://www.washingtonpost.com/national/health-science/a-look-at-dna-testing-that-idd-a-suspected-serial-killer/2018/04/26/

Kamira, R. (2006) Kaitiakitanga and health informatics: introducing useful indigenous concepts of governance in the health sector. In: Dyson, L. E., Hendriks, M., and Grant, S., eds. Information Technology and Indigenous People. Hershey, PA: Idea Group. Pp. 30-51.

Jia, H. and Xu, H. (2016) Measuring individuals’ concerns over collective privacy on social networking sites. Cyberpsychology: Journal of Psychosocial Research on Cyberspace 10, no. 1: article 4.

Kofman, A. (2018) DNA testing may help reunite families separated by Trump. But it could create a privacy nightmare. The Intercept, 27 June. https://theintercept.com/2018/06/27/immigration-families-dna-testing/

Mantelero, A. (2016) From group privacy to collective privacy: towards a new dimension of privacy and data protection in the big data era. In: Taylor, L., Floridi, L., van der Sloot, B., eds. Group Privacy. Philosophical Studies Series, 126. Springer. Pp. 139-157.

O’Neil, C. (2017) Weapons of math destruction: how big data increases inequality and threatens democracy. New York: Broadway Books.

Reardon, J. (2004) Race to the finish: Identity and governance in an age of genomics. Princeton: Princeton University Press.

Rose, N. and Novas, C. (2005) Biological Citizenship. In: Ong and Collier, eds. Global Assemblages: Technology, Politics, and Ethics as Anthropological Problems. Oxford: Blackwell. Pp. 439-463.

Russell, J.G. (2017) Updated look at GedMatch. The Legal Genealogist, weblog, 26 March. https://www.legalgenealogist.com/2017/03/26/updated-look-at-gedmatch/

Selk, A. (2018) The ingenious and ‘dystopian’ DNA technique police used to hunt the ‘Golden State Killer’ suspect. Washington Post, 28 April. https://www.washingtonpost.com/news/true-crime/wp/2018/04/27/golden-state-killer-dna-website-gedmatch-was-used-to-identify-joseph-deangelo-as-suspect-police-say/

Tallbear, K. (2013) Native American DNA: Tribal belonging and the false promise of genetic science. Minneapolis: University of Minnesota Press.

Waldby, C. (2002) Stem cells, tissue cultures and the production of biovalue. Health: an interdisciplinary journal for the social study of health, illness, and medicine 6, no. 3: 305-323.

Winton, R., Lien, T., St. John, P., and Oreskes, B. (2018) The first step in finding the Golden State Killer suspect: finding his great-great-great grandparents on genealogy site. Los Angeles Times, 27 April. http://www.latimes.com/local/lanow/la-me-golden-state-dna-match-20180427-story.html

Warren, S.D., and Brandeis, L.D. (1890). The right to privacy. Harvard Law Review 4, no. 5: 193–220.

Zhang, S. (2018) How a genealogy website led to the alleged Golden State Killer. The Atlantic, 27 April. https://www.theatlantic.com/science/archive/2018/04/golden-state-killer-east-area-rapist-dna-genealogy/559070/