« "The Stockbridge-Munsee Casino Proposal And The Potential For An Economic Turnaround In Sullivan County" by Benjamin Kern | Main | "Rape By Fraud, Deception, Or Impersonation - An Addition To New York's Penal Law: Rape In The First Degree Statute" by Daniel J. Slomnicki »

"Big Data And Criminal Law" by Samuel Yellen

Big Data And Criminal Law

by Samuel Yellen

Introduction: Big Data as a Three Step Process

Law enforcement agencies today can collect, store, and quickly analyze more data than ever before. This trend of analyzing large amounts of data quickly in business and government is known as Big Data. It is the direct result of the convergence of two trends in computing, the ability to collect and store large amounts of information and the widespread availability of inexpensive computing resources which permit calculations and predictions to be made on a large scale. Big Data is an important and significant trend in government. Over the past decade, there has been a series of revelations relating to the government's increasing appetite for information. Furthermore, constituents are demanding more efficient work from their law enforcement agencies. See, e.g., Kenneth Cukier, Data, Data Everywhere , ECONOMIST, Feb. 27, 2010, at 3-5.

In order to illustrate the problems that Big Data poses for criminal law, I look at Big Data as a process. Big Data involves first, the collection of information; second, the storage of that information; and third, the use of that information. Accordingly, I have written this article in three parts, each of which corresponds to a certain step along the way in the Big Data process.

I. Collection and surveillance

The first step in the Big Data process is collection and surveillance. Thus, one of the challenges facing courts and lawmakers is where to draw the line between the individual's Fourth Amendment right to be free from unreasonable search and the government's interest in collecting information and using it. The government has been historically restricted from collecting information both by the Constitution and statutes such as the Wiretap Act, the Electronic Communications and Privacy Act (ECPA), and the Stored Communications Act the courts have struggled with how to adapt the Fourth Amendment's concept of reasonableness in the context of new technologies such as GPS trackers, smartphones, and the Internet. The flexibility inherent in the idea of reasonableness has permitted the Fourth Amendment to stay relevant as technology has changed.

Unfortunately, there is not yet a generally accepted doctrine that meets the particular challenges presented by such new technologies, namely the extent to which users willingly and unwillingly disclose personal information to third parties and the high level of detail that they provide about an individual's private life. Under the third-party doctrine of United States v. Miller, an individual has no expectation of privacy when private information is shared with a third party. For example, emails, and web searches are may reside with a third-party such as Google or Microsoft. Under Miller, the government may lawfully obtain them without a warrant. The second issue is whether highly particularized information collected by the government may constitute an unreasonable search was taken up in United States v. Jones.

In United States v. Jones, the United States Supreme Court declined to change the reasonable expectation of privacy for the Internet age. However, several members of the court appeared ready to do so. In Jones, the police placed a GPS tracking device on the defendant's car without a warrant. The defendant appealed on the grounds that this violated his Fourth Amendment right against unreasonable search and seizure. The government argued that the surveillance was reasonable because the public-view doctrine applied, the GPS only tracked his movements on public roads. Agents could have lawfully observed him as he drove. The GPS device just made their job easier, but also far more effective than a team of agents could have been. The D.C. Circuit Court of Appeals relied on a mosaic theory of privacy whereby if a technology gave officers too high a level of particularity over a defendant's movements then it was violating a defendant's Fourth Amendment rights:

"Prolonged surveillance reveals types of information not revealed by short-term surveillance, such as what a person does repeatedly, what he does not do, and what he does ensemble. These types of information can each reveal more about a person than does any individual trip viewed in isolation." United States v. Maynard, 615 F.3d 544, 562 (D.C. Cir. 2010), aff'd in part sub nom. , United States v. Jones, 132 S. Ct. 945 (2012).

The Supreme Court was not entirely hostile to the mosaic theory, but the majority opinion written by Justice Scalia went with a trespass theory of privacy. It ruled that in attaching the GPS device to the defendant's car, the police were violating his right to privacy. Jones , 132 S. Ct. at 949(2012). See, e.g., David Gray et al., Fighting Cybercrime After United States v. Jones, 103 J. CRIM. L. & CRIMINOLOGY 745 (2013).

Notions of reasonable privacy have changed before when presented with technology. In the well-known case of Katz v. United States, the Supreme Court ruled that a man using a pay phone had a reasonable expectation of privacy while on the phone and therefore surveillance of his phone conversation was unconstitutional. 389 U.S. 347(1967). The Katz decision overturned what had been the prevailing notion of privacy since Olmstead v. United States. See 277 U.S. 438, 457, 464, 466 (1928). In Olmstead, the Court applied a trespass standard and held that wiretapping was constitutional because the phone tap was installed down the road from the defendant's house. Id.

While the Katz decision created a privacy formulation that met the needs of a new technology, telephone communications, it is much harder to find such an appropriate model of privacy to fit today's environment and constant sharing. There is no one right answer to the problem of privacy in today's environment. Big Data enthusiasts and optimists argue for rewriting privacy laws to allow society to gain the full benefits of computerized analysis while other commentators disagree. Omer Tene & Jules Polonetsky, Big Data for All: Privacy and User Control in the Age of Analytics, 11 NW. J. TECH. & INTELL. PROP. 239 (2013)(cf. Paul Ohm, The Underwhelming Benefits of Big Data, 161 U. PA. L. REV. ONLINE 339, 346 (2013)).

In Jones, Justice Alito notes that traditional notions of privacy do not exactly apply in todays interconnected world because persons receive benefits for their sharing of information with third parties:

"The Katz test rests on the assumption that this hypothetical reasonable person has a well-developed and stable set of privacy expectations. But technology can change those expectations. Dramatic technological change may lead to periods in which popular expectations are in flux and may ultimately produce significant changes in popular attitudes. New technology may provide increased convenience or security at the expense of privacy, and many people may find the tradeoff worthwhile."

Jones, 132 S. Ct. at 962.

Ultimately, Justice Alito suggests that the definition of privacy expectations is a task best left to the legislature. Id. at 964, 911 (2012)(citing Orin S. Kerr, The Fourth Amendment and New Technologies: Constitutional Myths and the Case for Caution, 102 MICH. L. REV. 801, 804-05 (2004)).

The challenge of balancing the Fourth Amendment right to privacy with the government's interest in policing effectively has some commentators to propose that we view privacy not as an absolute right, but instead in instrumentalist terms. They argue that there are costs and benefits associated with infringements on privacy and we should simply perform a cost-benefit analysis when it comes to determining whether a given act of surveillance should be allowed or not. Steven Penney, Reasonable Expectations of Privacy and Novel Search Technologies: An Economic Approach, 97 J. CRIM. L. & CRIMINOLOGY 477, 480 (2007). Whether based on a weighing of rights and interests or an instrumentalist approach, courts and the legislature have yet to specifically delineate the bounds of privacy in the present age of Big Data.

II. Storing Information: Using Databases

After the government collects information, the next logical step is for it to store the information in a database. Once it is stored in a database the law enforcement officers may query it, or certain services may be refused based on a match, such as the "No Fly List." A number of commentators have written about how a coherent legal framework for government databases does not yet exist, which would deal with such issues as when they should be used and how they should be maintained.

Frederic Cate proposes a legal framework. In it he suggests that there is already substantial agreement about what key features a legal framework regarding data mining should have. It should require agencies to verify the effectiveness, keep the records updated, and include some measure of judicial oversight. Fred H. Cate, Government Data Mining: The Need for A Legal Framework, 43 HARV. C.R.-C.L. L. REV. 435, 487-88 (2008). The closest we have come today legislatively is the Privacy Act, passed in 1974. Cate explains that the privacy act requires agencies to "1) store only relevant and necessary personal information and only for purposes required to be accomplished by statute or executive order; (2) collect information to the extent possible from the data subject; (3) maintain records that are accurate, complete, timely, and relevant; and (4) establish administrative, physical, and technical safeguards to protect the security of records." Id. at 464-65. Unfortunately, many of the most critical databases, such as the FBI's NCIC database have been exempted from the requirements of the privacy act. 5 U.S.C.A. Ā§ 552a(j-k)(2013).

In addition to lacking a legal framework, it is very difficult for someone incorrectly included on a database to challenge that determination. A person may be added to the database without a hearing. In some cases a person's only remedy may be to sue for expungement rather than addressing a specific mistake in the database, and even then it can be challenging for a plaintiff to achieve standing. See Bernstein at 511. One of the most obvious examples of a database affecting people's rights is the government's No Fly List. Anya Bernstein has commented on No Fly Lists and the way that they negatively affect those listed. She argues often that agencies view that false positives results in no cost, that is that there is not a cost associated with identifying someone as a terrorist when they are not is of no consequence to the agency. Bernstein at 463. A good legal framework for government databases would incentivize agencies to maintain accurate databases and minimize both false positives and false negatives.

III. Using the data: Predicting Crime

The promise of Big Data is that it may allow police departments to identify patterns utilize them to more effectively prevent crime and intercept inchoate crimes. But making arrests based on patterns may offend the constitution, especially when a computer generated prediction treats a certain class of individuals differently. Andrew Guthrie Ferguson has examined many of these different types of prediction analyses. He explains that local law enforcement agencies have had some success working on property crimes because once one crime occurs it is likely that a similar one will occur. See Andrew Guthrie Ferguson, Predictive Policing and Reasonable Suspicion, 62 EMORY L.J. 259, 281 (2012). Id. Already, several communities are using predictive policing methods to determine areas where crime is likely to occur and policing them more closely. See, e.g., Predictive Policing: Don't Even Think About It, ECONOMIST, July 20, 2013.

While property crimes are perhaps a more straightforward use of predictive crime fighting technology, communities are working to put more technology in the hands of the officer on the beat. In order to illustrate some of the potential constitutional pitfalls of predictive policing, it is helpful to imagine a hypothetical traffic stop. Imagine after a traffic stop a police officer queries an electronic program which takes into account, the time, the location of the stop, the make and model of the car and the age of driver. Based upon an unknown algorithm, the program recommends that the officer search the car. If the search of the car yields drugs, was the search constitutional under the Fourth Amendment? If it later becomes known that the algorithm disproportionately affects a certain race, does it violate the Equal Protection Clause? What if the machine prints out a receipt, stating that the probable cause for the search was 25% based on the model of the car, 25% on the time of the stop, and 50% based on the location?

The Supreme Court has not yet ruled whether a statistical profile may be sufficient grounds for probable cause under the Fourth Amendment. Although it has heard at least six cases in which the Drug Enforcement Agency (DEA) Drug Courier Profile has been a basis for probable cause, in each case it only looked at the factors underlying the profile. See Daniel J. Steinbock, Data Matching, Data Mining, and Due Process, 40 GA. L. REV. 1, 29-30 (2005) (internl citations omitted). The Drug Courier Profile is not perfectly analogous to a computerized profile, because the Drug Courier Profile is much simplerĀ¬-- the agent is given a list of criteria to look for and then makes his own determination. A computerized model has the potential to be much more accurate than human experts. See Grove et al., Clinical Versus Mechanical Prediction: A Meta-Snalysis, PSYCHOLOGICAL ASSESSMENT, Vol 12(1) (March 2000), at 19-30, http://www.ncbi.nlm.nih.gov/pubmed/10752360 (mechanical prediction typically as good or better than human judgment across 146 types of assessments).

One of the difficult questions in predictions and other statistical approaches is determining when a prediction unfairly affects a class of the population. Under the Supreme Court's analysis in Whren v. U.S., if a traffic stop is being challenged on the basis that race inappropriately caused the arrest where probable cause did not exist, then a suit may only be brought as a violation of the Equal Protection Clause and not under the Fourth Amendment. Significant research has been done as to whether police disproportionately stop minority motorists and whether such disproportionality evinces racism. Some economic models have been proposed that look at connection between stops and arrests made to determine whether the disproportionality is justified. Bernard E. Harcourt, Rethinking Racial Profiling: A Critique of the Economics, Civil Liberties, and Constitutional Literature, and of Criminal Profiling More Generally, 71 U. CHI. L. REV. 1275, 1306-07 (2004).

One problem is may occur is that a law enforcement agency may base their actions on a criterion that is highly correlated with race. For example, in New York City's controversial Stop and Frisk Program, targeting certain neighborhoods with higher crime also led to a disproportionate number of stops on minorities. New York commissioned two different reports on its policing and reached two different conclusions, which shows the difficulty of determining racial bias through statistics. See, e.g., Dasha Kabakova, The Lack of Accountability for the New York Police Department's Investigative Stops, 10 CARDOZO PUB. L. POL'Y & ETHICS J. 539, 560 (2012).

In the hypothetical above, transparency can go a long way towards addressing the constitutional issues raised by predictive policing. If the DEA Drug Courier Profile cases are any indication, then a prediction may survive a Fourth Amendment challenge by explaining the factors underlying the decision. In order to survive a claim under the Equal Protection Clause, a police department could offer expert testimony as to whether it unfairly applies the law. Finally, a receipt would helpful because it serves to memorialize the decision making of the prediction system at the time the arrest or search was made.

First Amendment Challenges

In addition to the Fourth and Fourteenth Amendment challenges, Big Data techniques may also offend against the First Amendment rights of assembly and expression. For example, in May 2013, it came to light that the IRS was scrutinizing the tax returns of certain political groups likely to be part of the Tea Party. The problem arose from the fact that the IRS created an algorithm to select which applications to scrutinize. Unfortunately that algorithm singled out certain types of political groups, thus potentially offending their First Amendment right to freedom of expression. See Hans A. von Spakovsky, Protecting the First Amendment From the IRS , HERITAGE FOUNDATION, Oct. 2, 2013.

Even if the IRS employees conducting the reviews were not acting maliciously, it is clear that the public was uncomfortable with the notion of treating groups differently based on their political orientation. If we were to apply some of the criteria used for race based encounters (i.e., are the offending rates different, do the enforcement actions lead to similar amounts of arrests), then the IRS might be able to legitimately claim to increase scrutiny on a certain political group. For example, hypothetically speaking, were there to be a political party that actively encouraged tax fraud, the IRS might legitimately be able to scrutinize that groups members for tax fraud, but the IRS might have to prove that its system for selecting applications for scrutiny was empirically supported.

Commentators have also pointed to freedom of association as being vulnerable to government infringement through Big Data. See Katherine J. Strandburg, Freedom of Association in A Networked World: First Amendment Regulation of Relational Surveillance, 49 B.C. L. REV. 741 (2008). Software companies regularly sell the government products promising social network analysis. These types of programs are able to examine many interactions between individuals and entities, whether through call logs, websites, emails, or financial transactions in order to quickly find the salient connections for government agencies. See, e.g., Andy Greenberg, How a 'Deviant' Philosopher Built Palantir, A CIA-Funded Data-Mining Juggernaut, FORBES, Sept. 2, 2013. The examination of networks of friends and associates may be a very powerful tool to law enforcement agencies seeking to disrupt terrorists or organized crime. At the same time, courts and the legislature must protect the individual rights of law-abiding citizens. If the law enforcement agencies are transparent about their programs and their intentions, then the boundaries of such programs can be informed by a healthy public debate.


In order for Big Data to fulfill its promise of more efficient policing and reduced crime, it is important that courts and the legislature demand transparency from law enforcement agencies. Transparency would help defendants challenge searches and seizures when their constitutional rights are violated. But the benefits of transparency are not limited to defendants. Law enforcement agencies and the public would benefit from transparency too because it would help earn the public's trust in new methods of policing. Only when these new and powerful methods are able to be properly scrutinized and understood will most effectively achieve the desired goal of more efficient policing and reduced crime.

Samuel Yellen is a first-year J.D. candidate at SUNY at Buffalo Law School.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


This page contains a single entry from the blog posted on November 13, 2013 9:32 PM.

The previous post in this blog was "The Stockbridge-Munsee Casino Proposal And The Potential For An Economic Turnaround In Sullivan County" by Benjamin Kern.

The next post in this blog is "Rape By Fraud, Deception, Or Impersonation - An Addition To New York's Penal Law: Rape In The First Degree Statute" by Daniel J. Slomnicki.

Many more can be found on the main index page or by looking through the archives.