Tax evasion or illegal drug smuggling are often not observable events for federal law enforcement officials. But to effectively manage federal law enforcement activities, officials and policy-makers in charge must have an idea of what is happening.

The challenge of how to measure the unobserved events is one faced by many federal leaders. But there are actually five methods that can assist government performance analysts in estimating basic information on unobserved events.
___________________________________________
This article originally appeared as part of a new report from the IBM Center for the Business of Government, “Five Methods for Measuring Unobserved Events: A Case Study of Federal Law Enforcement,” by John Whitley.

___________________________________________

The Need for a Statistical Framework

Law enforcement can face tough measurement challenges, but the fields of statistics and econometrics have developed a framework for dealing with them and it is useful to begin this part with a brief overview of that framework. All violations of a federal law can be thought of as elements of a prospective data population.

The scope of the population can be defined in various ways — e.g., immigrants illegally entering the United States in a calendar year, or the illegal drugs smuggled across the southwest land border between the United States and Mexico.

To effectively manage their operations, federal law enforcement officials need insight into these unobserved violations; i.e., they need to know the properties or parameters of this population of data, such as its size and distribution

Law enforcement officials are generally able to observe subsets, or samples, of this population. The most obvious is the subset of violators apprehended or arrested. Detailed documentation of apprehensions or arrests is generally retained in administrative records. In addition, there may be other available sources of data, often partial and incomplete, that shed light on various aspects of the population, e.g., survey data on drug usage or the footprints in the desert of illegal border-crossers.

Actions can also be taken to increase the available data, such as increasing the size of the observable subset, drawing additional samples from the population, or generating a sample of new data that mimics the characteristics of the population of interest. The methods described here use such samples to make estimates of the total population.

When using a sample to estimate parameters of the underlying (unobserved) population, an important statistical property is whether the estimate is biased. Bias occurs when the estimate systematically diverges from the true value of the population parameter being estimated.

An unbiased and therefore preferred estimate does not systematically diverge from the true value. One primary cause of bias is a poor sample that is not randomly selected. A sample is random when every element of the population has an equal probability of being included. Examples of non-random samples may include:

  • Records on individuals apprehended smuggling drugs across the border.These records may not be representative of all individuals who attempt to smuggle drugs across the border if slower, less prepared individuals are more likely to be caught.
  • Survey data on the propensity to illegally migrate to the United States collected from urban Mexican households. These data may not be representative of the propensity of all Mexicans, both rural and urban, to illegally migrate.
In these cases, estimates of population parameters made from the sample data may be biased and thus misleading. It is important for government managers who develop performance measures to be constantly vigilant for bias in their estimates.It may not be possible to eliminate all potential biases in data, but the analyst must be aware of the major potential biases in their data and their possible effects.
A final note on the need for a statistical framework in the area of law enforcement: although often related, the challenges of measuring unobserved events are different from the challenges associated with infrequent events.

The Department of Defense prepares to fight wars, but fortunately these are very infrequent. DHS prepares for a nuclear attack on a major U.S.city, but fortunately this has never happened. Measuring the performance of military capability in a war fight when there is no war fight or the performance of response and recovery capabilities for a terrorist nuclear attack when there has never been one are very important, but are not the focus of this report.

Five Data Estimating Methods

Method One: Administrative records. Once a performance manager has identified the outcomes that need to be measured and is beginning the task of developing a measurement strategy, the first action is to identify all relevant data currently captured by the agency or by others. In the best-case scenario, the performance manager may discover relevant data at a lower level in the organization (e.g., at the field offices) or in another organization (e.g., in a survey conducted by the Census Bureau that asks a pertinent question).

Or it could be that estimation of the outcome is possible, but that multiple sources of data have to be combined and those sources are spread across organizations. For example, Immigration and Customs Enforcement is responsible for law enforcement concerning individuals who enter the United States on visas, but violate the visa by overstaying the required departure date.

The rate of visa overstay, however, is unobservable to federal law enforcement. The number of visas issued and their required departure dates are known, but who actually departs and when is not. But everyone leaving the United States by commercial air or maritime transport is known because they are identified in passenger manifest documents maintained by the transportation companies.

Another potentially useful result from inventorying the available data is to identify data — or new data that could be generated — that could, with the use of clever empirical methods, support analyses that would estimate the outcomes of interest. In the Recidivism Analysis text box (page 16 of the report), a method for using apprehension data to infer an apprehension rate, and subsequently the rate of illegal immigration, is presented.

(Continue reading Methods Two, Three, Four and Five)

Method Two: Surveys. Surveys are a commonly used data collection method in policy and social science research.Surveys involve asking a set of questions to a sample population.They can be conducted by telephone or mail, online, or in person.The goal is to obtain a sample of sufficient quality, e.g., size and representation, to enable inferences to be drawn about the population from analysis of the data. Surveys may be conducted on a regular, recurring basis to create estimates through time or can be conducted on a one-time basis to answer specific questions at a point in time.

There are numerous surveys already being conducted by the government and private organizations that provide valuable information on federal law enforcement issues. The U.S. Bureau of the Census and its many supporting surveys provide some of the most comprehensive data about the United States. Other federal agencies conduct a wide range of surveys that include specific emphasis on law enforcement issues, such as the National Crime Victimization Survey (NCVS) discussed in the National Crime Victimization Survey box (on page 17 of the report).

Surveys are also conducted by academic researchers, think tanks, and private companies. In some situations, there may already be a recurring survey conducted that is close to, but not exactly, what the performance manager needs; a cost-effective way to get started is to partner with the organization conducting the existing survey to expand it in a way that would be useful for the law enforcement performance measurement.

There are, of course, challenges and limitations to surveys.It can be costly to implement a new survey that contacts enough individuals to be statistically valid. It can also be challenging to elicit truthful answers, particularly on issues of interest to law enforcement like criminal activity. There are generally few consequences for responding untruthfully on a survey, and on questions related to criminal activity, fear of how the information will be used may be a strong incentive to lie.

A great deal of work has been done to structure questions and surveys to elicit truthful answers and test for inconsistencies within a survey response, and the results allow this risk to be mitigated, but it remains an issue when considering the use of surveys

When the decision has been made to further explore use of a survey to address a measurement challenge, important factors that will have to be addressed include the survey methods (e.g., telephone, one-on-one), the sampling methodology (e.g., random drawing of names, targeting of particular groups), question design (must be unambiguous, must allow estimation of the desired measure), question sequencing, starting with a pilot or the full survey, and how the analysis will be conducted once the survey is complete. Numerous professional firms that conduct surveys and academic sources of information can be explored further.

(Continue reading Methods Three, Four and Five)

Method Three: Inspections, Investigations, and Audits. Criminal or administrative investigations offer another way to systematically collect an accurate data sample. The important point about using investigations in the context of measuring unobserved events is that the investigations must be in some way random.

In typical law enforcement operations, proactive investigations are prioritized to follow the most important clues or those that are most likely to lead to a major arrest or disruption of crime. Investigations prioritized in this manner may not provide statistically valid estimates of the underlying level of criminal activity.

Conducting investigations on a more random sample of potential illegal activity represents a major cultural shift for law enforcement operations, but limited and systematic use of them can be a powerful way to collect information about the outcomes the law enforcement organization is trying to effect. See the National Research Program text box (page 18 in the report) and the Administrative Site Visit and Verification Program text box (page 19) for examples of this method.

Method Four: Experimental Methods. Another method involves actually adding or modifying law enforcement activities in the field in ways that may facilitate estimation of the crime rate. In controlled environments like Ports of Entry or airport security screening, this could involve selecting a randomized subset of individuals who pass the primary screen for a secondary, more rigorous screen. The rate at which violations are identified in the secondary screen can be used to infer the failure rate of the primary screen.

The Randomized Secondary Screening text box (page 19 in the report) describes how CBP conducts these randomized secondary inspections at Ports of Entry. This method is not restricted to physical screening-application processing and other forms of information-based screening can also have randomized secondary evaluations conducted to evaluate the accuracy of the primary screening process.

Another example might be the randomized surge of enforcement activity across different geographic regions. The U.S. Border Patrol could conduct an unannounced surge of agents into a randomly selected station and measure the increase in drug apprehensions.

Conducting these random surges across a range of locations and circumstances could be used to infer information about the population of offenders who are evading detection and apprehension during non-surge periods.In addition to performance measurement, these random surges can also provide a deterrent effect to crime and have been used throughout law enforcement. For example, the Amtrak Police Department uses random and unpredictable surges as part of its law enforcement strategy.

Another category of experimental methods is to conduct controlled tests. With this method, the organization is not collecting data about the underlying population; it is generating a new data population in a controlled way so that this new population has the same properties as the population of interest.

Perhaps the most prominent method in this category is red-teaming, also called covert testing or penetration testing. A red team is an independent group that seeks to challenge an organization for the purpose of detecting vulnerabilities. The Red-Teaming text box (page 20 in the report) describes the use of this method in aviation security, where penetration testers physically test security.

Its most obvious application to law enforcement performance measurement is in situations where there is a controlled environment such as inbound passenger screening at POEs. In less controlled environments, e.g., drug smuggling routes along the border, there are obviously safety issues for the testers that would have to be seriously evaluated before considering this method.

Red-teaming does not have to be physical penetration testing, however. Many federal law enforcement organizations and their counterpart benefit-delivery organizations (e.g., the USCIS) engage in information-based screening as well as physical screening. In an information-based screening context, where data are validated to ensure compliance with the law (e.g., visa and citizenship application processing), red-teaming could entail the systematic filing of phony applications to identify failure rates of the screening process. These methods are similar to mystery shopper testing done by the private sector to test service in retail outlets

Red-teaming aims to conduct controlled tests, but is performed directly in the field (covertly) with the operational activities as they execute their daily mission.An even more controlled test is one that takes place in an artificial environment; e.g., a laboratory or range test.This is done routinely across the government in acquisition programs where both development and operational test and evaluation are standard procedures in the procurement of newly designed complicated government hardware such as defense weapon systems.It can also be used as part of a systematic performance measurement process.

For example, the U.S.Border Patrol uses sensors and radars along the border to detect illegal border crossing.Extensive testing is performed on these items during procurement to under¬stand their false negative and positive rates; that is, the rate at which they miss finding an item of interest and the rate at which they report finding something that is not of interest .Although actual field conditions will differ from these controlled conditions, it may be possible to use the results of these controlled tests to infer likely characteristics of the system in the field.

(Continue reading Method Five and How to Select Methods)

Method Five: Technical Measurement. Although there are many more methods that can be used, the final method described here is technical data collection. Well-known examples at the state and local level include red-light and speeding cameras and, more recently, gunfire detectors in some major cities. Examples at the federal level include the use of sensors, radars, and unmanned aerial vehicles to detect illegal immigrants crossing the border, and radiation detectors and X-ray screening of containerized cargo entering the United States. See the Counterfeit Detection and the Measuring Drug Production text boxes (pages 21 and 22) for examples of this method.

This method has a wide range of potential costs. While a red-light camera may be relatively inexpensive (and quickly pay for itself in fine revenue), arraying sensors and radars across the 1,900 miles of southwest land border is very expensive. When considering this method, the performance manager should consider if there are ways that the unobserved crimes can be observed by technical means and, if so, whether the cost justifies the benefit.

Selecting Between and Implementing the Methods

There is no simple formula that identifies the right methods for performance managers to use.Each measurement challenge is different, and the best methods vary with the circumstances of the individual challenge.There are some rules of thumb, however, that can add structure to the selection process.

• First, the performance manager should identify all relevant data currently captured by the agency and elsewhere.This may lead to a discovery of methods for direct estimation from available data or by expanding existing data sources

• A second step is to thoroughly examine the context and setting within which the law enforcement activities and performance measure challenges occur. Is there an entity, whether governmental or not, with an incentive to report or measure the crime? If there is an entity suffering an economic loss from the crime (such as credit card companies and banks discussed in the Credit Card and Bank Fraud text box – page 23 in the report), they may already be measuring it.There may also be think tanks, interest groups, or academic organizations that for various reasons have decided to measure the level of crime. Can measurements from these organizations be validated and used?

When data that allow for direct measurement are not discovered, and no other organization can be found that is already measuring or can be persuaded to measure the offense rate, it is time to begin exploring the estimation methods identified above, and any additional methods that may be relevant.

All potentially relevant methods should be identified and then systematically evaluated for their costs and benefits, e.g., how precise the estimates will be, how disruptive to current operations collection will be, and whether they can be implemented in a repeatable and transparent manner. The relevant methods can then be compared to select the best candidates for further development.

This shorter list of the best candidates should then be developed more thoroughly to decide which ones should be fully implemented and how.It may also take some trial and error or pilot projects to finalize selection. It also may take the combination of two or more methods to develop a complete estimate for a particular performance measure.

Technical expertise may need to be obtained. The performance management office may want to hire a statistician, economist, or similar technical expert to conduct the work in-house or to oversee supporting work being done by outside entities.

John Whitley is a senior fellow at the Institute for Defense Analysis and a former director of program analysis and evaluation in the Department of Homeland Security, where he faced these challenges daily.