How can the Army know itself? In November 1976 this question was on the mind of Major General Paul Gorman, the deputy chief of staff for training at US Army Training and Doctrine Command. That month he told a gathered audience at Fort Monmouth that the Army struggled to know itself because it did “not have a lot of data being turned in by ordinary units trying to do their job in a well simulated operational environment.” So Gorman proposed a radical solution that he and his influential boss, General William DePuy, had been developing: the construction of a giant combat simulation in the wilderness to train soldiers and collect that data. With such data, the Army could learn the ground truth about the state of its operational force, as if gazing in a costly but unprecedentedly clear mirror. In doing so, it could learn what worked best in simulated war, and so ready itself for the next real one.

Though the Army has built the combat training centers that Gorman proposed and so gained its clear mirrors, it does not currently look at them. That is because the Army does not systematically collect data from those training centers owing to ambiguities in regulation. To meet the current interwar moment, to transform in contact now, the Army must capture rotational training unit performance in a structured, quantified, and regular way. The task is urgent because regular training center rotations reveal the true state of operational force as no report or inspection can. In doing so these rotations point the way to needed reform, but only if studied in the aggregate. Qualitative observations of the sort one finds in reports from the Center for Army Lessons Learned alone are insufficient. Longitudinal, structured data is also needed.

What Gets Collected at The Training Centers?

The Army’s combat training center rotations are something of a professional and logistical miracle. Rotation costs can run in the tens of millions of dollars. The resultant training is magnificent. Historically, however, the quality of data collection at combat training centers has lagged behind the quality of the training itself. A 1986 Government Accountability Office report found data collection at the National Training Center unsatisfactory. Thirteen years later, another report found the situation had not much improved, observing that for lack of centralized data the Army “does not know the extent to which center exercises are improving the proficiency of its units and leaders.”

Today the problem abides. Army Regulation 350-50, Combat Training Center Program, which governs combat training center operations, remains ambiguous as to what rotational training unit data must be collected. Beyond the terabytes of raw instrumented data produced by the Combat Training Center–Instrumentation System, most rotation data lives in unstructured take-home packets, consisting of after-action review slide decks intended for trained units. For lack of a regulatory standard, collection practices and take-home packets vary with different observer coach-trainer (OC/T) authors, creating gaps that make longitudinal analysis difficult. In this sense the Army leaves money on the table at the Training Centers each month.

With support from the National Training Center Operations Group and Headquarters, Department of the Army, we reviewed over five hundred rotational documents to manually build an exploratory dataset. The dataset treats as the unit of analysis the battalion-level force-on-force battle (a single phase of force-on-force training at the National Training Center), and it comprises over fifty of these battles. It includes a wealth of unit characteristic data captured by OC/Ts in “roll-out cards” as well as unit performance during the battle, ranging from OC/T-observed daily vehicle operational readiness rates to kill-loss ratios to subjective OC/T assessments such as “Decisive Action Big 12” scores. The National Training Center cavalry squadron OC/T team, “Cobra,” on its own initiative collected the same quantitative data about unit performance for long enough to build a large enough dataset for multivariate regressions. For this reason the battles discussed below are cavalry battles.

Use Case One: Testing Readiness Reports

Combat training center data can help the Army check whether unit readiness reports mean in fact what they mean in theory. For example, do armored units that report a larger percentage of working tanks in their monthly reports than other units see fewer maintenance issues in battles at the National Training Center? Do units that report the highest percentage of deployable personnel in their monthly reports reliably deploy a larger portion of their formations to the Joint Readiness Training Center than other units? Do commanders that assess their units at higher levels of training proficiency in their mission-essential tasks achieve the best battlefield results?

There are many plausible reasons why reported readiness might differ from readiness as measured at a combat training center. The two measurements may not be intended to capture the same phenomenon. But even if there are good reasons why a unit’s readiness as told in its Unit Status Report and as observed in the Mojave Desert systematically differ, the Army should at the very least know about the difference. We paired our National Training Center rotational performance dataset with the same battalions’ unclassified monthly readiness report inputs from the six months preceding each battle. These inputs were each battalion’s reported personnel deployability rate, its lowest monthly pacing equipment operational readiness rate, its overall equipment operational readiness rate, and its assessed mission-essential task proficiency.

We examined whether these readiness report components were correlated with their theoretical counterparts during combat training center rotations: the percentage of the unit’s assigned personnel deployed to the National Training Center, the average pacing equipment operational readiness rate over the course of a battle, and unit performance as measured by the OC/T provided “Decision Action Big 12” scores. We also looked at, among other potential independent variables, unit leader experience as captured on National Training Center “roll-out cards.” We examined the data first using simple pairwise correlations and then with “rapid data modeling,” primarily ordinary least squares (OLS) multivariate regressions.

We found that simple pairwise correlations suggest, at best, uneven correlation between unit readiness reports and unit performance. A unit’s equipment readiness as reported in monthly reports preceding a rotation weakly and positively correlated with that unit’s overall equipment operational readiness rates during force-on-force battles but did not for pacing equipment performance during battles. A unit’s reported training proficiency before a rotation and its “Decisive Action Big 12” scores were nearly perfectly noncorrelated (a coefficient of -.009). Personnel deployability was moderately negatively correlated (a coefficient of -.488), meaning the more a unit reported its personnel ready before a rotation, the lower its personnel deployment rate was for the rotation. This sort of initial finding—that personnel readiness reporting represents the inverse of what it is supposed to represent—is exactly the sort of insight that only structured data can credibly test. Though our dataset is too small to state it conclusively, the finding invites further study.

The linear regression results (both bivariate and multivariate) found limited statistically significant correlations between readiness report components and their counterpart measures at combat training centers. They suggest at least two key insights that warrant further investigation: leader experience and equipment health appear to matter more for battlefield performance than any other unit characteristic. Command sergeant major experience, squadron commander experience and overall equipment operational readiness rates depicted on Unit Status Reports are positively correlated with better performance at the National Training Center.

Figure 1: The relationship between battalion leadership self-reported combat experience and force-on-force performance.
Figure 2: The relationship between prerotation reported fleet health and force-on-force performance.

These findings, if replicable with more data, suggest important relationships (and lack thereof) between how ready a unit says it is and what the Army can expect of that unit. It may be that the best readiness report is not a Unit Status Report but the experience of its leadership. As noted above, the findings reported here are tentative and their validity is limited by the small sample size of fifty-six nonindependent battles. There are also certainly confounding factors not captured for lack of available data—most importantly the intensity of the opposing force, which chiefs of operations groups at combat training centers often alter in response to a unit’s proficiency. The findings nonetheless suggest possible relationships that more systematically collected data could credibly test.

Use Case Two: Forecasting Peak Lethality

What if combat training center data could help the Army forecast which units are approaching peak readiness, to inform either wargames or actual deployments? To explore this possibility we move from diagnostic analytics (“why did something happen”) to predictive analytics (“what will happen”) and apply machine learning techniques. We test two common types of machine learning models, random forest and gradient booster, which account for nonlinear relationships. The machine fits the model without human involvement—a rudimentary form of artificial intelligence.

To our surprise, both types of models were strongly predictive when applied to the data, with the random forest model performing slightly better. It significantly increased the explanatory power of the previous linear regression model. The plot below illustrates the model’s predictions compared to the actual test data.

Figure 3: The performance of machine learning models fed combat training center data.

The dotted line is the “perfect prediction” line, with the model’s predictions as black diamonds. The model accounts for approximately 50 percent of the variation, which is notable given how few measurements exist about unit factors. Due to the small size of the dataset, its generalizability is limited, but the accuracy of the model is surprising with such a small dataset.

These results demonstrate the potential of machine learning techniques to forecast combat readiness. They suggest a method by which a wargame designer can validate assumptions about readiness levels, a joint planner can determine which available units are best suited for a real-world mission, or an OC/T can anticipate issues an upcoming training audience may encounter.

Use Case Three: Reducing Training Accidents

Why do some units suffer training accidents more than others? A common answer is culture. Systematic measures of unit culture, climate, and cohesion are difficult to come by, but can be captured through soldier surveys. 1st Stryker Brigade Combat Team, 4th Infantry Division has begun conducting a bimonthly, anonymous, cell phone survey measuring seven indicators of effective unit culture and climate. Broadly, this is a measure of unit cohesion. The “Ivy Raider Culture Survey” collected over 1,700 soldier responses across thirty-one companies and batteries from September to October 2024. It recorded responses across seven numerical culture measurables and unit morale (1–10 Likert scale). Do these reports mean anything?

Data from Fort Johnson’s Joint Readiness Training Center help us discern whether such surveys capture anything meaningful. At Fort Johnson the operations group measures combat discipline through what it categorizes as mishaps. These include negative incidents such as vehicle accidents, negligent discharges, missing equipment, and violations of exercise rules of engagement. How can we explain variations across companies and batteries?

Using linear regression analysis, we discovered a correlation between stronger unit culture at the company and battery level and fewer negative incidents during the brigade’s recent Joint Readiness Training Center rotation. Statistically significant relationships exist between various measures of company and battery culture prior to the rotation and negative incidents during the rotation (across twenty-one companies and batteries with complete data). The soldier “Development” score is the strongest correlation, followed by the “Efficient Time Use” score. A “Culture Index,” which adds up all culture measurable scores, is also negatively correlated with mishaps. We controlled for unit type, such as combat, support, and headquarters.

Figure 4: Relationship between units’ Joint Readiness Training Center incident count (violations of exercise rules of engagement) and Ivy Raider Culture Survey scores.

Interpreting the regressions suggests that each additional percentage point of soldier “Development” is associated with 3 percent fewer negative incidents. Similarly, as the “Efficient Time Use” and “Culture Index” scores increase by 1 percent, mishaps during rotations drop by almost 2 percent. These results suggest that stronger unit culture is correlated with fewer incidents of indiscipline in a simulated combat environment. The key insight is that investing in unit culture may be a focused method of improving soldier discipline, helping address the Army’s discipline gap.

Key Considerations and Next Steps

Narcissus reminds us that mirror gazing is not without risks. Nor is performance measurement. Measurement may cost scarce OC/T bandwidth and so reduce training value. It may fix the Army’s attention on junk measures and so lead the Army to wrong conclusions. And it may warp unit behavior by nudging commanders to seek good scores rather than good training. However, these risks are either smaller than imagined or can be mitigated to an acceptable level.

Concerns about OC/T bandwidth overstate the cost of systematic data collection at training centers because OC/Ts already collect almost all the data worth collecting. Anyone who has received a take-home packet from an OC/T after a rotation or worked in an operations group knows that operations groups are constantly collecting data. The question is not whether to collect data but whether to save what is collected in a systematic way by ensuring at least some measures are collected by every team across every rotation in a structured dataset rather than diffuse, inaccessible slide decks. Doing so represents a much smaller lift.

But even if the cost is low, is the data high quality? A skeptic might say lethality at a combat training center means little because instrumented gunnery is different from live gunnery. But instrumentation issues in lethality data is what data scientists call noise. The fact is a lot of useful data has noise. Noise is okay if random, and much noise is. To avoid junk measures, the Army must choose them carefully, but noise alone is not reason to avoid collection.

Even if OC/Ts have the bandwidth and the performance measures selected are good ones, does collection not, as Goodhart’s Law suggests, corrupt the measure by warping unit behavior? To preempt Goodhart’s Law, the Army must decouple measurement from incentive—communicating clearly to rotational units that what is collected does not bear on their evaluations. One way to make this guarantee credible is to immediately anonymize collected data. Another way is to measure performance evenly across different warfighting functions, so that the countervailing pressures cancel each other out. A third is to pick measures that are as close to the underlying phenomenon the Army aims to capture as possible—such as OC/T-observed vehicle operational readiness rates. If units responded to the collection of on-the-ground operational readiness rates rather than digitally reported ones by keeping their trucks running, that would be a good thing.

The above risks are important to mitigate but are not so great they outweigh the great unrealized benefit of systematically capturing structured rotational data. The decisive action rotations at our combat training centers are too valuable and too expensive to neglect. As Gorman knew nearly fifty years ago, the data represents not the only input but an incredibly important one for any interwar Army attempting to see itself and discern a path forward.

There is good news. The combat training centers’ operations groups have begun collecting more quantified data on their own initiative. The Combined Arms Center’s Data and Artificial Intelligence Office is considering adding data as a pillar of the combat training center management philosophy. The Army Research Lab, part of US Army Combat Capabilities and Development Command, is gathering instrumented and take-home packets in its SUNet enclave for researchers with government sponsorship.

The next step is for the Army to codify this progress by amending its Army Regulation 350-50 to mandate the collection of certain credible anonymized performance measures across all combat training centers. Stipulating in that regulation that the Combined Arms Center control and consolidate this longitudinal data so that it lives close to doctrinal innovation would be the next step. These are easy wins that ensure the Army makes best use of what is, in addition to a magnificent training opportunity, its costliest and clearest mirror.

Lieutenant Colonel Jon Bate is a US Army infantry officer currently serving as commander of 2-23 IN, 1/4 SBCT at Fort Carson, Colorado. He previously served in the 101st Airborne Division, in the 1st Armored Division, and as assistant professor of economics in the US Military Academy Department of Social Sciences. A Goodpaster Scholar in the Advanced Strategic Planning and Policy Program (ASP3), he holds an MPP from the Harvard Kennedy School and PhD in Political Science from Stanford University.

Captain Theo Lipsky is an armor officer currently studying public policy at Columbia University’s School of International and Public Affairs. He will next teach in the Department of Social Sciences at the US Military Academy.

The views expressed are those of the author and do not reflect the official position of the United States Military Academy, Department of the Army, or Department of Defense.

Image credit: Sgt. Ryan Gosselin, National Training Center