In 2019, Rudy Guyonneau and Arnaud Le Dez captured a common fear in a Cyber Defense Review article titled “Artificial Intelligence in Digital Warfare.” “The question of AI now tends to manifest under the guise of a mythicized omniscience and therefore, of a mythicized omnipotence,” they wrote. “This can lead to paralysis of people fearful of having to fight against some super-enemy endowed with such an intelligence that it would leave us bereft of solutions.” With the release of ChatGPT in 2022, it looked like that fear had come true. And yet the reality is that AI’s use as an offensive tool has evolved incrementally and not yet created this super-enemy. Much of AI’s real value today lies in the defense.
As Microsoft and OpenAI recently explained, today we see threat actors using AI in interesting but not invincible ways. They found five hacker groups from four countries using AI. At first, the groups used large language models for research, translation, building tools, and writing phishing emails. Later, Microsoft saw the tools suggesting actions after a system had been hacked. Although some argue that modern models could take on more, that seems premature. In stark contrast to fear that AI would unleash a wave of robot hackers on the world, these actors used it for mundane tasks. Defensive cyber forces, on the other hand, could use AI technology that exists today to meaningfully improve cyber defenses in four key ways: accelerating the pace of analysis, improving warning intelligence, developing training programs more efficiently, and delivering more realistic training scenarios.
First, endpoints and network sensors create billions of events per day across the Department of Defense Information Network. Today, “data overload” is not just a theoretical danger. It is a given. As Guyonneau and Le Dez pointed out, though, volume is only half the battle. Cyber analysts must also grapple with “techniques and strategies [that] evolve at a frantic pace, the former through the exigence imposed by early experiences in the field and the rate of technological development, the latter as our understanding of the stakes grows.” It is not just the volume of data in the fifth domain that confounds understanding, but its complexity as well. This ocean of uncertainty is a prime target for two of the most common forms of AI, machine learning and large language models.
Machine learning won’t turn data into knowledge by itself, but it can speed up analysis. These models might not know why an endpoint acts the way it does, but they can spot weird activity. At scale, they shift the burden of sifting through millions of logs onto a computer. As a result, people spend less time searching for the digital needle in the cyber haystack and more time on complex investigations. The challenge of training, tuning, assessing, using, and parsing the output of these algorithms, though, means that few use them well, if at all. Large language models can help. ChatGPT or the open-source Llama 3, for instance, can handle these tricky steps. Instead of coding a support vector machine, I can ask ChatGPT to “Build a support vector machine with this sample data.” Instead of sifting through pages of documentation to tune hyperparameters, I can ask Llama 3 to tune them. Tasks that once took data scientists hours can now take an eager analyst just minutes.
Large language models could also accelerate the pace of analysis as the backbone for analyst support tools. Cyber analysts start many investigations based on opaque alarms. For example, an alert that “Trojan:Win32” malware could have infected an endpoint might entail hours of work just to gather basic information. A large language model could instead create a brief report that explained the alert, assessed suspect files, collected facts about the host that raised the alarm, and offered next steps for the investigation. The prominent threat hunting and incident response firm Red Canary already does this with what it calls “GenAI agents.” Externalizing mundane tasks like these would drastically accelerate the pace of analysis.
As a stepping stone between manual and semiautonomous investigations, one of my projects used large language models to build analyst playbooks. These playbooks guide junior analysts to approach complex investigations in a similar way as their more experienced counterparts do. They promote analytic rigor. The process of researching, understanding, and then creating detections and investigation strategies for such a vast array of malicious activities, though, takes months. Over the years I have seen many pursue this lofty goal yet inevitably fail. Using large language models and a bit of Python, though, I built a library of over six hundred playbooks—one for each technique in MITRE’s ATT&CK matrix, a taxonomy of malicious actions in the cyber domain—in a few hours.
Second, machine learning could also help derive meaning from internet-wide scanning data for improved warning intelligence. The intelligence cycle has struggled to keep pace with the cyber domain. Many reports on servers used to launch attacks or control malware implants, for example, arrive far too late to do any good. They provide interesting but seldom actionable information. By finding the traits of those servers from internet-wide scans and training machine learning models to spot them, cyber analysts can use these tools on live data feeds to quickly find new malicious servers. Rather than acting on similar insights in days or weeks as reports make their way out of an intelligence cycle, this approach would operationalize intelligence at machine speed.
Third, AI could better prepare analysts for defensive cyber missions. Training, for instance, takes a lot of time and is hard to do well. I dealt with this just last year in the new 3rd Multi-Domain Task Force. Assigned to an Army service component command rather than part of the Cyber Mission Force, the unit’s large cyber formation stood up without access to the training to do its job or any plan to obtain it. We found ourselves again re-creating the wheel by building our own training program. We planned to spend over a year on this project. After some experimentation, though, we found a way to use large language models to create the entire curriculum—to include lessons plans, training material, and even some hands-on exercises and assessments—in just a few hours.
Finally, AI could also improve hands-on training. Realistic scenarios are exceedingly difficult to build, run, and maintain. So much so, in fact, that they do not exist. Michael Schwille, Scott Fisher and Eli Albright recently described the challenges they faced when they tried to implement data-driven operations—using real-world data—into an Army exercise. As Guyonneau and Le Dez pointed out in their 2019 article, though, “If the corresponding data exists and can be acquired, a cyberteammate has the capacity to simulate any type of environment, whether friendly, neutral, or adversarial.” An AI agent can handle almost everything. Where an entire team would have manually setup cyber ranges, an agent could generate code describing that cyber range and then deploy it in a common industry practice called infrastructure as code. An agent could also run realistic scenarios with synthetic actors that respond to trainees’ actions in real time. No longer must analysts suffer through small, contrived events based on canned scripts put on by under-resourced training cells.
There are valuable roles for AI to play in cyber operations. As Jenny Jun recently described it—with admirable brevity—the effects of AI in the cyber domain will be “sharper swords, tougher shields.” On the offensive side, though, those roles remain small for now, as Microsoft and OpenAI observed, and ultimately might not make offensive cyber operations relevant at the tactical level. Much of AI’s value today lies in defensive cyber operations. As a cyber analyst, I have access to hundreds of billions of new records per day—a prime target for machine learning. When paired with improved warning intelligence, also through machine learning, this technology presents an opportunity to drastically reduce the amount of time threat actors go undiscovered—or even neutralize a campaign before it starts. An analyst support tool, built on top of a large language model, could further accelerate my pace of analysis. In the lead up to those operations, AI could help lessen the crushing burden of building and running training. Unlike many lofty ideas that over-promise and under-deliver, these goals are realistic and achievable with the resources line units have today. We say we want innovation. Here is the opportunity; we must seize it. This is how we can move toward meaningful use of AI at the tactical cyber edge.
Captain Zachary Szewczyk commissioned into the Cyber Corps in 2018 after graduating from Youngstown State University with an undergraduate degree in computer science and information systems. He has supported or led defensive cyberspace operations from the tactical to the strategic level, including several high-level incident responses. He currently serves in the 3rd Multi-Domain Task Force.
The views expressed are those of the author and do not reflect the official position of the United States Military Academy, Department of the Army, or Department of Defense.
Image credit: NORAD/NORTHCOM Public Affairs