AI on Cybersecurity

I was prompted to write this entry after seeing never-ending discussions around AI and its effect on cybersecurity among my peers.

While LLMs and AI systems are hardly new, the debate was recently re-ignited by Anthropic’s Mythos marketing push and the rapid adoption of agentic workflows in both software development and security.

The topic was most certainly covered before, but I figured rather than repeat same points between multiple conversations, I could keep them all in one place and refer people here.

Another thing to note is that AI as a topic is heavily polarized still, so I’ll try to separate controversial takes and mark them as OPINIONs. Additionally, to facilitate productive discussion, I will mark things happening around me as OBSERVATIONs.

Disclaimer: all of the following are my personal thoughts which do not relate to any of my current or previous work and employers.

To begin, we need to agree what AI is and what it is not.

I won’t go over AI history or design, since many much more knowledgeable people have done that (for a brief introduction, I can recommend reading chapter 2 of the AI engineering book by Chip Huyen), rather than regurgitate same information over and over again in a distilled form, I’ll assume that reader is familiar with the topic somewhat.

Since we’re much more interested in topic of AI consequences for cybersecurity, I’ll apply a rough reductionist approach to perceive AI as:

A token prediction machine, and
Probabilistic in nature, but
Capable of generalizing to tasks not explicitly present in its training data

OPINION: I do not share the opinion that increasing model size and training data would eventually lead to AGI. I understand why people and companies behind AI would drive that narrative and see similarities with perceived human intelligence that makes other people think that, but I find the difference in information processing and learning between humans and LLMs quite stark. Humans do not require reading all available books just to understand how to drink from a cup with a lid and no bottom. I do not reject the notion that AGI is conceptually possible, but assume that qualitative leap in technology is required to reach it. You can probably describe me as AGI-agnostic.

01. The Good

Given AI’s nature, it is only natural it excels at text processing in the broadest sense.

Implementation amplification.

OBSERVATION: AI has largely taken over routine code and script generation. People around me now spend more time making architectural decisions and iterating on products rather than writing or editing code themselves.

AI amplifies security operators’ capabilities by lowering the entry barrier and helping to ease transition between various programming languages and systems which would usually bottleneck a human specialist.

OBSERVATION: Ease of access to code generation makes people overly reliant on it.

OPINION: This may negatively affect long-term skill acquisition and career development. By removing much of the frustration from the learning process, AI also removes some of the resilience-building mindset that is crucial to cybersecurity career. I plan to explore this topic at a greater depth in a future article(s).

It is quite obvious to everyone that economics of cyber-attacks are changing fast. Implementation is becoming cheaper, while design, architecture, and decision-making are becoming more valuable.

For threat actors capable of properly harnessing these tools, the implications are significant. Activities that once required weeks or months of development effort can now be compressed into days or even hours through rapid prototyping, code modification, and campaign iteration. This works particularly well with phishing campaigns and malware obfuscation.

OPINION: Just as Metasploit and Kali once gave rise to the so-called “script kiddie” AI likely to if not already give rise to the prompt kiddie.

For defenders, this may affect threat intelligence. Being able to quickly swap assets used in attacks, threat actors deprive defenders of time and resources available to react to them, reducing the amount of valuable information to a narrow subset leaked from operational security mistakes (e.g., single email used by malware developer to sign multiple otherwise unconnected github repositories) and forcing it into more proactive work of establishing covert adversary monitoring activity rather than open stock information exchange that may preliminary tip off an adversary.

Leveraging the ‘AI’ part

A very specific case of malware-enablement by LLMs lies in just-in-time malware (concept borrowed from software development).

Any security engineer probably knows that hardest part of hiding or finding malware is the need to ship elaborate logic while not tripping EDR hooks, bloating the sample sizes. JIT malware eliminates it via AIs by shipping much smaller footprint with hardcoded prompts and simple logic of invocation of either local or remote model and using its thinking capabilities to generate behavior on the fly.

OBSERVATION: This is somewhat similar to a very curious project showcased on recent Microsoft conference called ‘VibeOS’.

Analysis amplification.

Understanding a software project, even with readily-available source code, can be tedious and time-consuming. AI significantly lowers that barrier by summarization capabilities and providing some-what intelligent query interface.

More often, source code is not available and we must resort to reverse engineering. This is where AI capability extension via MCP tools such as GhidraMCP shines, once again simplifying large portions of the workflow and, in some cases, performing substantial parts of the analysis for us.

That naturally means one-day vulnerabilities can be expected to be weaponized much sooner if not immediately after vendor patches become available. What previously was a roadblock for the threat actors, now becomes a commodity, rapidly closing safety window for the vulnerability management cycle.

On the flipside, the same capabilities also empower defenders. AI-assisted analysis makes it easier to dissect malware, extract IOCs, understand attack chains, and triage security incidents.

AI performs surprisingly well in dynamic black-box analysis.

This is evident in what is publicly known about Mythos and similar agent-driven approaches to security research attempted by the community. Properly harnessed agents appear to be a natural evolution of traditional fuzzing techniques, capable of uncovering obscure vulnerabilities that would otherwise remain hidden.

OPINION: It remains unclear whether the current surge in vulnerability discovery is sustainable. My gut feeling is that discoverable findings will quickly dwindle, after which productivity gains normalize. The situation reminds me somewhat of PortSwigger’s research cycle of developing a new approach, reaping rewards and returning to status-quo soon after a publication.

Bureaucracy automation.

OBSERVATION: AI is increasingly integrated into back-office systems, chat platforms, ticketing systems, and workflow automation to offload a significant amount of self-inflicted management overhead present in most day-to-day jobs.

OPINION: while convenient, it already produced an AI-fatigue with people skimming over immediately recognizable style of writing. It’s a fascinating occurrence of machine mechanically capturing the meaning in a way that waters it down to nothing, something I would call an erosion of entropy. For critical requests, there is a growing preference for human-only filings, as well as outright rejection of AI-generated content.

To summarize, the changes brought by AI are

Code generation (generic help, diminishing time-to-attack, malware obfuscation and just-in-time execution)
Code analysis (static, reverse, dynamic)
Back-office and workflow automation

02. The Bad

Token-prediction nature of AI ultimately dictates its drawbacks.

It’s too generic

LLMs are being applied to problems that aren’t fundamentally language problems.

I get it: it’s nice to have one prompt to rule them all. But at some point you’re trying to hit a nail with a microscope.

What’s the most common cybersecurity needs in relation to data? Anomaly and similarity detection. Former can be used to find bad stuff, latter – to verify good. You don’t need LLMs for it. I’m sure they are capable of producing some results – just not good results. It’s better handled by specialized neural networks designed with one goal in mind, not textual processing and reasoning. It’s also much cheaper.

OPINION: I am skeptical about attempts to obtain meaningful gains by bolting on LLMs to cybersecurity products.

Thinking is hard

Reasoning models (LRM) were a natural development of normal models. By splitting original request into multiple smaller logical chunks, we can generate more context and enhance overall accuracy. It works, somewhat. Controversial research from Apple showed that reasoning models accuracy may improve only to a certain point of complexity, after which they might face complete collapse. Other voices in the industry also voiced quite directly that there are no ‘reasoning’ as a concept happening anywhere.

This is also directly applicable to ‘agentic’ approach currently taking over people. Instead of some intelligent system being built and executed autonomly, what we have is multiple separate models running with their own context and processing what basically are prompts of each other to ‘plan’, ‘judge’, and ’execute’. Inherent unreliability of LLMs makes agentic engineering akin to trying put a bandaid over leaking pipes. It also starts to awfully remind primitive algorithmic programming, with the caveat that each block instead of a simple instruction runs a highly sophisticated (and costly) language model. Yet, results are still flaky at best.

To us it means that we can be more abstract in our requests delegating logical splitting to the models, but we won’t receive an intelligent assistant running complex thoughts.

OPINION: in my tests, current LLM capabilities in regards to offensive security reminds me of genius-level kindergartener. If task is simply decomposable into linear activity, it will be completed without a hurdle. If at any point unusual thinking is required to account for some weird quirk, everything collapses. More practically, you can delegate something like initial reconnaissance on a compromised machine, but any complex task like privilege escalation remains significantly less reliable than many demonstrations suggest and still benefits heavily from human guidance.

Similarly, models are struggling to maintaining operational security crucial for red teaming. Assuming you have your hands on jailbroken variant with removed ethics finetuning or were able to fool it with roleplay prompting, the nuances of actions being ‘stealthy’ while others being ‘noisy’ is hardly understood.

OPINION: this drawback might be rectified once more specialized training data is acquired from subject matter experts.

Context matters

Models do not possess a persistent understanding of an environment. They rely entirely on information provided through context, which may be incomplete, inaccurate, or outdated. This becomes particularly problematic in cybersecurity, where critical details are often buried across thousands of log entries, source files, tickets, and infrastructure components.

In addition to that, context windows are limited both by hard architectural constraints and implementation choices depending on the model, as well as soft economic considerations tied to the amount of tokens required to pass context back and forth and their associated cost.

As a result, effective use of AI frequently becomes an exercise in context engineering: deciding what information to include, what to omit, and how to present it in a way that allows the model to reason effectively.

Hallucinations

Be it a consequence of supervised finetuning and varied human knowledge level that introduces contradictions, or internal state inconsistency that outputs wrong answer despite internal incoherence – sometimes we get a wrong answer. What can be fine for code development and caught and fixed later on, is critical for cybersecurity where one wrong command can bring down infrastructure, financial loss and unnecessary headache.

OBSERVATION: lately, I hardly ever encounter hallucinations. The non-obvious problem with this improvement is that hallucinations are becoming more subtle: you never know when things may go wrong.

To summarize, the noticeable limitations are

Product applicability
Complex thoughts
Reliability

03. The Ugly

OPINION: this whole section is heavily opinionated. Keep that in mind.

While job market may experience near-term turbulence due to hype and unrealistic management expections, I do not believe cybersecurity as a market and a job-market will contract.

Best case scenario for IT security-wise, net-gain for cybersecurity will be zero. Worst case scenario – cybersecurity will hyperinflate because of people trying to contain emerging threats. Yes, some jobs will be lost. It’s inevitable that older technologies are losing relevancy and some classes of risks are being eliminated. It happened before. Similarly how first shift from binary and network exploitation towards application and infrastructure security, and then from application and infrastructure security towards identity and cloud security have happened before, a new shift towards ai models and their application security is already happening.

New tech, same mistakes

OBSERVATION: non-technical people have embraced agentic approach. Working applications are being made with single prompt.

As of now, there are no security guidelines or reviewer in a chain for any ai-generated code. Naturally, this means that many unsupervised projects generated through AI are likely to contain elementary security vulnerabilities

OPINION: I suspect that many of ai-assisted projects will never mature due to non-addressable cost of maintenance, lack of human direction and accumulation of ai-induced bugs.

Same goes for infrastructure around the AI: although community is actively securing it, proper configuration is another question. Things like Ollama exposing port for arbitrary model execution is no fun.

Then there are new attack surfaces. OpenClaw, however fun and innovative it is, is THE security nightmare. Models, agents, their actions and prompts with which they were launched are all new layers stacking on top of other existing problems.

Is it unsolvable? No. Does it bring more job (and headache) to cybersecurity? Absolutely.

Responsibility

AI can triage alerts. AI can prioritize incidents. AI can recommend actions. But when one of those recommendations causes damage, responsibility has to reside somewhere. When a probabilistic system fails, who is accountable? Is it a person who typed an ambiguous prompt? Is it a model provider that trained the weights? Is it a data provider who gave a training dataset? Or maybe the Sun that accidently flipped the bit affecting the outcome?

OBSERVATION: AI is being experimented with as a summarizer to parse copious amounts of logs and bring meaningful signals to the surface.

OPINION: While no consensus is reached on this topic, personally I feel wary relying on probabilistic tool for critical decisions.

It is always good to have a scape goat. When your tool work with probabilities, its hard to find one. This is unacceptable for things like critical infrastructure, government, military.

People

A claude.md file can be written to provide a context for coding style, agreements, etc, but it’s nearly impossible to do the same for such chaotic socio-economic entity as company.

There is another essential commodity that is baked in in humans and cannot be easily replaced: social connections. It’s impossible to replicate politics, decision making, influence, interdependence and all other subtle but important factors in decision making.

At the end of the day, there has to be a person with a face to whom security questions will be addressed.

99. How AI Changes Cybersecurity

It’s time to wrap up.

I could be lame and give you my ‘predictions for the Next 5 Years’, but there is a simpler observation: life moves on, technology changes all the time. As long as computer systems can be used for nefarious goals and monetary gain, cybersecurity is here to stay. As with everything else, it transforms and adapts, and so should you. Instead of completely rejecting the changes or buying in on the hype, you have to maintain fragile balance of seeing technology for what it’s worth and using it to your benefit, ignoring the rest.

OPINION: The perception of AI as a automation and work multiplication tool is much healthier and closer to its nature than either complete replacement or autonomous work outsourcing.

Or I can be completely wrong.

Life will tell.

01. The Good#

Implementation amplification.#

Leveraging the ‘AI’ part#

Analysis amplification.#

AI performs surprisingly well in dynamic black-box analysis.#

Bureaucracy automation.#

To summarize, the changes brought by AI are#

02. The Bad#

It’s too generic#

Thinking is hard#

Context matters#

Hallucinations#

To summarize, the noticeable limitations are#

03. The Ugly#

New tech, same mistakes#

Responsibility#

People#

99. How AI Changes Cybersecurity#