AI-Powered Security Operations: Building Detection Rules from Threat Intelligence with LLMs

Introduction

The modern threat intelligence ecosystem produces data at a scale that overwhelms human analysts. A single STIX/TAXII feed can deliver hundreds of indicators per hour: IP addresses, file hashes, domain names, URLs, tactics, techniques, and procedures. The downstream SIEM needs actionable detection rules derived from this data — but translating raw IOC bundles into well-formed Wazuh XML rules or Suricata signatures is tedious, error-prone manual work that does not scale. Large language models are uniquely suited to bridge this gap: they understand structured threat intelligence formats, can generate syntactically correct rule templates, and can reason about the semantics of an attack to produce context-appropriate severity levels and alert metadata.

This article covers the full architecture of an AI-powered security operations pipeline: from parsing STIX bundles and generating detection rules to false positive triage, human-in-the-loop approval workflows, and the underappreciated risk of adversarial feed manipulation.

The Gap Between IOC Feeds and SIEM Rules

IOC feeds speak the language of indicators: “this IP address was observed commanding a Cobalt Strike beacon” or “this SHA256 hash is associated with ransomware dropper X.” SIEMs speak the language of log events: “when a Windows agent reports a network connection to this IP on port 443, generate a high-severity alert.” The translation between these two representations requires:

  • Understanding which log sources are relevant to each indicator type (network IOCs map to firewall/DNS/proxy logs; file hash IOCs map to endpoint FIM or AV logs)
  • Knowing the appropriate field names and log formats for each data source (e.g., Wazuh FIM events use syscheck.sha256_after; Sysmon events use EventID 1 with win.eventdata.hashes)
  • Selecting appropriate severity levels based on threat actor profile, indicator confidence, and asset criticality context
  • Writing syntactically correct rule DSL for the target SIEM (Wazuh’s XML rule schema, Suricata’s rule language, Sigma YAML)

Each of these steps is individually straightforward but collectively laborious when performed at feed scale. An LLM with few-shot examples of each rule type can automate the translation reliably enough to dramatically reduce analyst workload — if the output goes through a validation and approval gate before deployment.

Parsing STIX/TAXII Bundles with LLMs

STIX 2.1 is a JSON-based standard for representing threat intelligence objects (Indicators, Attack Patterns, Malware, Threat Actors, etc.) and their relationships. A STIX Indicator object for a malicious IP looks like:

{
  "type": "indicator",
  "spec_version": "2.1",
  "id": "indicator--a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "name": "Cobalt Strike C2 - 10.0.1.99",
  "pattern": "[ipv4-addr:value = '10.0.1.99']",
  "pattern_type": "stix",
  "valid_from": "2026-04-01T00:00:00Z",
  "confidence": 85,
  "labels": ["malicious-activity"],
  "external_references": [
    {
      "source_name": "example-corp-threat-intel",
      "description": "Confirmed C2 server for Cobalt Strike campaign targeting financial sector"
    }
  ]
}

An LLM prompt that extracts structured fields for rule generation:

system_prompt = """You are a security rule generator. Extract the following fields from STIX Indicator objects:
- indicator_type: one of [ip, domain, url, file_hash, process]
- value: the actual indicator value
- confidence: integer 0-100
- severity: one of [low, medium, high, critical] based on confidence and labels
- context: one-sentence description of the threat
Return a JSON object with these fields. Return null if the object is not an Indicator."""

user_prompt = f"Extract fields from this STIX object:\n{json.dumps(stix_object, indent=2)}"

response = ollama.chat(
    model="llama3:8b",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ],
    format="json"
)

Using structured output (format="json" in Ollama, or the response_format parameter with OpenAI-compatible APIs) constrains the model to valid JSON output, making downstream parsing reliable. For TAXII feeds, use the taxii2client Python library to pull collections and iterate over STIX bundles.

Automatic Wazuh Rule Generation

Wazuh rules are XML documents that match against decoded log fields. An IP-based rule that alerts when any monitored host communicates with a known C2 IP:

<rule id="100901" level="12">
  <if_sid>62001</if_sid>
  <srcip>10.0.1.99</srcip>
  <description>Known C2 IP detected: Cobalt Strike campaign (confidence: 85)</description>
  <mitre>
    <id>T1071</id>
  </mitre>
  <group>threat_intel,cobalt_strike,c2</group>
</rule>

An LLM prompt template for generating Wazuh rules from structured IOC data:

rule_prompt = f"""Generate a Wazuh XML detection rule for this indicator:
- Type: {ioc['indicator_type']}
- Value: {ioc['value']}
- Severity: {ioc['severity']}
- Context: {ioc['context']}
- Confidence: {ioc['confidence']}

Requirements:
- Rule ID must be in range 100900-100999
- Use level 7 for medium, 10 for high, 12 for critical
- Include MITRE ATT&CK ID if applicable (use T1071 for C2, T1566 for phishing, T1190 for exploitation)
- Group must include 'threat_intel'
- Match against the correct Wazuh parent SID for this log type
- For IP IOCs: use <srcip> or <dstip> tags against firewall/network logs (parent SID 62001)
- For file hash IOCs: use <field name="syscheck.sha256_after"> against FIM events (parent SID 550)
- For domain IOCs: use <field name="data.dns.question.name"> against DNS query logs (parent SID 82200)

Return only the XML rule, no explanation."""

A severity mapping function that translates STIX confidence to Wazuh level:

def confidence_to_wazuh_level(confidence: int, labels: list) -> int:
    if "malicious-activity" in labels and confidence >= 80:
        return 12  # critical
    elif confidence >= 60:
        return 10  # high
    elif confidence >= 40:
        return 7   # medium
    else:
        return 5   # low

Automatic Suricata Rule Generation

Suricata rules operate at the network packet level and follow a distinct syntax from SIEM rules. A Suricata rule for the same C2 IP:

alert tcp $HOME_NET any -> 10.0.1.99 any (
  msg:"ET THREAT_INTEL Cobalt Strike C2 IP Detected (confidence:85)";
  threshold: type limit, track by_src, seconds 300, count 1;
  classtype:command-and-control;
  sid:9000901;
  rev:1;
  metadata:affected_product Windows, attack_target Client_and_Server,
            created_at 2026-04-03, confidence High;
)

LLM prompt for Suricata rule generation includes additional protocol-awareness requirements — the model must choose between tcp/udp/http/dns headers, construct correct content matching syntax, and manage SID namespaces. Providing three to five few-shot examples of correct Suricata rules covering each protocol type significantly improves output quality and consistency:

suricata_prompt = f"""Generate a Suricata IDS rule for this indicator.

Examples of correctly formatted rules:
{few_shot_examples}

Now generate a rule for:
- Indicator type: {ioc['indicator_type']}
- Value: {ioc['value']}
- Threat context: {ioc['context']}
- SID: {next_available_sid}

Rules:
- Use 'alert' action (not drop or reject)
- Choose correct protocol header based on indicator type
- Add threshold to prevent alert flooding (limit 1 per 300s per source)
- Classtype must match threat category
- Metadata must include confidence level
- Return only the rule text, no explanation."""

False Positive Reduction with LLM-Assisted Triage

Generated rules fired against a live environment inevitably produce false positives — benign traffic or events that match the rule pattern. LLMs can assist in triage by analyzing clusters of alert events and reasoning about whether they represent genuine threats or benign anomalies.

A triage prompt that analyzes an alert cluster:

triage_prompt = f"""Analyze this cluster of security alerts and determine if they represent
a genuine threat or a false positive.

Rule that fired: {rule_description}
Number of events: {len(alert_events)}
Source IPs: {source_ips}
Destination IPs: {dest_ips}
Time range: {time_range}
Sample events:
{json.dumps(alert_events[:5], indent=2)}

Respond with:
1. Verdict: TRUE_POSITIVE, FALSE_POSITIVE, or NEEDS_INVESTIGATION
2. Confidence: 0-100
3. Reasoning: 2-3 sentences explaining your verdict
4. Recommended action: one of [escalate, tune_rule, suppress_source, close]"""

Route the LLM triage output to an analyst review queue. Do not automate rule suppression based on LLM triage alone — use it to prioritize which alerts get human attention first and to pre-populate analyst notes in the case management system.

Human-in-the-Loop Validation Workflows

No AI-generated rule should reach production without human review. The approval workflow should be lightweight enough that it does not become a bottleneck, but rigorous enough that a poorly-generated rule cannot cause mass false positives or, worse, be deliberately crafted to suppress legitimate alerts.

A minimal approval workflow using a ticketing or review queue system:

  1. LLM generates rule from IOC — rule goes to pending_review state with associated IOC metadata, confidence score, and a syntax validation result
  2. Automated syntax check — run Wazuh’s ossec-logtest or Suricata’s --test-rules against the generated rule; reject syntactically invalid rules before human review
  3. Analyst review — reviewer sees the rule, the source IOC, the LLM’s reasoning, and any test results. One-click approve/reject/modify with mandatory comment on reject
  4. Staged deployment — approved rules deploy to a canary set of agents first; monitor for false positive rate over 24 hours before full rollout
  5. Feedback loop — analyst rejection reasons are logged and used to improve LLM prompts over time

Prompt Engineering for Security: Structured Output and Few-Shot Examples

Prompt engineering for rule generation has specific requirements that differ from general LLM usage. Key principles:

Structured output over free text: Always request JSON or explicit format responses. Free-text rule generation produces inconsistent output that breaks downstream parsers. Use Ollama’s format: "json" or a grammar constraint for structured fields, then generate the rule text as a single field within that structure.

Few-shot examples: Provide 3-5 examples of correct rules for each indicator type. The model needs to learn your organization’s specific SID namespace, group naming conventions, and MITRE mapping preferences. Generic instructions without examples produce generic rules that need heavy editing.

Negative examples: Include examples of common mistakes and explicitly instruct the model to avoid them. For Wazuh rules, common LLM mistakes include: using non-existent field names, incorrect parent SID references, and omitting required attributes. A “never do” list in the prompt eliminates 80% of systematic errors.

Chain-of-thought for severity mapping: For complex severity decisions, ask the model to reason step by step before producing the final severity level. This surface-level reasoning makes errors visible and correctable during review.

Integration Architecture

The full pipeline from feed ingestion to rule deployment:

TAXII Feeds / CSV Downloads
         |
         v
   Feed Ingestion Service
   (deduplicate, validate, enrich)
         |
         v
   LLM Rule Generator
   (Ollama / local model)
         |
         v
   Syntax Validator
   (ossec-logtest / suricata --test-rules)
         |
         v
   Review Queue (n8n / TheHive / custom)
         |
      [Human Approval]
         |
         v
   Canary Deployment
   (subset of agents / sensors)
         |
      [24h monitoring]
         |
         v
   Full Rollout
   (Puppet / Ansible push to all nodes)

Run the LLM rule generator as a scheduled task (hourly or on feed update). Deduplicate against already-deployed rules using indicator value hashing to avoid regenerating rules for IOCs already in your ruleset. Store all generated rules, their source IOCs, and their review history in a database for audit purposes.

Risk: Adversarial Manipulation of Threat Feeds

The most underappreciated risk in AI-assisted rule generation is feed poisoning: an adversary who can influence the content of a threat intelligence feed you consume can potentially cause your system to generate rules that suppress detection of their own activity.

Concrete attack scenarios:

  • IOC injection to suppress legitimate traffic monitoring: An attacker submits a false IOC claiming that a legitimate business partner’s IP is a C2 server. Your system generates a rule flagging that IP. Alternatively — and more subtly — a falsely high-confidence “benign” classification of their actual C2 IP causes the system to suppress existing rules targeting it.
  • Rule exhaustion: Flooding a feed with thousands of low-quality IOCs causes your LLM pipeline to generate thousands of noisy rules, overwhelming analyst review queues and degrading the signal-to-noise ratio of your SIEM.
  • Prompt injection via STIX fields: An adversary who can control the description or name fields of a STIX indicator can attempt to inject instructions into your LLM prompt. Mitigation: sanitize all feed-derived strings before interpolating them into prompts, and use structured JSON extraction rather than passing raw indicator text directly to the rule generation prompt.

Mitigations include: maintaining a minimum confidence threshold for rule generation (reject IOCs below 60% confidence), using multiple independent feed sources and requiring corroboration before generating rules for any single-source IOC, and running generated rules through an independent review that cannot be influenced by the feed source.

Summary

AI-powered rule generation addresses a real and growing bottleneck in security operations: the gap between the volume of threat intelligence data and the capacity of human analysts to translate it into actionable detection rules. LLMs — particularly smaller models running locally — are well-suited to this task: they understand structured formats, can maintain consistent output schemas with proper prompting, and execute at speeds that keep pace with high-volume feeds. The critical design constraints are human oversight before deployment, rigorous syntax validation, staged rollout to control false positive blast radius, and awareness of the adversarial manipulation risk that is unique to AI-augmented pipelines. Applied correctly, this architecture can reduce rule authoring time from hours per indicator to minutes per batch, while maintaining the analyst oversight that keeps the detection ruleset trustworthy.

Scroll to Top