Treffer: Automatically Generating Rules of Malicious Software Packages via Large Language Model

Title:

Automatically Generating Rules of Malicious Software Packages via Large Language Model

Authors:

Zhang, Xiangrui, orcid:0009-0002-9124-, Li, Qiang

Contributors:

Zhang, Xiangrui, Li, Qiang

Publisher Information:

Zenodo

Publication Year:

2025

Collection:

Zenodo

Subject Terms:

LLM, Pypi, Rule, Yara

Document Type:

E-Ressource software

Language:

English

Relation:

https://zenodo.org/records/15278659; oai:zenodo.org:15278659; https://doi.org/10.5281/zenodo.15278659

DOI:

10.5281/zenodo.15278659

Availability:

https://doi.org/10.5281/zenodo.15278659
https://zenodo.org/records/15278659
https://github.com/zhang-xr/RuleLLM

Rights:

MIT License ; mit ; https://opensource.org/licenses/MIT

Accession Number:

edsbas.3D9EC554

Database:

BASE

Weitere Informationen

Malware Detection Rule Generator A Python-based tool that automatically generates, refines, and validates malware detection rules from code samples. The system supports both YARA and Semgrep rule generation, using advanced language models to analyze malicious code patterns. ## Features - **Automated Rule Generation**: Analyzes malicious code samples to generate detection rules - **Multiple Rule Types**: Supports both YARA and Semgrep rule formats - **Intelligent Clustering**: Groups similar malware samples for more effective rule generation - **Rule Refinement**: Improves initial rules through pattern analysis and optimization - **Validation System**: Automatically validates and fixes generated rules - **Progress Tracking**: Maintains detailed logs of the generation process ## Installation 1. Clone the repository: ```bash git clone <repository-url> cd malware-rule-generator ``` 2. Install required dependencies: ```bash pip install -r requirements.txt ``` Required dependencies include: - openai - yara-python - semgrep - torch - transformers - scikit-learn - numpy - tqdm ## Configuration Create a `config.ini` file in the project root: ```ini [Settings] Model = # e.g., gpt-4-0125-preview ModelApiKey = <your-api-key> # Your API key BaseURL = # API base URL, such as https://api.openai.com/v1 ``` ## Workflow The system employs a three-phase process to generate effective malware detection rules: 1. **Clustering & Preprocessing**: - Uses `cluster_malware.py` to analyze and cluster similar malware samples - Leverages CodeBERT to generate embeddings for code samples - Applies K-means clustering to group similar malware - Filters large and heavily encoded samples to improve processing efficiency 2. **Rule Generation Process**: - Selects samples from each cluster - `Generator` analyzes samples and creates initial rules - `Refiner` optimizes rules to improve detection efficiency and reduce false positives - `Fixer` validates rule format and fixes any syntax errors 3. **Output & Logging**: - Generates ...

Treffer: Automatically Generating Rules of Malicious Software Packages via Large Language Model

Weitere Informationen

Links

Zusatz-Funktionen