Treffer: Cross-Modal Prompt Inversion

Title:
Cross-Modal Prompt Inversion
Authors:
Publisher Information:
Zenodo
Publication Year:
2025
Collection:
Zenodo
Document Type:
E-Ressource software
Language:
unknown
DOI:
10.5281/zenodo.15603408
Rights:
Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode
Accession Number:
edsbas.CADACCEE
Database:
BASE

Weitere Informationen

This project includes all the Python code required for our reverse prompt engineering experiments across three modalities: text, image, and video. To improve the clarity of the structure, we have organized the code into three separate folders corresponding to each modality. The implementation follows a two-step inference approach as proposed in the paper: Direct Inversion as the first step and Fine-tuning as the second step. The project provides comprehensive datasets and evaluation frameworks across all modalities. In the following sections, we will provide a detailed explanation of each modality folder, the two-step approach implementation, and the available datasets.1. Text Prompt Inversion (text_prompt_inversion/):This folder implements the reverse prompt engineering approach for text modalities, targeting text-to-text models: Step 1 - Default Direct Inversion: Default_DI_for_text.ipynb implements the first step of the proposed approach, performing direct inversion on text prompts using pre-trained models without additional training. This notebook includes both implementation and evaluation components. Step 2 - Fine-tuning: The Fine-tuning/ directory contains the implementation for the second step, where models are fine-tuned using reinforcement learning (RL) methods. The fine-tuning process uses the direct inversion (DI) model as the initial checkpoint, with customizable training parameters through configuration files in scripts/training/task_configs/. Environment Setup: Provides complete environment configuration (txt2txt.yml) and documentation. Datasets: The text modality experiments utilize two comprehensive datasets:• Alpaca-GPT4 Dataset: Available at Alpaca-GPT4 with processed version at hugging face cyprivlab/Alpaca-GPT4 (https://huggingface.co/datasets/cyprivlab/Alpaca-GPT4/)• RetrievalQA Dataset: Source available at RetrievalQA with processed version at huggingface cyprivlab/GPT4RQA (https://huggingface.co/datasets/cyprivlab/GPT4RQA) 2. Image Prompt Inversion (image_prompt_inversion/):This folder ...