Treffer: Cross-Modal Prompt Inversion

Title:

Cross-Modal Prompt Inversion

Authors:

Ye, Dayong, He, Feng

Publisher Information:

Zenodo

Publication Year:

2025

Collection:

Zenodo

Document Type:

E-Ressource software

Language:

unknown

Relation:

https://zenodo.org/records/15603408; oai:zenodo.org:15603408; https://doi.org/10.5281/zenodo.15603408

DOI:

10.5281/zenodo.15603408

Availability:

https://doi.org/10.5281/zenodo.15603408
https://zenodo.org/records/15603408

Rights:

Creative Commons Attribution 4.0 International ; cc-by-4.0 ; https://creativecommons.org/licenses/by/4.0/legalcode

Accession Number:

edsbas.CADACCEE

Database:

BASE

Weitere Informationen

This project includes all the Python code required for our reverse prompt engineering experiments across three modalities: text, image, and video. To improve the clarity of the structure, we have organized the code into three separate folders corresponding to each modality. The implementation follows a two-step inference approach as proposed in the paper: Direct Inversion as the first step and Fine-tuning as the second step. The project provides comprehensive datasets and evaluation frameworks across all modalities. In the following sections, we will provide a detailed explanation of each modality folder, the two-step approach implementation, and the available datasets.1. Text Prompt Inversion (text_prompt_inversion/):This folder implements the reverse prompt engineering approach for text modalities, targeting text-to-text models: Step 1 - Default Direct Inversion: Default_DI_for_text.ipynb implements the first step of the proposed approach, performing direct inversion on text prompts using pre-trained models without additional training. This notebook includes both implementation and evaluation components. Step 2 - Fine-tuning: The Fine-tuning/ directory contains the implementation for the second step, where models are fine-tuned using reinforcement learning (RL) methods. The fine-tuning process uses the direct inversion (DI) model as the initial checkpoint, with customizable training parameters through configuration files in scripts/training/task_configs/. Environment Setup: Provides complete environment configuration (txt2txt.yml) and documentation. Datasets: The text modality experiments utilize two comprehensive datasets:• Alpaca-GPT4 Dataset: Available at Alpaca-GPT4 with processed version at hugging face cyprivlab/Alpaca-GPT4 (https://huggingface.co/datasets/cyprivlab/Alpaca-GPT4/)• RetrievalQA Dataset: Source available at RetrievalQA with processed version at huggingface cyprivlab/GPT4RQA (https://huggingface.co/datasets/cyprivlab/GPT4RQA) 2. Image Prompt Inversion (image_prompt_inversion/):This folder ...

Treffer: Cross-Modal Prompt Inversion

Weitere Informationen

Links

Zusatz-Funktionen