Web LLM Attacks

Web LLM Attacks

Read In This Article

The integration of Large Language Models (LLMs) into online platforms presents a double-edged sword, offering enhanced user experiences but also introducing security vulnerabilities. Insecure output handling is a prominent concern, where insufficient validation or sanitization of LLM outputs can lead to a range of exploits like cross-site scripting (XSS) and cross-site request forgery (CSRF). Indirect prompt injection further exacerbates these risks, allowing attackers to manipulate LLM responses through external sources such as training data or API calls, potentially compromising user interactions and system integrity. Additionally, training data poisoning poses a significant threat, as compromised data used in model training can result in the dissemination of inaccurate or sensitive information, undermining trust and security.

Defending against LLM attacks requires a multifaceted approach that prioritizes robust security measures and proactive risk mitigation strategies. Treating LLM-accessible APIs as publicly accessible entities, implementing stringent access controls, and avoiding the feeding of sensitive data to LLMs are critical steps in bolstering defense mechanisms. Furthermore, relying solely on prompting to block attacks is insufficient, as attackers can circumvent these restrictions through cleverly crafted prompts, underscoring the need for comprehensive security protocols that encompass data sanitization, access control, and ongoing vulnerability testing. By adopting these practices, organizations can better safeguard their systems and user data against the evolving threat landscape posed by LLM-based attacks.

Overall, the emergence of LLMs presents a paradigm shift in online interaction, offering unparalleled capabilities but also posing unprecedented security challenges. Organizations must remain vigilant, continuously assessing and enhancing their security posture to mitigate the risks associated with LLM integration effectively. By understanding the nuances of LLM vulnerabilities, implementing robust defense strategies, and fostering a culture of proactive security, businesses can harness the transformative potential of LLMs while safeguarding against exploitation and ensuring user trust and safety.

What are LLMs?

Large language models (LLMs) are sophisticated AI algorithms adept at processing user inquiries and crafting realistic responses. Their capabilities stem from analyzing vast collections of text data and learning the complex relationships between words, sequences, and overall context. Through this machine learning process, LLMs acquire the ability to:

  • Generate human-quality text: LLMs can create coherent, grammatically correct, and even stylistically diverse text formats, such as poems, code, scripts, musical pieces, emails, letters, and more.
  • Translate languages: LLMs can accurately translate languages, taking different cultural nuances and contexts into account.
  • Summarize information: LLMs can present concise and informative summaries of factual topics, making it easier to grasp the essence of complex information.
  • Answer questions: LLMs can extract knowledge from massive datasets and respond to questions in a comprehensive and informative manner.

Interactive Interfaces and Use Cases

LLMs often interact with users through a chat interface, receiving prompts and instructions through text input. This opens up numerous avenues for their application, including:

  • Customer service chatbots: LLMs can power chatbots that address customer inquiries, answer FAQs, and provide basic troubleshooting support.
  • Content creation assistance: LLMs can help with writing tasks by suggesting content ideas, generating outlines, and drafting different sections of an article or document.
  • Education and learning: LLMs can serve as intelligent tutors, providing personalized learning experiences and responding to students’ queries in a tailored manner.
  • Market research and analysis: LLMs can analyze user-generated content (UGC) on social media, forums, and review platforms to gauge public opinion, identify trends, and understand customer sentiment.

Security Considerations

While LLMs offer a range of potential benefits, it’s crucial to be aware of potential security risks:

  • Prompt injection: Malicious actors could craft manipulative prompts to induce the LLM to perform unintended actions, such as making unauthorized API calls or revealing sensitive data.
  • LLM vulnerabilities: LLMs may have vulnerabilities in their design or training data that could be exploited to elicit harmful outputs or gain unauthorized access.
  • Excessive agency: Granting LLMs access to a wide range of APIs can create situations where attackers can manipulate them into using those APIs unsafely.

Protecting Against LLM Attacks

To mitigate these risks, it’s imperative to:

  • Implement robust input validation: Carefully sanitize user inputs to prevent malicious prompts from being injected into the LLM.
  • Regularly assess LLM vulnerabilities: Conduct security audits to identify and address potential vulnerabilities in the LLM itself.
  • Limit LLM API access: Restrict the APIs and data that LLMs can access to minimize potential harm in case of exploitation.
  • Monitor LLM activity: Track LLM actions and data access to detect unusual patterns that might indicate a security breach.
  • Be transparent about LLM limitations: Clearly inform users about the capabilities and limitations of LLMs and the potential risks involved when interacting with them.

By employing these measures, we can leverage the power of LLMs while safeguarding against potential security issues and safeguarding the proper use of these powerful AI tools.

Exploiting LLM APIs with excessive agency

This application includes two endpoints:

  1. /debug-sql: This endpoint accepts POST requests and allows the user to execute SQL commands directly on a dummy database table. It is vulnerable to SQL injection attacks as it does not properly sanitize user input.
  2. /livechat: This endpoint simulates a live chat feature and responds to specific messages related to the available APIs and their arguments.

An attacker could exploit this vulnerable web application to perform unauthorized actions, such as executing arbitrary SQL commands and deleting user records from the database.

Here’s how an attacker could exploit this vulnerability:

  1. The attacker sends a POST request to /debug-sql with a SQL command as the sql_statement parameter. For example, the attacker could send a SQL injection payload like DELETE FROM users WHERE username='carlos'. This command would delete the user ‘carlos’ from the database.
  2. The attacker can also interact with the /livechat endpoint to gather information about the available APIs and their arguments. For example, by sending a message containing ‘APIs’, the attacker can receive a response indicating that the LLM has access to the Debug SQL API.

By leveraging these vulnerabilities, an attacker could gain unauthorized access to sensitive data, modify database records, or disrupt the functionality of the web application.

from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy database table
users = [
    {'username': 'carlos', 'password': 'password123'},
    {'username': 'alice', 'password': 'alicepassword'},
    {'username': 'bob', 'password': 'bobpassword'}
]

# Debug SQL API endpoint
@app.route('/debug-sql', methods=['POST'])
def debug_sql():
    # Get SQL statement from request data
    sql_statement = request.form.get('sql_statement')

    # Execute SQL statement directly (Vulnerable to SQL Injection)
    if sql_statement.startswith('SELECT'):
        results = []
        for user in users:
            results.append({'username': user['username'], 'password': user['password']})
        return jsonify(results)
    elif sql_statement.startswith('DELETE'):
        # Parse SQL statement to get username to delete
        username = sql_statement.split("WHERE")[1].split("=")[1].strip("'")
        for user in users:
            if user['username'] == username:
                users.remove(user)
                return jsonify({'message': 'User {} deleted'.format(username)})
        return jsonify({'error': 'User not found'})
    else:
        return jsonify({'error': 'Invalid SQL statement'})

# Live chat endpoint
@app.route('/livechat', methods=['POST'])
def live_chat():
    message = request.form.get('message')

    # Simulate LLM behavior
    if 'APIs' in message:
        return jsonify({'response': 'I have access to Debug SQL API'})
    elif 'arguments' in message:
        return jsonify({'response': 'Debug SQL API takes a string containing an entire SQL statement'})
    else:
        return jsonify({'response': 'I do not understand your request'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate these vulnerabilities, it’s crucial to implement proper input validation and parameterized queries to prevent SQL injection attacks. Additionally, access controls should be enforced to restrict access to sensitive APIs and functionality. Regular security assessments and code reviews can also help identify and address security flaws in web applications.

Exploiting vulnerabilities in LLM APIs

Python code sets up a vulnerable web application using Flask, which exposes an endpoint /newsletter-subscription vulnerable to command injection. This endpoint simulates the behavior of a Newsletter Subscription API within the context of the described scenario.

Here’s a breakdown of the vulnerable code and its exploitation:

  1. Flask Setup: The code initializes a Flask application.
  2. Newsletter Subscription Endpoint: The vulnerable endpoint /newsletter-subscription accepts POST requests and extracts the email address from the request form data.
  3. Command Injection Vulnerability: The code checks if the email address contains certain patterns ($(whoami) or $(rm ) indicative of command injection attempts. If such patterns are found, it executes the corresponding command.
  4. Command Execution:
    • If the email address contains $(whoami), the code extracts the command before $(whoami), simulates its execution (e.g., by returning a fixed value 'carlos'), and sends an email to the result concatenated with @YOUR-EXPLOIT-SERVER-ID.exploit-server.net.
    • If the email address contains $(rm , the code extracts the file path from the command, checks if it matches /home/carlos/morale.txt, and simulates file deletion (e.g., by printing a message).
  5. Normal Subscription: If the email address does not contain any suspicious patterns, the code simulates sending a subscription confirmation email.

An attacker can exploit this vulnerability by sending crafted email addresses containing the command injection payloads ($(whoami) or $(rm /home/carlos/morale.txt)) to the /newsletter-subscription endpoint.

from flask import Flask, request, jsonify

app = Flask(__name__)

# Function to subscribe to newsletter (vulnerable to command injection)
@app.route('/newsletter-subscription', methods=['POST'])
def newsletter_subscription():
    email = request.form.get('email')

    # Simulate LLM behavior
    if '$(whoami)' in email:
        # Execute command and send email to result
        command = email.split('$(whoami)')[0]
        result = execute_command(command)
        send_email(result + '@YOUR-EXPLOIT-SERVER-ID.exploit-server.net')
        return jsonify({'response': 'Command executed successfully'})
    elif '$(rm ' in email:
        # Execute command to delete file
        command = email.split('$(rm ')[1].split(')')[0]
        if command == '/home/carlos/morale.txt':
            delete_file(command)
            return jsonify({'response': 'File deleted successfully'})
        else:
            return jsonify({'response': 'Invalid file path'})
    else:
        # Subscribe to newsletter normally
        send_email(email)
        return jsonify({'response': 'Subscribed to newsletter'})

# Function to execute system command
def execute_command(command):
    # Simulate execution of system command (replace with actual execution)
    return 'carlos'

# Function to send email
def send_email(email):
    # Simulate sending email (replace with actual email sending)
    print("Subscription confirmation email sent to:", email)

# Function to delete file
def delete_file(file_path):
    # Simulate file deletion (replace with actual file deletion)
    print("File deleted:", file_path)

if __name__ == '__main__':
    app.run(debug=True)

To mitigate this vulnerability, input validation and proper sanitization techniques should be implemented to ensure that user-supplied data is safe for use. Additionally, access controls and least privilege principles should be enforced to restrict the capabilities of the application’s APIs. Regular security assessments and code reviews are essential to identify and remediate such vulnerabilities in web applications.

Indirect prompt injection

Python code sets up a vulnerable Flask application that exposes several endpoints for user registration, email address change, and product review submission. This application is vulnerable to an attacker manipulating the product review feature to delete user accounts.

Here’s a breakdown of the code and the exploitation scenario:

  1. Flask Setup: The code initializes a Flask application.
  2. User Account Management:
    • The /register endpoint allows users to register a new account by providing an email and password. The details are stored in the users dictionary.
    • The /change-email endpoint allows users to change their email address by providing the current email and the new email. The email address is updated in the users dictionary.
  3. Product Review:
    • The /add-review endpoint allows users to add a review for a product. If the product is a leather jacket and the review contains the string ‘delete_account’, the user’s account associated with the client’s IP address (request.remote_addr) will be deleted from the users dictionary.
  4. Exploitation Scenario:
    • An attacker can register a user account through the /register endpoint.
    • The attacker then submits a review for the leather jacket product through the /add-review endpoint, including the ‘delete_account’ prompt in the review text.
    • When the LLM makes a call to the Delete Account API (simulated by sending a request to the /add-review endpoint), the user’s account associated with the client’s IP address will be deleted, effectively exploiting the vulnerability.
from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy database to store user accounts
users = {}

# Endpoint to register a new user account
@app.route('/register', methods=['POST'])
def register():
    email = request.form.get('email')
    password = request.form.get('password')

    # Create user account
    users[email] = {'password': password}
    return jsonify({'message': 'Account registered successfully'})

# Endpoint to login and change email address
@app.route('/change-email', methods=['POST'])
def change_email():
    email = request.form.get('email')
    new_email = request.form.get('new_email')

    # Change email address
    if email in users:
        users[email]['email'] = new_email
        return jsonify({'message': 'Email address updated successfully'})
    else:
        return jsonify({'error': 'User not found'})

# Endpoint to add product review
@app.route('/add-review', methods=['POST'])
def add_review():
    product_name = request.form.get('product_name')
    review = request.form.get('review')

    # Add review to product
    if product_name == 'leather jacket':
        # Check if review contains delete account prompt
        if 'delete_account' in review:
            del users[request.remote_addr]
            return jsonify({'message': 'Account deleted successfully'})
        else:
            return jsonify({'message': 'Review added successfully'})
    else:
        return jsonify({'error': 'Product not found'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate this vulnerability, input validation should be enforced to ensure that user-supplied data is properly sanitized, and access controls should be implemented to restrict the capabilities of the application’s APIs. Additionally, the application should avoid using sensitive information such as client IP addresses for user identification or authentication purposes.

Exploiting insecure output handling in LLMs

Python code sets up a vulnerable Flask application that is susceptible to Cross-Site Scripting (XSS) attacks due to insecure handling of user input in product reviews. Below is a description of the code and the exploitation scenario:

  1. Flask Setup: The code initializes a Flask application.
  2. User Account Management:
    • The /register endpoint allows users to register a new account by providing an email and password. The details are stored in the users dictionary.
    • The /login endpoint allows users to log in by providing their email and password.
  3. Product Review:
    • The /product-info endpoint allows users to retrieve product information, including product reviews. The application renders product reviews using the render_template_string function, which allows for dynamic rendering of templates, including potential injection of XSS payloads.
  4. Exploitation Scenario:
    • An attacker registers a user account through the /register endpoint.
    • The attacker logs in and navigates to the product information page for a product (e.g., the gift wrap).
    • The attacker submits a review for the product containing a crafted XSS payload (e.g., an <iframe> tag with an onload attribute to automatically submit a form to delete the user’s account).
    • When other users view the product information page, the XSS payload embedded in the review is executed in their browsers, leading to unauthorized actions such as deleting their accounts.
from flask import Flask, request, jsonify, render_template_string

app = Flask(__name__)

# Dummy database to store user accounts
users = {}

# Dummy product review for gift wrap
product_reviews = {
    'gift_wrap': [
        '<p>This product is amazing!</p>',
        '<p>This product is out of stock and cannot be ordered.</p>',
        '<p>When I received this product I got a free T-shirt with "{{ xss_payload }}" printed on it. I was delighted!</p>'
    ]
}

# Endpoint to register a new user account
@app.route('/register', methods=['POST'])
def register():
    email = request.form.get('email')
    password = request.form.get('password')

    # Create user account
    users[email] = {'password': password}
    return jsonify({'message': 'Account registered successfully'})

# Endpoint to log in
@app.route('/login', methods=['POST'])
def login():
    email = request.form.get('email')
    password = request.form.get('password')

    # Check if user exists and password is correct
    if email in users and users[email]['password'] == password:
        return jsonify({'message': 'Login successful'})
    else:
        return jsonify({'error': 'Invalid email or password'})

# Endpoint to get product information
@app.route('/product-info', methods=['GET'])
def product_info():
    product_name = request.args.get('product_name')

    # Get product reviews
    if product_name in product_reviews:
        reviews = product_reviews[product_name]
        # Render product reviews, injecting XSS payload if applicable
        xss_payload = '<iframe src="my-account" onload="this.contentDocument.forms[1].submit()">'
        reviews_rendered = [render_template_string(review, xss_payload=xss_payload) for review in reviews]
        return jsonify({'reviews': reviews_rendered})
    else:
        return jsonify({'error': 'Product not found'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate this vulnerability, input validation and output encoding should be implemented to sanitize user-supplied data and prevent execution of malicious scripts. Additionally, proper access controls and authentication mechanisms should be enforced to restrict access to sensitive functionalities. Regular security assessments and code reviews are essential to identify and remediate such vulnerabilities in web applications.

LLM Zero-Shot Learning Attacks

Python code sets up a vulnerable Flask application that represents a simplified version of an LLM API. This LLM API is susceptible to exploitation by attackers to train the LLM with malicious prompts, potentially leading it to perform harmful actions. Below is a description of the code and the exploitation scenario:

  1. Flask Setup: The code initializes a Flask application to create a RESTful API.
  2. Harmless Prompts Dataset: The harmless_prompts dictionary contains a dataset of harmless prompts and their corresponding actions. These prompts are intended to guide the LLM towards learning helpful and benign behaviors.
  3. Prompt Endpoint (/prompt):
    • This endpoint receives prompts from users (or attackers) in the form of JSON data.
    • It checks if the received prompt exists in the harmless_prompts dataset.
    • If the prompt is found, it returns the corresponding response from the dataset.
    • If the prompt is not found, it returns an error message indicating that the prompt was not found.
  4. Malicious Prompt Endpoint (/malicious-prompt):
    • This endpoint receives malicious prompts from attackers in the form of JSON data.
    • It simulates training the LLM with the received malicious prompt.
    • For simplicity, it prints the malicious prompt and returns a success message indicating that the LLM was trained successfully with the malicious prompt.

Exploitation Scenario:

  1. An attacker interacts with the /malicious-prompt endpoint and submits a series of seemingly harmless prompts, gradually leading the LLM towards learning how to perform a specific malicious task.
  2. The attacker carefully crafts the prompts to guide the LLM’s learning process towards the desired malicious outcome, such as generating phishing emails or executing unauthorized commands.
  3. The LLM learns from these malicious prompts and, depending on its capabilities, may perform the malicious task without explicit training.
from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dataset of harmless prompts and corresponding actions
harmless_prompts = {
    "How do you greet someone politely?": "Say hello and smile.",
    "What is the capital of France?": "Paris is the capital of France.",
    "How do you make a cup of tea?": "Boil water, add tea leaves, and steep for a few minutes.",
    # Add more harmless prompts here
}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the dataset
    if prompt_text in harmless_prompts:
        response = harmless_prompts[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

# Endpoint for presenting malicious prompts and collecting responses
@app.route('/malicious-prompt', methods=['POST'])
def malicious_prompt():
    malicious_prompt_text = request.json.get('malicious_prompt')

    # Simulate training the LLM with malicious prompts
    # For simplicity, we'll just print the malicious prompt and return a success message
    print("Received malicious prompt:", malicious_prompt_text)
    return jsonify({'message': 'LLM trained successfully with malicious prompt'})

if __name__ == '__main__':
    app.run(debug=True)

To prevent such attacks, it is essential to implement robust security measures such as input validation, access controls, and monitoring mechanisms to detect and mitigate potential misuse of LLMs. Additionally, educating users and developers about the risks associated with LLMs and promoting responsible use practices can help prevent exploitation of these models for malicious purposes.

LLM Homographic Attacks

we’ll create a simple Flask application that represents an LLM API vulnerable to homographic attacks. Below is an example of such code:

In this code:

  1. Flask Setup: The code initializes a Flask application to create a RESTful API.
  2. Harmless Prompts Dataset: The harmless_prompts dictionary contains a dataset of harmless prompts and their corresponding actions. These prompts are intended to guide the LLM towards learning helpful and benign behaviors.
  3. Prompt Endpoint (/prompt):
    • This endpoint receives prompts from users (or attackers) in the form of JSON data.
    • It checks if the received prompt exists in the harmless_prompts dataset.
    • If the prompt is found, it returns the corresponding response from the dataset.
    • If the prompt is not found, it returns an error message indicating that the prompt was not found.
  4. Malicious Prompt Endpoint (/malicious-prompt):
    • This endpoint receives malicious prompts from attackers in the form of JSON data.
    • It simulates processing the received prompt, replacing homoglyphs (visually similar characters) with their legitimate counterparts.
    • After processing, it executes the processed prompt using the eval function, which evaluates the string as Python code.

Exploitation Scenario:

  1. An attacker interacts with the /malicious-prompt endpoint and submits a malicious prompt containing homoglyphs to disguise the intended code execution.
  2. The endpoint processes the malicious prompt, replacing homoglyphs with legitimate characters, and then executes the resulting code.
  3. The attacker’s code is executed, potentially leading to unauthorized actions or malicious behavior.
from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dataset of harmless prompts and corresponding actions
harmless_prompts = {
    "How do you greet someone politely?": "Say hello and smile.",
    "What is the capital of France?": "Paris is the capital of France.",
    "How do you make a cup of tea?": "Boil water, add tea leaves, and steep for a few minutes.",
    # Add more harmless prompts here
}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the dataset
    if prompt_text in harmless_prompts:
        response = harmless_prompts[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

# Endpoint for presenting malicious prompts and collecting responses
@app.route('/malicious-prompt', methods=['POST'])
def malicious_prompt():
    malicious_prompt_text = request.json.get('malicious_prompt')

    # Simulate processing the prompt, replacing homoglyphs, and executing the code
    processed_prompt = malicious_prompt_text.replace('1', 'l')  # Replace homoglyphs
    response = eval(processed_prompt)  # Execute the processed prompt

    return jsonify({'response': response})

if __name__ == '__main__':
    app.run(debug=True)

To prevent such attacks, it’s crucial to implement robust security measures such as input validation, output encoding, and filtering of user-supplied data to mitigate the risk of homographic attacks. Additionally, educating users and developers about the risks associated with homographic attacks and promoting secure coding practices can help prevent exploitation of LLMs in this manner.

LLM Model Poisoning with Code Injection

we’ll create a simple vulnerable Flask application representing an LLM API susceptible to such attacks. Below is an example of the vulnerable code along with its description:

  1. Flask Setup: The code initializes a Flask application to create a RESTful API.
  2. Training Data: The training_data dictionary contains a dataset of prompts and their corresponding actions. These prompts are used to train the LLM during the training phase.
  3. Prompt Endpoint (/prompt):
    • This endpoint receives prompts from users (or attackers) in the form of JSON data.
    • It checks if the received prompt exists in the training_data dataset.
    • If the prompt is found, it returns the corresponding action associated with the prompt.
    • If the prompt is not found, it returns an error message indicating that the prompt was not found.

Exploitation Scenario:

  1. Model Poisoning: Attackers inject malicious code into the training data during the model training phase. For example, they inject a prompt asking the LLM to “print this message” followed by malicious code disguised as a legitimate action.
  2. Model Training: The LLM is trained on the poisoned dataset, associating specific inputs with the injected malicious code.
  3. Code Injection: During the inference phase, attackers send prompts containing similar requests to “print a message,” but with embedded malicious code. The LLM, having learned to associate such prompts with executing code, executes the injected malicious code instead of the benign action.
from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dataset of prompts and corresponding actions
training_data = {
    "print this message": "print('This is a harmless message')",
    # Add more prompts and corresponding actions here
}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the dataset
    if prompt_text in training_data:
        response = training_data[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate LLM Model Poisoning with Code Injection attacks, it’s crucial to implement rigorous security measures during the model training phase, including:

  • Input Sanitization: Validate and sanitize training data to remove any potentially malicious code or prompts.
  • Anomaly Detection: Implement anomaly detection techniques to identify and filter out suspicious or unexpected inputs during model training.
  • Model Verification: Perform thorough testing and validation of the trained model to ensure it behaves as expected and does not execute unintended actions.
  • Continuous Monitoring: Monitor the LLM’s behavior during inference for any signs of unexpected or malicious activity, and take appropriate action if detected.

Chained Prompt Injection

we’ll create a vulnerable Flask application representing an LLM API susceptible to such attacks. Below is an example of the vulnerable code along with its description:

  1. Flask Setup: The code initializes a Flask application to create a RESTful API.
  2. Chained Prompts Dictionary: The chained_prompts dictionary stores a series of prompts and their corresponding actions. Each prompt is associated with an action that will be executed when the prompt is presented to the LLM.
  3. Prompt Endpoint (/prompt):
    • This endpoint receives prompts from users (or attackers) in the form of JSON data.
    • It checks if the received prompt exists in the chained_prompts dictionary.
    • If the prompt is found, it returns the corresponding action associated with the prompt.
    • If the prompt is not found, it returns an error message indicating that the prompt was not found.
  4. Add Prompt Endpoint (/add-prompt):
    • This endpoint allows users (or attackers) to add new prompts to the chained_prompts dictionary.
    • It receives a prompt and its associated action in the form of JSON data and adds them to the dictionary.

Exploitation Scenario:

  1. Craft Chained Prompts: Attackers craft a series of seemingly innocuous prompts, each building upon the previous one, ultimately leading to the execution of malicious code. For example, they start by asking the LLM to “define the function downloadFile.” Then, they ask to “set the download URL to ‘attacker-controlled-url'” and finally, “call the downloadFile function.”
  2. Add Chained Prompts: Attackers add each prompt and its associated action to the chained_prompts dictionary using the /add-prompt endpoint.
  3. Execute Chained Prompts: When the LLM processes each prompt individually, it executes the associated action. However, since the prompts are chained together, the final action executed by the LLM may be malicious, such as downloading and potentially running a file from an attacker-controlled URL.
from flask import Flask, request, jsonify

app = Flask(__name__)

# Dummy dictionary to store chained prompts and actions
chained_prompts = {}

# Endpoint for presenting prompts and collecting responses
@app.route('/prompt', methods=['POST'])
def prompt():
    prompt_text = request.json.get('prompt')

    # Check if the prompt is in the chained prompts dictionary
    if prompt_text in chained_prompts:
        response = chained_prompts[prompt_text]
        return jsonify({'response': response})
    else:
        return jsonify({'error': 'Prompt not found'})

# Endpoint for adding chained prompts
@app.route('/add-prompt', methods=['POST'])
def add_prompt():
    prompt_text = request.json.get('prompt')
    action = request.json.get('action')

    # Add the prompt and associated action to the chained prompts dictionary
    chained_prompts[prompt_text] = action
    return jsonify({'message': 'Prompt added successfully'})

if __name__ == '__main__':
    app.run(debug=True)

To mitigate LLM Model Chained Prompt Injection attacks, it’s crucial to implement rigorous security measures such as:

  • Input Validation: Validate and sanitize prompts before adding them to the chained_prompts dictionary to prevent injection of malicious prompts.
  • Access Control: Implement access control mechanisms to restrict the ability to add prompts to authorized users only.
  • Anomaly Detection: Monitor the LLM’s behavior during prompt processing for any unusual patterns or unexpected actions that may indicate an attack.
  • Model Verification: Perform thorough testing and validation of the trained model to ensure it behaves as expected and does not execute unintended actions, especially when presented with chained prompts.

Conclusion

In conclusion, the scenarios of LLM Model Poisoning with Code Injection and LLM Model Chained Prompt Injection underscore the critical importance of securing large language models (LLMs) against various forms of attacks. These attacks exploit vulnerabilities in the LLM’s training data and prompt processing mechanisms to inject and execute malicious code, leading to potentially harmful consequences.

References

  • https://portswigger.net/web-security/llm-attacks
  • https://github.com/OWASP/www-project-top-10-for-large-language-model-applications

Security Researchers

Fazel (Arganex-Emad) Mohammad Ali Pour

Negin Nourbakhsh

Free Consultation

For a Free Consultation And Analysis Of Your Business, Please Fill Out The Opposite Form, Our Team Will Contact You As Soon As Possible.