logo
Published on

Asking ChatGPT to come up with an efficient and flexible file format for Generative AI prompts

artificial-intelligence
Authors

User: Prompt engineering is a big topic in the age of generative AI. As more use cases are being discovered, it is getting harder and harder to keep prompts organised, assume you were an engineer asked to create a new file format to store prompts. The format must store a single prompt and its variants. It must also have an indicator of which prompt variant to load when read. For example: Assume we are working on a prompt to come up with a business name, domain and slogan

1. Proposal 1: XmlPF

<AI-Prompt version="1.0" language="en">
    <Name>
        "Business Naming AI"
    </Name>
    <Prompt>
        "Generate a business name, domain, and slogan for a tech startup."
    </Prompt>
    <Variants defaultVariant="v2">
        <Variant id="v1" efficacyScore="7.5" lastModified="2023-06-01">
            "Devise a catchy name, web domain, and tagline for a tech enterprise."
        </Variant>
        <Variant id="v2" efficacyScore="8.2" lastModified="2023-06-10">
            "Create a business name, domain URL, and slogan for a technology company."
        </Variant>
        <!-- Additional Variants -->
    </Variants>
    <SytemInstructions>
        <Prompt>
            "Generate a business name, domain, and slogan for a tech startup."
        </Prompt>
        <Variants defaultVariant="v2">
            <Variant id="v1" efficacyScore="7.5" lastModified="2023-06-01">
                "Devise a catchy name, web domain, and tagline for a tech enterprise."
            </Variant>
            <Variant id="v2" efficacyScore="8.2" lastModified="2023-06-10">
                "Create a business name, domain URL, and slogan for a technology company."
            </Variant>
            <!-- Additional Variants -->
        </Variants>
    </SytemInstructions>
    <Metadata>
        <Author>Sire</Author>
        <CreatedOn>2023-05-20</CreatedOn>
        <UseCase>Business Naming AI</UseCase>
        <Tags>
            <Tag>Business</Tag>
            <Tag>Technology</Tag>
            <Tag>Startup</Tag>
            <Tag>Domain</Tag>
            <Tag>Slogan</Tag>
        </Tags>
    </Metadata>
</AI-Prompt>

Pros: Highly structured, suitable for complex data hierarchies. Extensive tooling and support in various languages. Self-descriptive, making it good for data interchange. Cons: Can be quite verbose and cumbersome to read. The syntax is more complex compared to JSON or YAML. Overhead in parsing and data manipulation.

2. Proposal 2: JsonPF

{
  "basePrompt": "Generate a business name, domain, and slogan for a tech startup.",
  "variants": {
    "v1": {
      "prompt": "Devise a catchy name, web domain, and tagline for a tech enterprise.",
      "efficacyScore": 7.5,
      "lastModified": "2023-06-01"
    },
    "v2": {
      "prompt": "Create a business name, domain URL, and slogan for a technology company.",
      "efficacyScore": 8.2,
      "lastModified": "2023-06-10"
    }
  },
  "defaultVariant": "v2",
  "metadata": {
    "author": "Sire",
    "createdOn": "2023-05-20",
    "useCase": "Business Naming AI"
  }
}

Pros: Lightweight and widely used. Easy to parse with built-in support in many languages. Structurally simple and highly interoperable.

Cons: Less human-readable than YAML for complex data. No support for comments. Can be verbose with brackets and quotes. TOML (Tom's Obvious, Minimal Language)

3. Proposal 3: YamlPF

basePrompt: Generate a business name, domain, and slogan for a tech startup.
variants:
  v1:
    prompt: Devise a catchy name, web domain, and tagline for a tech enterprise.
    efficacyScore: 7.5
    lastModified: 2023-06-01
  v2:
    prompt: Create a business name, domain URL, and slogan for a technology company.
    efficacyScore: 8.2
    lastModified: 2023-06-10
defaultVariant: v2
metadata:
  author: Sire
  createdOn: 2023-05-20
  useCase: Business Naming AI

Pros: Highly readable and human-friendly. Supports comments for additional context. Flexible structure, good for hierarchical data. Cons: Can be error-prone due to indentation. Slower parsing compared to JSON. Less widespread support in programming languages compared to JSON. JSON (JavaScript Object Notation)

4. Proposal 4: TextPF

Base Prompt: Generate a business name, domain, and slogan for a tech startup.

Variant 1:
- Prompt: Devise a catchy name, web domain, and tagline for a tech enterprise.
- Efficacy Score: 7.5
- Last Modified: 2023-06-01

Variant 2 (Default):
- Prompt: Create a business name, domain URL, and slogan for a technology company.
- Efficacy Score: 8.2
- Last Modified: 2023-06-10

Metadata:
- Author: Sire
- Created On: 2023-05-20
- Use Case: Business Naming AI

Pros: Maximum simplicity and readability. Universally accessible and editable with any text editor. No need for specialized parsers. Cons: Lacks structured data capabilities. Not suitable for complex or hierarchical data. No standard way to include metadata or annotations.

5. Proposal 5: TomlPF

[basePrompt]
prompt = "Generate a business name, domain, and slogan for a tech startup."

[[variants]]
id = "v1"
prompt = "Devise a catchy name, web domain, and tagline for a tech enterprise."
efficacyScore = 7.5
lastModified = 2023-06-01

[[variants]]
id = "v2"
prompt = "Create a business name, domain URL, and slogan for a technology company."
efficacyScore = 8.2
lastModified = 2023-06-10
default = true

[metadata]
author = "Sire"
createdOn = "2023-05-20"
useCase = "Business Naming AI"

Pros: Verbose but clear and easy to read. Ideal for configuration files with a well-defined structure. Strongly typed, which can reduce errors. Cons: Less known and adopted compared to YAML and JSON. May be overly verbose for simple data. Limited tooling and ecosystem compared to JSON and YAML.

Introducing Multi-Format Prompt Parser (MFPP)

MFPP is a lightweight library that parses prompts from different formats into a JSON object format that can be used by any generative AI model. The library seeks to achieve a few things:

  • A flexible way to store prompts
  • A fast way to load prompts from different formats
  • A way to store metadata about prompts

1. Core Components:

  • Parser Modules: Separate modules for each format - YAML, JSON, TOML, and Plain Text.
  • Common Interface: A unified API interface for interacting with all parsers.
  • Data Models: Define common data models for prompts, variants, and metadata to ensure consistency across formats.

2. Implementation Approach:

  • Language: Python, for its extensive support for these formats and integration with Azure services.
  • Libraries: Utilize pyyaml for YAML, json (built-in) for JSON, toml for TOML, and custom parsing logic for Plain Text.
  • Design: Use a factory design pattern to create parser instances based on the format of the prompt file.

3. Usage Flow

  • Users can input prompts in any supported format.
  • The library parses the input, performs desired operations (like retrieving a specific variant), and can output in any of the supported formats.
  • Integration with cloud and AI tools for seamless deployment and usage in AI-driven applications.

4. Sample usage


from mfpp import MFPP

# Create a parser instance
parser = MFPP.create_parser('yaml')

# Load a prompt file
parser.load_prompt_file('path/to/prompt.yaml')

# Get the base prompt
base_prompt = parser.get_base_prompt()

# Get a specific variant
variant = parser.get_variant('v1')

# Get the default variant
default_variant = parser.get_default_variant()

# Get the metadata
metadata = parser.get_metadata()

# Add a new variant
parser.add_variant('v3', 'This is a new variant')

# Save the prompt file
parser.save_prompt_file('path/to/prompt.yaml')

RoadMap

  • Create a unified API interface for all parsers
  • Create a parser for XML
  • Create a parser for YAML
  • Create a parser for JSON
  • Create a parser for TOML
  • Create a parser for Plain Text
  • Create a factory design pattern for creating parser instances
  • Create a data model for prompts
  • Implement user input validation
  • Implement error handling
  • Implement unit/regression tests
  • Implement CI/CD pipeline
  • Add documentation
  • Publish to PyPi + Github
  • Explore light-weight template syntax to allow for dynamic prompts + link to external data sources(e.g. local files)
  • Explore light-weight template syntax to link to external local text files for large prompts
  • Explore light-weight template syntax to link to external cloud text files for large prompts (e.g. Azure Blob Storage, GCS, S3, Fuse etc.)
  • Add support for more formats
  • Explore speed improvements for large files using Rust
  • Exmplore lightweight web GUI for prompt management
  • Explore REST interface for prompt management and intergration with prompt sale services

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License - No warranties whatsoever.


Closing Thoughts

While YAML could be a close second due to its human readability, JSON's efficiency and ubiquity give it an edge in diverse and scalable AI applications. However, it's worth noting that the ideal choice may vary depending on specific project requirements, data complexity, and personal or team preferences.

I hope that developers and AI engineers find this library useful. I am open to feedback and suggestions on how to improve it.