AI Writing Style - Prompt Experiment

August 16, 2023
Michael Taylor

“It’s dull and uncreative” - this is the usual criticism when I tell people AI is great at writing content for SEO. And they’re right. If you don’t know Prompt Engineering, the chances are ChatGPT is just regurgitating text approximating the average of the internet, with all of the naughty bits RLHF’d out (Reinforcement Learning From Human Feedback: a method of fine-tuning OpenAI uses to make ChatGPT behave itself).

The default writing style you get back from ChatGPT isn’t all it’s capable of. It can write in Shakespearean prose just as easily as it can simulate a Donald Trump speech or generate a poem in Chaucer’s Middle English. The model can replicate any writing style you desire, you just have to know how to ask. With well known writing styles that’s easy, because the model saw plenty of writing samples and endless commentary in its training data.

However when you start writing content for work with AI, chances are Chaucer won’t cut it for your B2B SaaS blog. More likely you want to replicate your own writing style, or match a style guide or selection of samples from the publication you’re writing for. If you weren’t a famous writer before 2021 when the training data cut off for ChatGPT, the chances are it won’t know how to replicate your style. However, because it’s seen every possible writing style on the internet, it can do a great job of approximating your desired writing style given a few samples. The method I’ve written about in the past is ‘unbundling’ where you extract a list of patterns, or memes, of a style you’re trying to replicate, in order to distill its essence into the prompt.

https://www.saxifrage.xyz/post/ai-writer 

This method works unreasonably well, but I think it’s still the weakest part of my content writing pipeline, and I have heard from other people who struggle with it. I recently wrote about Prompt Optimization, the process of running two competing prompts head to head multiple times to see on average how they perform relative to each other. That experiment – adding writing samples to the prompt as well as instructions – actually failed to improve performance. When I measured the response from the two prompts 10 times each, they had the same average embedding distance (lower is more similar, higher is less similar) to the reference text (a sample I manually rewrote for the experiment).

https://www.saxifrage.xyz/post/prompt-optimization

I hate losing, and this prompt is super important to me, so I decided to dedicate another day to the task of seeing if I could beat my control prompt. In the end I came up with 5 new test variations, all using different approaches. The experiment was run the same was as in the Prompt Optimization approach, and embedding distance was still the metric I’m using to measure the similarity between the response and the reference text. To make it a fairer test, I added two more test cases, meaning I generated some AI content for two more topics (Memetics and the SkyScraper Technique from SEO, to add to the Value Based Pricing case from my past post), then manually rewrote them for the reference text. You can see all the prompts and test cases in the GitHub repository. Each of the prompts ran 10 times for each test case, so that's 6 x 10 x 3 = 180 test runs (cost me about $24 with some mistakes).

Here were the results:

Here are some interesting findings:

  • Only two of the prompts I tested improved on my control, B and G (lower is better)
  • One of the prompts performed significantly worse than the control, C from PromptPerfect
  • Some test cases were easier than others, with the original case being the hardest

Here’s an explanation of the prompts tested:

  • A: Control - the standard prompt I always use
  • B: 1-Shot Writing Sample - I provided one sample of text, and asked GPT-4 to describe the writing style
  • C: PromptPerfect ChatGPT - I tested out their prompt optimization service to see if it worked, optimizing for the ChatGPT model
  • D: PromptPerfect GPT-4 - I realized you could invest more credits in a heavily optimized prompt for GPT-4, so I maxed everything out to see if it made a difference
  • E: 3-Shot Rewriting Example - I gave 3 samples of the input text to GPT-4 and the rewritten version, and asked it to describe the writing style 
  • F: 3-Shot Writing Sample - same as above, except without the input text, only the final samples of my writing

What I find super interesting in this experiment is that, at least in my findings, prompt optimization tools like PromptPerfect don’t seem to help. I’m not sure what they do to optimize the prompts (a custom model? Ask GPT-4 to rewrite it?) but the initial rewrite performed worse than the control, and even the maxed out extra credits version didn’t perform significantly better. The prompts certainly looked more… prompty? But I wonder if they’re truly optimizing for performance or just “make this prompt look cooler”.

The winning prompt B, 1-Shot Writing Sample, performing almost 20% better, was also one of the simplest. This is great, because it saves us tokens, and we don’t have to collect a bunch of writing samples to get excellent performance. All I did was provide 1 writing sample (1-Shot) (the value-based pricing reference text), and ask ChatGPT to `describe this writing style without mentioning value-based pricing or repeating any of the content of the text`. 

I took the result and replaced the previous bullet points I had in my old prompt (i.e. I replaced everything other than the instructions at the top). Note: in none of the variations didn’t I include any samples of text, only the instructions gleaned from the text (otherwise it would have biased the test). The end result was as follows:

```You will be provided with the sample text.
Your task is to rewrite the text into a different writing style.
The writing style can be described as follows:
1. **Informative and Analytical**: The writer presents detailed information about different strategies, especially the main theme of the text, and breaks down its benefits, challenges, and implementation steps. This depth of information shows that the writer has a solid grasp of the topic.
2. **Structured and Organized**: The writing follows a logical flow, starting with a brief overview of different approaches, delving into a deep dive on the topic, and concluding with potential challenges and contexts where it might be best applied.
3. **Conversational Tone with Professionalism**: While the information is presented in a professional manner, the writer uses a conversational tone ("Here’s how to implement..."), which makes it more relatable and easier for readers to understand.
4. **Practical and Actionable**: The writer not only explains the concept but also offers actionable advice ("Here’s how to implement X") with step-by-step guidance based on real world experience.
5. **Balanced Perspective**: The writer doesn’t just present the benefits of the topic but also discusses its challenges, which gives a well-rounded perspective to readers.
6. **Examples and Analogies**: To make concepts clearer, the writer uses concrete examples (e.g., how much a company might save per month) and analogies (e.g., making comparisons to popular frames of reference). This helps readers relate to the concepts and understand them better.
7. **Direct and Clear**: The writer uses straightforward language without excessive jargon. Concepts are broken down into digestible bits, making it accessible for a broad audience, even if they're not well-versed in business strategies.
In essence, this writing style is a blend of professional analysis with practical, actionable advice, written in a clear and conversational tone.```

What can you learn from this? Well that it pays to do proper A/B testing of your prompts, because you really can’t predict what will work. In my previous test I found that adding writing samples didn’t improve the performance of my instructions, and in this test I found prompt optimization tools don’t seem to work (in this case at least - they may work for other use cases). The final prompt that worked was also the simplest, which is a great result in particular if you’re building a tool around this: that’s a lot of tokens saved, and it makes for a simpler user experience if you only need one writing sample too! Test your prompts people!

September 16, 2020

More to read