Use (AI) agents or get left behind? A response

2026-02-09

A couple of weeks ago I read a post by Tim Dettmers outlining how he has tried to use AI to automate various tasks, with some examples of successes and failures. While his attempt to automate email writing with AI was unsuccessful, he did mention that he uses AI for writing blog posts. I’m not sure that writing one or two blog posts with AI is really a successful use-case, but it did get me to try and implement something similar.

For context, I have a lot of projects that I’m working on, some of them related to AI in some capacity. These are projects in psychology, low-resource languages and datasets, and tools for authors, not to mention my teaching and other duties. I am finding that I would like to be more productive in each of these areas, but there are obvious time constraints in terms of the number of hours in a day that I can spend on a given task. This has meant that I have tried to streamline my workflow and course prep, as well as focusing my efforts on completing certain tasks in blocks of time.

Blogging is something that I like to do, as it helps me think through things and stay mentally engaged. I also find that it doesn’t take a lot of time to write down thoughts, and I can generally write a single long post in an hour or two. Having said that, blogging has taken a back seat to other considerations lately, which is why there have been some long stretches between blog posts. This brings me to a query that I tried to answer, inspired by Tim’s post.

Query

Can I code a blog generator in an evening that could produce a blog post I’ve been meaning to write for a year, and possibly additional posts in the future?

The blog post

The blog post in question is a follow-up to a post that I wrote a year ago about an experiment with LLMs that I conducted for an author friend. The “Part 2” was intended as a discussion of the challenges in using LLMs to automate another aspect of language viewed through a psychological lens: implicit motives. I haven’t gotten around to writing it in part because I haven’t had the time, but I have some ideas of what shape it should take, so I thought this could be a good test of an agentic blog generation tool.

The attempt

There’s a companion post to this one that details the steps I took and the code that I used, but here I just want to highlight some key points of this exercise.

  1. I have various data to work with in order to guide the LLM toward generation of the kind of writing I want: the “Part 1” blog post, an outline (bullet points), and some research papers related to the topic of automating implicit motive coding.
  2. I have access to a 24gb gpu and a long-context model (Phi-3-mini-128k-instruct) that allows me to feed a lot of information to the model, to use a long prompt, and to generate a decent-length post.
  3. My goal is to generate a decently long post that summarizes some existing research on implicit motives and highlights the role of LLMs in supporting the automatic identification of implicit motive imagery in text. This should be a post that requires minimal editing - I could write something appropriate in about 2 hours, but if I can code a tool in 2 hours to do something similar, and the process can be re-used for other posts, then I would consider that time well-spent.

The result

After writing some code to ingest the data, and a prompt to have the LLM process it for generation, I ended up with some output. I literally only spent 2 hours on the whole process - one hour to write the code, another hour to play around with prompts to modify the output. The result was not terrible, but it was a bit flat, and it was a bit shorter than I wanted, despite playing around with min-/max-token settings in generation. The result can be seen below - see the companion post for details on how it was generated.

Title: Some notes on LLMs in real-world contexts (Part 2)

Summary of previous post: In the first part of this series, we explored the use of large language models (LLMs) like GPT-3 and BERT for text-based applications, such as generating descriptions from keywords and classifying implicit motives in text. We discussed the process of training LLMs for these tasks, including dataset selection, model fine-tuning, and performance evaluation. We also examined the challenges and benefits of using LLMs for these applications, as well as the success of recent models in comparison to previous ones.

Description of using large language models to automate Implicit Motive Classification: In this section, we delve into the use of LLMs for classifying implicit motives in text, a task that has been traditionally performed by human experts. We discuss the process of training LLMs to classify implicit motives, including the selection of appropriate models, fine-tuning on relevant datasets, and the evaluation of model performance. We also highlight the recent success of our implicit motives classifier, which outperforms previous models on various metrics.

Challenges of this approach: Despite the recent success of LLMs in classifying implicit motives, there are still significant challenges in using them for this task. One of the main challenges is the brittleness of LLMs, as they struggle to generalize to unseen data and may produce erroneous classifications. Additionally, LLMs may not capture the nuanced understanding of implicit motives that human experts possess, as they are limited to the information present in the text.

Benefits of this approach: The use of LLMs for classifying implicit motives offers several benefits over traditional human-based approaches. Firstly, LLMs can process large amounts of text data quickly and accurately, making them well-suited for classifying implicit motives on a large scale. Secondly, LLMs can capture subtle semantic representations of text, allowing for the extraction of nuanced cues that correlate with implicit motives. This can lead to more accurate and reliable classifications.

Success of current model vs previous models: Our current implicit motives classifier, implemented using the Electra-based model, has demonstrated superior performance compared to previous models. It has achieved high levels of accuracy and generalization, outperforming previous models on various metrics. This success can be attributed to the advancements in LLM technology and the availability of large-scale training data.

Takeaways

This attempt at automation (i.e. using AI ‘agents’) was somewhat underwhelming. For my use-case, I don’t think it was particularly helpful. I would rather spend an hour or two writing a complete blog post than editing one produced by this tool. However, there are some critiques that could be made here, and which could potentially change the situation, though not without a bit more effort/expense than I think I’m willing to put in.

  1. Could I use a better model?
    The model I’m using is only a 3.8 billion parameter model. It runs relatively quickly and gives decent output, but I could probably get a better result from a frontier model. Am I willing to run a large model on expensive hardware, or wait an hour for a result? Not really.

  2. Could I write a better prompt?
    Sure. But am I willing to spend a lot of time finessing it, just to have the output change when the input does? Maybe not.

Having said that, there are some promising aspects of this exercise. I could imagine thinking through a post, writing bullet points in a note, and running it through this tool. Maybe I would use several prompts with different emphases to generate a few different versions, then read through them, and possibly choose one to edit. This is not much different from what I do already, though all the ‘versioning’ is done in my head while I let the notes sit. The challenge with an ‘agent-assisted’ process would be ensuring that the tool itself is not biasing my creative output in a generic direction, but it might make for slightly faster iteration.

Another observation is that most of the “agentic” AI workflows I’m aware of are using voice tools or dictation to query/instruct an AI model. This is more like having a conversation with someone/thing than what I coded for my tool. I have thought about using dictation for my blog posts, since I generally know what I want to say and I just need to get it down. It’s usually faster to speak than to type, so this could be where the speedup in productivity comes from for most people using an agentic AI workflow. This is actually what we’ve found to be the case for fiction authors, which is why Bookscribe.ai exists.

The challenge for me, then, is not really in automating a particular task, but in getting comfortable with a different input method (voice) for interacting with computers. This does seem to be a skill that will become increasingly important as language models and other automated systems become more capable. For now, I don’t think the AI tools are really worth using, but I intend to keep experimenting with individual use-cases, and we’ll see how it goes.