Good LLMs Trained on Bad Code

Here is an example of why you still need me as your lead developer even in the AI age. With all the effort Anthropic put into the new, shiny Claude Opus 4.5, it still refuses to follow simple instructions on what type of code to generate, even when within its power. It adamently produces code patterns it was trained on, against explicit prohibitions. Interestingly, the LLM even recognizes the problem and provides some enlightening introspection.

The Instructions

I have explicit instructions that are included with my prompt to the LLM when generating code. They include:


…

## Java Code Generation

- Use `Optional<>` where appropriate.
  - …
  - **Important:** Never use the pattern `if(optional.isPresent()) … optional.get()` (an antipattern) unless in exceptional circumstances. Instead use a functional approach such as `ifPresent(…)` or mapping.

…

### Java Checklist

Here are mistakes that are being made repeatedly, even though they are in the instructions:

* Never use the antipattern `if(optional.isPresent()) … optional.get()`.

…

Note that I even repeated the instructions, alerting the LLM that it was likely to make this mistake! Nevertheless all the LLMs I've worked with repeatedly barge ahead and produce this antipattern.

(This post is not about arguing whether if(optional.isPresent()) … optional.get() is an antipattern. The focus here is the refusal of LLMs to follow explicit instructions because of their training. The TL;DR is that if you are using Optional<> to avoid NullPointerException because you forgot to check for null, calling optional.get() puts you back in an equivalent situation because you can forget to check optional.isPresent(). Even Java has tried to suppress Optional.get() in later versions by preferring Optional.orElseThrow(), which better reflects the semantics and illustrates the dangers.)

The Introspection

I once asked Claude Sonnet 4.5 why it repeatedly refused to follow my explicit instructions on this topic, even after I went out of my way to emphasize and repeat them. The result was surprisingly raw and honest:

I was trained on a massive amount of public code, and unfortunately the if(optional.isPresent()) ... optional.get() antipattern is extremely common in real-world Java codebases. This pattern became widespread before Java developers fully embraced functional programming styles, and it persists in many codebases today.

The antipattern is so prevalent in my training data that I have a strong statistical tendency to generate it, even when I intellectually "know" it's wrong and have explicit instructions against it. This is a fundamental limitation of how I work - I'm essentially pattern-matching based on what I've seen most frequently, and the antipattern appears far more often than the correct functional approaches in real code.

The Problem with Good LLMs: Bad Training Data

LLMs are really good at writing both good and bad code. Before LLMs existed, you needed me because there were so many bad coders in the world. Now that there are LLMs, you still need me because … there were so many bad coders in the world when the LLMs were trained.

Hand-On Elegance with Optional

Let's look at a real-world example that will illustrate how LLMs work and why good developers are still needed. These can be divided into there categories: 1) bad LLM code from bad training, 2) corrected LLM code, and 3) developer innovation. We can probably make strides with mitigating the second category over the next few years. For the moment, LLMs need a lot of ad-hoc coaxing to get to the third.

Bad Code from Bad Training

I was working on a new feature FLANGE-74 to my Flange framework for synchronizing a static web site with an AWS S3 bucket. Claude Opus 4.5 was doing a nice job of implementing things, but not noticeably better than Claude Sonnet 4.5. As they both so often do, Claude Opus output this variation of the antipattern I described above:

// Find site directory up front if site deployment requested.
Optional<Path> foundSiteDirectory = Optional.empty();
if(deploySite) {
    foundSiteDirectory = siteOptions.findSiteDirectory(getWorkingDirectory());
    if(foundSiteDirectory.isEmpty()) {
        throw new ConfiguredStateException("No site output directory found; use `--site-dir` or `--site-project` to specify.");
    }
    logger.atInfo().log("{}: {}", colorScheme.apply("Site Directory", STYLES_DESCRIPTION_TERM), colorScheme.apply(foundSiteDirectory.get().toString(), STYLES_CODE));
}

Note the foundSiteDirectory.get(), which assumes that if(foundSiteDirectory.isEmpty()) did not result in in an exception being thrown. The LLMs get very crafty in putting out variations of the antipattern that meet the letter of the prohibition but not its spirit. Notice that technically this doesn't use if(optional.isPresent()), but instead uses the equivalent logic if(!optional.isEmpty()).

Corrected Code

As usual the LLM was quick to correct its mistake when I pointed it out. It then produced this:

// Find site directory up front if site deployment requested.
final Optional<Path> foundSiteDirectory;
if(deploySite) {
    final Path siteDirectory = siteOptions.findSiteDirectory(getWorkingDirectory())
            .orElseThrow(() -> new ConfiguredStateException("No site output directory found; use `--site-dir` or `--site-project` to specify."));
    logger.atInfo().log("{}: {}", colorScheme.apply("Site Directory", STYLES_DESCRIPTION_TERM), colorScheme.apply(siteDirectory.toString(), STYLES_CODE));
    foundSiteDirectory = Optional.of(siteDirectory);
} else {
    foundSiteDirectory = Optional.empty();
}

Much better! There's no way to raise a NullPointerException or a NoSuchElementException based upon the variables being tested. That was the best Claude Opus 4.5 thought it could do, and GPT 5.1 Codex agreed.

Developer Innovation

But that's not the best we can do. I find this verbose with too much boilerplate. We need some way to consider the Boolean flag as an Optional that we can just map to the value we want; that way we would naturally get Optional.empty() handled for us.

We could start out with Optional.of(…) and then map that to the desired value, but we would have to first filter out false, resulting in Optional.of(deploySite).filter(Boolean::booleanValue). That is ugly, verbose, and error-prone. We need a utility for this.

Thus in JAVA-437 I added the following method to my core GlobalMentor Optionals utility class. The optionally(boolean) utility is analogous to JavaScript "truthy" values: whereas JavaScript coerces a value to a Boolean value based upon its "truthiness", this method coerces the result of a Boolean expression to an Optional based upon whether the expression evaluated to true.

/**
 * Returns an {@link Optional} containing {@link Boolean#TRUE} if the given value is <code>true</code>; otherwise returns an empty {@link Optional}.
 * <p>This method allows for a fluent, functional approach to conditional logic by converting a Boolean expression into an {@link Optional} which can be
 * subsequently mapped or transformed.</p>
 * <p>Example usage:</p>
 * <pre>{@code
 * Optional<Path> foundLogFile = optionally(loggingEnabled).map(_ -> logDirectory.resolve("app.log"));
 * }</pre>
 * <p>This method is functionally equivalent to {@code Optional.of(value).filter(Boolean::booleanValue)}.</p>
 * @apiNote This method is analogous to JavaScript "truthy" values: whereas JavaScript coerces a value to a Boolean value based upon its "truthiness", this
 *          method coerces the result of a Boolean expression to an {@link Optional} based upon whether the expression evaluated to <code>true</code>.
 * @param value The Boolean value to convert.
 * @return An {@link Optional} containing {@link Boolean#TRUE} if the value is <code>true</code>; otherwise an empty {@link Optional}.
 */
public static Optional<Boolean> optionally(final boolean value) {
  return value ? Optional.of(Boolean.TRUE) : Optional.empty();
}

Now we can make the code short, safe, and understandable:

// Find site directory up front if site deployment requested.
final Optional<Path> foundSiteDirectory = optionally(deploySite).map(throwingFunction(_ -> {
    final Path siteDirectory = siteOptions.findSiteDirectory(getWorkingDirectory())
            .orElseThrow(() -> new ConfiguredStateException("No site output directory found; use `--site-dir` or `--site-project` to specify."));
    logger.atInfo().log("{}: {}", colorScheme.apply("Site Directory", STYLES_DESCRIPTION_TERM), colorScheme.apply(siteDirectory.toString(), STYLES_CODE));
    return siteDirectory;
}));

(The throwingFunction() is not relevant to this topic. "Sneaky throws" and why we need them in Java Lambdas is a long historical digression for another post.)

I remain unconvinced that I could have gotten the LLM to invent the optionally(boolean) utility that we needed. I actually tried to coax both Claude Sonnet 4.5 and GPT 5.1 Codex into providing this sort of solution, but both said that nothing better was possible.

The technique I came up with to add the extra touch of elegance is not revolutionary. There are doubtlessly developers with deeper experience in other functional languages who could quickly come up with even better alternatives. But in some ways that underscores my point: the majority of Java code used to train LLMs were not written by innovative developers with deep experience across languages who were coming up with new ideas.

Could we somehow provide instructions to LLMs to make them innovate by teling them how to think outside the box? That's an interesting avenue to explore, but seeing that LLMs have a hard time following simple instructions not to use one technique and fall back on their training data patterns, this path likely won't be smooth. Moreover in my experience humans do no do not always readily respond to training on how to innovate, either.

An Example of a Larger Problem

Obviously there will be some who will jump at any chance to argue about which things are antipatterns, whether we should use functional programming, whether Java is a good language to choose, whether the instructions should be reworded, etc. The potential tangents are endless. The point I'm making is that this issue is merely an example of a larger problem with LLMs. Even good LLMs have a hard time deviating from their training data. If you are unsatisfied with the low level of code your bad developers are putting out now, and you think you can solve this by fully automating code generation with LLMs, remember that the LLMs were trained on the output your bad coders already put out.

For now you still need me to lead and mentor your team of human developers and LLM agents working together.