String and Character Array Alternative in Java

A String in Java is a quasi-primitive type made for passing around sequences of Unicode characters such as are found in spoken languages, and the class does a good job at this. But if you want to indicate certain characters, without regard to order—such as indicating the characters allowed in a username, for instance? Most programmers will use a String for this task as well, while others will resort to an arrays of char. Both approaches have their drawbacks, and I've come up with a solution superior to both, an approach the usefulness of which even I have been surprised.

Let's say you want to trim characters from the end of a string, and would like to specify which characters should be candidates for trimming. Most programmers would create a method with a signature something like this:

public static trimEnd(String string, String characters)
{
    int i=string.length()-1;
    while(i>=0)
    {
        if(characters.indexOf(string.charAt(i))<0)
        {
            break;
        }
    }
    return string.substring(0, i+1);
}

There are ways to make this method even more efficient, but it's sufficient to point out that the approach of using a String to represent a set of characters has several drawbacks:

A naive alternative might be to switch to an admittedly more semantically appropriate approach:

public static trimEnd(String string, char[] characters);

This second approach alleviates all but the first drawback of the first approach, but it has one even greater drawback that more than negates any benefits: arrays in Java are not immutable! Where do we store the characters we want to trim? Good programming practice calls for defining the characters in some reusable place rather then hard-coding them inline. We would therefore use something like this in some definitions class:

public static final char[] WHITESPACE_CHARS=new char[]{' ', '\t', 'r', 'n'};

While final prevents the variable itself from being modified, any renegade piece of code could at any time modify the definition of whitespace, using a simple WHITESPACE_CHARS[0]='X';. This is extremely dangerous; in programming, create code to trust no one, not even the original programmer.

So what do we do? There is no way to make an array immutable in Java, but we can make a thin and smart wrapper class around a character array that guards all access to the array. When we do this, we find that we gain a multitude of other, unexpected benefits beyond String and char[]. This is because we are using a class built for the specific purpose, rather than trying to ride on the coattails of classes meant for other applications.

Let's call this new class Characters. Under the covers it will have a an array of characters, but access to that array will be strictly controlled. We'll make the class final (analogous to String and other quasi-primitive classes) to prevent subclasses from subterfuge

public final class Characters
{
    private final char[] chars;
    private final int minChar;
    private final int maxChar;

    public Characters(char... characters)
    {
        …

 Now that all designation of the characters passes through the constructor, we can do any preprocessing we want, such as:

With all this pre-knowledge of the characters we have, a method such as Characters.contains() can be very efficient. Rather than walking the entire set using String.indexOf(char), for example, we can first check to see if the given character falls outside the known bounds. If so, we don't even need to check any characters:

public boolean contains(char character)
{
    if(character < minChar || character > maxChar)
    {
        return false;
    }
    for(final char c : chars)
    {
        if(c == character)
        {
            return true;
        }
        else if(c > character)
        {
            return false;
        }
    }
    return false;
}

Notice also that, because we sorted the characters in our array in the constructor, if we find a character greater than our given character, we know that there can be no later characters that match our character, preventing the need to check the rest of the array.

With only these simple improvements, the Characters class already provides immense value, in readability, semantics, and even algorithm efficiency:

public static trimEnd(String string, Characters characters)
{
    int i=string.length()-1;
    while(i>=0)
    {
        if(!characters.contains(string.charAt(i)))
        {
            break;
        }
    }
    return string.substring(0, i+1);
}

The method signature will no longer clash with a trimEnd(String, String), which as you would expect, will trim the string at the end of the input string, not the individual characters.

We can now safely store our whitespace characters in the global definition without fear of it being modified:

public static final Characters WHITESPACE_CHARACTERS=new Characters(' ', '\t', 'r', 'n');

We can add all sorts of builder methods to Characters to assist in defining sets of characters. For instance, a Characters.add(char...) method would produce a new instance of Characters containing the additional supplied characters. The following example creates one set of characters to represent control characters, and then creates another instance containing all control characters as well as the space character:

public static final Characters CONTROL_CHARACTERS=new Characters('\t', 'r', 'n')
public static final Characters WHITESPACE_CHARACTERS=CONTROL_CHARACTERS.add(' ');

After creating the initial implementation of the Characters class, I started integrating it into my String manipulation functions and parsing routines. While I knew it was an implementation fit for its purpose, even I have been surprised at its usefulness, and the extent to which it makes code more elegant and more efficient. If you need to pass around a "set of characters" as opposed to a sequence of letters, try the Characters class. The latest version of Characters is freely downloadable via Subversion, and is distributed under the open-source Apache Licence, Version 2.0. You'll probably want to grab the entire Maven-buildable globalmentor-core library source code while you're there.