Don’t use Stream filter().map(); Use mapMulti() Instead

Let’s discover if code using filter() and map() could be inefficient for certain use cases and why mapMulti() could be a better alternative.

What is mapMulti()?

Javadoc is Available HERE. It’s always great to read the official Javadoc.It contains useful stuff.

If you prefer a more human explanation instead, here it is:

1) It’s a one-to-many intermediate operation. Every element can be transformed into 0 or more elements. This means it can be used to filter out elements and transform them. That’s why we can choose it instead of using filter() and map() chained together.

2) It differs from the flatMap() method because it doesn’t require a Stream to be returned. That’s why it’s more efficient than flatMap() when we also need to filter elements.

Now we will try to provide alternative solutions to a problem using mapMulti(). Then we will compare the performance of the solutions using JMH.

Problem Definition

Given a list of Strings representing numbers, 
extract only the even integers from the list.

Some Examples:

["1", "error", "42", "3", "banana", "4"] -> [42, 4]
["1", "error", "3", "banana"] -> []

Solution 1: filter() and map()

The standard code everyone would write might look like this using filter():

var evenNumbers = lines.stream()
        .filter(line -> {
            try {
                return Integer.parseInt(line) % 2 == 0;
            } catch (NumberFormatException e) {
                return false;
            }
        })
        .map(Integer::parseInt) 
        .toList();

The first filter() parses the string to check if it’s even. Then map() parses the same string again to convert it into an integer. This results in redundant work. We are calling the parseInt() method twice for each valid element.

Solution 2: flatMap()

The flatMap() approach avoids redundant parsing but introduces another inefficiency:

var evenNumbers = lines.stream()
        .flatMap(line -> {
            try {
                if(Integer.parseInt(line) % 2 == 0) {
                    return Stream.of(Integer.parseInt(line));
                }
                return Stream.empty();
            } catch (NumberFormatException e) {
                return Stream.empty();
            }
        })
        .toList();

While it solves the double parsing issue, it creates a new stream for each line, even for invalid entries. This adds overhead due to the repeated allocation of stream objects and additional iterations over the data.

Solution 3: mapMulti()

With mapMulti() we can solve both problems by avoiding double parsing and useless stream creations:

var evenNumbers = lines.stream()
        .<Integer>mapMulti((line, consumer) -> {
            try {
                int number = Integer.parseInt(line);
                if (number % 2 == 0) {
                    consumer.accept(number);
                }
            } catch (NumberFormatException ignored) { }
        })
        .toList();

This code is pretty nasty, but it’s the most efficient solution. It avoids creating a new stream for each line and eliminates the need for double parsing.

Performance Comparison:

We will use JMH to compare the performance of the three solutions.

@BenchmarkMode(Mode.AverageTime)
@Warmup(iterations = 3)
@Measurement(iterations = 5)
@Fork(1)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Thread)
public class ProblemBenchmark {

    private List<String> lines;

    @Setup
    public void prepare() {
        Random random = new Random();
        lines = Stream.generate(() -> {
            if (random.nextInt(10) == 0) {
                return "error";
            } else {
                return String.valueOf(random.nextInt(1000));
            }
        }).limit(10000).toList();
    }

    @Benchmark
    public List<Integer> filterMap() {
        return lines.stream()
                .filter(line -> {
                    try {
                        return Integer.parseInt(line) % 2 == 0;
                    } catch (NumberFormatException e) {
                        return false;
                    }
                })
                .map(Integer::parseInt)
                .toList();
    }

    @Benchmark
    public List<Integer> flatMap() {
        return lines.stream()
                .flatMap(line -> {
                    try {
                        int number = Integer.parseInt(line);
                        if (number % 2 == 0) {
                            return Stream.of(number);
                        }
                        return Stream.empty();
                    } catch (NumberFormatException e) {
                        return Stream.empty();
                    }
                })
                .toList();
    }

    @Benchmark
    public List<Integer> mapMulti() {
        return lines.stream()
                .<Integer>mapMulti((line, consumer) -> {
                    try {
                        int number = Integer.parseInt(line);
                        if (number % 2 == 0) {
                            consumer.accept(number);
                        }
                    } catch (NumberFormatException ignored) { }
                })
                .toList();
    }

}

As you might imagine, the results show that mapMulti() is faster than filter().map() and flatMap() other than being less error-prone.

Benchmark	Mode	Cnt	Score	Error	Units
mapMulti	avgt	5	0,690	± 0,017	ms/op
flatMap	avgt	5	0,722	± 0,018	ms/op
filterMap	avgt	5	0,758	± 0,037	ms/op

Filter() + Map() Inherent (but small) Inefficiency

When you use .filter().map(), the operations are applied lazily as part of a single-stream pipeline. The stream itself doesn’t create a new stream but chains operations. However, each operation adds a layer of processing that could introduce overhead.

Each .filter() and .map() call adds a separate step in the pipeline.
Every element flows through all intermediate operations, adding a level of indirection.
Each step adds a function call overhead, even if the function is simple.

Even if with JIT optimizations, the overhead is minimal, it’s still there.

If we try to run a benchmark to compare the performance of filter().map() vs. mapMulti(), without any extra logic performing the same operations, we can see that mapMulti() is faster.

Benchmark Summary:

filterMap: Average Time = 143.193 microseconds ± 2.278
mapMulti: Average Time = 129.308 microseconds ± 0.796

In this case, mapMulti() is faster than filter().map() by about 10%. On a huge dataset, the performance difference could be significant. Also, the error margin is smaller for mapMulti(). This is because mapMulti() avoids the overhead of chaining multiple steps and reduces the number of functional calls.

Conclusion

While filter()+map() and flatMap() are useful, mapMulti() offers a more efficient and streamlined way to filter and transform data in Java Streams.

Next time you’re working with transformations where filtering and mapping overlap, consider using mapMulti() for cleaner and more efficient code. It reduces redundant operations, avoids extra stream creation, and simplifies the transformation logic.

When to use filter() then?

1) when you just need to filter elements without transformation or additional logic

2) when the stream dataset is small (the vast majority of standard use cases)

3) when you want to ensure the code is more readable and maintainable. mapMulti code is nasty.

4) when you don’t care about the performance difference (who does?)

5) when you don’t want to use a new method that you don’t know yet

Thanks to José Paumard for this great lesson where he explains the benefits of mapMulti() as a prerequisite to understand the new streams Gatherer API method that will be released in JDK 24.

Bonus: Alternative (and faster) Solutions

In this code, we have considered just filter() and mapMulti(), but there are other ways to solve this problem. This was not the scope of the article, but here are some examples:

Regex:

var evenNumbers = lines.stream()
    .filter(line -> line.matches("\\d+") && Integer.parseInt(line) % 2 == 0)
    .map(Integer::parseInt)
    .toList();

Plain old Basic For Loop:

var result = new java.util.ArrayList<Integer>();
for(String line: lines) {
    try {
        int number = Integer.parseInt(line);
        if (number % 2 == 0) {
            result.add(number);
        }
    } catch (NumberFormatException ignored) { }
}
return result;

For the sake of completeness, I have also benchmarked these solutions. Here are the results:

Benchmark	Mode	Cnt	Score	Error	Units
regex	avgt	5	0.513	± 0.109	ms/op
forLoop	avgt	5	0.610	± 0.015	ms/op

Bonus 2: how things change with parallel streams

With parallel streams, the overhead of filter().map() is reduced, and the performance difference is way smaller than before. When we have multiple cores, the overhead of chaining operations is negligible.

Nice!

Benchmark	Mode	Cnt	Score	Error	Units
mapMultiParallel	avgt	5	0.132	± 0.235	ms/op
filterMapParallel	avgt	5	0.138	± 0.004	ms/op
flatMapParallel	avgt	5	0.139	± 0.002	ms/op
regexSolutionParallel	avgt	5	0.151	± 0.008	ms/op