java8

How does Parallel Stream work internally? Is it always better?

How It Works Internally

  1. Source Splitting: The stream's Spliterator divides the data source into chunks.
  2. ForkJoinPool: Each chunk is submitted as a task to ForkJoinPool.commonPool().
  3. Parallel Execution: Worker threads process chunks concurrently.
  4. Combining Results: Results from all chunks are merged back (e.g., via Collectors).
[1, 2, 3, 4, 5, 6, 7, 8]
        ↓ Spliterator splits
  [1,2,3,4]    [5,6,7,8]
     ↓ split      ↓ split
 [1,2] [3,4]  [5,6] [7,8]
   ↓     ↓      ↓     ↓    ← 4 worker threads
  f(1,2) f(3,4) f(5,6) f(7,8)
     ↓     ↓      ↓     ↓
      combine    combine
          ↓
       final result

The Common Pool Problem

// ALL parallel streams in your app share the same pool! ForkJoinPool.commonPool(); // size = availableProcessors() - 1 // A slow stream blocks others: list1.parallelStream().map(x -> slowIOCall(x)); // ⚠️ Hogs the pool list2.parallelStream().map(x -> fastCompute(x)); // ⚠️ Starved

Workaround — Use a custom pool:

ForkJoinPool customPool = new ForkJoinPool(4); customPool.submit(() -> list.parallelStream().forEach(this::process) ).get();

Is It Always Better? No!

ScenarioSequentialParallelWinner
Small dataset (< 10K)FastOverhead of splitting/mergingSequential
Simple operations (identity map)FastSplitting overhead > gainSequential
I/O-bound (REST/DB calls)BlockingBlocks ForkJoinPool threadsSequential (use async instead)
Large dataset + CPU-heavy opsSlowMassive speedupParallel
LinkedList sourceN/APoor splittabilitySequential
ArrayList / Array sourceN/AExcellent splittabilityParallel

Data Source Splittability

SourceSplittabilityGood for Parallel?
ArrayListExcellent✅ Yes
int[], long[]Excellent✅ Yes
HashSetGood✅ Yes
TreeSetGood✅ Yes
LinkedListPoor❌ No
Stream.iterate()Poor❌ No

Golden Rule

Use parallel streams only when: the dataset is large (100K+), operations are CPU-intensive, the source splits well (arrays/ArrayLists), and operations are stateless & thread-safe.

How does Parallel Stream work internally? Is it always better? | DevExCode