Class DistributedCacheStream<Original,R>

java.lang.Object
org.infinispan.stream.impl.AbstractCacheStream<Original,R,Stream<R>,CacheStream<R>>
org.infinispan.stream.impl.DistributedCacheStream<Original,R>
Type Parameters:
Original - the original type of the underlying stream - normally CacheEntry or Object
R - The type of the stream
All Implemented Interfaces:
AutoCloseable, BaseStream<R,Stream<R>>, Stream<R>, BaseCacheStream<R,Stream<R>>, CacheStream<R>

public class DistributedCacheStream<Original,R> extends AbstractCacheStream<Original,R,Stream<R>,CacheStream<R>> implements CacheStream<R>
Implementation of CacheStream that provides support for lazily distributing stream methods to appropriate nodes
  • Constructor Details

    • DistributedCacheStream

      public DistributedCacheStream(Address localAddress, boolean parallel, InvocationContext ctx, long explicitFlags, int distributedBatchSize, Executor executor, ComponentRegistry registry, Function<? super Original,?> toKeyFunction, ClusterPublisherManager<?,?> clusterPublisherManager)
      Standard constructor requiring all pertinent information to properly utilize a distributed cache stream
      Parameters:
      localAddress - the local address for this node
      parallel - whether or not this stream is parallel
      ctx - the invocation context when this stream is created
      explicitFlags - whether or not a cache loader should be utilized for these operations
      distributedBatchSize - default size of distributed batches
      executor - executor to be used for certain operations that require async processing (ie. iterator)
      registry - component registry to wire objects with
      toKeyFunction - function that can be applied to an object in the stream to convert it to a key or null if it is a key already. This variable is used to tell also if the underlying stream contains entries or not by this value being non null
      clusterPublisherManager - publisher manager
    • DistributedCacheStream

      protected DistributedCacheStream(AbstractCacheStream other)
      This constructor is to be used only when a user calls a map or flat map method changing back to a regular Stream from an IntStream, DoubleStream etc.
      Parameters:
      other - other instance of AbstractCacheStream to copy details from
  • Method Details

    • getLog

      protected org.infinispan.util.logging.Log getLog()
      Specified by:
      getLog in class AbstractCacheStream<Original,R,Stream<R>,CacheStream<R>>
    • unwrap

      protected CacheStream<R> unwrap()
      Specified by:
      unwrap in class AbstractCacheStream<Original,R,Stream<R>,CacheStream<R>>
    • filter

      public CacheStream<R> filter(Predicate<? super R> predicate)
      Description copied from interface: CacheStream
      Specified by:
      filter in interface CacheStream<Original>
      Specified by:
      filter in interface Stream<Original>
      Returns:
      the new cache stream
    • map

      public <R1> CacheStream<R1> map(Function<? super R,? extends R1> mapper)
      Description copied from interface: CacheStream

      Just like in the cache, null values are not supported.

      Specified by:
      map in interface CacheStream<Original>
      Specified by:
      map in interface Stream<Original>
      Returns:
      the new cache stream
    • mapToInt

      public IntCacheStream mapToInt(ToIntFunction<? super R> mapper)
      Description copied from interface: CacheStream
      Specified by:
      mapToInt in interface CacheStream<Original>
      Specified by:
      mapToInt in interface Stream<Original>
      Parameters:
      mapper - a non-interfering, stateless function to apply to each element
      Returns:
      the new int cache stream
    • mapToLong

      public LongCacheStream mapToLong(ToLongFunction<? super R> mapper)
      Description copied from interface: CacheStream
      Specified by:
      mapToLong in interface CacheStream<Original>
      Specified by:
      mapToLong in interface Stream<Original>
      Parameters:
      mapper - a non-interfering, stateless function to apply to each element
      Returns:
      the new long cache stream
    • mapToDouble

      public DoubleCacheStream mapToDouble(ToDoubleFunction<? super R> mapper)
      Description copied from interface: CacheStream
      Specified by:
      mapToDouble in interface CacheStream<Original>
      Specified by:
      mapToDouble in interface Stream<Original>
      Parameters:
      mapper - a non-interfering, stateless function to apply to each element
      Returns:
      the new double cache stream
    • flatMap

      public <R1> CacheStream<R1> flatMap(Function<? super R,? extends Stream<? extends R1>> mapper)
      Description copied from interface: CacheStream
      Specified by:
      flatMap in interface CacheStream<Original>
      Specified by:
      flatMap in interface Stream<Original>
      Returns:
      the new cache stream
    • flatMapToInt

      public IntCacheStream flatMapToInt(Function<? super R,? extends IntStream> mapper)
      Description copied from interface: CacheStream
      Specified by:
      flatMapToInt in interface CacheStream<Original>
      Specified by:
      flatMapToInt in interface Stream<Original>
      Returns:
      the new cache stream
    • flatMapToLong

      public LongCacheStream flatMapToLong(Function<? super R,? extends LongStream> mapper)
      Description copied from interface: CacheStream
      Specified by:
      flatMapToLong in interface CacheStream<Original>
      Specified by:
      flatMapToLong in interface Stream<Original>
      Returns:
      the new cache stream
    • flatMapToDouble

      public DoubleCacheStream flatMapToDouble(Function<? super R,? extends DoubleStream> mapper)
      Description copied from interface: CacheStream
      Specified by:
      flatMapToDouble in interface CacheStream<Original>
      Specified by:
      flatMapToDouble in interface Stream<Original>
      Returns:
      the new cache stream
    • distinct

      public CacheStream<R> distinct()
      Description copied from interface: CacheStream

      This operation will be invoked both remotely and locally when used with a distributed cache backing this stream. This operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. This is described in more detail in the CacheStream documentation

      This intermediate iterator operation will be performed locally and remotely requiring possibly a subset of all elements to be in memory

      Any subsequent intermediate operations and the terminal operation are then performed locally.

      Specified by:
      distinct in interface CacheStream<Original>
      Specified by:
      distinct in interface Stream<Original>
      Returns:
      the new stream
    • sorted

      public CacheStream<R> sorted()
      Description copied from interface: CacheStream

      This operation is performed entirely on the local node irrespective of the backing cache. This operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. Beware this means it will require having all entries of this cache into memory at one time. This is described in more detail at CacheStream

      Any subsequent intermediate operations and the terminal operation are also performed locally.

      Specified by:
      sorted in interface CacheStream<Original>
      Specified by:
      sorted in interface Stream<Original>
      Returns:
      the new stream
    • sorted

      public CacheStream<R> sorted(Comparator<? super R> comparator)
      Description copied from interface: CacheStream

      This operation is performed entirely on the local node irrespective of the backing cache. This operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. Beware this means it will require having all entries of this cache into memory at one time. This is described in more detail at CacheStream

      Any subsequent intermediate operations and the terminal operation are then performed locally.

      Specified by:
      sorted in interface CacheStream<Original>
      Specified by:
      sorted in interface Stream<Original>
      Parameters:
      comparator - the comparator to be used for sorting the elements
      Returns:
      the new stream
    • peek

      public CacheStream<R> peek(Consumer<? super R> action)
      Description copied from interface: CacheStream
      Specified by:
      peek in interface CacheStream<Original>
      Specified by:
      peek in interface Stream<Original>
      Parameters:
      action - the action to perform on the stream
      Returns:
      the new stream
    • limit

      public CacheStream<R> limit(long maxSize)
      Description copied from interface: CacheStream

      This intermediate operation will be performed both remotely and locally to reduce how many elements are sent back from each node. More specifically this operation is applied remotely on each node to only return up to the maxSize value and then the aggregated results are limited once again on the local node.

      This operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. This is described in more detail in the CacheStream documentation

      Any subsequent intermediate operations and the terminal operation are then performed locally.

      Specified by:
      limit in interface CacheStream<Original>
      Specified by:
      limit in interface Stream<Original>
      Parameters:
      maxSize - how many elements to limit this stream to.
      Returns:
      the new stream
    • skip

      public CacheStream<R> skip(long n)
      Description copied from interface: CacheStream

      This operation is performed entirely on the local node irrespective of the backing cache. This operation will act as an intermediate iterator operation requiring data be brought locally for proper behavior. This is described in more detail in the CacheStream documentation

      Depending on the terminal operator this may or may not require all entries or a subset after skip is applied to be in memory all at once.

      Any subsequent intermediate operations and the terminal operation are then performed locally.

      Specified by:
      skip in interface CacheStream<Original>
      Specified by:
      skip in interface Stream<Original>
      Parameters:
      n - how many elements to skip from this stream
      Returns:
      the new stream
    • reduce

      public R reduce(R identity, BinaryOperator<R> accumulator)
      Specified by:
      reduce in interface Stream<Original>
    • reduce

      public Optional<R> reduce(BinaryOperator<R> accumulator)
      Specified by:
      reduce in interface Stream<Original>
    • reduce

      public <U> U reduce(U identity, BiFunction<U,? super R,U> accumulator, BinaryOperator<U> combiner)
      Specified by:
      reduce in interface Stream<Original>
    • collect

      public <R1> R1 collect(Supplier<R1> supplier, BiConsumer<R1,? super R> accumulator, BiConsumer<R1,R1> combiner)

      Note: The accumulator and combiner are applied on each node until all the local stream's values are reduced into a single object. Because of marshalling limitations, the final result of the collector on remote nodes is limited to a size of 2GB. If you need to process more than 2GB of data, you must force the collector to run on the originator with CacheStream.spliterator():

       StreamSupport.stream(stream.filter(entry -> ...)
                                  .map(entry -> ...)
                                  .spliterator(),
                            false)
                    .collect(Collectors.toList());
       

      Note: this method doesn't pay attention to ordering constraints and any sorting performed on the stream will be ignored by this terminal operator. If you wish to have an ordered collector use the collect(Collector) method making sure the Collector.Characteristics.UNORDERED property is not set.
      Specified by:
      collect in interface CacheStream<Original>
      Specified by:
      collect in interface Stream<Original>
      Type Parameters:
      R1 -
      Parameters:
      supplier -
      accumulator -
      combiner -
      Returns:
    • collect

      public <R1, A> R1 collect(Collector<? super R,A,R1> collector)
      Description copied from interface: CacheStream

      Note when using a distributed backing cache for this stream the collector must be marshallable. This prevents the usage of Collectors class. However you can use the CacheCollectors static factory methods to create a serializable wrapper, which then creates the actual collector lazily after being deserialized. This is useful to use any method from the Collectors class as you would normally. Alternatively, you can call CacheStream.collect(SerializableSupplier) too.

      Note: The collector is applied on each node until all the local stream's values are reduced into a single object. Because of marshalling limitations, the final result of the collector on remote nodes is limited to a size of 2GB. If you need to process more than 2GB of data, you must force the collector to run on the originator with CacheStream.spliterator():

       StreamSupport.stream(stream.filter(entry -> ...)
                                  .map(entry -> ...)
                                  .spliterator(),
                            false)
                    .collect(Collectors.toList());
       

      Specified by:
      collect in interface CacheStream<Original>
      Specified by:
      collect in interface Stream<Original>
      Type Parameters:
      R1 - collected type
      A - intermediate collected type if applicable
      Returns:
      the collected value
      See Also:
    • min

      public Optional<R> min(Comparator<? super R> comparator)
      Specified by:
      min in interface Stream<Original>
    • max

      public Optional<R> max(Comparator<? super R> comparator)
      Specified by:
      max in interface Stream<Original>
    • anyMatch

      public boolean anyMatch(Predicate<? super R> predicate)
      Specified by:
      anyMatch in interface Stream<Original>
    • allMatch

      public boolean allMatch(Predicate<? super R> predicate)
      Specified by:
      allMatch in interface Stream<Original>
    • noneMatch

      public boolean noneMatch(Predicate<? super R> predicate)
      Specified by:
      noneMatch in interface Stream<Original>
    • findFirst

      public Optional<R> findFirst()
      Specified by:
      findFirst in interface Stream<Original>
    • findAny

      public Optional<R> findAny()
      Specified by:
      findAny in interface Stream<Original>
    • count

      public long count()
      Specified by:
      count in interface Stream<Original>
    • iterator

      public Iterator<R> iterator()
      Description copied from interface: CacheStream

      Usage of this operator requires closing this stream after you are done with the iterator. The preferred usage is to use a try with resource block on the stream.

      This method has special usage with the BaseCacheStream.SegmentCompletionListener in that as entries are retrieved from the next method it will complete segments.

      This method obeys the CacheStream.distributedBatchSize(int). Note that when using methods such as CacheStream.flatMap(Function) that you will have possibly more than 1 element mapped to a given key so this doesn't guarantee that many number of entries are returned per batch.

      Note that the Iterator.remove() method is only supported if no intermediate operations have been applied to the stream and this is not a stream created from a Cache.values() collection.

      Specified by:
      iterator in interface BaseStream<Original,R>
      Specified by:
      iterator in interface CacheStream<Original>
      Returns:
      the element iterator for this stream
    • spliterator

      public Spliterator<R> spliterator()
      Description copied from interface: CacheStream

      Usage of this operator requires closing this stream after you are done with the spliterator. The preferred usage is to use a try with resource block on the stream.

      Specified by:
      spliterator in interface BaseStream<Original,R>
      Specified by:
      spliterator in interface CacheStream<Original>
      Returns:
      the element spliterator for this stream
    • forEach

      public void forEach(Consumer<? super R> action)
      Description copied from interface: CacheStream

      This operation is performed remotely on the node that is the primary owner for the key tied to the entry(s) in this stream.

      NOTE: This method while being rehash aware has the lowest consistency of all of the operators. This operation will be performed on every entry at least once in the cluster, as long as the originator doesn't go down while it is being performed. This is due to how the distributed action is performed. Essentially the CacheStream.distributedBatchSize(int) value controls how many elements are processed per node at a time when rehash is enabled. After those are complete the keys are sent to the originator to confirm that those were processed. If that node goes down during/before the response those keys will be processed a second time.

      It is possible to have the cache local to each node injected into this instance if the provided Consumer also implements the CacheAware interface. This method will be invoked before the consumer accept() method is invoked.

      This method is ran distributed by default with a distributed backing cache. However if you wish for this operation to run locally you can use the stream().iterator().forEachRemaining(action) for a single threaded variant. If you wish to have a parallel variant you can use StreamSupport.stream(Spliterator, boolean) passing in the spliterator from the stream. In either case remember you must close the stream after you are done processing the iterator or spliterator..

      Specified by:
      forEach in interface CacheStream<Original>
      Specified by:
      forEach in interface Stream<Original>
      Parameters:
      action - consumer to be ran for each element in the stream
    • forEach

      public <K, V> void forEach(BiConsumer<Cache<K,V>,? super R> action)
      Description copied from interface: CacheStream
      Same as CacheStream.forEach(Consumer) except that it takes a BiConsumer that provides access to the underlying Cache that is backing this stream.

      Note that the CacheAware interface is not supported for injection using this method as the cache is provided in the consumer directly.

      Specified by:
      forEach in interface CacheStream<Original>
      Type Parameters:
      K - key type of the cache
      V - value type of the cache
      Parameters:
      action - consumer to be ran for each element in the stream
    • forEachOrdered

      public void forEachOrdered(Consumer<? super R> action)
      Specified by:
      forEachOrdered in interface Stream<Original>
    • toArray

      public Object[] toArray()
      Specified by:
      toArray in interface Stream<Original>
    • toArray

      public <A> A[] toArray(IntFunction<A[]> generator)
      Specified by:
      toArray in interface Stream<Original>
    • sequentialDistribution

      public CacheStream<R> sequentialDistribution()
      Description copied from interface: CacheStream
      This would disable sending requests to all other remote nodes compared to one at a time. This can reduce memory pressure on the originator node at the cost of performance.

      Parallel distribution is enabled by default except for CacheStream.iterator() and CacheStream.spliterator()

      Specified by:
      sequentialDistribution in interface BaseCacheStream<Original,R>
      Specified by:
      sequentialDistribution in interface CacheStream<Original>
      Returns:
      a stream with parallel distribution disabled.
    • parallelDistribution

      public CacheStream<R> parallelDistribution()
      Description copied from interface: BaseCacheStream
      This would enable sending requests to all other remote nodes when a terminal operator is performed. This requires additional overhead as it must process results concurrently from various nodes, but should perform faster in the majority of cases.

      Parallel distribution is enabled by default except for CacheStream.iterator() and CacheStream.spliterator()

      Specified by:
      parallelDistribution in interface BaseCacheStream<Original,R>
      Specified by:
      parallelDistribution in interface CacheStream<Original>
      Returns:
      a stream with parallel distribution enabled.
    • filterKeySegments

      public CacheStream<R> filterKeySegments(Set<Integer> segments)
      Description copied from interface: CacheStream
      Filters which entries are returned by what segment they are present in. This method can be substantially more efficient than using a regular CacheStream.filter(Predicate) method as this can control what nodes are asked for data and what entries are read from the underlying CacheStore if present.
      Specified by:
      filterKeySegments in interface BaseCacheStream<Original,R>
      Specified by:
      filterKeySegments in interface CacheStream<Original>
      Parameters:
      segments - The segments to use for this stream operation. Any segments not in this set will be ignored.
      Returns:
      a stream with the segments filtered.
    • filterKeySegments

      public CacheStream<R> filterKeySegments(IntSet segments)
      Description copied from interface: CacheStream
      Filters which entries are returned by what segment they are present in. This method can be substantially more efficient than using a regular CacheStream.filter(Predicate) method as this can control what nodes are asked for data and what entries are read from the underlying CacheStore if present.
      Specified by:
      filterKeySegments in interface BaseCacheStream<Original,R>
      Specified by:
      filterKeySegments in interface CacheStream<Original>
      Parameters:
      segments - The segments to use for this stream operation. Any segments not in this set will be ignored.
      Returns:
      a stream with the segments filtered.
    • filterKeys

      public CacheStream<R> filterKeys(Set<?> keys)
      Description copied from interface: CacheStream
      Filters which entries are returned by only returning ones that map to the given key. This method will be faster than a regular CacheStream.filter(Predicate) if the filter is holding references to the same keys.
      Specified by:
      filterKeys in interface BaseCacheStream<Original,R>
      Specified by:
      filterKeys in interface CacheStream<Original>
      Parameters:
      keys - The keys that this stream will only operate on.
      Returns:
      a stream with the keys filtered.
    • distributedBatchSize

      public CacheStream<R> distributedBatchSize(int batchSize)
      Description copied from interface: CacheStream
      Controls how many keys are returned from a remote node when using a stream terminal operation with a distributed cache to back this stream. This value is ignored when terminal operators that don't track keys are used. Key tracking terminal operators are CacheStream.iterator(), CacheStream.spliterator(), CacheStream.forEach(Consumer). Please see those methods for additional information on how this value may affect them.

      This value may be used in the case of a a terminal operator that doesn't track keys if an intermediate operation is performed that requires bringing keys locally to do computations. Examples of such intermediate operations are CacheStream.sorted(), CacheStream.sorted(Comparator), CacheStream.distinct(), CacheStream.limit(long), CacheStream.skip(long)

      This value is always ignored when this stream is backed by a cache that is not distributed as all values are already local.

      Specified by:
      distributedBatchSize in interface BaseCacheStream<Original,R>
      Specified by:
      distributedBatchSize in interface CacheStream<Original>
      Parameters:
      batchSize - The size of each batch. This defaults to the state transfer chunk size.
      Returns:
      a stream with the batch size updated
    • segmentCompletionListener

      public CacheStream<R> segmentCompletionListener(BaseCacheStream.SegmentCompletionListener listener)
      Description copied from interface: CacheStream
      Allows registration of a segment completion listener that is notified when a segment has completed processing. If the terminal operator has a short circuit this listener may never be called.

      This method is designed for the sole purpose of use with the CacheStream.iterator() to allow for a user to track completion of segments as they are returned from the iterator. Behavior of other methods is not specified. Please see CacheStream.iterator() for more information.

      Multiple listeners may be registered upon multiple invocations of this method. The ordering of notified listeners is not specified.

      This is only used if this stream did not invoke BaseCacheStream.disableRehashAware() and has no flat map based operations. If this is done no segments will be notified.

      Specified by:
      segmentCompletionListener in interface BaseCacheStream<Original,R>
      Specified by:
      segmentCompletionListener in interface CacheStream<Original>
      Parameters:
      listener - The listener that will be called back as segments are completed.
      Returns:
      a stream with the listener registered.
    • disableRehashAware

      public CacheStream<R> disableRehashAware()
      Description copied from interface: CacheStream
      Disables tracking of rehash events that could occur to the underlying cache. If a rehash event occurs while a terminal operation is being performed it is possible for some values that are in the cache to not be found. Note that you will never have an entry duplicated when rehash awareness is disabled, only lost values.

      Most terminal operations will run faster with rehash awareness disabled even without a rehash occuring. However if a rehash occurs with this disabled be prepared to possibly receive only a subset of values.

      Specified by:
      disableRehashAware in interface BaseCacheStream<Original,R>
      Specified by:
      disableRehashAware in interface CacheStream<Original>
      Returns:
      a stream with rehash awareness disabled.
    • timeout

      public CacheStream<R> timeout(long timeout, TimeUnit unit)
      Description copied from interface: CacheStream
      Sets a given time to wait for a remote operation to respond by. This timeout does nothing if the terminal operation does not go remote.

      If a timeout does occur then a TimeoutException is thrown from the terminal operation invoking thread or on the next call to the Iterator or Spliterator.

      Note that if a rehash occurs this timeout value is reset for the subsequent retry if rehash aware is enabled.

      Specified by:
      timeout in interface BaseCacheStream<Original,R>
      Specified by:
      timeout in interface CacheStream<Original>
      Parameters:
      timeout - the maximum time to wait
      unit - the time unit of the timeout argument
      Returns:
      a stream with the timeout set
    • intCacheStream

      protected DistributedIntCacheStream intCacheStream()
    • doubleCacheStream

      protected DistributedDoubleCacheStream doubleCacheStream()
    • longCacheStream

      protected DistributedLongCacheStream longCacheStream()