Google? Evil? You have no idea

Sky-Tiger · 发表于 2014-5-3 23:43

Another frequent operation on a data set you may do in your day-to-day job is grouping items in a set, based on the values of one or more of their properties. As you saw in the earlier transactions currency–grouping example, this operation can be cumbersome, verbose, and error prone when implemented with an imperative style, but it can be easily translated in a single, very readable statement by rewriting it in a more functional style as encouraged by Java 8. To give a second example of how this feature works, suppose you want to classify the dishes in the menu according to their respective type, putting the ones containing meat in a group, the ones with fish in another group, and all others in a third group. You can easily
perform this task using a Collector created with the Collectors.groupingBy factory method as follows:
Map<Dish.Type, List<Dish>> dishesByType = menu.stream().collect(groupingBy(Dish::getType));
This will result in the following Map:
{FISH=[prawns, salmon], OTHER=[french fries, rice, season fruit, pizza],
MEAT=[pork, beef, chicken]}
Here, you pass to the groupingBy method a Function (expressed in the form of a method reference) extracting from each Dish in the Stream the corresponding Dish.Type. We call this Function a classification function because it’s used to classify the elements of the Stream in different groups. More in general, you have to create a Collector that passes to the groupingBy method a classification Function that transforms each item in the Stream into the value under which the item itself will be classified. The result of this grouping operation, shown in figure 5.4, is a Map, having as key the value returned by the classification Function and as a corresponding value a List of all the items in the Stream for which the application of the classification Function on that item returns that value. In the menu-classification example a key is the type of dish, and its value is a List containing all the dishes of that type.

Sky-Tiger · 发表于 2014-5-3 23:49

But it isn’t always possible to use a method reference as a classification Function, because it could be something more complex than a simple property accessor. For instance, you could decide to classify as “diet” all dishes with 400 calories or fewer, set to “normal” the dishes having between 400 and 700 calories, and set to “fat” the ones with more than 700 calories. Since the author of the Dish class unhelpfully didn’t provide such an operation as a method, you can’t use a method reference in this case, but you can express this logic in a lambda expression:
public enum CaloricLevel { DIET, NORMAL, FAT }
Map<CaloricLevel, List<Dish>> dishesByCaloricLevel = menu.stream().collect( groupingBy(dish -> {
if (dish.getCalories() <= 400) return CaloricLevel.DIET;
else if (dish.getCalories() <= 700) return CaloricLevel.NORMAL;
else return CaloricLevel.FAT;
} ));
So now you’ve seen how to group the dishes in the menu, both by their type and by calories, but what if you want to use both criteria at the same time?

Sky-Tiger · 发表于 2014-5-5 17:06

In general it’s impossible (and pointless) to try to give any quantitative hint on when to use a parallel Stream because any suggestion like “use a parallel Stream only if you have at least 1 thousand (or 1 million or whatever number you want) elements” could be correct for a specific operation running on a specific machine, but it could be completely wrong in an even marginally different context. But it’s at least possible to provide some qualitative advice that could be useful when deciding if it makes sense to use a parallel Stream in a certain situation:
If in doubt, measure. Turning a sequential Stream into a parallel one is trivial but not always the right thing to do. As we already demonstrated in this section, a parallel Stream isn’t always faster than the corresponding sequential version. Moreover, parallel Streams can sometimes work in a counterintuitive way, so the first and most important suggestion when choosing between sequential and parallel Streams is to always check their performance with an appropriate benchmark.
Watch out for boxing. Automatic boxing and unboxing operations can dramatically hurt performance. Primitive Streams have been included for this reason, and the performance benefits in employing them every time it’s possible to do so can often overcome the advantages provided by parallel Streams.

Sky-Tiger · 发表于 2014-5-5 17:06

Some operations naturally perform worse on a parallel Stream than on a sequential Stream. In particular operations such as limit and findFirst that rely on the order of the elements are expensive in a parallel Stream. For example findAny will perform better than findFirst because it is not constrained to operate in the encounter order. You can always turn an ordered Stream into an unordered Stream by invoking the method unordered() on it. So for instance if you need N elements of your Stream and you’re not necessarily interested in the first N ones, calling limit on an unordered parallel Stream may execute more efficiently than on an Stream with an encounter order (e.g. the source is a List).

Sky-Tiger · 发表于 2014-5-5 17:06

Consider the total computational cost of the pipeline of operations performed by the Stream. With N being the number of elements to be processed and Q the approximate cost of processing one of this element through the Stream pipeline, the product of N*Q gives a rough qualitative estimation of this cost. A higher value for this cost implies a better chance of good performance when using a parallel Stream.

Sky-Tiger · 发表于 2014-5-5 17:07

For small amount of data, choosing a parallel Stream is almost never a winning decision. The advantages of processing in parallel only a few elements aren’t enough to compensate for the additional cost introduced by the parallelization process.

Sky-Tiger · 发表于 2014-5-5 17:07

Take into account how well the data structure underlying the Stream decomposes. For instance, an ArrayList can be split much more efficiently than a LinkedList, because the first can be evenly divided without traversing it, as it is necessary to do with the second. Also, the primitive Streams created with the range() factory method can be decomposed very quickly. Finally, as you’ll learn in section 6.3, you can get full control of this decomposition process by implementing your own Spliterator.

Sky-Tiger · 发表于 2014-5-5 17:14

The characteristics of a Stream, and how the intermediate operations through the pipeline modify them, can change the performance of the decomposition process. For example, a SIZED Stream can be divided into two equal parts, and then each part can be processed in parallel more effectively, but a filter operation can throw away an unpredictable number of elements, making the size of the Stream itself unknown.
Consider whether a terminal operation has a cheap or expensive merge step. In the second case, the cost caused by the re-aggregation of the partial results generated by each substream can negatively affect the performance of a parallel Stream.

Google? Evil? You have no idea

浏览过的版块