Extends the the Java™ Collections Framework by providing type-specific maps, sets, lists and priority queues with a small memory footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files. It is free software distributed under the Apache License 2.0.

Package Specification

fastutil is formed by three cores:

  • type-specific classes that extend naturally the Java™ Collections Framework;
  • classes that support very large collections;
  • classes for fast and practical access to binary and text files.

The three cores are briefly introduced in the next sections, and then discussed at length in the rest of this overview.

The following features were added in version 8.5.0:

  • type-specific spliterators;
  • primitive stream methods;
  • even more default methods overridden for performance or avoiding boxing/unboxing.

We strongly suggest to activate deprecation warnings in your development environment, as fastutil systematically deprecates all JDK methods for which there is a type-specific alternative.

Type-specific classes

fastutil specializes the most useful HashSet, HashMap, LinkedHashSet, LinkedHashMap, TreeSet, TreeMap, IdentityHashMap, ArrayList and Stack classes to versions that accept a specific kind of key or value (e.g., integers). Besides, there are also several types of priority queues and a large collection of static objects and methods (such as immutable empty containers, comparators implementing the opposite of the natural order, iterators obtained by wrapping an array and so on).

To understand what's going on at a glance, the best thing is to look at the examples provided below. If you already used the Collections Framework, everything should look rather natural. If, in particular, you use an IDE that can suggest you the method names, all you need to know is the right name for the class you need.

Support for very large collections

A set of classes makes it possible to handle very large collections: in particular, collections whose size exceeds 231. Big arrays are arrays-of-arrays handled by a wealth of static methods that act on them as if they were monodimensional arrays with 64-bit indices; big lists provide 64-bit list access; big hash sets provide support for sets whose size is only limited by the amount of core memory.

The usual methods from Arrays and similar classes have been extended to big arrays: have a look at the Javadoc documentation of BigArrays and IntBigArrays to get an idea of the generic and type-specific methods available.

Limited support is available for big atomic arrays of integers and longs.

Fast and practical I/O

fastutil provides replacements for some standard classes of java.io that are plagued by a number of problems (see, e.g., FastBufferedInputStream). The BinIO and TextIO static containers contain dozens of methods that make it possible to load and save quickly (big) arrays to disks, to adapt binary and text file to iterators, and so on. Classes such as IntMappedBigList make it possible to map into memory large file of primitive types and access them as big lists.

More on type-specific classes

All data structures in fastutil extend their standard counterpart interface whenever possible (e.g., Map for maps). Thus, they can be just plugged into existing code, using the standard access methods. However, they also provide (whenever possible) many polymorphic versions of the most used methods that avoid boxing/unboxing. In doing so, they specify more stringent interfaces that extend and strengthen the standard ones (e.g., Int2IntSortedMap or IntListIterator). All generic methods in type-specific interfaces have been deprecated.

Warning: automatic boxing and unboxing can lead you to choose the wrong method when using fastutil. It is also extremely inefficient. We suggest that your programming environment is set so to mark boxing/unboxing as a warning, or even better, as an error.

The main point of type-specific data structures is that the absence of wrappers around primitive types can increase speed and reduce space occupancy by several times. The presence of generics in Java does not change this fact, since there is no genericity for primitive types.

The implementation techniques used in fastutil are quite different than those of java.util: for instance, open-addressing hash tables, threaded AVL trees, threaded red-black trees and exclusive-or lists. An effort has also been made to provide powerful derived objects and to expose them by overriding covariantly return types: for instance, the keys of sorted maps are sorted and iterators on sorted containers are always bidirectional.

More generally, the rationale behing fastutil is that you should never need to code explicitly natural transformations. You do to not need to define an anonymous class to iterate over an array of integers—just wrap it. You do not need to write a loop to put the characters returned by an iterator into a set—just use the right constructor. And so on.

The Names

In general, class names adhere to the general pattern

valuetype collectiontype

for collections, and

keytype 2 valuetype maptype

for maps.

The word "type" here referes to a capitalized primitive type, Object or Reference. In the latter case, we are treating objects, but their equality is established by reference equality (that is, without invoking equals()), similarly to IdentityHashMap. Note that reference-based classes are significantly faster.

Thus, an IntOpenHashSet stores integers efficiently and implements IntSet, whereas a Long2IntAVLTreeMap does the same for maps from longs to integers (but the map will be sorted, tree based, and balanced using the AVL criterion), implementing Long2IntMap. If you need additional flexibility in choosing your hash strategy, you can put, say, arrays of integers in a ObjectOpenCustomHashSet, maybe using the ready-made hash strategy for arrays. A LongLinkedOpenHashSet stores longs in a hash table, but provides a predictable iteration order (the insertion order) and access to first/last elements of the order. A Reference2ReferenceOpenHashMap is similar to an IdentityHashMap. You can manage a priority queue of characters in a heap using a CharHeapPriorityQueue, which implements CharPriorityQueue. Front-coded lists are highly specialized immutable data structures that store compactly a large number of arrays: if you don't know them you probably don't need them.

For a number of data structures that were not available in the Java™ Collections Framework when fastutil was created, an object-based version is contained in it.unimi.dsi.fastutil, and in that case the prefix Object is not used (see, e.g., PriorityQueue).

Since there are eight primitive types in Java, and we support reference-based containers, we get around 2000 (!) classes (some nonsensical classes, such as Boolean2BooleanAVLTreeMap, are not generated). Many classes are generated just to mimic the hierarchy of java.util so to redistribute common code in a similar way. There are also several abstract classes that ease significantly the creation of new type-specific classes by providing automatically generic methods based on the type-specific ones.

The huge number of classes required a suitable division in subpackages. Each subpackage is characterized by the type of elements or keys: thus, for instance, IntSet belongs to it.unimi.dsi.fastutil.ints (the plural is required, as int is a keyword and cannot be used in a package name), as well as Int2ReferenceRBTreeMap. Note that all classes for non-primitive elements and keys are gathered in it.unimi.dsi.fastutil.objects. Finally, a number of non-type-specific classes have been gathered in it.unimi.dsi.fastutil.

An In–Depth Look

The following table summarizes the available interfaces and implementations. To get more information, you can look at a specific implementation in it.unimi.dsi.fastutil or, for instance, it.unimi.dsi.fastutil.ints. Note that a few existing abstract classes have been deprecated and are not listed here (they are no longer necessary due to default methods in the corresponding interfaces).

InterfacesAbstract ImplementationsImplementations
Iterable
CollectionAbstractCollection
SetAbstractSetOpenHashSet, OpenCustomHashSet, ArraySet, OpenHashBigSet
SortedSetAbstractSortedSetRBTreeSet, AVLTreeSet, LinkedOpenHashSet
FunctionAbstractFunction
MapAbstractMapOpenHashMap, OpenCustomHashMap, ArrayMap
SortedMapAbstractSortedMapRBTreeMap, AVLTreeMap, LinkedOpenHashMap
List, BigList†AbstractList, AbstractBigListArrayList, BigArrayBigList, MappedBigList, ArrayFrontCodedList, ImmutableList
PriorityQueue†HeapPriorityQueue, ArrayPriorityQueue, ArrayFIFOQueue
IndirectPriorityQueue†HeapSemiIndirectPriorityQueue, HeapIndirectPriorityQueue, ArrayIndirectPriorityQueue
Stack†ArrayList
Pair†MutablePair, ImmutablePair
Iterator, BigListIterator†Iterators.AbstractIndexBasedIterator
SpliteratorSpliterators.AbstractIndexBasedSpliterator
Comparator
BidirectionalIterator†
ListIterator
Consumer
Predicate
BinaryOperator
Size64‡

†: this class has also a non-type-specific implementation in it.unimi.dsi.fastutil.

‡: this class has only a non-type-specific implementation in it.unimi.dsi.fastutil.

Note that abstract implementations are named by prefixing the interface name with Abstract. Thus, if you want to define a type-specific structure holding a set of integers without the hassle of defining object-based methods, you should inherit from AbstractIntSet.

The following table summarizes static containers, which usually give rise both to a type-specific and to a generic class:

Static Containers
Collections
Sets
SortedSets
Functions
Maps†
SortedMaps
Lists
BigLists
Arrays†
BigArrays†
Heaps
SemiIndirectHeaps
IndirectHeaps
PriorityQueues†
IndirectPriorityQueues†
Iterators
BigListIterators
Spliterators
BigSpliterators
Comparators
Hash‡
HashCommon‡
SafeMath‡

†: this class has also a non-type-specific implementation in it.unimi.dsi.fastutil.

‡: this class has only a non-type-specific implementation in it.unimi.dsi.fastutil.

The static containers provide also special-purpose implementations for all kinds of empty structures (including arrays) and singletons.

Warnings

All classes are not synchronized. If multiple threads access one of these classes concurrently, and at least one of the threads modifies it, it must be synchronized externally. Iterators will behave unpredictably in the presence of concurrent modifications. Reads, however, can be carried out concurrently.

Reference-based classes violate the Map contract. They intentionally compare objects by reference, and do not use the equals() method. They should be used only when reference-based equality is desired (for instance, if all objects involved are canonized, as it happens with interned strings).

Setting the default return value for maps with both non-primitive keys and non-primitive values violates the Map contract, as you might not get null on missing keys.

Linked classes do not implement wholly the SortedMap interface. They provide methods to get the first and last element in iteration order, and to start a bidirectional iterator from any element, but any submap or subset method will cause an UnsupportedOperationException (this may change in future versions).

Substructures in sorted classes allow the creation of arbitrary substructures. In java.util, instead, you can only create contained sub-substructures (BTW, why?). For instance, (new TreeSet()).tailSet(1).tailSet(0) will throw an exception, but (new IntRBTreeSet()).tailSet(1).tailSet(0) won't.

Additional Features and Methods

The new interfaces add some very natural methods and strengthen many of the old ones. Moreover, whenever possible, the object returned is type-specific, or implements a more powerful interface.

More in detail:

  • Keys and values of a map are of the fastutil type you would expect (e.g., the keys of an Int2LongSortedMap are an IntSortedSet and the values are a LongCollection).
  • Hash-based and tree-based maps that return primitive numeric values have an addTo() method (see, e.g., Int2IntOpenHashMap.addTo(int,int)) that adds an increment to the current value of a key; it is most useful to avoid the inefficient procedure of getting a value, incrementing it and then putting it back into the map (typically, when counting the number of occurrences of elements in a sequence).
  • Hash-set implementations have an additional get() method that returns the actual object in the collection that is equal to the query key.
  • Linked hash-based maps and sets have a wealth of additional methods that make it easy to use them as caches. See, for instance, Int2IntLinkedOpenHashMap.putAndMoveToLast(int,int), IntLinkedOpenHashSet.addAndMoveToLast(int) or Int2IntLinkedOpenHashMap.removeFirstInt().
  • Submaps of a sorted map and subsets of a sorted sets are of the fastutil type you would expect, too.
  • Iterators returned by iterator() are type-specific.
  • Spliterators returned by spliterator() are type-specific.
  • Collection has a method returning primitive streams, for example IntCollection.intStream().
  • Sorted structures in fastutil return type-specific bidirectional iterators. This means that you can move back and forth among entries, keys or values.
  • Some classes for maps (check the specification) return a fast entry set (see, e.g., Int2IntOpenHashMap.int2IntEntrySet()); fast entry sets can, in turn, provide a fast iterator that is guaranteed not to create a large number of objects, possibly by returning always the same entry (of course, mutated). There's a also companion fastForEach(). To make it possible to access these fast iterators in for loops, there are static helper methods available (e.g., Int2IntMaps.fastIterable(it.unimi.dsi.fastutil.ints.Int2IntMap)).
  • The type-specific sorted set interfaces, moreover, feature an optional method iterator(from) which creates a type-specific BidirectionalIterator starting from a given element of the domain (not necessarily in the set). See, for instance, IntSortedSet.iterator(int). The method is implemented by all type-specific sorted sets and subsets.
  • Finally, there are constructors that allow you to build easily sets using array and iterators. This means, for instance, that you can create quickly a set of strings with a statement like
    new ObjectOpenHashSet( new String[] { "foo", "bar" } )
    or just "unroll" the integers returned by an iterator into a list with
    new IntArrayList( iterator )

There are a few quirks, however, that you should be aware of:

  • The versions of the get(), put() and remove() methods that return a primitive type cannot, of course, rely on returning null to denote the absence of a certain pair. Rather, they return a default return value, which is set to 0 cast to the return type (false for booleans) at creation, but can be changed using the defaultReturnValue() method (see, e.g., Int2IntMap). Note that changing the default return value does not change anything about the data structure; it is just a way to return a reasonably meaningful result—it can be changed at any time. For uniformity reasons, even maps returning objects can use defaultReturnValue() (of course, in this case the default return value is initialized to null). A submap or subset has an independent default return value (which however is initialized to the default return value of the originator).
  • For all maps that have objects as keys, the get() and remove() methods do not admit polymorphic versions, as Java does not allow return-value polymorphism. Rather, the extended interfaces introduce new methods of the form getvaluetype() and removevaluetype(). Similar problems occur with first(), last(), and so on.
  • Similarly, all iterators have a suitable method nexttype() returning directly a primitive type. And, of course, you have a type-specific version of previous(). Note that the “for each” style of iteration can hide boxing and unboxing: even if you iterate using a primitive variable (as in for(long x: s), where s is a LongSet), the actual iterator used will be object-based, as Java knows nothing about fastutil's type-specific next()/previous() methods.
  • For the same reason, the method Collection.toArray() as a polymorphic version accepting a type-specific array, but there is also an explicitly typed method tokeytypeArray().
  • The standard entry-set iterators for hash-based maps use an entry object that refers to the data contained in the hash table. If you retrieve an entry and delete it, the entry object will become invalid and will throw an ArrayIndexOutOfBoundsException exception. This does not apply to fast iterators (see above).
  • A name clash between the list and collection interfaces forces the deletion method of a collection to be named rem(). At the risk of creating some confusion, remove() reappears in the type-specific set interfaces, so the only really unpleasant effect is that you must use rem() on variables that are collections, but not sets—for instance, type-specific lists, and that a subclass of a type-specific abstract collection must override rem(), rather than remove(), to make all inherited methods work properly.
  • Stacks are interfaces implemented by array-based lists: the interface, moreover, is slightly different from the implementation contained in Stack.
  • In JDK 8 a number of classes appeared specifying primitive-type interfaces: examples are PrimitiveIterator and IntToLongFunction. Whenever possible, fastutil tries to extend such interfaces (e.g., IntIterator extends PrimitiveIterator.OfInt).
  • In the specification of new methods such as Map.computeIfAbsent() we do not always have a suitable type-specific version of Function in the JDK: in this case, we approximate the best we can. For example, a Byte2ByteMap expects an IntUnaryOperator. The default implementation will check that the returned argument actually fits a byte, throwing an exception otherwise. Note that we always specify also a version accepting a function returning an object, as it is useful in case one needs to return null.
  • In the specification of new methods requiring a type-specific Consumer such as Iterable.forEach(java.util.function.Consumer) we try to use the java.util.function correct type, if available. Otherwise, we use the fastutil correct type-specific version, but, as in the previous case, the fastutil version extends the closest Consumer. For example, CharConsumer extends IntConsumer.

Functions

Function (and its type-specific versions) is an interface geared towards mathematical functions (e.g., hashes) which associates values to keys, but in which enumerating keys or values is not possible. It is essentially a Map that does not provide access to set representations. It is of course unfortunate that Java 8 introduced an identically named interface with a different signature: differences and interoperability issues are discussed in the class documentation.

fastutil provides interfaces, abstract implementations and the usual array of wrappers in the suitable static container (e.g., Int2IntFunctions). Implementations will be provided by other projects (e.g., Sux4J). Type-specific functions require just to define their get() methods: thus, they can be defined by lambda expressions.

All fastutil type-specific maps extend their respective type-specific functions: but, alas, we cannot have Map extending Function. Moreover, type-specific functions extend the existing type-specific function from java.util.function that can accommodate the argument and the result with the least widening: for example, Short2CharFunction extends IntUnaryOperator.

Static Container Classes

fastutil provides a number of static methods and singletons, much like Collections. To avoid creating classes with hundreds of methods, there are separate containers for sets, lists, maps and so on. Generic containers are placed in it.unimi.dsi.fastutil, whereas type-specific containers are in the appropriate package. You should look at the documentation of the static classes contained in it.unimi.dsi.fastutil, and in type-specific static classes such as CharSets, Float2ByteSortedMaps, LongArrays, FloatHeaps. Presently, you can easily obtain empty collections, empty type-specific collections, singletons, synchronized versions of any type-specific container and unmodifiable versions of containers and iterators (of course, unmodifiable containers always return unmodifiable iterators).

On a completely different side, the type-specific static container classes for arrays provide several useful methods that allow to treat an array much like an array-based list, hiding completely the growth logic. In many cases, using this methods and an array is even simpler then using a full-blown type-specific array-based list because elements access is syntactically much simpler. The version for objects uses reflection to return arrays of the same type of the argument.

For the same reason, fastutil provides a full implementation of methods that manipulate arrays as type-specific heaps, semi-indirect heaps and indirect heaps. There are also quicksort and mergesort implementations that use arbitrary type-specific comparators.

fastutil offers also a less common choice—a very tuned implementation of radix sort for all primitive types. It is significantly faster than quicksort already at small sizes (say, more than 10000 elements), and should be considered the sorting algorithm of choice if you do not need a generic comparator.

There are several variants provided. First of all you can radix sort in parallel two or even more arrays. You can also perform indirect sorts, for instance if you want to compute the sorting permutation of an array.

The sorting algorithm is a tuned radix sort adapted from Peter M. McIlroy, Keith Bostic and M. Douglas McIlroy, “Engineering radix sort”, Computing Systems, 6(1), pages 5−27 (1993), and further improved using the digit-oracle idea described by Juha Kärkkäinen and Tommi Rantala in “Engineering radix sort for strings”, String Processing and Information Retrieval, 15th International Symposium, volume 5280 of Lecture Notes in Computer Science, pages 3−14, Springer (2008). The basic algorithm is not stable, but this is immaterial for arrays of primitive types. For the indirect case, there is a parameter specifying whether the algorithm should be stable.

Iterators and Comparators

fastutil provides type-specific iterators and comparators. The interface of a fastutil iterator is slightly more powerful than that of a java.util iterator, as it contains a skip() method that allows to skip over a list of elements (an analogous method is provided for bidirectional iterators). For objects (even those managed by reference), the extended interface is named ObjectIterator; it is the return type, for instance, of ObjectCollection.iterator().

A plethora of useful static methods is also provided by various type-specific static containers (e.g., IntIterators) and IntComparators: among other things, you can wrap arrays and standard iterators in type-specific iterators, generate them giving an interval of elements to be returned, concatenate them or pour them into a set.

Spliterators and Streams

fastutil provides type-specific spliterators. These are used to implement fast streams.

Due to Java only supporting primitive Streams of int, long, and double, types for primitives narrower then that (such as ByteCollection) have methods that expose a spliterator widening them to the type supported by the Stream API. For example, ByteCollection will have an intSpliterator() method in addition to a spliterator() method that works in terms of bytes.

Due to the incompatibility of object based streams and the primitive streams (such as IntStream), the stream() methods of the primitive containers could not be updated to return their primitive equivalents. Instead, for primitive collections, a new method is provided that exposes a primitive stream, and the object based stream method is marked deprecated like the other boxing methods do. For example, ByteCollection, ShortCollection, and IntCollection exposes an intStream() method, which should be used in preference to stream() as it will not box and unbox.

Since type-specific collectors would be more than 700, fastutil implements basic collection methods in each container (e.g., IntImmutableList.toList()). These methods must be applied to a stream to obtain a collection containing the output of the stream.

Similar to iterators, a raft of utility methods for type-specific spliterators are provided (e.g. IntSpliterators). Most of them are analogues of the utilities in the type-specific Iterators class.

Currently no utility methods are provided for type-specific streams, as the Java library's API already has good coverage for that (e.g. the static methods of Stream and their primitive equivalents, or classes such as StreamSupport).

Functions, Consumers, Predicates, and Operators

Classes such as IntBinaryOperator were introduced in Java 8 to obtain performance improvements similar to those obtained by fastutil. Unfortunately, the design of all these classes follows an opposite choice with respect to fastutil's: type-specific classes do not extend the generic ones. Moreover, type specificity is rather limited (e.g., there are no short-specific classes). The fact that type-specific classes do not extend generic ones makes it difficult to use lambdas, as often they are (rightly) considered ambiguous by the compiler. The solution implemented in fastutil 8.5.0 is to have an internal type-specific version of these interfaces which extends both the generic one and the JDK type-specific one, allowing also for type widening (e.g., CharConsumer extends IntConsumer). When definining lambdas, the compiler now has a more specific type to choose (the type-specific fastutil interface), and no ambiguity arises.

Queues

fastutil offers two types of queues: direct queues and indirect queues. A direct queue offers type-specific method to enqueue and dequeue elements. An indirect queue needs a reference array, specified at construction time: enqueue and dequeue operations refer to indices in the reference array. The advantage is that it may be possible to notify the change of any element of the reference array, or even to remove an arbitrary element.

Queues have two kind of implementations: array-based implementations, and heap-based implementations. In particular, heap-based indirect queues may be fully indirect or just semi-indirect: in the latter case, there is no need for an explicit indirection array (which saves one integer per queue entry), but not all operations will be available. Note there there are also FIFO queues.

Custom Hashing

Sometimes, the behaviour of the built-in equality and hashing methods is not what you want. In particular, this happens if you store in a hash-based collection arrays, and you would like to compare them by equality. For this kind of applications, fastutil provides custom hash strategies, which define new equality and hashing methods to be used inside the collection. There are even ready-made strategies for arrays. Note, however, that fastutil containers do not cache hash codes, so custom hash strategies must be efficient.

Abstract Classes

fastutil provides a wide range of abstract classes, to help in implementing its interfaces. They take care, for instance, of providing wrappers for non-type-specific method calls, so that you have to write just the (usually simpler) type-specific version. When possible, such wrappers are actually default methods, so no abstract class is necessary.

More on the support for very large collections

With the continuous increase in core memory available, Java arrays are starting to show their size limitation (indices cannot be larger than 231). fastutil proposes to store big arrays using arrays-of-arrays subject to certain size restrictions and a number of supporting static methods. Please read the documentation of BigArrays to understand how big arrays work.

Correspondingly, fastutil proposes a new interface, called Size64, that should be implemented by very large collections. Size64 contains a method Size64.size64() which returns the collection size as a long integer.

fastutil provides big lists, which are lists with 64-bit indices; of course, they implement Size64. An implementation based on big arrays is provided (see, e.g., IntBigArrayBigList), as well as static containers (see, e.g., IntBigLists). Whereas it is unlikely that such collection will be in main memory as big arrays, there are number of situations, such as exposing large files through a list interface or storing a large amount of data using succinct data structures, in which a big list interface is natural.

Unfortunately, lists and big lists, as well as list iterators and big-list iterators, cannot be made compatible: we thus provide adapters (see, e.g., IntBigLists.asBigList(it.unimi.dsi.fastutil.ints.IntList)).

Finally, fastutil provides big hash sets, which are based on big arrays. They are about 30% slower than non-big sets, but their size is limited only by the amount core memory.

More on fast and practical I/O

fastutil includes an I/O package that provides, for instance, fast, unsynchronized buffered input streams, fast, unsynchronized buffered output streams, and a wealth of static methods to store and retrieve data in textual and binary form. The latter, in particular, contain methods that load and store big arrays.

Performance

The main reason behind fastutil is performance, both in time and in space. The relevant methods of type-specific hash maps and sets are something like 2 to 10 times faster than those of the standard classes. Note that performance of hash-based classes on object keys is usually slightly worse than that of java.util, because fastutil classes do not cache hash codes (albeit it will not be that bad if keys cache internally hash codes, as in the case of String). Of course, you can try to get more speed from hash tables using a small load factor: to this purpose, alternative load factors are proposed in Hash.FAST_LOAD_FACTOR and Hash.VERY_FAST_LOAD_FACTOR.

For tree-based classes you have two choices: AVL and red-black trees. The essential difference is that AVL trees are more balanced (their height is at most 1.44 log n), whereas red-black trees have faster deletions (but their height is at most 2 log n). So on small trees red-black trees could be faster, but on very large sets AVL trees will shine. In general, AVL trees have slightly slower updates but faster searches; however, on very large collections the smaller height may lead in fact to faster updates, too.

fastutil reduces enormously the creation and collection of objects. First of all, if you use the polymorphic methods and iterators no wrapper objects have to be created. Moreover, since fastutil uses open-addressing hashing techniques, creation and garbage collection of hash-table entries are avoided (but tables have to be rehashed whenever they are filled beyond the load factor). The major reduction of the number of objects around has a definite (but very difficult to measure) impact on the whole application (as garbage collection runs proportionally to the number of alive objects).

Maps whose iteration is very expensive in terms of object creation (e.g., hash-based classes) usually return a type-specific FastEntrySet whose fastIterator() method significantly reduces object creation by returning always the same entry object, suitably mutated.

Memory Usage

The absence of wrappers makes data structures in fastutil much smaller: even in the case of objects, however, data structures in fastutil try to be space-efficient.

Hash Tables

To avoid memory waste, (unlinked) hash tables in fastutil keep no additional information about elements (such as a list of keys). In particular, this means that enumerations are always linear in the size of the table (rather than in the number of keys). Usually, this would imply slower iterators. Nonetheless, the iterator code includes a single, tight loop; moreover, it is possible to avoid the creation of wrappers. These two facts make in practice fastutil iterators faster than java.util's.

The memory footprint for a table of length ℓ is exactly the memory required for the related types times ℓ. The absence of wrappers around primitive types can reduce space occupancy by several times (this applies even more to serialized data, e.g., when you save such a data structure in a file). These figures can greatly vary with your virtual machine, JVM versions, CPU etc.

More precisely, when you ask for a map that will hold n elements with load factor 0 < f < 1, 2⌈log n / f entries are allocated. When the table is filled up beyond the load factor, it is rehashed doubling its size. When it is emptied below one fourth of the load factor, it is rehashed halving its size; however, a map is never reduced to a size smaller than that at creation time: this approach makes it possible to create maps with a large capacity in which insertions and deletions do not cause immediately rehashing.

In the case of linked hash tables, there is an additional vector of 2⌈log n / f longs that is used to store link information. Each element records the next and previous element (packed together so to be more cache friendly).

Balanced Trees

The balanced trees implementation is also very parsimonious. fastutil is based on the excellent (and unbelievably well documented) code contained in Ben Pfaff's GNU libavl, which describes in detail how to handle balanced trees with threads. Thus, the overhead per entry is two pointers and one integer, which compares well to three pointers plus one boolean of the standard tree maps. The trick is that we use the integer bit by bit, so we consume two bits to store thread information, plus one or two bits to handle balancing. As a result, we get bidirectional iterators in constant space and amortized constant time without having to store references to parent nodes.

It should be mentioned that all tree-based classes have a fixed overhead for some arrays that are used as stacks to simulate recursion; in particular, we need 48 booleans for AVL trees and 64 pointers plus 64 booleans for red-black trees.

Examples

To store a set of longs, we can use a hash-based container:

LongSet s = new LongOpenHashSet();
    

Access methods avoid boxing and unboxing:

s.add(1);
s.contains(2);
    

We can obtain a type-specific iterator on the elements of the set:

// Sum all elements
long t = 0;
for(LongIterator i = s.iterator(); i.hasNext();) t += i.nextLong();
        

Note that “for each” iteration must be avoided:

long t = 0;
for(long x: s) t += x;
        

In the loop above, boxing and unboxing is happening (even if your IDE does not report it). In some cases, a solution is to use a type-specific forEach():

// Print all elements
s.forEach(x -> System.out.println(x));
        

Or we can use fastutil's type-specific version of Java 8's streams:

long t = m.longStream().sum();
    

Suppose instead you want to store a sorted map from longs to integers. We will use a tree of AVL type:

Long2IntSortedMap m = new Long2IntAVLTreeMap();
    

Now we can easily modify and access its content:

m.put(1, 5);
m.put(2, 6);
m.put(3, 7);
m.put(1000000000L, 10);
m.get(1); // This method call will return 5
m.get(4); // This method call will return 0
    

We can also try to change the default return value:

m.defaultReturnValue(-1);
m.get(4); // This method call will return -1
    

We can obtain a type-specific iterator on the key set:

LongBidirectionalIterator i = m.keySet().iterator();
// Now we sum all keys
long s = 0;
while(i.hasNext()) s += i.nextLong();
        

Or we can use again fastutil's type-specific version of Java 8's streams:

long s = m.longStream().sum();
    

We now generate a head map, and iterate bidirectionally over it starting from a given point:

// This map contains only keys smaller than 4
Long2IntSortedMap m1 = m.headMap(4);
// This iterator is positioned between 2 and 3
LongBidirectionalIterator t = m1.keySet().iterator(2);
t.previous(); // This method call will return 2 (t.next() would return 3)
    

Should we need to access the map concurrently, we can wrap it:

// This map can be safely accessed by many threads
Long2IntSortedMap m2 = Long2IntSortedMaps.synchronize(m1);
    

Linked maps are very flexible data structures which can be used to implement, for instance, queues whose content can be probed efficiently:

// This map remembers insertion order.
IntSortedSet s = new IntLinkedOpenHashSet.of(4, 3, 2, 1);
s.firstInt(); // This method call will return 4
s.lastInt(); // This method call will return 1
s.contains(5); // This method will return false
IntBidirectionalIterator i = s.iterator(s.lastInt()); // We could even cast it to a list iterator
i.previous(); // This method call will return 1
i.previous(); // This method call will return 2
s.remove(s.lastInt()); // This will remove the last element in constant time
    

Now, we play with iterators. It is easy to create iterators over intervals or over arrays, and combine them:

IntIterator i = IntIterators.fromTo(0, 10); // This iterator will return 0, 1, ..., 9
int[] a = new int[] { 5, 1, 9 };
IntIterator j = IntIterators.wrap(a); // This iterator will return 5, 1, 9.
IntIterator k = IntIterators.concat(new IntIterator[] { i , j }); // This iterator will return 0, 1, ..., 9, 5, 1, 9
    

It is easy to build lists and sets on the fly using the of static factory methods. For maps you can use the constructors that take key and value arrays (array based constructors for list and set exist too):

IntSet s = IntOpenHashSet.of(1, 2, 3); // This set will contain 1, 2, and 3
Char2IntMap m = new Char2IntRBTreeMap(new char[] { '@', '-' }, new int[] { 0, 1 }); // This map will map '@' to 0 and '-' to 1

Whenever you have some data structure, it is easy to serialize it in an efficient (buffered) way, or to dump their content in textual form:

BinIO.storeObject(s, "foo"); // This method call will save s in the file named "foo"
TextIO.storeInts(s.intIterator(), "foo.txt"); // This method call will save the content of s in ASCII
i = TextIO.asIntIterator("foo.txt"); // This iterator will parse the file and return the integers therein
        

You can also store data (of any size) in native format and access it via memory mapping:

BinIO.storeLongs(l, "foo", ByteOrder.nativeOrder()); // This method call will save the (big) array of longs l in the file named "foo" in native order
c = FileChannel.open(new File("foo").toPath());
m = LongMappedBigList.map(c); // Now you can access the data in l via memory mapping
        

Support for Java 8 primitive streams is included for primitive collections (e.g. intStream), which will work in terms of primitives instead of boxing to wrapper types like the regular stream would do:

IntList l = IntList.of(2, 380, 1297);
int lSum = l.intStream().sum();  // Will be 1679
IntList lTransformed = IntArrayList.toList(l.intStream().map(i -> i + 40)); // Will be 42, 420, 1337
    

You can sort arrays using type-specific comparators specified by lambda expressions (no boxing/unboxing here):

IntArrays.quickSort(a, (x, y) -> Integer.compare(y,  x)); // Sorts in reverse order
    

You can also easily specify complex generic sorting, like sorting indirectly on a while swapping elements in a and b in parallel:

Arrays.quickSort(0, a.length, (i, j) -> Integer.compare(a[i], a[j]), (i, j) -> { IntArrays.swap(a, i, j); IntArrays.swap(b, i, j); }));
    

If you have several cores, you can do it in parallel:

IntArrays.parallelQuickSort(a, (x, y) -> y - x); // Sorts in reverse order
Arrays.parallelQuickSort(0, a.length, (i, j) -> Integer.compare(a[i], a[j]), (i, j) -> { IntArrays.swap(a, i, j); IntArrays.swap(b, i, j); }));
    

Some maps provide a fast iterator on their entry set: such iterators are allowed to reuse the Map.Entry instance they return, resulting is highly reduced garbage collection (e.g., for large hash maps). To easily access such iterators, we can use a helper static method:

Int2IntOpenHashMap m = new Int2IntOpenHashMap();
// Fill the map
for (Int2IntMap.Entry e : Int2IntMaps.fastIterable(m)) {
    // do things with e.getIntKey() and e.getIntValue();
}
    

The code above will not generate an object per enumerated item, as for(Entry<Integer,Integer> e: m.entrySet()) (or even for(Int2IntMap.Entry e: m.int2IntKeySet())) would do.

Packages
Package
Description
Static classes and methods used by all implementations and some non-type-specific classes that do not belong to it.unimi.dsi.fastutil.objects.
Type-specific classes for boolean elements or keys.
Type-specific classes for byte elements or keys.
Type-specific classes for character elements or keys.
Type-specific classes for double elements or keys.
Type-specific classes for float elements or keys.
Type-specific classes for integer elements or keys.
Classes and static methods that make object and primitive-type I/O easier and faster.
Type-specific classes for long elements or keys.
Type-specific classes for object elements or keys.
Type-specific classes for short elements or keys.