Introduction
fastutil
extends the
Java™
Collections Framework by providing type-specific maps, sets, lists
and queues with a small memory footprint and fast access and
insertion; provides also big (64-bit) arrays, sets and lists, and
fast, practical I/O classes for binary and text files. It is
free software distributed under the Apache License 2.0.
The classes implement their standard counterpart interface (e.g.,
Map
for maps) and can be plugged into existing code. Moreover,
they provide additional features (such as bidirectional iterators) that are
not available in the standard classes.
Besides objects and primitive types, fastutil
classes provide support
for references, that is, objects that are compared using the equality
operator rather than the equals()
method.
The sources are generated using a C preprocessor, starting from a set of
driver files. You can peek at the javadoc-generated documentation. In particular, the overview
explains the design choices used in fastutil
.
Core jar
If the standard fastutil
jar is too large, there is a core jar
containing only data structures specific for integers, longs and doubles. Note that
those classes are duplicated in the standard jar, so if you are depending on both
(for example, because of transitive dependencies) you should exclude the core jar.
Previous split versions would provide different classes in different jars, but managing sensibly dependencies turned out to be impossible.
Speed
fastutil
provides in many cases the fastest
implementations available. You can find many other implementations of primitive
collections (e.g., HPPC, Koloboke,
etc.). Sometimes authors are a little bit quick in defining their implementations
the “fastest available“: the truth is, you have to take decisions
in any implementation. These decisions make your implementation faster of slower
in different scenarios. I suggest to always test speed within your own
application, rather than relying on general benchmarks, and ask the authors
for suggestions about how to use the libraries in an optimal way. In particular,
when testing hash-based data structures you should always set explicitly the load factor,
as speed is strongly dependent on the length of collision chains.
Big data structures
With fastutil
6, a new set of classes makes it possible
to handle very large collections: in particular, collections whose size exceeds
231. Big arrays
are arrays-of-arrays handled by a wealth of static methods that act on them
as if they were monodimensional arrays with 64-bit indices, and
big lists provide 64-bit list access.
The size of a hash big set
is limited only by the amount of core memory.
The usual methods from java.util.Arrays
and similar classes have
been extended to big arrays: have a look at the Javadoc documentation of
BigArrays
and
IntBigArrays
to get an idea of the generic and type-specific methods available.
History and Motivation
fastutil
came up as a necessity during the development
of UbiCrawler,
as we needed to manage structures with dozens of millions of items
very efficiently. The same reasons led to the development of the high-performance classes of dsiutils and
MG4J (e.g., MutableString
).