You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
With large hash collisions hashed gs-collections seem to perform a lot worse than the stock HotSpot hashed collections. Initial superficial analysis seems to suggest that with large hash collisions gs-collections also scale a lot worse than the stock HotSpot collections.
Generally large hash collisions are a symptom of bad hash code implementations is user code. However sometimes an attacker can craft input that results in large hash collisions. An example is when an attacker is able to supply strings that end up as keys in hashed collections and he can chose the input such that all keys have the same hash code. This is generally the case for any key-value based input, examples include XML element or attribute names, XML namespace prefixes, names of JSON object properties, HTTP header names and query parameters of any kind. If an attacker is able to create large enough hash collisions he may be able to degrade application performance enough to create a DoS attack.
String#hashCode() is quite susceptible to hash collisions. There are 2048 strings of length 2 that have the same hash code (see below). Strings have the property that if a.hashCode() == b.hashCode() then (a + b).hashCode() == (b + a).hashCode(). This means out of these 2048 strings one can generate 4194304 strings of length 4 with the same hash code.
To mitigate this changes where introduced to HashMap (the implementation of String#hashCode() is specified in the Javadoc and Oracle does not want to change it). As a stop gap measure HashMap used MurmurHash 3 with a random seed to hash strings by calling sun.misc.Hashing.stringHash32 instead of calling String#hashCode() with an instanceof check in HashMap.hash(Object). This changce seems to have been rolled back in recent Java 8 versions. The second change was to overflow to a red-black tree instead of a (linked) list when there are too many hash collisions for a slot. This would probably be the most appropriate change for gs-collections as well.
The following are results for JMH benchmarks with 2048 elements all having the same hash code.
As you can see sometimes gs-collections is about factor 50 slower than the stock HotSpot collections.
However if it's a scalability difference then more elements would produce a larger difference.
With large hash collisions hashed gs-collections seem to perform a lot worse than the stock HotSpot hashed collections. Initial superficial analysis seems to suggest that with large hash collisions gs-collections also scale a lot worse than the stock HotSpot collections.
Generally large hash collisions are a symptom of bad hash code implementations is user code. However sometimes an attacker can craft input that results in large hash collisions. An example is when an attacker is able to supply strings that end up as keys in hashed collections and he can chose the input such that all keys have the same hash code. This is generally the case for any key-value based input, examples include XML element or attribute names, XML namespace prefixes, names of JSON object properties, HTTP header names and query parameters of any kind. If an attacker is able to create large enough hash collisions he may be able to degrade application performance enough to create a DoS attack.
String#hashCode()
is quite susceptible to hash collisions. There are 2048 strings of length 2 that have the same hash code (see below). Strings have the property that ifa.hashCode() == b.hashCode()
then(a + b).hashCode() == (b + a).hashCode()
. This means out of these 2048 strings one can generate 4194304 strings of length 4 with the same hash code.To mitigate this changes where introduced to HashMap (the implementation of
String#hashCode()
is specified in the Javadoc and Oracle does not want to change it). As a stop gap measure HashMap used MurmurHash 3 with a random seed to hash strings by callingsun.misc.Hashing.stringHash32
instead of callingString#hashCode()
with aninstanceof
check inHashMap.hash(Object)
. This changce seems to have been rolled back in recent Java 8 versions. The second change was to overflow to a red-black tree instead of a (linked) list when there are too many hash collisions for a slot. This would probably be the most appropriate change for gs-collections as well.This is all based on Schadcode - Schadenfreude for Hash Functions.
Sample code for producing collisions an be found in marschall/hashcollisions.
The following are results for JMH benchmarks with 2048 elements all having the same hash code.
As you can see sometimes gs-collections is about factor 50 slower than the stock HotSpot collections.
However if it's a scalability difference then more elements would produce a larger difference.
Jcf = Java Colllections Framework (stock HotSpot collections) JDK 1.8u60
The following strings all have the same hash code (1216347)
The text was updated successfully, but these errors were encountered: