{"id":255525,"date":"2016-06-06T16:11:14","date_gmt":"2016-06-06T08:11:14","guid":{"rendered":"http:\/\/blog.zhenglei.net\/?p=255525"},"modified":"2016-06-06T16:11:14","modified_gmt":"2016-06-06T08:11:14","slug":"which-hashing-algorithm-is-best-for-uniqueness-and-speed","status":"publish","type":"post","link":"https:\/\/blog.zhenglei.net\/?p=255525","title":{"rendered":"Which hashing algorithm is best for uniqueness and speed?"},"content":{"rendered":"<p><a href=\"http:\/\/programmers.stackexchange.com\/questions\/49550\/which-hashing-algorithm-is-best-for-uniqueness-and-speed\">http:\/\/programmers.stackexchange.com\/questions\/49550\/which-hashing-algorithm-is-best-for-uniqueness-and-speed<\/a><\/p>\n<p>I tested some different algorithms, measuring speed and number of collisions.<\/p>\n<p>I used three different key sets:<\/p>\n<p>    A list of 216,553 English words (in lowercase)<br \/>\n    The numbers &#8220;1&#8221; to &#8220;216553&#8221; (think ZIP codes, and how a poor hash took down msn.com)<br \/>\n    216,553 &#8220;random&#8221; (i.e. type 4 uuid) GUIDs<\/p>\n<p>For each corpus, the number of collisions and the average time spent hashing was recorded.<\/p>\n<p>I tested:<\/p>\n<p>    DJB2<br \/>\n    DJB2a (variant using xor rather than +)<br \/>\n    FNV-1 (32-bit)<br \/>\n    FNV-1a (32-bit)<br \/>\n    SDBM<br \/>\n    CRC32<br \/>\n    Murmur2 (32-bit)<br \/>\n    SuperFastHash<\/p>\n<p>Results<\/p>\n<p>Each result contains the average hash time, and the number of collisions<\/p>\n<p>Hash           Lowercase      Random UUID  Numbers<br \/>\n=============  =============  ===========  ==============<br \/>\nMurmur            145 ns      259 ns          92 ns<br \/>\n                    6 collis    5 collis       0 collis<br \/>\nFNV-1a            152 ns      504 ns          86 ns<br \/>\n                    4 collis    4 collis       0 collis<br \/>\nFNV-1             184 ns      730 ns          92 ns<br \/>\n                    1 collis    5 collis       0 collis\u25aa<br \/>\nDBJ2a             158 ns      443 ns          91 ns<br \/>\n                    5 collis    6 collis       0 collis\u25aa\u25aa\u25aa<br \/>\nDJB2              156 ns      437 ns          93 ns<br \/>\n                    7 collis    6 collis       0 collis\u25aa\u25aa\u25aa<br \/>\nSDBM              148 ns      484 ns          90 ns<br \/>\n                    4 collis    6 collis       0 collis**<br \/>\nSuperFastHash     164 ns      344 ns         118 ns<br \/>\n                   85 collis    4 collis   18742 collis<br \/>\nCRC32             250 ns      946 ns         130 ns<br \/>\n                    2 collis    0 collis       0 collis<br \/>\nLoseLose          338 ns        &#8211;             &#8211;<br \/>\n               215178 collis<\/p>\n<p>Notes:<\/p>\n<p>    The LoseLose algorithm (where hash = hash+character) is truly awful. Everything collides into the same 1,375 buckets<br \/>\n    SuperFastHash is fast, with things looking pretty scattered; by my goodness the number collisions. I&#8217;m hoping the guy who ported it got something wrong; it&#8217;s pretty bad<br \/>\n    CRC32 is pretty good. Slower, and a 1k lookup table<\/p>\n<p>Do collisions actually happen?<\/p>\n<p>Yes. I started writing my test program to see if hash collisions actually happen &#8211; and are not just a theoretical construct. They do indeed happen:<\/p>\n<p>FNV-1 collisions<\/p>\n<p>    creamwove collides with quists<\/p>\n<p>FNV-1a collisions<\/p>\n<p>    costarring collides with liquid<br \/>\n    declinate collides with macallums<br \/>\n    altarage collides with zinke<br \/>\n    altarages collides with zinkes<\/p>\n<p>Murmur2 collisions<\/p>\n<p>    cataract collides with periti<br \/>\n    roquette collides with skivie<br \/>\n    shawl collides with stormbound<br \/>\n    dowlases collides with tramontane<br \/>\n    cricketings collides with twanger<br \/>\n    longans collides with whigs<\/p>\n<p>DJB2 collisions<\/p>\n<p>    hetairas collides with mentioner<br \/>\n    heliotropes collides with neurospora<br \/>\n    depravement collides with serafins<br \/>\n    stylist collides with subgenera<br \/>\n    joyful collides with synaphea<br \/>\n    redescribed collides with urites<br \/>\n    dram collides with vivency<\/p>\n<p>DJB2a collisions<\/p>\n<p>    haggadot collides with loathsomenesses<br \/>\n    adorablenesses collides with rentability<br \/>\n    playwright collides with snush<br \/>\n    playwrighting collides with snushing<br \/>\n    treponematoses collides with waterbeds<\/p>\n<p>CRC32 collisions<\/p>\n<p>    codding collides with gnu<br \/>\n    exhibiters collides with schlager<\/p>\n<p>SuperFastHash collisions<\/p>\n<p>    dahabiah collides with drapability<br \/>\n    encharm collides with enclave<br \/>\n    grahams collides with gramary<br \/>\n    &#8230;snip 79 collisions&#8230;<br \/>\n    night collides with vigil<br \/>\n    nights collides with vigils<br \/>\n    finks collides with vinic<\/p>\n<p>Randomnessification<\/p>\n<p>The other subjective measure is how randomly distributed the hashes are. Mapping the resulting HashTables shows how evenly the data is distributed. All the hash functions show good distribution when mapping the table linearly:<\/p>\n<p>Enter image description here<\/p>\n<p>Or as a Hilbert Map (XKCD is always relevant):<\/p>\n<p>Enter image description here<\/p>\n<p>Except when hashing number strings (&#8220;1&#8221;, &#8220;2&#8221;, &#8230;, &#8220;216553&#8221;) (for example, zip codes), where patterns begin to emerge in most of the hashing algorithms:<\/p>\n<p>SDBM:<\/p>\n<p>Enter image description here<\/p>\n<p>DJB2a:<\/p>\n<p>Enter image description here<\/p>\n<p>FNV-1:<\/p>\n<p>Enter image description here<\/p>\n<p>All except FNV-1a, which still look plenty random to me:<\/p>\n<p>Enter image description here<\/p>\n<p>In fact, Murmur2 seems to have even better randomness with Numbers than FNV-1a:<\/p>\n<p>Enter image description here<\/p>\n<p>    When I look at the FNV-1a &#8220;number&#8221; map, I think I see subtle vertical patterns. With Murmur I see no patterns at all. What do you think?<\/p>\n<p>The extra * in the above table denotes how bad the randomness is. With FNV-1a being the best, and DJB2x being the worst:<\/p>\n<p>      Murmur2: .<br \/>\n       FNV-1a: .<br \/>\n        FNV-1: \u25aa<br \/>\n         DJB2: \u25aa\u25aa<br \/>\n        DJB2a: \u25aa\u25aa<br \/>\n         SDBM: \u25aa\u25aa\u25aa<br \/>\nSuperFastHash: .<br \/>\n          CRC: \u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa<br \/>\n     Loselose: \u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa<br \/>\n                                        \u25aa<br \/>\n                                 \u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa<br \/>\n                        \u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa<br \/>\n          \u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa\u25aa<\/p>\n<p>I originally wrote this program to decide if I even had to worry about collisions: I do.<\/p>\n<p>And then it turned into making sure that the hash functions were sufficiently random.<br \/>\nFNV-1a algorithm<\/p>\n<p>The FNV1 hash comes in variants that return 32, 64, 128, 256, 512 and 1024 bit hashes.<\/p>\n<p>The FNV-1a algorithm is:<\/p>\n<p>hash = FNV_offset_basis<br \/>\nfor each octetOfData to be hashed<br \/>\n    hash = hash xor octetOfData<br \/>\n    hash = hash * FNV_prime<br \/>\nreturn hash<\/p>\n<p>Where the constants FNV_offset_basis and FNV_prime depend on the return hash size you want:<\/p>\n<p>Hash Size    Prime                       Offset<br \/>\n===========  =========================== =================================<br \/>\n32-bit       16777619                    2166136261<br \/>\n64-bit       1099511628211               14695981039346656037<br \/>\n128-bit      309485009821345068724781371 144066263297769815596495629667062367629<br \/>\n256-bit<br \/>\n    prime: 2^168 + 2^8 + 0x63 = 374144419156711147060143317175368453031918731002211<br \/>\n    offset: 100029257958052580907070968620625704837092796014241193945225284501741471925557<br \/>\n512-bit<br \/>\n    prime: 2^344 + 2^8 + 0x57 = 35835915874844867368919076489095108449946327955754392558399825615420669938882575126094039892345713852759<br \/>\n    offset: 9659303129496669498009435400716310466090418745672637896108374329434462657994582932197716438449813051892206539805784495328239340083876191928701583869517785<br \/>\n1024-bit<br \/>\n    prime: 2^680 + 2^8 + 0x8d = 5016456510113118655434598811035278955030765345404790744303017523831112055108147451509157692220295382716162651878526895249385292291816524375083746691371804094271873160484737966720260389217684476157468082573<br \/>\n    offset: 1419779506494762106872207064140321832088062279544193396087847491461758272325229673230371772250864096521202355549365628174669108571814760471015076148029755969804077320157692458563003215304957150157403644460363550505412711285966361610267868082893823963790439336411086884584107735010676915<\/p>\n<p>See the main FNV page for details.<\/p>\n<p>As a practical matter:<\/p>\n<p>    32-bit UInt32,<br \/>\n    64-bit UInt64, and<br \/>\n    128-bit Guid can be useful<\/p>\n<p>All my results are with the 32-bit variant.<br \/>\nFNV-1 better than FNV-1a?<\/p>\n<p>No. FNV-1a is all around better. There was more collisions with FNV-1a when using the English word corpus:<\/p>\n<p>Hash    Word Collisions<br \/>\n======  ===============<br \/>\nFNV-1   1<br \/>\nFNV-1a  4<\/p>\n<p>Now compare lowercase and uppercase:<\/p>\n<p>Hash    lowercase word Collisions  UPPERCASE word collisions<br \/>\n======  =========================  =========================<br \/>\nFNV-1   1                          9<br \/>\nFNV-1a  4                          11<\/p>\n<p>In this case FNV-1a isn&#8217;t &#8220;400%&#8221; worse than FN-1, only 20% worse.<\/p>\n<p>I think the more important takeaway is that there are two classes of algorithms when it comes to collisions:<\/p>\n<p>    collisions rare: FNV-1, FNV-1a, DJB2, DJB2a, SDBM<br \/>\n    collisions common: SuperFastHash, Loselose<\/p>\n<p>And then there&#8217;s the how evenly distributed the hashes are:<\/p>\n<p>    outstanding distribution: Murmur2, FNV-1a, SuperFastHas<br \/>\n    excellent distribution: FNV-1<br \/>\n    good distribution: SDBM, DJB2, DJB2a<br \/>\n    horrible distribution: Loselose<\/p>\n<p>Update<\/p>\n<p>Murmur? Sure, why not<\/p>\n<p>Update<\/p>\n<p>@whatshisname wondered how a CRC32 would perform, added numbers to the table.<\/p>\n<p>CRC32 is pretty good. Few collisions, but slower, and the overhead of a 1k lookup table.<\/p>\n<p>Snip all erroneous stuff about CRC distribution &#8211; my bad<\/p>\n<p>Up until today I was going to use FNV-1a as my de facto hash-table hashing algorithm. But now I&#8217;m switching to Murmur2:<\/p>\n<p>    Faster<br \/>\n    Better randomnessification of all classes of input<\/p>\n<p>And I really, really hope there&#8217;s something wrong with the SuperFastHash algorithm I found; it&#8217;s too bad to be as popular as it is.<\/p>\n<p>Update: From the MurmurHash3 homepage on Google:<\/p>\n<p>    (1) &#8211; SuperFastHash has very poor collision properties, which have been documented elsewhere.<\/p>\n<p>So I guess it&#8217;s not just me.<\/p>\n<p>Update: I realized why Murmur is faster than the others. MurmurHash2 operates on four bytes at a time. Most algorithms are byte by byte:<\/p>\n<p>for each octet in Key<br \/>\n   AddTheOctetToTheHash<\/p>\n<p>This means that as keys get longer Murmur gets its chance to shine.<\/p>\n<p>Update<br \/>\nGUIDs are designed to be unique, not random<\/p>\n<p>A timely post by Raymond Chen reiterates the fact that &#8220;random&#8221; GUIDs are not meant to be used for their randomness. They, or a subset of them, are unsuitable as a hash key:<\/p>\n<p>    Even the Version 4 GUID algorithm is not guaranteed to be unpredictable, because the algorithm does not specify the quality of the random number generator. The Wikipedia article for GUID contains primary research which suggests that future and previous GUIDs can be predicted based on knowledge of the random number generator state, since the generator is not cryptographically strong.<\/p>\n<p>Randomess is not the same as collision avoidance; which is why it would be a mistake to try to invent your own &#8220;hashing&#8221; algorithm by taking some subset of a &#8220;random&#8221; guid:<\/p>\n<p>int HashKeyFromGuid(Guid type4uuid)<br \/>\n{<br \/>\n   \/\/A &#8220;4&#8221; is put somewhere in the GUID.<br \/>\n   \/\/I can&#8217;t remember exactly where, but it doesn&#8217;t matter for<br \/>\n   \/\/the illustrative purposes of this pseudocode<br \/>\n   int guidVersion = ((type4uuid.D3 &amp; 0x0f00) &gt;&gt; 8);<br \/>\n   Assert(guidVersion == 4);<\/p>\n<p>   return (int)GetFirstFourBytesOfGuid(type4uuid);<br \/>\n}<\/p>\n<p>Note: Again, I put &#8220;random GUID&#8221; in quotes, because it&#8217;s the &#8220;random&#8221; variant of GUIDs. A more accurate description would be Type 4 UUID. But nobody knows what type 4, or types 1, 3 and 5 are. So it&#8217;s just easier to call them &#8220;random&#8221; GUIDs.<br \/>\nAll English Words mirrors<\/p>\n<p>    http:\/\/www.filedropper.com\/allenglishwords<br \/>\n    https:\/\/web.archive.org\/web\/20070221060514\/http:\/\/www.sitopreferito.it\/html\/all_english_words.html<\/p>\n<p>shareimprove this answer<\/p>\n<p>edited Nov 9 &#8217;15 at 18:12<\/p>\n<p>answered Apr 23 &#8217;12 at 12:42<br \/>\nIan Boyd<br \/>\n12.6k1911<\/p>\n<p>181 \t <\/p>\n<p>some of the collisions make awesome band names. Particularly &#8216;Adorable Rentability&#8217; \u2013 mcfinnigan Apr 23 &#8217;12 at 14:35<br \/>\n27 \t <\/p>\n<p>Also, I&#8217;d love to hear how you generated these results (source code and\/or tools) \u2013 Earlz Apr 23 &#8217;12 at 17:40<br \/>\n32 \t <\/p>\n<p>@Earlz Development tool is Delphi. i assume you mean the images though. For &#8220;linear&#8221; map i created a square bitmap of size nxn, (where n = Ceil(sqrt(hashTable.Capacity))). Rather than simply black for list entry is occupied and white for list entry is empty, i used an HSLtoRGB function, where the hue ranged from 0 (red) to 300 (magenta). White is still an &#8220;empty list cell&#8221;. For the Hilbert map i had to hunt wikipedia for the algorithm that turns an index into an (x,y) coordinate. \u2013 Ian Boyd Apr 23 &#8217;12 at 18:15<br \/>\n32 \t <\/p>\n<p>I&#8217;ve removed a number of comments that were along the lines of &#8220;+1 great answer!&#8221; &#8211; Please don&#8217;t post comments that don&#8217;t ask for clarifications or add information to the answer, if you feel it&#8217;s a great answer, upvote it \ud83d\ude09 \u2013 Yannis\u2666 Apr 28 &#8217;12 at 20:34<br \/>\n11 \t <\/p>\n<p>It would be really interesting to see how SHA compares, not because it&#8217;s a good candidate for a hashing algorithm here but it would be really interesting to see how any cryptographic hash compares with these made for speed algorithms. \u2013 Michael May 25 &#8217;12 at 15:09<br \/>\nshow 69 more comments<br \/>\nup vote 30 down vote<\/p>\n<p>Here is a list of hash functions, but the short version is:<\/p>\n<p>    If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. It has excellent distribution and speed on many different sets of keys and table sizes<\/p>\n<p>unsigned long<br \/>\nhash(unsigned char *str)<br \/>\n{<br \/>\n    unsigned long hash = 5381;<br \/>\n    int c;<\/p>\n<p>    while (c = *str++)<br \/>\n        hash = ((hash &lt;&lt; 5) + hash) + c; \/* hash * 33 + c *\/<\/p>\n<p>    return hash;<br \/>\n}<\/p>\n<p>shareimprove this answer<\/p>\n<p>answered Feb 19 &#039;11 at 1:13<br \/>\nDean Harding<br \/>\n17.8k33967<\/p>\n<p>2 \t <\/p>\n<p>Actually djb2 is zero sensitive, as most such simple hash functions, so you can easily break such hashes. It has a bad bias too many collisions and a bad distribution, it breaks on most smhasher quality tests: See github.com\/rurban\/smhasher\/blob\/master\/doc\/bernstein His cdb database uses it, but I wouldn&#039;t use it with public access. \u2013 rurban Aug 20 &#039;14 at 6:03<br \/>\nadd a comment<br \/>\nup vote 26 down vote<\/p>\n<p>If you are wanting to create a hash map from an unchanging dictionary, you might want to consider perfect hashing https:\/\/en.wikipedia.org\/wiki\/Perfect_hash_function &#8211; during the construction of the hash function and hash table, you can guarantee, for a given dataset, that there will be no collisions.<br \/>\nshareimprove this answer<\/p>\n<p>answered May 25 &#039;12 at 3:16<br \/>\nDamien<br \/>\n26132<\/p>\n<p>4 \t <\/p>\n<p>+1 I wasn&#039;t aware of such an algorithm \u2013 Earlz May 25 &#039;12 at 5:00<br \/>\n2 \t <\/p>\n<p>Here&#039;s more about (minimal) Perfect Hashing burtleburtle.net\/bob\/hash\/perfect.html including performance data, although it doesn&#039;t use the most current processor etc. \u2013 Ellie Kesselman May 29 &#039;12 at 12:24<br \/>\n3 \t <\/p>\n<p>It&#039;s pretty obvious, but worth pointing out that in order to guarantee no collisions, the keys would have to be the same size as the values, unless there are constraints on the values the algorithm can capitalize on. \u2013 devios Apr 4 &#039;13 at 20:34<\/p>\n<p>I improved gperf and provide a nice frontend to most perfect hash generators at github.com\/rurban\/Perfect-Hash It&#039;s not yet finished, but already better then the existing tools. \u2013 rurban Aug 20 &#039;14 at 6:05<\/p>\n<p>@devios: I&#039;ve recently learned that there are several hash table algorithms that guarantee no collisions, even when you use long strings as keys, strings much longer than the hash table index values generated by the hash function, without any constraints on those strings. See cs.stackexchange.com\/questions\/477\/\u2026 . \u2013 David Cary Jun 8 &#039;15 at 17:11<br \/>\nshow 1 more comment<br \/>\nup vote 20 down vote<\/p>\n<p>CityHash by Google is the algorithm you are looking for. It is not good for cryptography but is good for generating unique hashes.<\/p>\n<p>Read the blog for more details and the code is available here.<\/p>\n<p>CityHash is written in C++. There also is a plain C port.<\/p>\n<p>About 32-bit support:<\/p>\n<p>    All the CityHash functions are tuned for 64-bit processors. That said, they will run (except for the new ones that use SSE4.2) in 32-bit code. They won&#039;t be very fast though. You may want to use Murmur or something else in 32-bit code.<\/p>\n<p>shareimprove this answer<\/p>\n<p>edited May 29 &#039;12 at 11:56<br \/>\nJanX2<br \/>\n1051<\/p>\n<p>answered May 25 &#039;12 at 10:29<br \/>\nVipin Parakkat<br \/>\n30924<\/p>\n<p>6 \t <\/p>\n<p>Is CityHash pronounced similar to &quot;City Sushi?&quot; \u2013 Eric Mar 20 &#039;13 at 21:20<\/p>\n<p>Have a look at SipHash too, it is meant to replace MurmurHash\/CityHash\/etc. : 131002.net\/siphash \u2013 T\u00f6r\u00f6k Edwin Oct 15 &#039;13 at 8:47<br \/>\n1 \t <\/p>\n<p>Also see FarmHash, a successor to CitHash. code.google.com\/p\/farmhash \u2013 stevendaniels Mar 18 &#039;15 at 13:15<br \/>\n1 \t <\/p>\n<p>xxHash claims to be 5x faster than CityHash. \u2013 Clay Bridges May 22 &#039;15 at 15:56<br \/>\nadd a comment<br \/>\nup vote 12 down vote<\/p>\n<p>The SHA algorithms (including SHA-256) are designed to be fast.<\/p>\n<p>In fact, their speed can be a problem sometimes. In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10,000 times (storing the hash of the hash of the hash of the hash of the &#8230; password).<\/p>\n<p>#!\/usr\/bin\/env ruby<br \/>\nrequire &#039;securerandom&#039;<br \/>\nrequire &#039;digest&#039;<br \/>\nrequire &#039;benchmark&#039;<\/p>\n<p>def run_random_digest(digest, count)<br \/>\n  v = SecureRandom.random_bytes(digest.block_length)<br \/>\n  count.times { v = digest.digest(v) }<br \/>\n  v<br \/>\nend<\/p>\n<p>Benchmark.bmbm do |x|<br \/>\n  x.report { run_random_digest(Digest::SHA256.new, 1_000_000) }<br \/>\nend<\/p>\n<p>Output:<\/p>\n<p>Rehearsal &#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;<br \/>\n   1.480000   0.000000   1.480000 (  1.391229)<br \/>\n&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212;&#8212; total: 1.480000sec<\/p>\n<p>       user     system      total        real<br \/>\n   1.400000   0.000000   1.400000 (  1.382016)<\/p>\n<p>shareimprove this answer<\/p>\n<p>answered Feb 19 &#039;11 at 0:21<br \/>\nyfeldblum<br \/>\n1,32579<\/p>\n<p>36 \t <\/p>\n<p>It&#039;s relatively fast, sure, for a cryptographic hashing algorithm. But the OP just wants to store values in a hashtable, and I don&#039;t think a cryptographic hash function is really appropriate for that. \u2013 Dean Harding Feb 19 &#039;11 at 1:10<br \/>\n6 \t <\/p>\n<p>The question brought up (tangentially, it now appears) the subject of the cryptographic hash functions. That&#039;s the bit I am responding to. \u2013 yfeldblum Feb 22 &#039;11 at 13:14<br \/>\n7 \t <\/p>\n<p>Just to put people off the idea of &quot;In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10,000 times&quot; &#8212; while common, that&#039;s just plain stupid. There are algorithms designed for these scenarios, e.g., bcrypt. Use the right tools. \u2013 TC1 Oct 14 &#039;13 at 13:19<br \/>\n1 \t <\/p>\n<p>Cryptographic hashes are designed to have a high throughput, but that often means they have high setup, teardown, .rodata and\/or state costs. When you want an algorithm for a hashtable, you usually have very short keys, and lots of them, but do not need the additional guarantees of a cryptographic has. I use a tweaked Jenkins\u2019 one-at-a-time myself. \u2013 mirabilos Dec 6 &#039;13 at 13:57<br \/>\nadd a comment<br \/>\nup vote 9 down vote<\/p>\n<p>I&#039;ve plotted a short speed comparasion of different hashing algorithms when hashing files.<\/p>\n<p>The individual plots only differ slightly in the reading method and can be ignored here, since all files were stored in a tmpfs. Therefore the benchmark was not IO-Bound if you are wondering.<\/p>\n<p>    Linear Scale<br \/>\n    Logarithmic scale<\/p>\n<p>Algorithms include: SpookyHash, CityHash, Murmur3, MD5, SHA{1,256,512}.<\/p>\n<p>Conclusions:<\/p>\n<p>    Non-cryptographich hasfunctions like Murmur3, Cityhash and Spooky are pretty close together. One should note that Cityhash may be faster on CPUs with SSE 4.2s CRC instruction, which my CPU does not have. SpookyHash was in my case always a tiny bit before CityHash.<br \/>\n    MD5 seems to be a good tradeoff when using cryptographic hashfunctions, although SHA256 may be more secure to the collision vulnerabilities of MD5 and SHA1.<br \/>\n    The complexity of all algorithms is linear &#8211; which is really not surprising since they work blockwise. (I wanted to see if the reading method makes a difference, so you can just compare the rightmost values).<br \/>\n    SHA256 was slower than SHA512.<br \/>\n    I did not investigate the randomness of the hashfunctions. But here is a good comparasion of the hashfunctions that are missing in Ian Boyds answer. This points out that CityHash has some problems in corner cases.<\/p>\n<p>The source used for the plots:<\/p>\n<p>    https:\/\/github.com\/sahib\/rmlint\/tree\/gh-pages\/plots (sorry for the ugly code)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>http:\/\/programmers.stackexchange.com\/que &hellip; <a href=\"https:\/\/blog.zhenglei.net\/?p=255525\">\u7ee7\u7eed\u9605\u8bfb <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[293],"class_list":["post-255525","post","type-post","status-publish","format-standard","hentry","category-linux","tag-hash"],"_links":{"self":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts\/255525","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=255525"}],"version-history":[{"count":2,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts\/255525\/revisions"}],"predecessor-version":[{"id":255527,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=\/wp\/v2\/posts\/255525\/revisions\/255527"}],"wp:attachment":[{"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=255525"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=255525"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.zhenglei.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=255525"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}