• Which hashing algorithm is best for uniqueness and speed?

    http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

    I tested some different algorithms, measuring speed and number of collisions.

    I used three different key sets:

    A list of 216,553 English words (in lowercase)
    The numbers “1” to “216553” (think ZIP codes, and how a poor hash took down msn.com)
    216,553 “random” (i.e. type 4 uuid) GUIDs

    For each corpus, the number of collisions and the average time spent hashing was recorded.

    I tested:

    DJB2
    DJB2a (variant using xor rather than +)
    FNV-1 (32-bit)
    FNV-1a (32-bit)
    SDBM
    CRC32
    Murmur2 (32-bit)
    SuperFastHash

    Results

    Each result contains the average hash time, and the number of collisions

    Hash Lowercase Random UUID Numbers
    ============= ============= =========== ==============
    Murmur 145 ns 259 ns 92 ns
    6 collis 5 collis 0 collis
    FNV-1a 152 ns 504 ns 86 ns
    4 collis 4 collis 0 collis
    FNV-1 184 ns 730 ns 92 ns
    1 collis 5 collis 0 collis▪
    DBJ2a 158 ns 443 ns 91 ns
    5 collis 6 collis 0 collis▪▪▪
    DJB2 156 ns 437 ns 93 ns
    7 collis 6 collis 0 collis▪▪▪
    SDBM 148 ns 484 ns 90 ns
    4 collis 6 collis 0 collis**
    SuperFastHash 164 ns 344 ns 118 ns
    85 collis 4 collis 18742 collis
    CRC32 250 ns 946 ns 130 ns
    2 collis 0 collis 0 collis
    LoseLose 338 ns – –
    215178 collis

    Notes:

    The LoseLose algorithm (where hash = hash+character) is truly awful. Everything collides into the same 1,375 buckets
    SuperFastHash is fast, with things looking pretty scattered; by my goodness the number collisions. I’m hoping the guy who ported it got something wrong; it’s pretty bad
    CRC32 is pretty good. Slower, and a 1k lookup table

    Do collisions actually happen?

    Yes. I started writing my test program to see if hash collisions actually happen – and are not just a theoretical construct. They do indeed happen:

    FNV-1 collisions

    creamwove collides with quists

    FNV-1a collisions

    costarring collides with liquid
    declinate collides with macallums
    altarage collides with zinke
    altarages collides with zinkes

    Murmur2 collisions

    cataract collides with periti
    roquette collides with skivie
    shawl collides with stormbound
    dowlases collides with tramontane
    cricketings collides with twanger
    longans collides with whigs

    DJB2 collisions

    hetairas collides with mentioner
    heliotropes collides with neurospora
    depravement collides with serafins
    stylist collides with subgenera
    joyful collides with synaphea
    redescribed collides with urites
    dram collides with vivency

    DJB2a collisions

    haggadot collides with loathsomenesses
    adorablenesses collides with rentability
    playwright collides with snush
    playwrighting collides with snushing
    treponematoses collides with waterbeds

    CRC32 collisions

    codding collides with gnu
    exhibiters collides with schlager

    SuperFastHash collisions

    dahabiah collides with drapability
    encharm collides with enclave
    grahams collides with gramary
    …snip 79 collisions…
    night collides with vigil
    nights collides with vigils
    finks collides with vinic

    Randomnessification

    The other subjective measure is how randomly distributed the hashes are. Mapping the resulting HashTables shows how evenly the data is distributed. All the hash functions show good distribution when mapping the table linearly:

    Enter image description here

    Or as a Hilbert Map (XKCD is always relevant):

    Enter image description here

    Except when hashing number strings (“1”, “2”, …, “216553”) (for example, zip codes), where patterns begin to emerge in most of the hashing algorithms:

    SDBM:

    Enter image description here

    DJB2a:

    Enter image description here

    FNV-1:

    Enter image description here

    All except FNV-1a, which still look plenty random to me:

    Enter image description here

    In fact, Murmur2 seems to have even better randomness with Numbers than FNV-1a:

    Enter image description here

    When I look at the FNV-1a “number” map, I think I see subtle vertical patterns. With Murmur I see no patterns at all. What do you think?

    The extra * in the above table denotes how bad the randomness is. With FNV-1a being the best, and DJB2x being the worst:

    Murmur2: .
    FNV-1a: .
    FNV-1: ▪
    DJB2: ▪▪
    DJB2a: ▪▪
    SDBM: ▪▪▪
    SuperFastHash: .
    CRC: ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
    Loselose: ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪

    ▪▪▪▪▪▪▪▪▪▪▪▪▪
    ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
    ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪

    I originally wrote this program to decide if I even had to worry about collisions: I do.

    And then it turned into making sure that the hash functions were sufficiently random.
    FNV-1a algorithm

    The FNV1 hash comes in variants that return 32, 64, 128, 256, 512 and 1024 bit hashes.

    The FNV-1a algorithm is:

    hash = FNV_offset_basis
    for each octetOfData to be hashed
    hash = hash xor octetOfData
    hash = hash * FNV_prime
    return hash

    Where the constants FNV_offset_basis and FNV_prime depend on the return hash size you want:

    Hash Size Prime Offset
    =========== =========================== =================================
    32-bit 16777619 2166136261
    64-bit 1099511628211 14695981039346656037
    128-bit 309485009821345068724781371 144066263297769815596495629667062367629
    256-bit
    prime: 2^168 + 2^8 + 0x63 = 374144419156711147060143317175368453031918731002211
    offset: 100029257958052580907070968620625704837092796014241193945225284501741471925557
    512-bit
    prime: 2^344 + 2^8 + 0x57 = 35835915874844867368919076489095108449946327955754392558399825615420669938882575126094039892345713852759
    offset: 9659303129496669498009435400716310466090418745672637896108374329434462657994582932197716438449813051892206539805784495328239340083876191928701583869517785
    1024-bit
    prime: 2^680 + 2^8 + 0x8d = 5016456510113118655434598811035278955030765345404790744303017523831112055108147451509157692220295382716162651878526895249385292291816524375083746691371804094271873160484737966720260389217684476157468082573
    offset: 1419779506494762106872207064140321832088062279544193396087847491461758272325229673230371772250864096521202355549365628174669108571814760471015076148029755969804077320157692458563003215304957150157403644460363550505412711285966361610267868082893823963790439336411086884584107735010676915

    See the main FNV page for details.

    As a practical matter:

    32-bit UInt32,
    64-bit UInt64, and
    128-bit Guid can be useful

    All my results are with the 32-bit variant.
    FNV-1 better than FNV-1a?

    No. FNV-1a is all around better. There was more collisions with FNV-1a when using the English word corpus:

    Hash Word Collisions
    ====== ===============
    FNV-1 1
    FNV-1a 4

    Now compare lowercase and uppercase:

    Hash lowercase word Collisions UPPERCASE word collisions
    ====== ========================= =========================
    FNV-1 1 9
    FNV-1a 4 11

    In this case FNV-1a isn’t “400%” worse than FN-1, only 20% worse.

    I think the more important takeaway is that there are two classes of algorithms when it comes to collisions:

    collisions rare: FNV-1, FNV-1a, DJB2, DJB2a, SDBM
    collisions common: SuperFastHash, Loselose

    And then there’s the how evenly distributed the hashes are:

    outstanding distribution: Murmur2, FNV-1a, SuperFastHas
    excellent distribution: FNV-1
    good distribution: SDBM, DJB2, DJB2a
    horrible distribution: Loselose

    Update

    Murmur? Sure, why not

    Update

    @whatshisname wondered how a CRC32 would perform, added numbers to the table.

    CRC32 is pretty good. Few collisions, but slower, and the overhead of a 1k lookup table.

    Snip all erroneous stuff about CRC distribution – my bad

    Up until today I was going to use FNV-1a as my de facto hash-table hashing algorithm. But now I’m switching to Murmur2:

    Faster
    Better randomnessification of all classes of input

    And I really, really hope there’s something wrong with the SuperFastHash algorithm I found; it’s too bad to be as popular as it is.

    Update: From the MurmurHash3 homepage on Google:

    (1) – SuperFastHash has very poor collision properties, which have been documented elsewhere.

    So I guess it’s not just me.

    Update: I realized why Murmur is faster than the others. MurmurHash2 operates on four bytes at a time. Most algorithms are byte by byte:

    for each octet in Key
    AddTheOctetToTheHash

    This means that as keys get longer Murmur gets its chance to shine.

    Update
    GUIDs are designed to be unique, not random

    A timely post by Raymond Chen reiterates the fact that “random” GUIDs are not meant to be used for their randomness. They, or a subset of them, are unsuitable as a hash key:

    Even the Version 4 GUID algorithm is not guaranteed to be unpredictable, because the algorithm does not specify the quality of the random number generator. The Wikipedia article for GUID contains primary research which suggests that future and previous GUIDs can be predicted based on knowledge of the random number generator state, since the generator is not cryptographically strong.

    Randomess is not the same as collision avoidance; which is why it would be a mistake to try to invent your own “hashing” algorithm by taking some subset of a “random” guid:

    int HashKeyFromGuid(Guid type4uuid)
    {
    //A “4” is put somewhere in the GUID.
    //I can’t remember exactly where, but it doesn’t matter for
    //the illustrative purposes of this pseudocode
    int guidVersion = ((type4uuid.D3 & 0x0f00) >> 8);
    Assert(guidVersion == 4);

    return (int)GetFirstFourBytesOfGuid(type4uuid);
    }

    Note: Again, I put “random GUID” in quotes, because it’s the “random” variant of GUIDs. A more accurate description would be Type 4 UUID. But nobody knows what type 4, or types 1, 3 and 5 are. So it’s just easier to call them “random” GUIDs.
    All English Words mirrors

    http://www.filedropper.com/allenglishwords
    https://web.archive.org/web/20070221060514/http://www.sitopreferito.it/html/all_english_words.html

    shareimprove this answer

    edited Nov 9 ’15 at 18:12

    answered Apr 23 ’12 at 12:42
    Ian Boyd
    12.6k1911

    181

    some of the collisions make awesome band names. Particularly ‘Adorable Rentability’ – mcfinnigan Apr 23 ’12 at 14:35
    27

    Also, I’d love to hear how you generated these results (source code and/or tools) – Earlz Apr 23 ’12 at 17:40
    32

    @Earlz Development tool is Delphi. i assume you mean the images though. For “linear” map i created a square bitmap of size nxn, (where n = Ceil(sqrt(hashTable.Capacity))). Rather than simply black for list entry is occupied and white for list entry is empty, i used an HSLtoRGB function, where the hue ranged from 0 (red) to 300 (magenta). White is still an “empty list cell”. For the Hilbert map i had to hunt wikipedia for the algorithm that turns an index into an (x,y) coordinate. – Ian Boyd Apr 23 ’12 at 18:15
    32

    I’ve removed a number of comments that were along the lines of “+1 great answer!” – Please don’t post comments that don’t ask for clarifications or add information to the answer, if you feel it’s a great answer, upvote it 😉 – Yannis♦ Apr 28 ’12 at 20:34
    11

    It would be really interesting to see how SHA compares, not because it’s a good candidate for a hashing algorithm here but it would be really interesting to see how any cryptographic hash compares with these made for speed algorithms. – Michael May 25 ’12 at 15:09
    show 69 more comments
    up vote 30 down vote

    Here is a list of hash functions, but the short version is:

    If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. It has excellent distribution and speed on many different sets of keys and table sizes

    unsigned long
    hash(unsigned char *str)
    {
    unsigned long hash = 5381;
    int c;

    while (c = *str++)
    hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

    return hash;
    }

    shareimprove this answer

    answered Feb 19 '11 at 1:13
    Dean Harding
    17.8k33967

    2

    Actually djb2 is zero sensitive, as most such simple hash functions, so you can easily break such hashes. It has a bad bias too many collisions and a bad distribution, it breaks on most smhasher quality tests: See github.com/rurban/smhasher/blob/master/doc/bernstein His cdb database uses it, but I wouldn't use it with public access. – rurban Aug 20 '14 at 6:03
    add a comment
    up vote 26 down vote

    If you are wanting to create a hash map from an unchanging dictionary, you might want to consider perfect hashing https://en.wikipedia.org/wiki/Perfect_hash_function – during the construction of the hash function and hash table, you can guarantee, for a given dataset, that there will be no collisions.
    shareimprove this answer

    answered May 25 '12 at 3:16
    Damien
    26132

    4

    +1 I wasn't aware of such an algorithm – Earlz May 25 '12 at 5:00
    2

    Here's more about (minimal) Perfect Hashing burtleburtle.net/bob/hash/perfect.html including performance data, although it doesn't use the most current processor etc. – Ellie Kesselman May 29 '12 at 12:24
    3

    It's pretty obvious, but worth pointing out that in order to guarantee no collisions, the keys would have to be the same size as the values, unless there are constraints on the values the algorithm can capitalize on. – devios Apr 4 '13 at 20:34

    I improved gperf and provide a nice frontend to most perfect hash generators at github.com/rurban/Perfect-Hash It's not yet finished, but already better then the existing tools. – rurban Aug 20 '14 at 6:05

    @devios: I've recently learned that there are several hash table algorithms that guarantee no collisions, even when you use long strings as keys, strings much longer than the hash table index values generated by the hash function, without any constraints on those strings. See cs.stackexchange.com/questions/477/… . – David Cary Jun 8 '15 at 17:11
    show 1 more comment
    up vote 20 down vote

    CityHash by Google is the algorithm you are looking for. It is not good for cryptography but is good for generating unique hashes.

    Read the blog for more details and the code is available here.

    CityHash is written in C++. There also is a plain C port.

    About 32-bit support:

    All the CityHash functions are tuned for 64-bit processors. That said, they will run (except for the new ones that use SSE4.2) in 32-bit code. They won't be very fast though. You may want to use Murmur or something else in 32-bit code.

    shareimprove this answer

    edited May 29 '12 at 11:56
    JanX2
    1051

    answered May 25 '12 at 10:29
    Vipin Parakkat
    30924

    6

    Is CityHash pronounced similar to "City Sushi?" – Eric Mar 20 '13 at 21:20

    Have a look at SipHash too, it is meant to replace MurmurHash/CityHash/etc. : 131002.net/siphash – Török Edwin Oct 15 '13 at 8:47
    1

    Also see FarmHash, a successor to CitHash. code.google.com/p/farmhash – stevendaniels Mar 18 '15 at 13:15
    1

    xxHash claims to be 5x faster than CityHash. – Clay Bridges May 22 '15 at 15:56
    add a comment
    up vote 12 down vote

    The SHA algorithms (including SHA-256) are designed to be fast.

    In fact, their speed can be a problem sometimes. In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10,000 times (storing the hash of the hash of the hash of the hash of the … password).

    #!/usr/bin/env ruby
    require 'securerandom'
    require 'digest'
    require 'benchmark'

    def run_random_digest(digest, count)
    v = SecureRandom.random_bytes(digest.block_length)
    count.times { v = digest.digest(v) }
    v
    end

    Benchmark.bmbm do |x|
    x.report { run_random_digest(Digest::SHA256.new, 1_000_000) }
    end

    Output:

    Rehearsal ————————————
    1.480000 0.000000 1.480000 ( 1.391229)
    ————————— total: 1.480000sec

    user system total real
    1.400000 0.000000 1.400000 ( 1.382016)

    shareimprove this answer

    answered Feb 19 '11 at 0:21
    yfeldblum
    1,32579

    36

    It's relatively fast, sure, for a cryptographic hashing algorithm. But the OP just wants to store values in a hashtable, and I don't think a cryptographic hash function is really appropriate for that. – Dean Harding Feb 19 '11 at 1:10
    6

    The question brought up (tangentially, it now appears) the subject of the cryptographic hash functions. That's the bit I am responding to. – yfeldblum Feb 22 '11 at 13:14
    7

    Just to put people off the idea of "In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10,000 times" — while common, that's just plain stupid. There are algorithms designed for these scenarios, e.g., bcrypt. Use the right tools. – TC1 Oct 14 '13 at 13:19
    1

    Cryptographic hashes are designed to have a high throughput, but that often means they have high setup, teardown, .rodata and/or state costs. When you want an algorithm for a hashtable, you usually have very short keys, and lots of them, but do not need the additional guarantees of a cryptographic has. I use a tweaked Jenkins’ one-at-a-time myself. – mirabilos Dec 6 '13 at 13:57
    add a comment
    up vote 9 down vote

    I've plotted a short speed comparasion of different hashing algorithms when hashing files.

    The individual plots only differ slightly in the reading method and can be ignored here, since all files were stored in a tmpfs. Therefore the benchmark was not IO-Bound if you are wondering.

    Linear Scale
    Logarithmic scale

    Algorithms include: SpookyHash, CityHash, Murmur3, MD5, SHA{1,256,512}.

    Conclusions:

    Non-cryptographich hasfunctions like Murmur3, Cityhash and Spooky are pretty close together. One should note that Cityhash may be faster on CPUs with SSE 4.2s CRC instruction, which my CPU does not have. SpookyHash was in my case always a tiny bit before CityHash.
    MD5 seems to be a good tradeoff when using cryptographic hashfunctions, although SHA256 may be more secure to the collision vulnerabilities of MD5 and SHA1.
    The complexity of all algorithms is linear – which is really not surprising since they work blockwise. (I wanted to see if the reading method makes a difference, so you can just compare the rightmost values).
    SHA256 was slower than SHA512.
    I did not investigate the randomness of the hashfunctions. But here is a good comparasion of the hashfunctions that are missing in Ian Boyds answer. This points out that CityHash has some problems in corner cases.

    The source used for the plots:

    https://github.com/sahib/rmlint/tree/gh-pages/plots (sorry for the ugly code)

  • VPS network speed test

    wget –no-check-certificate -O speedtest-cli.py https://github.com/sivel/speedtest-cli/raw/master/speedtest_cli.py

    python speedtest_cli.py

  • flex: input rules are too complicated (>= 32000 NFA states)

    Download source code from sourceforge,

    and rebuild flex with

    sed -i “s/#define MAXIMUM_MNS 31999/#define MAXIMUM_MNS 319999/” src/flexdef.h

  • C minimal perfect hash library

    CMPH

     

    Differ from gperf:

    Support large keys set

    Support load hash table from file

    Support different algorithm

     

     

     

    http://iswsa.acm.org/mphf/index.html

  • AdBlock Plus

    https://adblockplus.org/zh_CN/filters

     

    撰写Adblock Plus 过滤规则

    当前的Adblock Plus版本允许你通过许多不同的方法来优化你的过滤规则。本文档描述了你可以使用的方法,并告诉你如果去使用它。

    声明 :这里给出的所有的过滤规则只是例子而已,并不能直接使用。

    AdBlock Plus 过滤规则介绍

    对 于偶尔才写过滤规则的用户,本章节对过滤规则的描述足矣。

    基本过滤规则

    通常你定义得最琐碎的过滤规则当然是阻挡banner 地址的。但是这些地址在你每次打开页面的时候都会 改变。例如:可能是http://example.com/ads/banner123.gif 。其中的123就是个随机的数字。所以 阻挡完整的图片网址没有效果,通常你需要更通用的过滤规则,例如http://example.com/ads/banner*.gif 甚至 http://example.com/ads/*

    注意 :不要过多地使用通配符。过滤规则:http://example.com/* 虽然可以阻挡所有的banner , 但也会阻挡example.com 下其他一些你想看的内容。

    定义例外规 则

    有时你可能会发现某个过滤规则平时挡广告挡得很好,但在某些情况下,会阻挡不该挡的内容。 你不想移除这条过滤规则,但也不希望它阻挡不该挡的内容。

    这就是例外规则的好处——它们允许 你定义过滤规则不被使用的情况。例如:当你对adv 规则阻挡了 http://example.com/advice.html 不 爽的时候,这时你就可以定义一条例外规则@@advice 。例外规则的写法与过滤规则完全一样,你可以使用通配符和正则表达式。你只需 要通过@@ 表明一个例外的规则。

    例外规则可以处理得更多。如果一条例外规 则以http://https:// (在前面加上管线符号(|))开始,这会使所有的页面都是例外。例如:如果你 的规则是@@|http://example.com ,你浏览http: //example.com 的页面 时,Adblock Plus对这个页面将会失效,这将不会阻挡任何东西。

    匹配网址 开头/结尾

    通常Adblock Plus处理过滤规则时,会自己假设在过滤规则的开头与结尾都有一个通配符,也就是说,ad*ad* 效果是一样 的。通常这并不会带来问题,但有时你可能想要设定可以匹配以网址开头或结尾的过滤规则。例如:你想要阻挡所有的Flash,如果你加入一条规则swf,阻 挡的将不只是以swf结尾的地址, http://www.example.com/swf/index.html 同样也会被阻挡。

    要解决这个问题的办法就是:使用管线符号(|)来代表网址的最前端或最末端。例如:过滤规则swf| 会 阻挡 http://example.com/annoyingflash.swf 但不会阻挡http://example.com/swf/index.html 。 过滤规则|http://baddomain.example/ 会阻挡http://baddomain.example/banner.gif 而 不会阻挡http://gooddomain.example/analyze?http://baddomain.example

    有时你想阻止http://example.com/banner.gifhttps://example.com/banner.gif 以 及 http://www.example.com/banner.gif 。这时只需把两个管线符号(|| )放到 过滤规则的域名前面。||example.com/banner.gif 会阻挡上面的地址,而不会阻挡http://badexample.com/banner.gif 或 者http://gooddomain.example/analyze?http://example.com/banner.gif (需 要 Adblock Plus 1.1 或更高版本).

    标记分隔符

    通常你需要接受过滤规则的任何分隔符。例如,你可能写这样的一个规则:阻挡http://example.com/http://example.com:8000/ 但是不阻挡http://example.com.ar/ 。 这里的符号()用作一个分隔符。{{{http://example.com }}} (需要的AdBlock加1.1或更高).

    分隔符可以是除了– . %之外的任何字符或数字。这个地址的结尾也是作为一个分隔符下面的例子中所有的分隔符以红色标记出:http://example.com:8000/foo.bar?a=12&b=%D1%82%D0%B5%D1%81%D1%82 。 所以这个地址可以通过过滤规则^example.com^ 或者^%D1%82%D0%B5%D1%81%D1%82^ 或者^foo.bar^ 过滤

    注释

    任何以感叹号(!)开始的规则,都被视为注释。在过滤规则的列表中,仍然会显示这些规则,但会用灰色的字来显示,而不是黑色。 Adblock Plus在判断规则时,会忽略这些注释,所以我们可以写下任何我们想写的东西。你可以在一条规则上面写下这条规则是做什么用的。也可以在过滤列表的上方写 上作者信息(大多数过滤列表的作者已经这样做了)。

    进阶功能

    本章节描述的特性通常只有高级用户和维护过滤列表的作者才会看。一般使用者可以跳过。

    指定过滤规则选项

    Adblock Plus可以让你指定某些选项来改变某条规则的行为。你列举这些选项的时候将它们放在美元符号($)后面并用逗号(,)分割这些选项,放在过滤规则的最后 面。例如:

    */ads/*$script,match-case

    这里的*/ads/* 是真实的过滤规则,scriptmatch-case 是 它指定的选项。以下是目前支持的选项:

    • 类型选项:判定过滤规则(或例外规则) 过滤元素的类型。过滤规则可以明确指定多种元素类型可以指定的类型包括:
      1. script —— 外部脚本,由HTML script标签加载
      2. image —— 一般图片,通常由 HTML 的 img 标签所载入
      3. background ——背景图片,通常用CSS指定
      4. stylesheet —— 外部CSS 样式表
      5. object —— 由浏览器插件处理的内容,例如Flash或Java
      6. xbl —— xbl绑定(通常由 -moz-binding CSS属性加载) 需要Firefox 3或更高版本
      7. ping pings 需要Firefox 3或更高版本
      8. XMLHttpRequest XMLHttpRequest 对 象 开始的请求 需要Firefox 3或更高版本
      9. object-subrequest —— 插件的请求,比如Flash 需要Firefox 3或更高版本
      10. dtd —— 通过XML文档加载的DTD文件 需要Firefox3或更高版本
      11. subdocument —— 内嵌的页面,通常通过HTML的框架方式内嵌
      12. document —— 网页本身(只有 例外规则 适用)
      13. other —— 其他不在上面的类型的请求,(在Firefox 2 中,这个包括XBL bindings, XMLHttpRequests )
    • 反 转类型选项:指定过滤规则 不应用的元素类型。可以指定的类型选项: ~script, ~image, ~background, ~stylesheet, ~object, ~xbl, ~ping, ~xmlhttprequest, ~object-subrequest, ~dtd, ~subdocument, ~document, ~other
    • third-party/first- party请求限制:如果指定third-party选项,则过滤规则只适用于来源与当前正在浏览的页面的不同的请求。类似地,~third-party 适用于来源与当前浏览页面相同的请求。
    • 域名限制:domain= example.com 指过滤规则只适用于 了”example.com”下的页面。多个域名,可以用”|”分隔:过滤规则 domain=example.com|example.net 将 只适用于”example.com”或”example.net”的页面。如果一个域名是前面有”~”,则该规则不适用于这个域名的页面。例如,domain=~example.com 指 过滤规则适用于除了example.com 之外的任何域名的页面,但domain=example.com|~foo.example.com 限 制了过滤规则适用于”example.com”但是不包括”foo.example.com”。
    • match-case —— 使过滤规则只适用于匹配地址,例如规则*/BannerAd.gif$match-case 将阻止http://example.com /BannerAd.gif 但是不阻止http://example.com/bannerad.gif
    • collapse —— 这个选项将覆盖全局“折叠屏蔽元素”,并确过滤规则总是折叠元素。类似地,~collapse选项将确保过滤规则不折叠元素

    使用正则表达式

    如果你想更好地控制您的过滤规则,什么匹 配,什么不匹配,你可以使用正则表达式。例如过滤规则{{{/banner/d+/}}会匹配matchbanner123banner321 但 是不匹配banners 。您可以查看 正则表达式的文档 来学习如何写正则表达式。

    :你 不应该 为了加快处理您的过滤列表而使用正则表达 式。你可能听过类似的说法,但其实那已经过时了。从 Adblock Plus 0.7 开始,基本过滤规则的处理已经比正规表达式快了。

    元素隐藏

    基本规则

    有时你可能会发现无法阻挡某些内嵌在网页中的文字广告。如果查看源码的话,可能发现类似这样的代码:

        
    <div
     
    class
    =
    "textad"
    >
    
        Cheapest tofu, only here and now!
        
    <div>
    
        
    <div
     
    id
    =
    "sponsorad"
    >
    
        Really cheap tofu, click here!
        
    <div>
    
        
    <textad>
    
        Only here you get the best tofu!
        
    </textad>
    
    

    因为你必须下载页面的内容,所以你也必须下载这些广告。对于这种情况,你可以做的就是-把这些广告藏起来,这样你就不会看到他 们了。这也就是元素隐藏的意义所在。

    上面代码中的第一则广告是在一个class属性为 “textad”的div 容器内。过滤规则#div(textad) 可以把它给隐藏起来。这里的##表明这是一条定 义了元素隐藏的选择符的元素隐藏规则类似地,您可以通过他们的id属性隐藏属性, #div#sponsorad 将隐藏的第二广告。你 也可以不必指定元素名称,用#*(sponsorad) 也可以。你也可以仅指定要阻挡的元素名称来隐藏,例如:##textad 可 以阻挡第三则广告。

    在不查看页面源码的情况下,Element Hiding Helper extension 帮助选择正确的元素并写出相应的规则基础的HTML知识还是很有用的。

    : 元素隐藏规则与普通过滤规则的工作方式有很大的差别。元素隐藏规则不支持通配符。

    隐藏元素:限定在特定域名的规则

    通常你只想要隐藏特定网站的特定广告,而不希望规则会作用于其他网站。例如:#*(sponsor) 可 能会把某些网站的有效代码也隐藏了。但如果你把隐藏规则写成example.com##*(sponsor) ,这规则就只会作用在 http://example.com/http://something.example.com/ ,但不作用在http://example.org/ 。 你也可以指定多个域名-只要用逗号(,)分隔即可,例如: domain1.example,domain2.example,domain3.example##*(sponsor)

    如果在域名之前有“ ~ ” ,该规则将 适 用于这个域名的页面。(需要的AdBlock加1.1或更高) 。例如, ~example.com##*.sponsor 将适用于出 了“example.com”之外的域名。 example.com|~foo.example.com##*.sponsor 适用于 “example.com”域名以及除了 “foo.example.com“外的字域名。

    :由于元素隐藏实际实现的关系,你只可以将隐藏规则限制在完整的域名。你不能使用网址的 其他部份,也不可用 domain 代替domain.example,domain.test

    :限制域名的元素隐藏规则也可用来隐藏浏览器的使用界面。例如, 过滤规则browser##menuitem#javascriptConsole 会隐藏Firefox的工具菜单上的JavaScript控制台。

    属性选择符

    一些广告隐藏起来并不容易——这些文字广告不仅没有id也没有class属性。你需要使用其他属性来隐藏这些广告,例如:##table[width=80%] 可 以隐藏拥有width属性值为80%的表格元素。如果你不想隐藏这个属性的所有值,##div[title*="adv"] 会隐藏所有title 属 性包含adv 字符的div 元素。你还可以检查属性的开始和结束字符。例如##div[title^="adv"][title$="ert"] 会 隐藏以adv开始并且以ert结束的div元素。正如你所见,你可以使用多个条件,table[width="80%"][bgcolor="white"] 会 隐藏宽度设为80% 、背景色(bgcolor)属性为白色的表格元素。

    高 级选择符

    一般情况下,Firefox支持的 CSS选择符都可用于元素隐藏。例如:下面的规则会隐藏跟在class 的属性为adheaderdiv 元 素后的所有东西:##div.adheader + * 。完整的CSS列表请查阅W3C的CSS规范(Firefox目前并没有支持所 有的选择符)。

    :这个功能是给高级使用者使用的,你可以通过CSS选择符很舒 服地去使用它。Adblock Plus 无法检查选择符的语法是否正确,如果你使用无效的 CSS 语法,你可能会破坏其他你已有的过滤规则。建议使用 JavaScript Console 检查是否有 CSS 错误。

    简单元素隐藏语 法

    Adblock Plus 只向后支持简单元素隐藏语法 (例如#div(id=foo) )。 使用这个语法是不好的,用CSS选择符才是首选。对这个语法的支持可能在以后的某个时间就不支持了。

  • Chain socks with http proxy upstream

    Dante support both socks(socks4/socks5) and http proxy as upstream proxy.

    logoutput: /var/log/sockd.log

    internal: 0.0.0.0 port=1080
    external: eth0

    clientmethod: none
    socksmethod: none

    user.privileged: root
    user.notprivileged: nobody

    timeout.negotiate: 30
    timeout.io: 86400

    client pass {
    from: 0.0.0.0/0 to: 0.0.0.0/0
    log: connect error
    }

    socks pass {
    from: 0.0.0.0/0 to: 0.0.0.0/0
    log: connect error
    protocol: tcp udp
    }

    route {
    from: 0.0.0.0/0 to: 0.0.0.0/0 via: HTTP_PROXY_IP port = HTTP_PROXY_PORT
    proxyprotocol: http
    command: connect
    protocol: tcp
    method: none
    }

  • road warrior & ssh share port 443

    With help of SNI in stunnel,  we can  support both  road warrior and ssh function on the same TCP/443 port.

     

    VPS Server:

    Install  stunnel v5.31 with  openssl  v1.0.2, and listen on port 443

    Install dante v1.4.1,  and listen on port 1080

    Install openssh, and listen on port 22

     

    Stunnel config for VPS server

    chroot = /var/lib/stunnel/
    pid=/stunnel.pid
    setuid = stunnel
    setgid = stunnel

    ;debug =debug
    debug = err
    ;foreground = yes

    log = append
    ;log = overwrite
    output = /stunnel.log

    cert = /etc/stunnel/stunnel.pem
    ;key = /etc/stunnel/stunnel.pem

    verify = 3
    CApath = /certs

    ; performance
    socket = l:TCP_NODELAY=1

    ;compression = deflate
    compression = zlib

    [tls]
    accept = 0.0.0.0:443
    connect = 127.0.0.1:1080

    [ssh]
    sni = tls:22.vps.server.net
    connect = 127.0.0.1:22

    [socks]
    sni = tls:vps.server.net
    connect = 127.0.0.1:1080

     

    stunnel listen on 22 for ssh connection

    stunnel listen on 1080 for socks connection

     

    Stunnel config for client within Corp’s network:

    chroot = /var/lib/stunnel/
    pid=/stunnel.pid
    setuid = stunnel
    setgid = stunnel

    ;debug = alert/crit/err/warning/notice/info/debug
    debug = err

    ;foreground = yes

    cert = /etc/stunnel/stunnel.pem

    ;compression = deflate | zlib
    compression = zlib

    client = yes

    ; performance
    socket = l:TCP_NODELAY=1

    [socks-http-proxy]
    accept = 127.0.0.1:1080
    connect = http_proxy_ip:http_proxy_port

    protocol = connect
    protocolHost = vps.server.net:443

    [ssh-http-proxy]
    accept = 0.0.0.0:22
    connect = http_proxy_ip:http_proxy_port
    protocol = connect
    protocolHost = vps.server.net:443
    sni = 22.vps.server.net

     

     

    How to

    Road Warrier: 

    set socks proxy of browser to 127.0.0.1:1080

     

    SSH to vps.server.net

    ssh -p 22  user@localhost