2016年房地产继承税费表一览

发表于17 6 月, 2016由Leic

http://www.360doc.com/content/16/0311/12/7654794_541292867.shtml

2016年房地产继承税费表一览

税费税种
编辑
旧版契税
符合住宅小区建筑容积率在1.0（含）以上、
二手房交易税费
二手房交易税费
单套建筑面积在140（含）平方米以下（在120平方米基础上上浮16.7%）、实际成交价低于同级别土地上住房平均交易价格1.2倍以下等三个条件的，视为普通住宅，征收房屋成交价或评估价的1.5%。反之则按3%。
新政下的契税
普通住宅：
卖方：
不满两年（无论是否唯一）6.6%
满两年不满五年（无论是否唯一）1%
满五年唯一住房免税
满五年不唯一住房1%
买方：
购买90平以下房产的1%
购买90平以上房产的（唯一住房）1.5%
购买90平以上房产的（不唯一住房）2%
商业房或公司产权：3%
城市维护建设税
营业税的7%。
教育费附加
营业税的3%。
个人所得税
普通住宅2年之内：{售房收入－购房总额－（营业税+城建税+教育费附加税+印花税）}×20%；2年以上（含）5年以下的普通住宅：（售房收入－购房总额－印花税）×20%。出售公房：5年之内，（售房收入－经济房价款－土地出让金－合理费用）×20%，其中经济房价款=建筑面积×4000元/平方米，土地出让金=1560元/平方米×1%×建筑面积。出售不是家庭唯一住房的个税按房价的1%征收。
交易手续费
2元/平方米×建筑面积
印花税
房屋成交总额×0.05%（2009年至今暂免）
营业税
2011年1月27日新通知规定：个人将购买不足5年的住房对外销售的，全额征收营业税；个人将购买超过5年（含5年）的非普通住房对外销售的，按照其销售收入减去购买房屋的价款后的差额征收营业税；个人将购买超过5年（含5年）的普通住房对外销售的，免征营业税。
（废止）住宅5年内：房屋评估总额×5.6% ；5年或5年以上普通住宅无营业税。
（最新）住宅2年内：房屋评估总额×5.6% ；2年或2年以上普通住宅无营业税。
2015年3月30日财政部、国家税务总局发布《关于调整个人住房转让营业税政策的通知》（财税[2015]39号）规定：“个人将购买不足2年的住房对外销售的，全额征收营业税；个人将购买2年以上（含2年）的非普通住房对外销售的，按照其销售收入减去购买房屋的价款后的差额征收营业税；个人将购买2年以上（含2年）的普通住房对外销售的，免征营业税。”[2]
增值税
自2016年5月1日起，在全国范围内全面推开营业税改证增值税试点，建筑业、房地产业、金融业、生活服务业等全部营业税纳税人，纳入试点范围，由缴纳营业税改为缴纳增值税。《财政部国家税务总局关于全面推开营业税改证增值税试点的通知》（财税[2016]36号）
对于非一线城市，个人购买不足2年的住房对外销售，按照5%的征收率全额缴纳增值税；个人将购买2年以上（含2年）的住房对外销售的，免征增值税。
北、上、广、深四个一线城市，个人购买不足2年的住房对外销售的，按照5%的征收率全额缴纳增值税；个人将购买2年以上（含2年）的非普通住房对外销售的，以销售收入减去购买住房价款后的差额按照5%的征收率缴纳增值税；个人将购买2年以上（含2年）的普通住房对外销售的，免征增值税。
土地增值税
普通住宅免征；非普通住宅3年内：房屋成交总额×0.5% ，3年至5年：房屋成交总额×0.25%，5年或5年以上：免征。
房屋所有权登记费
80元，共有权证：20元
买卖合同公证费
买卖合同需要公证时才须缴纳，房屋成交总额×0.3%；
过户费用
（1）契税：9
卖方：
不满两年（无论是否唯一）6.6%
满两年不满五年（无论是否唯一）1%
满五年唯一住房免税
满五年不唯一住房1%
买方：
购买90平以下房产的1%
购买90平以上房产的（唯一住房）1.5%
购买90平以上房产的（不唯一住房）2%
（2）房屋交易手续费：‘买卖双方各自缴纳房屋建筑面积*2元/平方米
（3）房屋所有权登记费：80元。
（4）房屋评估费；按评估额0.5%缴纳
计算方式
编辑
契税由买方交纳，交税比例是：
1、普通住宅应该交纳成交价或是评估价的1.5%的契税。
2、非普通住宅应该交纳成交价或是评估价的3%的契税。
过户费用;
(1)契税;90平方米以下首次购房的按1%缴纳;90—140平方米按房价1.5%缴纳;140平方米以上按房价3%缴纳买方承担（二套房按3%收取）
(2)营业税：房屋产权取得满五年的免征，未超过五年的按房价5.8%缴纳。卖方承担
(3)土地增值税;房屋产权取得满五年的免征，未超过五年的按房价1%预缴纳，按照超率累进税率计算，多退少补。卖方承担
(4)所得税：房屋产权取得满五年的免征，未超过五年的按房价1%或房屋原值—房屋现值差额20%缴纳。(房屋原值一般按上道契税完税额计算)卖方承担
(5)房屋交易手续费：按房屋建筑面积6元/平方米交纳双方承担
(6)房屋产权登记费：80元。买方承担
(7)房屋评估费：按评估额0.5%
交易手续
编辑
正常过户手续
（一）交易税费
1.营业税（税率5.6%，卖方缴纳）
根据2015年3月房产新政，转让出售购买时间不足2年的非普通住宅按照全额征收营业税，转让出售购买时间超过5年的非普通住宅或者转让出售购买时间不足2年的普通住宅按照两次交易差价征收营业税，转让出售购买时间超过5年的普通住宅免征营业税。
这里有两个要点
①购买时间超过2年这里首先看产权证，其次看契税发票，再次看票据（房改房看国有住房出售收入专用票据）这三种证件按照时间最早的计算。一般地说票据早于契税发票，契税发票早于产权证，房改房中时间最早的是房改所收的定金的票据。
②所售房产是普通住宅还是非普通住宅。
另：如果所售房产是非住宅类如商铺、写字间或厂房等则不论证是否过5年都需要全额征收营业税。
2. 个人所得税（税率交易总额1%或两次交易差的20%，卖方缴纳）
征收条件以家庭为单位出售非唯一住房需缴纳个人房转让所得税。在这里有两个条件①家庭唯一住宅②购买时间超过5年。如果两个条件同时满足可以免交个人所得税；任何一个条件不满足都必须缴纳个人所得税。注：如果是家庭唯一住宅但是购买时间不足5年则需要以纳税保证金形式先缴纳，若在一年以内能够重新购买房产并取得产权则可以全部或部分退还纳税保证金，具体退还额度按照两套房产交易价格较低的1%退还。
注：地税局会审核卖方夫妻双方名下是否有其他房产作为家庭唯一住宅的依据，其中包括虽然产权证没有下放但是房管部门已经备案登记的住房（不包含非住宅类房产）。
另注：如果所售房产是非住宅类房产则不管什么情况都要缴纳个人所得税。而且地税局在征税过程中对于营业税缴纳差额的情况，个人所得税也必须征收差额的20% 。
3.印花税（税率1%，买卖双方各半）
从2009年至今暂免征收。
4.契税（税率1%、1.5% 、3%，买方缴纳）
征收方法：按照基准税率征收交易总额的3%，若买方是首次购买面积不足90平的普通住宅缴纳交易总额的1%，若买方首次购买面积超过90平（包含90平）的普通住宅则缴纳交易总额的1.5%。
注：首次购买和普通住宅同时具备才可以享受优惠，契税的优惠是以个人计算的，只要是首次缴契税都可以享受优惠。若买方购买的房产是非普通住宅或者是非住宅则缴纳交易总额的3%。
5.测绘费1.36元/平米总额=1.36元/平米*实际测绘面积（2008年4月后新政策房改房测绘费标准：面积75平米以下收200元，75平米以上144平米以下收300元，144平米以上收400元）
一般房改房都是需要测绘的，商品房如果原产权证上没有市房管局的测绘章也是需要测绘的。
6.二手房交易手续费总额：住宅6元/平米×实际测绘面积非住宅10元/平米×实际测绘面积
7.房屋产权登记费80元共有权证：20元
（二）所需材料
1.地税局需要卖方夫妻双方身份证和户口本复印件一套（若卖方夫妻不在同一个户口本上还需提供结婚证复印件一套）、买方身份证复印件一套、网签买卖协议一份、房产证复印件一套（如果卖方配偶已经去世还需要派出所的死亡证明一份）
2.房管局需要网签买卖协议一份、房产证原件、新测绘图纸两张，免税证明或完税证明复印件；如省直房改房还需已购公房确认表原件两份和附表一。
注：房改房过户时需要配偶一起出面签字；若配偶已经去世但使用了其工龄，如果是在房改之后则需要先做继承公证再交易过户；如在房改之前，则应提交派出所开具的死亡证明原件。省直房改房还需填写《已购公房确认表》两份并由单位和省直房改办盖章确认，并提交房改原始票据原件。
赠与过户手续
（一）费用:免征营业税和个人所得税，但是需要增加
1.公证费:40元/平米× 产权证面积
2.契税：不论房产什么情况都需要征收全额契税
其他费用和正常过户都一样
（二）所需材料
1.公证处需要卖方夫妻双方户口本和身份证复印件一套、买方身份证复印件一份，产权证复印件一套
2.不需要经过地税局直接可以过户。
3.房管局需要材料同正常过户基本一样只不过还需要公证书原件一份。
继承房产过户
（一）继承房产的费用有
1.公证费 40元/平米×产权证面积
2.继承公证费 80元/单放弃继承公证：80元/人
注：继承的房产再次转让出售时个人所得税按照所得征收20%，不过只要是符合家庭唯一住房和购买超过5年的话就可以免征个人所得税，而且个人所得税退税的政策同样适用。
（二）所需材料
1.公证处需要原产权人的死亡证明、产权证复印件和所有当事人的身份证、户口本复印件一套。
2.房管局需要材料和正常过户基本一样，只是还需要公证书一份。
注：继承的难点在于公证所有的继承人都放弃继承，这样就要求证明当事人即为所有继承人并且都自愿放弃继承权。
析产
析产又称财产分析，是指财产共有人通过协议的方式，根据一定的标准，将共同财产予以分割，而分属各共有人所有。最常见的是夫妻之间的析产，一般会有婚内析产和离婚析产两种情况。过程是先到公证处做析产公正再到房管局办理过户手续。除其他材料外还需要离婚协议书或者法院判决书复印件。
注意事项
编辑
房屋手续是否齐全
房产证是证明房主对房屋享有所有权的惟一凭证，没有房产证的房屋交易时对买受人来说有得不到房屋的极大风险。房主可能有房产证而将其抵押或转卖，即使现在没有房产证，过短时间办理取得后，房主还可以抵押和转卖。所以最好选择有房产证的房屋进行交易。
房屋产权是否明晰
有些房屋有好多个共有人，如有继承人共有的、有家庭共有的、还有夫妻共有的，对此买受人应当和全部共有人签订房屋买卖合同。如果只是部分共有人擅自处分共有财产，买受人与其签订的买卖合同未在其他共有人同意的情况下一般是无效的。
交易房屋是否在租
有些二手房在转让时，存在物上负担，即还被别人租赁。如果买受人只看房产证，只注重过户手续，而不注意是否存在租赁时，买受人极有可能得到一个不能及时入住的或使用的房产。因为我国包括大部分国家均认可“买卖不破租赁”，也就是说房屋买卖合同不能对抗在先成立的租赁合同。这一点在实际中被很多买受人及中介公司忽视，也被许多出卖人利用从而引起较多纠纷。
土地情况是否清晰
二手房中买受人应注意土地的使用性质，看是划拨还是出让，划拨的土地一般是无偿使用，政府可无偿收回，出让是房主已缴纳了土地出让金，买受人对房屋享有较完整的权利；还应注意土地的使用年限，如果一个房屋的土地使用权仅有40年，房主已使用十来年，对于买受人来说是否还应该按同地段土地使用权为70年商品房的价格来衡量时，就有点不划算。
市政规划是否影响
有些房主出售二手房可能是已了解该房屋在5到10年左右要面临拆迁，或者房屋附近要建高层住宅，可能影响采光、价格等市政规划情况，才急于出售，作为买受人在购买时应全面了解详细情况。
福利房屋是否合法
房改房、安居工程、经济适用房本身是一种福利性质的政策性住房，在转让时有一定限制，而且这些房屋在土地性质、房屋所有权范围上有一定的国家规定，买受人购买时要避免买卖合同与国家法律冲突。
单位房屋是否侵权
一般单位的房屋有成本价的职工住房，还有标准价的职工住房，二者土地性质均为划拨，转让时应缴纳土地使用费。再者，对于标准价的住房一般单位享有部分产权，职工在转让时，单位享有优先购买权。买受人如果没有注意这些可能会和房主一起侵犯单位的合法权益。
物管费用是否拖欠
有些房主在转让房屋时，其物业费、电费以及三气（天然气、暖气、煤气）费用长期拖欠，且已欠下数目不小的费用，买受人不知情购买了此房屋，所有费用买受人有可能要全部承担。
中介公司是否违规
有些中介公司违规提供中介服务，如在二手房贷款时，为买受人提供零首付的服务，即买受人所支付的全部购房款均可从银行骗贷出来。买受人以为自己占了便宜，岂不知如果被银行发现，所有的责任有可能自己都要承担。
合同约定是否明确
二手房的买卖合同虽然不需像商品房买卖合同那么全面，但对于一些细节问题还应约定清楚，如：合同主体、权利保证、房屋价款、交易方式、违约责任、纠纷解决、签订日期等等问题均应全面考虑。
交易五关
编辑
定价关
二手房买卖最关键的环节就是房屋价格的评估，对于经验不足的业主来说，价格定高了难以找到买主，定低了又会使自己蒙受经济损失。
建议：将房子委托给有资信的中介公司，然后通过市场比较法、收益法、成本法等为客户的房子做出较为公正的评估。较为常见的市场比较法要求评估人有丰富的交易经验，熟悉了解市场价格，并能够根据房屋的位置、朝向、装修程度、房龄等因素对房价做出较为准确的评估。
合同关
近年来，二手房交易双方常常因合同签订得不够规范，导致纠纷事件不断出现。而签订合同常常要注意诸多细节问题，如屋内设施细节、付款方式、交房具体时间、税费交付等等。
建议：如果委托正规中介公司，最后达成了买卖交易，中介公司会提供规范且详实的买卖合同文本，这就为买卖双方减少了很多麻烦，也避免了因合同不规范而引起的纠纷。随后，卖方要结清所有物业费和供暖费，为下一环节立契过户做好准备。
过户关
办理房屋的立契过户，是买卖流程中最耗时间和精力的一关。办理过户涉及一系列的政策法规，手续繁琐。购房者由于缺乏房地产交易知识，没有相关经验，不了解有关部门的办事程序，往往跑断了腿，事情也办不圆满，花费了大量的人力、财力。
建议：一些正规的中介公司可以代办过户，其办证部门的办事人员通晓有关的政策法规和办事程序，经验丰富。因此，委托专业的中介公司办理房产过户，省时、省力、省心，是多数消费者的首选。
付款关
付房款是客户最为担心的一个环节，少则十几万，多则几十万的房款，一旦出了问题，购房者将蒙受巨大的损失。在房屋买卖中，确实存在着房产交割的风险。买方担心把钱交给卖方而产权过户中如出现问题，会拿不到产权证；卖方担心产权证办到了买方名下，买方拖欠房款。
建议：针对上述情况，一些大型正规中介公司纷纷推出“居间中保”服务，买方把房款交给中介公司保管，卖方把原产权证交中介公司保管，中介为双方办理产权过户，新产权证拿到后，中介一手把房款付给卖方，一手把新产权证交给买方，从而确保了双方的利益。
交验关
物业交验是买卖交易的最后一个环节，如果能够顺利完成，客户随即可以安心入住，原业主也不再对房子负有责任了。
建议：物业交验时，中介公司会提供一份《物业交验单》供买卖双方填写确认。验房的内容主要包括：物业是否与合同约定的一致；所售房屋里的家具等是否已搬空；钥匙是否已交付；水电费、煤气费、电话费、有线收视费等杂费是否已经结清等。
税费规定
编辑
2013年税费规定
二套房贷款首付比例可提高
继续严格实施差别化住房信贷政策。对房价上涨过快的城市，人民银行当地分支机构可根据城市人民政府新建商品住房价格控制目标和政策要求，进一步提高第二套住房贷款的首付款比例和贷款利率。
对出售自有住房按规定应征收的个人所得税，通过税收征管、房屋登记等历史信息能核实房屋原值的，应依法严格按转让所得的20%计征。
地级市“十二五”末联网住房信息
通知还要求，市、县人民政府应于一季度公布年度住房用地供应计划。2013年底前，地级以上城市要把符合条件的、有稳定就业的外来务工人员纳入当地住房保障范围。大力推进城镇个人住房信息系统建设，到“十二五”期末，所有地级以上城市原则上要实现联网。加快建立和完善引导房地产市场健康发展的长效机制。
150万房价个税多缴24万
来京工作6年的吴先生刚缴纳购房定金，但因二手房购房政策的调整，他可能要面临多缴纳20多万个税的情况。
在北京，购买二手房所有税费均由购房者承担。例如，吴先生在朝阳区购买了一套价值150万的小户型二类经济适用房，该房已满五年可上市交易，但并非业主的唯一住房。
按现行政策，吴先生需替业主缴纳总房价1%的个人所得税，即150×1%=1.5万元；但是，如果按照新政策，因该经济适用房最初价值仅为20万元，如果按照差额缴税即（150-20）×20%=26万。
这多出的24万多元的税收，让吴先生很头疼。他说，他购买的这套小户型用于自住，但现在，多出的20万限制住了他这种非投资者。吴先生希望，政策应该进行细化，对于非投资性的住房，应给予个人所得税减免或返还。
本次调控时间表
●各直辖市、计划单列市和省会城市（除拉萨外），要按照保持房价基本稳定的原则，制定本地区年度新建商品住房（不含保障性住房）价格控制目标，并于一季度向社会公布。
●市、县人民政府应于一季度公布年度住房用地供应计划。
●2013年底前，地级以上城市要把符合条件的、有稳定就业的外来务工人员纳入当地住房保障范围。
●大力推进城镇个人住房信息系统建设，到“十二五”期末，所有地级以上城市原则上要实现联网。
■ 北京落地
■ 解读
政策2
二套房可实施差别化信贷
对房价上涨过快的城市，人民银行当地分支机构可根据政策要求，进一步提高第二套住房贷款的首付款比例和贷款利率。
近期，二套房贷首付将调至7成，利率增至1.3倍的传言甚嚣。昨天公布的通知中，并未明确二手房贷的具体政策，但提出可进一步提高。
陈志和胡景晖均认为，各地应该会从实际情况出发，来确定二套房的政策，除了继续收紧外，国家也明确提出鼓励改善性购房需求，因此对于二套房也不能一棒子打死。同时，也要看当地银行自身的情况。
“如果真是调到7成，那和全款购房也没太多差别了。”胡景晖认为，二套房也应该实施差别化信贷政策，如果二套房购买的是普通住房，则不应该再上调。
政策3
限购政策全覆盖
限购区域应覆盖城市全部行政区域。
陈志认为，限购要覆盖城市的全部行政区域是细则非常严厉的条款。此前，一些经济发达城市限购，但其周边区域却不限购，这导致那些投资投机的购房仍然有空间，也导致了这些不限购区域的房价出现快速上涨。而此次新一轮调控中，明确了限购的城市要覆盖城市的全部行政区域。
“经济发达城市的周边，比如北京周边的香河、涿州等地，以及上海、杭州等城市周边，按照政策也应该要实行限购。这样就堵住了所有的漏洞，让投资投机购房完全没有市场，住房也彻底回归自住的属性。”陈志说。
不同类型房应区别对待
胡景晖说，以前的房改房、经适房，当时的购买价格只有两三千甚至几百元每平米，而如今的卖出价已经高达数万元，这就意味着几乎是全部交易额的20%来征税。
胡景晖说，“再加上营业税、契税等，买一套300万的房子，税费要八九十万。”他认为，北京出台的调控落地细则中，应该要根据房屋的不同性质来区别税收政策。
税费谁交
编辑
买卖房屋的税收，国家列明买卖双方各自应承担的费用标准，各项费用应由谁承担清晰可见，购房的承担其自身的购房交易税费，卖房的承担其自身通过出售房屋所得收益应缴纳的税费;但自2006年国家出台“国十五条”，其中规定，“从2006年6月1日起，对购买住房不足5年转手交易，销售时按其取得的售房收入全额征收营业税”。
按国家相关部门规定，个人所得税、营业税及土地增值税、教育附加费均是卖方支付的，前二者是房屋卖出所产生，而卖方就是所得者，所以，这以上税费由卖方支付才是合理的。但这几个税费所占比例较大，就成了卖方定价的一个重要标准。因此，很多卖家相应降低房价，搞一刀切，净收房价，所有税费由买方负责了。不过生意场上，一个愿打一个愿挨，当各自认为物有所值时，买卖就成功了。
税费由谁交的问题，实际上国家有明文规定，全由买家交原则上说是不合理的。但实际上税费与房价是相互关联的，如果卖家交税费，那么房价可能就高一些，买家交税费，房价就相对低一些。合同中约定由谁交税费是符合民法上自治原则的，从这个角度讲税费由买家交也是合理的

Top Domain

发表于16 6 月, 2016由Leic

IANA

http://www.iana.org/domains/root/db

http://data.iana.org/TLD/tlds-alpha-by-domain.txt

MOZILLA: the Public Suffix List

https://publicsuffix.org/list/public_suffix_list.dat

https://github.com/publicsuffix/list

Open Source UDP File Transfer Tool Comparison

发表于14 6 月, 2016由Leic

Open Source UDP File Transfer Tool Comparison
February 3, 2016 Michael C 0 Comments

When it comes to Internet protocols, TCP has been the dominant protocol used across the web to form connections. TCP helps computers communicate by breaking large data sets into individual packets, transmitting them, and then reforming the packet in the original order once the data set has been received. But as file sizes grew and latency became an issue, User Datagram Protocol (UDP), gained more popularity. UDP picks up the slack by offering faster speeds with the ability to transmit much larger files, something that TCP isn’t capable of.

When comparing the architecture of the two protocol tools, the main difference is that UDP sends the packets without waiting for each connection to go through, which means lower bandwidth overhead and latency. TCP, on the other hand, sends the packets one at a time, in order, waiting to make sure each connection goes through before starting the next. To better understand the pros and cons of each protocol, below is a basic comparison of the two:
Feature
TCP UDP
Connection Connection-Oriented: messages navigate the internet one connection at a time Connectionless: a single program sends out a load of packets all at once.
Usage Used for application needing high reliability, where time is less relevant Used when applications need fast transmission
Used by other protocols HTTP, HTTPS, FTP, SMTP, Telnet DNS, DHCP, TFTP, SNMP, RIP, VOIP
Reliability All transferred data is guaranteed to arrive in order specified. No guarantee of arrival and ordering has to be managed by the application layer
Header Size 20 bytes 8 bytes
Data Flow Control Does flow control, requiring three packets to set up a socket connection before user data can be sent Does not have an option for flow control
Error Checking Yes Yes, but no recovery options
Acknowledgments Yes, required before next transfer will take place No, allows for quicker speed

There’s a clear tradeoff between the two when it comes to speed versus reliability, but as developments have been made with UDP, it’s become more and more trustworthy as a leading protocol tool. Recently, Google announced that they use their open source Google QUIC UDP-based protocol to run 50% of their Chrome traffic, with that percentage expecting to increase in the coming years. So as the market shifts and advancements continue to be made with the source code, UDP is quickly becoming the file transfer tool of the future. So what UDP should you use?
Choosing a UDP

When it comes to choosing a UDP there are two main options—buy it from a commercial service or install it for open source software. Given the immaturity of most open source software, it may not be as user friendly, but there are many options out there, so paying developers to manage and configure your UDP is not necessary.

Currently, there are six main UDP file transfer tools available as open source.

Tsunami UDP Protocol: Uses TCP control and UDP data for transfer over high speed long distance networks. It was designed specifically to offer more throughput than possible with TCP over the same networks.
UDT UDP-based Data Transfer: Designed to support global data transfer of terabyte sized data sets, using UDP to transfer the bulk of its data with reliability control mechanisms.
Enet: Main goal is to provide a thin, simple and robust network communication layer on top of UDP with reliable and ordered delivery of packets, which they accomplish by stripping it of higher level networking features.
UFTP: Specializes in distributing large files to a large number of receivers, especially when data distribution takes place over a satellite link, making TCP communication inefficient due to the delay. They have been used widely by The Wall Street Journal to send WSJ pages over satellite to their remote printing plants.
GridFTP: Unlike all others, GridFTP is based on FTP and is not UDP, but it can be used to solve the same problems experienced when using TCP.
Google QUIC: An experimental UDP-based network protocol designed at Google to support multiplexed connections between two endpoints, provide security protection equivalent to TSL/SSL, and reduce latency and bandwidth usage.

Below is a features comparison chart to better help you understand the side by side differences each system supports.

Tsunami

UDT

ENet

UFTP

GridFTP

Google QUIC
Multi-Threaded No No Yes No Yes Yes
Protocol Overhead 20% 10% NA ~10% 6-8%, same as TCP NA
Encryption No No No Yes Yes Yes
C++ source code Yes Yes Yes Yes Yes Yes
Java Source Code No Partial No No No No
Command Line No No No Yes Yes Yes
Distribution Packets Source code only Source code only Source Only Yes Yes Yes
UDP based point-to-point Yes Yes Yes Yes No Yes
Firewall Friendly No Partial, no auto-detection No Partial, no auto-detection No No, has had issues with stateful firewalls
Congestion Control Yes Yes NA Yes Yes, using TCP Yes
Automatic retry and resume No No Yes No, manual resume yes Yes Yes
Jumbo Packets No Yes No Yes, up to 8800 bytes Yes NA
Support for Packet Loss No No Yes No Yes Yes

Conclusion

When deciding what UDP tool to use, you have several factors to keep in mind. Many of issues arise when dealing with packet loss and recovery. For instance, UDT has been known to fail completely if even the slightest packet loss is reported, whereas Google QUIC has made significant advancements in addressing this issue. GridFTP might also not be the most user friendly, since it requires a much larger framework and orchestration to run, since it is not UDP-based. Considering all these factors intandem, hopefully you’ll be able to make a more informed decision when implementing UDP file transfer tools on your network.

SSL 3.0 POODLE

发表于8 6 月, 2016由Leic

SSL/TLS协议的演化 10–安全问题 – SSLv3的POODLE攻击
发表回复

image

1.有趣的POODLE

POODLE是Padding Oracle On Downgraded Legacy Encryption的缩写，不知是巧合还是有意，在英文中Poodle的意思是贵宾犬，于是老外在写POODLE攻击相关文字的时候，用这种漂亮的小狗做配图。给人的感觉是，这小狗是SSL的吉祥物。

老外的一个习惯就是，无论做什么项目，都要事先起一个响亮的名字(伽利略计划之类)或者搞一个缩写(就像SSL)。POODLE既是缩写又是靓名，很容易就扬名天下了。这一招我们很快就学会了，例如百度凤巢的。

POODLE攻击是Google的研究员14年9月发现的，算是新攻击了，文档是【POODLE】。攻击的研究者也是2011年BEAST攻击的发现者。POODLE仅仅针对SSLv3，但它本质上和BEAST攻击很类似：

1.针对CBC模式块加密的缺陷。

2.目的是恢复cookie中的敏感信息

3.猜测的方式是一个byte一个byte的选择明文注入。

4.降维打击：利用POODLE缺陷，猜测n个byte需要发送的探测次数是O(n)，即每个n byte需要256*n次。如果没有这种缺陷，仅仅bruteforce方式猜测，需要2^(8*n)次。

但操作更简单，而且没有像BEAST那样的workaround，只能升级协议。

2.攻击原理

POODLE仅仅针对SSLv3，而对TLS1.0无效，利用SSLv3加密块的Padding是随机值这一协议设计缺陷。

RFC6101 SSLv3的5.2.3.2定义了CBC块加密下的SSLv3加密块定义：

block-ciphered struct {
opaque content[SSLCompressed.length];
opaque MAC[CipherSpec.hash_size];————————————->1.对Padding部分没有完整性保护，这里MAC保护content，content length，sequence number。
uint8 padding[GenericBlockCipher.padding_length];————–>2.padding是随机的。看起来更不容被猜测到，实际上画蛇添足了。随机加上没有完整性保护，意味着无论这部分数据是什么内容，接收方都不会在意。
uint8 padding_length;
} GenericBlockCipher;

下面是一个可能的HTTP Post明文格式的例子，标为蓝色的path和body是上层应用或者说攻击者可以控制的数据：

“POS T /path C o o ki e : name=value… \ r\ n\ r\ nbody ‖ 20byte MAC ‖ padding”

假定加密后得到如下加密快，C1，C2到Cn，（C0是IV我们看不到），对于第i个加密块，处理的流程如下：

1.Pi = Dk(Ci) ⊕ Ci-1 –>其中k是密钥，Dk是解密函数，Pi是解密后的明文。

2.检查并移除Pn尾部的padding(Pn也可以完全是padding)。

3.检测并移除MAC，得到了明文。

攻击者希望通过选择明文注入，达到如下两个效果：

1.通过调整“path”的长度，使得cookie中尚未被猜测出来的字节位于上一个块的尾部（假设包括这个字节的加密块是Ci）。

2.然后调整“body”的长度，使得最后一个块（Cn）都是padding（也就是整个Post长度是block的整数倍）。

这显然是可以做到的。

之后攻击者保存Ci，替换最后的Cn，发送到Server。假设padding是随机填充的，只要计算之后最后一个字节碰撞成功，则整个报文可以正确发送出去。假设每个block是16字节，最后一个字节是Ci[15]。则

Dk(Ci)[15] ⊕ Cn-1[15] = 15 /*15是padding length*/

又根据前面的公式：

Pi = Dk(Ci) ⊕ Ci-1

则

15 ⊕ Cn1[15] ⊕ Ci-1[15] = Cn1[15] ⊕ Ci-1[15] ⊕ Dk(Ci)[15] ⊕ Cn-1[15] = Ci-1[15] ⊕ Dk(Ci)[15] = Pi[15]

—->这样就解出来最后一个字节

这里因为并不能保证每个数值尝试一次，只能说统计学上讲，256次尝试之后，会碰上一次。

3.防范措施 – TLS SCVS

定义了一个空的加密套件TLS_FALLBACK_SCSV {0x56, 0x00}，如果Client这次用的版本号低于自己最高支持的版本好，必须携带这个信令。Client利用这个标记告诉Server，我因为上次连接失败，被你要求版本了。Server如果发现，自己没有降低Client的版本，则一定有中间人攻击，人为降低了前面Client发送的版本好，试图攻击。

什么时候这个扩展不管用：

1.有一方确实最高仅仅支持SSLv3。

2.有一方不支持这个扩展。

看到这种方法，我的第一反应是，如果中间人从两边的协商中全部剥离这个信令怎么办？后来想明白了，前几天被我认为狗血的Session Hash，这里起作用了。因为它检查所有发出去的握手报文的完整性，中间人无法欺骗了（待确认）。

还有一个问题是为什么不用标准扩展方式而用了一个特殊的加密套件这种Tricky的手段，后来也想明白了，TLS对这种攻击免疫，因此必须要SSLv3支持这种扩展，而又不能修改已有的SSLv3实现，用特殊加密套件这种方式来实现最少代码升级的扩展。

4.为什么TLS1.0可以免疫

我们看看TLS1.0协议的那一部分的改动解决了这个问题

RFC2246 TLS1.0里面相关数据结构没有变化，MAC依然无法保护padding，但是padding的内容有了规定：

“Each uint8 in the padding data vector must be filled with the padding length value.”

也就是，padding部分被填充成为padding length的值。假设padding有7byte，则连同padding length的8个bytes的内容是“7，7，7，7，7，7，7，7”，这样替换padding内容会被发现，攻击强度等同于暴力破解。

【参考】

1.POODLE, https://www.openssl.org/~bodo/ssl-poodle.pdf

2.BEAST，火狐浏览器分割HTTPS报文的问题，http://www.unclekevin.org/?p=38

3.TLS SCVS – TLS Fallback Signaling Cipher Suite Value (SCSV) for Preventing Protocol Downgrade Attacks，draft-ietf-tls-downgrade-scsv-05，https://datatracker.ietf.org/doc/draft-ietf-tls-downgrade-scsv/?include_text=1
本条目发布于2015年3月6日。属于SSL/TLS、安全、网络协议分类。

http://support.fortinet.com.cn/uploadfile/2015/0513/20150513020838240.pdf

SSL 3.0 POODLE SSL 3.0 POODLE SSL 3.0 POODLESSL 3.0 POODLE SSL 3.0 POODLE SSL 3.0 POODLE 漏洞
版本
1.0
时间
2015年4月
作者
张彦龙 (ylzhang@fortinet.com)
受影响的版本
FortiOS (4.3.X, 5.0.X, 5.2.X)
状态
草稿
1. 简介
1.1 SSL 3.0 poodle漏洞介绍
2014年10月15日，Google研究人员公布SSL 3.0协议存在一个非常严重的漏洞，该漏洞可被黑客用于截取浏览器与服务器之间进行传输的加密数据，如网银账号、邮箱账号、个人隐私等等。SSL 3.0的漏洞允许攻击者发起降级攻击，即欺骗浏览器说“服务器不支持更安全的安全传输层(TLS)协议”，然后强制其转向使用SSL 3.0，在强制浏览器采用SSL 3.0与服务器进行通讯之后，黑客就可以利用中间人攻击来解密HTTPs的cookies，Google将其称之为POODLE攻击，若受到POODLE 攻击，所有在网络上传输的数据将不再加密。目前已被TLS 1.0，TLS 1.1，TLS 1.2替代，因为兼容性原因，大多数的TLS实现依然兼容SSL3.0。
1.2 SSL 3.0 poodle攻击原理
为了通用性的考虑，目前多数浏览器版本都支持SSL3.0，TLS协议的握手阶段包含了版本协商步骤，一般来说，客户端和服务器端的最新的协议版本将会被使用。其在与服务器端的握手阶段进行版本协商的时，首先提供其所支持协议的最新版本，若该握手失败，则尝试以较旧的协议版本协商。能够实施中间人攻击的攻击者通过使受影响版本浏览器与服务器端使用较新协议的协商的连接失败，可以成功实现降级攻击，从而使得客户端与服务器端使用不安全的SSL3.0 进行通信，此时，由于SSL 3.0使用的CBC块加密的实现存在漏洞，攻击者可以成功破解SSL连接的加密信息，比如获取用户cookie数据。这种攻击被称为POODL攻击 (Padding Oracle On Downgraded Legacy Encryption)。此漏洞影响绝大多数SSL服务器和客户端，影响范围广泛。但攻击者如要利用成功，需要能够控制客户端和服务器之间的数据(执行中间人攻击)。通常用户的浏览器都使用新版本的安全协议与服务器进行连接，为了保持兼容性，当浏览器安全协议连接失败的时候，就会转而尝试老版本的安全协议进行连接，其中就包括SSL 3.0。Poodle攻击的原理，就是黑客故意制造安全协议连接失败的情况，触发浏览器的降级使用 SSL 3.0，然后使用特殊的手段，从 SSL 3.0 覆盖的安全连接下提取到一定字节长度的隐私信息。
1.3 SSL协议要点
SSL协议由美国 NetScape公司开发的， 1996年发布了V3.0版本。SSL 3.0 已经存在 15 年之久，目前绝大多数浏览器都支持该版本。SSL3.0是已过时且不安全
的协议，SSL(Secure Sockets Layer 安全套接层)是一种基于Web应用的安全通信协议。 SSL介于TCP协议和应用层协议之间，主要作用就是将HTTP、FTP等应用层的数据进行加密然后依托可靠的TCP协议在互联网上传输到目的地，其中最典型的应用就是https。
1.4 SSL提供3个基本的安全服务：
1）身份合法性：数据发送方和接收方要确认彼此身份，要确保各自的身份不会被冒充。
2）数据机密性：所有传输的数据都进行加密，并且要确保即使数据被截获也无法破解。
3）数据完整性：确保收到的数据与发送方发出的数据一致，没有被篡改。
1.5 SSL协议主要采用的数据加密算法：
1）非对称加密算法：数据加密和解密使用不同的密钥，如RSA公钥加密算法。优点是安全级别高，很难被破解；缺点是加密解密的速度慢，因此只适用于小量数据的加密。SSL协议采用非对称加密算法实现数字签名，验证数据发送方（或接收方）的身份，同时也用非对称加密算法交换密钥（用于数据加密的对称加密算法的密钥，以及用于数据完整性验证的MAC算法）。
2）对称加密算法：数据加密和解密使用同一个密钥，如DES、3DES、RC4等都是对称加密算法。优点是加解密速度快，适用于大数据量的加密，但安全性较差。SSL协议采用对称加密算法对传输的数据进行加密。
3）MAC算法：Message Authentication Codes，即消息认证码算法，MAC含有密钥散列函数算法，兼容了MD和SHA算法的特性，并在此基础上加入了密钥。SSL协议采用MAC算法来检验消息的完整性。

Which hashing algorithm is best for uniqueness and speed?

发表于6 6 月, 2016由Leic

http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed

I tested some different algorithms, measuring speed and number of collisions.

I used three different key sets:

A list of 216,553 English words (in lowercase)
The numbers “1” to “216553” (think ZIP codes, and how a poor hash took down msn.com)
216,553 “random” (i.e. type 4 uuid) GUIDs

For each corpus, the number of collisions and the average time spent hashing was recorded.

I tested:

DJB2
DJB2a (variant using xor rather than +)
FNV-1 (32-bit)
FNV-1a (32-bit)
SDBM
CRC32
Murmur2 (32-bit)
SuperFastHash

Results

Each result contains the average hash time, and the number of collisions

Hash Lowercase Random UUID Numbers
============= ============= =========== ==============
Murmur 145 ns 259 ns 92 ns
6 collis 5 collis 0 collis
FNV-1a 152 ns 504 ns 86 ns
4 collis 4 collis 0 collis
FNV-1 184 ns 730 ns 92 ns
1 collis 5 collis 0 collis▪
DBJ2a 158 ns 443 ns 91 ns
5 collis 6 collis 0 collis▪▪▪
DJB2 156 ns 437 ns 93 ns
7 collis 6 collis 0 collis▪▪▪
SDBM 148 ns 484 ns 90 ns
4 collis 6 collis 0 collis**
SuperFastHash 164 ns 344 ns 118 ns
85 collis 4 collis 18742 collis
CRC32 250 ns 946 ns 130 ns
2 collis 0 collis 0 collis
LoseLose 338 ns – –
215178 collis

Notes:

The LoseLose algorithm (where hash = hash+character) is truly awful. Everything collides into the same 1,375 buckets
SuperFastHash is fast, with things looking pretty scattered; by my goodness the number collisions. I’m hoping the guy who ported it got something wrong; it’s pretty bad
CRC32 is pretty good. Slower, and a 1k lookup table

Do collisions actually happen?

Yes. I started writing my test program to see if hash collisions actually happen – and are not just a theoretical construct. They do indeed happen:

FNV-1 collisions

creamwove collides with quists

FNV-1a collisions

costarring collides with liquid
declinate collides with macallums
altarage collides with zinke
altarages collides with zinkes

Murmur2 collisions

cataract collides with periti
roquette collides with skivie
shawl collides with stormbound
dowlases collides with tramontane
cricketings collides with twanger
longans collides with whigs

DJB2 collisions

hetairas collides with mentioner
heliotropes collides with neurospora
depravement collides with serafins
stylist collides with subgenera
joyful collides with synaphea
redescribed collides with urites
dram collides with vivency

DJB2a collisions

haggadot collides with loathsomenesses
adorablenesses collides with rentability
playwright collides with snush
playwrighting collides with snushing
treponematoses collides with waterbeds

CRC32 collisions

codding collides with gnu
exhibiters collides with schlager

SuperFastHash collisions

dahabiah collides with drapability
encharm collides with enclave
grahams collides with gramary
…snip 79 collisions…
night collides with vigil
nights collides with vigils
finks collides with vinic

Randomnessification

The other subjective measure is how randomly distributed the hashes are. Mapping the resulting HashTables shows how evenly the data is distributed. All the hash functions show good distribution when mapping the table linearly:

Enter image description here

Or as a Hilbert Map (XKCD is always relevant):

Enter image description here

Except when hashing number strings (“1”, “2”, …, “216553”) (for example, zip codes), where patterns begin to emerge in most of the hashing algorithms:

SDBM:

Enter image description here

DJB2a:

Enter image description here

FNV-1:

Enter image description here

All except FNV-1a, which still look plenty random to me:

Enter image description here

In fact, Murmur2 seems to have even better randomness with Numbers than FNV-1a:

Enter image description here

When I look at the FNV-1a “number” map, I think I see subtle vertical patterns. With Murmur I see no patterns at all. What do you think?

The extra * in the above table denotes how bad the randomness is. With FNV-1a being the best, and DJB2x being the worst:

Murmur2: .
FNV-1a: .
FNV-1: ▪
DJB2: ▪▪
DJB2a: ▪▪
SDBM: ▪▪▪
SuperFastHash: .
CRC: ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
Loselose: ▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
▪
▪▪▪▪▪▪▪▪▪▪▪▪▪
▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪
▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪▪

I originally wrote this program to decide if I even had to worry about collisions: I do.

And then it turned into making sure that the hash functions were sufficiently random.
FNV-1a algorithm

The FNV1 hash comes in variants that return 32, 64, 128, 256, 512 and 1024 bit hashes.

The FNV-1a algorithm is:

hash = FNV_offset_basis
for each octetOfData to be hashed
hash = hash xor octetOfData
hash = hash * FNV_prime
return hash

Where the constants FNV_offset_basis and FNV_prime depend on the return hash size you want:

Hash Size Prime Offset
=========== =========================== =================================
32-bit 16777619 2166136261
64-bit 1099511628211 14695981039346656037
128-bit 309485009821345068724781371 144066263297769815596495629667062367629
256-bit
prime: 2^168 + 2^8 + 0x63 = 374144419156711147060143317175368453031918731002211
offset: 100029257958052580907070968620625704837092796014241193945225284501741471925557
512-bit
prime: 2^344 + 2^8 + 0x57 = 35835915874844867368919076489095108449946327955754392558399825615420669938882575126094039892345713852759
offset: 9659303129496669498009435400716310466090418745672637896108374329434462657994582932197716438449813051892206539805784495328239340083876191928701583869517785
1024-bit
prime: 2^680 + 2^8 + 0x8d = 5016456510113118655434598811035278955030765345404790744303017523831112055108147451509157692220295382716162651878526895249385292291816524375083746691371804094271873160484737966720260389217684476157468082573
offset: 1419779506494762106872207064140321832088062279544193396087847491461758272325229673230371772250864096521202355549365628174669108571814760471015076148029755969804077320157692458563003215304957150157403644460363550505412711285966361610267868082893823963790439336411086884584107735010676915

See the main FNV page for details.

As a practical matter:

32-bit UInt32,
64-bit UInt64, and
128-bit Guid can be useful

All my results are with the 32-bit variant.
FNV-1 better than FNV-1a?

No. FNV-1a is all around better. There was more collisions with FNV-1a when using the English word corpus:

Hash Word Collisions
====== ===============
FNV-1 1
FNV-1a 4

Now compare lowercase and uppercase:

Hash lowercase word Collisions UPPERCASE word collisions
====== ========================= =========================
FNV-1 1 9
FNV-1a 4 11

In this case FNV-1a isn’t “400%” worse than FN-1, only 20% worse.

I think the more important takeaway is that there are two classes of algorithms when it comes to collisions:

collisions rare: FNV-1, FNV-1a, DJB2, DJB2a, SDBM
collisions common: SuperFastHash, Loselose

And then there’s the how evenly distributed the hashes are:

outstanding distribution: Murmur2, FNV-1a, SuperFastHas
excellent distribution: FNV-1
good distribution: SDBM, DJB2, DJB2a
horrible distribution: Loselose

Update

Murmur? Sure, why not

Update

@whatshisname wondered how a CRC32 would perform, added numbers to the table.

CRC32 is pretty good. Few collisions, but slower, and the overhead of a 1k lookup table.

Snip all erroneous stuff about CRC distribution – my bad

Up until today I was going to use FNV-1a as my de facto hash-table hashing algorithm. But now I’m switching to Murmur2:

Faster
Better randomnessification of all classes of input

And I really, really hope there’s something wrong with the SuperFastHash algorithm I found; it’s too bad to be as popular as it is.

Update: From the MurmurHash3 homepage on Google:

(1) – SuperFastHash has very poor collision properties, which have been documented elsewhere.

So I guess it’s not just me.

Update: I realized why Murmur is faster than the others. MurmurHash2 operates on four bytes at a time. Most algorithms are byte by byte:

for each octet in Key
AddTheOctetToTheHash

This means that as keys get longer Murmur gets its chance to shine.

Update
GUIDs are designed to be unique, not random

A timely post by Raymond Chen reiterates the fact that “random” GUIDs are not meant to be used for their randomness. They, or a subset of them, are unsuitable as a hash key:

Even the Version 4 GUID algorithm is not guaranteed to be unpredictable, because the algorithm does not specify the quality of the random number generator. The Wikipedia article for GUID contains primary research which suggests that future and previous GUIDs can be predicted based on knowledge of the random number generator state, since the generator is not cryptographically strong.

Randomess is not the same as collision avoidance; which is why it would be a mistake to try to invent your own “hashing” algorithm by taking some subset of a “random” guid:

int HashKeyFromGuid(Guid type4uuid)
{
//A “4” is put somewhere in the GUID.
//I can’t remember exactly where, but it doesn’t matter for
//the illustrative purposes of this pseudocode
int guidVersion = ((type4uuid.D3 & 0x0f00) >> 8);
Assert(guidVersion == 4);

return (int)GetFirstFourBytesOfGuid(type4uuid);
}

Note: Again, I put “random GUID” in quotes, because it’s the “random” variant of GUIDs. A more accurate description would be Type 4 UUID. But nobody knows what type 4, or types 1, 3 and 5 are. So it’s just easier to call them “random” GUIDs.
All English Words mirrors

http://www.filedropper.com/allenglishwords
https://web.archive.org/web/20070221060514/http://www.sitopreferito.it/html/all_english_words.html

shareimprove this answer

edited Nov 9 ’15 at 18:12

answered Apr 23 ’12 at 12:42
Ian Boyd
12.6k1911

181

some of the collisions make awesome band names. Particularly ‘Adorable Rentability’ – mcfinnigan Apr 23 ’12 at 14:35
27

Also, I’d love to hear how you generated these results (source code and/or tools) – Earlz Apr 23 ’12 at 17:40
32

@Earlz Development tool is Delphi. i assume you mean the images though. For “linear” map i created a square bitmap of size nxn, (where n = Ceil(sqrt(hashTable.Capacity))). Rather than simply black for list entry is occupied and white for list entry is empty, i used an HSLtoRGB function, where the hue ranged from 0 (red) to 300 (magenta). White is still an “empty list cell”. For the Hilbert map i had to hunt wikipedia for the algorithm that turns an index into an (x,y) coordinate. – Ian Boyd Apr 23 ’12 at 18:15
32

I’ve removed a number of comments that were along the lines of “+1 great answer!” – Please don’t post comments that don’t ask for clarifications or add information to the answer, if you feel it’s a great answer, upvote it 😉 – Yannis♦ Apr 28 ’12 at 20:34
11

It would be really interesting to see how SHA compares, not because it’s a good candidate for a hashing algorithm here but it would be really interesting to see how any cryptographic hash compares with these made for speed algorithms. – Michael May 25 ’12 at 15:09
show 69 more comments
up vote 30 down vote

Here is a list of hash functions, but the short version is:

If you just want to have a good hash function, and cannot wait, djb2 is one of the best string hash functions i know. It has excellent distribution and speed on many different sets of keys and table sizes

unsigned long
hash(unsigned char *str)
{
unsigned long hash = 5381;
int c;

while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */

return hash;
}

shareimprove this answer

answered Feb 19 '11 at 1:13
Dean Harding
17.8k33967

Actually djb2 is zero sensitive, as most such simple hash functions, so you can easily break such hashes. It has a bad bias too many collisions and a bad distribution, it breaks on most smhasher quality tests: See github.com/rurban/smhasher/blob/master/doc/bernstein His cdb database uses it, but I wouldn't use it with public access. – rurban Aug 20 '14 at 6:03
add a comment
up vote 26 down vote

If you are wanting to create a hash map from an unchanging dictionary, you might want to consider perfect hashing https://en.wikipedia.org/wiki/Perfect_hash_function – during the construction of the hash function and hash table, you can guarantee, for a given dataset, that there will be no collisions.
shareimprove this answer

answered May 25 '12 at 3:16
Damien
26132

+1 I wasn't aware of such an algorithm – Earlz May 25 '12 at 5:00
2

Here's more about (minimal) Perfect Hashing burtleburtle.net/bob/hash/perfect.html including performance data, although it doesn't use the most current processor etc. – Ellie Kesselman May 29 '12 at 12:24
3

It's pretty obvious, but worth pointing out that in order to guarantee no collisions, the keys would have to be the same size as the values, unless there are constraints on the values the algorithm can capitalize on. – devios Apr 4 '13 at 20:34

I improved gperf and provide a nice frontend to most perfect hash generators at github.com/rurban/Perfect-Hash It's not yet finished, but already better then the existing tools. – rurban Aug 20 '14 at 6:05

@devios: I've recently learned that there are several hash table algorithms that guarantee no collisions, even when you use long strings as keys, strings much longer than the hash table index values generated by the hash function, without any constraints on those strings. See cs.stackexchange.com/questions/477/… . – David Cary Jun 8 '15 at 17:11
show 1 more comment
up vote 20 down vote

CityHash by Google is the algorithm you are looking for. It is not good for cryptography but is good for generating unique hashes.

Read the blog for more details and the code is available here.

CityHash is written in C++. There also is a plain C port.

About 32-bit support:

All the CityHash functions are tuned for 64-bit processors. That said, they will run (except for the new ones that use SSE4.2) in 32-bit code. They won't be very fast though. You may want to use Murmur or something else in 32-bit code.

shareimprove this answer

edited May 29 '12 at 11:56
JanX2
1051

answered May 25 '12 at 10:29
Vipin Parakkat
30924

Is CityHash pronounced similar to "City Sushi?" – Eric Mar 20 '13 at 21:20

Have a look at SipHash too, it is meant to replace MurmurHash/CityHash/etc. : 131002.net/siphash – Török Edwin Oct 15 '13 at 8:47
1

Also see FarmHash, a successor to CitHash. code.google.com/p/farmhash – stevendaniels Mar 18 '15 at 13:15
1

xxHash claims to be 5x faster than CityHash. – Clay Bridges May 22 '15 at 15:56
add a comment
up vote 12 down vote

The SHA algorithms (including SHA-256) are designed to be fast.

In fact, their speed can be a problem sometimes. In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10,000 times (storing the hash of the hash of the hash of the hash of the … password).

#!/usr/bin/env ruby
require 'securerandom'
require 'digest'
require 'benchmark'

def run_random_digest(digest, count)
v = SecureRandom.random_bytes(digest.block_length)
count.times { v = digest.digest(v) }
v
end

Benchmark.bmbm do |x|
x.report { run_random_digest(Digest::SHA256.new, 1_000_000) }
end

Output:

Rehearsal ————————————
1.480000 0.000000 1.480000 ( 1.391229)
————————— total: 1.480000sec

user system total real
1.400000 0.000000 1.400000 ( 1.382016)

shareimprove this answer

answered Feb 19 '11 at 0:21
yfeldblum
1,32579

It's relatively fast, sure, for a cryptographic hashing algorithm. But the OP just wants to store values in a hashtable, and I don't think a cryptographic hash function is really appropriate for that. – Dean Harding Feb 19 '11 at 1:10
6

The question brought up (tangentially, it now appears) the subject of the cryptographic hash functions. That's the bit I am responding to. – yfeldblum Feb 22 '11 at 13:14
7

Just to put people off the idea of "In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10,000 times" — while common, that's just plain stupid. There are algorithms designed for these scenarios, e.g., bcrypt. Use the right tools. – TC1 Oct 14 '13 at 13:19
1

Cryptographic hashes are designed to have a high throughput, but that often means they have high setup, teardown, .rodata and/or state costs. When you want an algorithm for a hashtable, you usually have very short keys, and lots of them, but do not need the additional guarantees of a cryptographic has. I use a tweaked Jenkins’ one-at-a-time myself. – mirabilos Dec 6 '13 at 13:57
add a comment
up vote 9 down vote

I've plotted a short speed comparasion of different hashing algorithms when hashing files.

The individual plots only differ slightly in the reading method and can be ignored here, since all files were stored in a tmpfs. Therefore the benchmark was not IO-Bound if you are wondering.

Linear Scale
Logarithmic scale

Algorithms include: SpookyHash, CityHash, Murmur3, MD5, SHA{1,256,512}.

Conclusions:

Non-cryptographich hasfunctions like Murmur3, Cityhash and Spooky are pretty close together. One should note that Cityhash may be faster on CPUs with SSE 4.2s CRC instruction, which my CPU does not have. SpookyHash was in my case always a tiny bit before CityHash.
MD5 seems to be a good tradeoff when using cryptographic hashfunctions, although SHA256 may be more secure to the collision vulnerabilities of MD5 and SHA1.
The complexity of all algorithms is linear – which is really not surprising since they work blockwise. (I wanted to see if the reading method makes a difference, so you can just compare the rightmost values).
SHA256 was slower than SHA512.
I did not investigate the randomness of the hashfunctions. But here is a good comparasion of the hashfunctions that are missing in Ian Boyds answer. This points out that CityHash has some problems in corner cases.

The source used for the plots:

https://github.com/sahib/rmlint/tree/gh-pages/plots (sorry for the ugly code)