I have written a new version of the PBKDF2 algorithm in Haskell. It passes almost all of the HMAC-SHA-1 test vectors listed in RFC 6070, but it is not very efficient. How can I improve the code?
When I run it on the test vectors, the third case (see below) never finishes (I left it running for more than 1/2 hour on a 2010 Macbook Pro).
I believe that the foldl'
is my problem. Will foldr
perform better, or do I need to use mutable arrays?
{-# LANGUAGE BangPatterns #-}
{- Copyright 2013, G. Ralph Kuntz, MD. All rights reserved. LGPL License. -}
module Crypto where
import Codec.Utils (Octet)
import qualified Data.Binary as B (encode)
import Data.Bits (xor)
import qualified Data.ByteString.Lazy.Char8 as C (pack)
import qualified Data.ByteString.Lazy as L (unpack)
import Data.List (foldl')
import Data.HMAC (hmac_sha1)
import Text.Bytedump (dumpRaw)
-- Calculate the PBKDF2 as a hexadecimal string
pbkdf2
:: ([Octet] -> [Octet] -> [Octet]) -- pseudo random function (HMAC)
-> Int -- hash length in bytes
-> String -- password
-> String -- salt
-> Int -- iterations
-> Int -- derived key length in bytes
-> String
pbkdf2 prf hashLength password salt iterations keyLength =
let
passwordOctets = stringToOctets password
saltOctets = stringToOctets salt
totalBlocks =
ceiling $ (fromIntegral keyLength :: Double) / fromIntegral hashLength
blockIterator message acc =
foldl' (\(a, m) _ ->
let !m' = prf passwordOctets m
in (zipWith xor a m', m')) (acc, message) [1..iterations]
in
dumpRaw $ take keyLength $ foldl' (\acc block ->
acc ++ fst (blockIterator (saltOctets ++ intToOctets block)
(replicate hashLength 0))) [] [1..totalBlocks]
where
intToOctets :: Int -> [Octet]
intToOctets i =
let a = L.unpack . B.encode $ i
in drop (length a - 4) a
stringToOctets :: String -> [Octet]
stringToOctets = L.unpack . C.pack
-- Calculate the PBKDF2 as a hexadecimal string using HMAC and SHA-1
pbkdf2HmacSha1
:: String -- password
-> String -- salt
-> Int -- iterations
-> Int -- derived key length in bytes
-> String
pbkdf2HmacSha1 =
pbkdf2 hmac_sha1 20
Third test vector
Input:
P = "password" (8 octets)
S = "salt" (4 octets)
c = 16777216
dkLen = 20
Output:
DK = ee fe 3d 61 cd 4d a4 e4
e9 94 5b 3d 6b a2 15 8c
26 34 e9 84 (20 octets)
I was able to get it to complete in ~16 min on my MacBookPro:
by changing the strictness of your fold:
Note how I force the full evaluation of each
zipWith xor
. In order to calculatesum cs
into WHNF, we must know the exact value of each element incs
.This prevents building up a chain of thunks, which I think your existing code was attempting to do, but failing, as
foldl'
only forces the accumulator into WHNF. Since your accumulator was a pair, the WHNF is just(_thunk, _another_thunk)
, so your intermediate thunks were not getting forced.