Catena Variants

Lucks, Stefan; Wenzel, Jakob

doi:10.1007/978-3-319-29938-9_7

Stefan Lucks¹⁷ &
Jakob Wenzel¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 9551))

Included in the following conference series:

International Conference on Passwords

1050 Accesses
1 Citations

Abstract

Catena is a password-scrambling framework characterized by its high flexibility. The user (defender) can simply adapt the underlying (cryptographic) primitives, the underlying memory-hard function, and the time ($\lambda $) and memory (garlic) parameters, to render it suitable for a wide range of applications. This enables Catena to maximize the defense against specific adversaries, their capabilities and goals, and to cope with a high variation of hardware and constraints on the side of the defender. Catena has obtained special recognition of the Password Hashing Competition (PHC), alongside of the winner Argon2.

In addition to the default instantiations presented in the PHC submission, we want to use this document to introduce further variants of Catena, or rather, further instantiations of the Catena framework. Our instantiations use different hash functions, and we evaluate their influence on the computational time and the throughput. Next, we discuss how instantiations of the memory-hard graph-based algorithm influence the computational time and resistance against low-memory attacks. Furthermore, we introduce possible extensions of Catena accommodating strong resistance against GPU- and ASIC-based adversaries, e.g., by providing sequential memory-hardness due to a data-dependent indexing function. At the end, we combine particular instantiations discussed so far to construct full-fledged variants of Catena for certain goals. Hence, this document can be seen as an additional guide to the PHC submission of Catena when considering its usage under certain restrictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 34.99; Price excludes VAT (USA)

Softcover Book: USD 44.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Note that the tweak is given as the hash value of an instantiation-specific identifier, a mode, $\lambda $, the output size and the size of the salt in bit, and the associated data. Thus, the tweak is unique for each system and each user within a system.

References

Alwen, J., Serbinenko, V.: High parallel complexity graphs and memory-hard functions. In: Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, STOC 2015, Portland, OR, USA, June 14–17, 2015, pp. 595–603 (2015)
Google Scholar
Aumasson, J.-P.: Password Hashing Competition (2015). https://password-hashing.net/call.html. Accessed 3 September 2015
Aumasson, J.-P.: Password Hashing Competition - Candidates. https://password-hashing.net/candidates.html
Aumasson, J.-P., Neves, S., Wilcox-O’Hearn, Z., Winnerlein, C.: BLAKE2: Simpler, Smaller, Fast as MD5. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 119–135. Springer, Heidelberg (2013)
Chapter Google Scholar
Bernstein, D.J.: Cache-timing attacks on AES (2005)
Google Scholar
Biryukov, A., Dinu, D., Khovratovich, D.: Argon2. Password Hashing Competition, Winner (2015). https://www.cryptolux.org/index.php/Argon2
Biryukov, A., Khovratovich, D.: Tradeoff cryptanalysis of Catena. PHC mailing list: discussions@password-hashing.net
Google Scholar
Biryukov, A., Khovratovich, D.: Tradeoff cryptanalysis of memory-hard functions. IACR Cryptol. ePrint Arch. 2015, 227 (2015)
MathSciNet MATH Google Scholar
Bonneau, J.: The science of guessing: analyzing an anonymized corpus of 70 million passwords. In: 2012 IEEE Symposium on Security and Privacy, May 2012
Google Scholar
Brent, R.P., Gaudry, P., Thomé, E., Zimmermann, P.: Faster Multiplication in GF(2) [x]. In: ANTS, pp. 153–166 (2008)
Google Scholar
Cox, B.: TwoCats (and SkinnyCat): A Compute Time and Sequential Memory Hard Password Hashing Scheme (2014). https://password-hashing.net/submissions/specs/TwoCats-v0.pdf
Forler, C., List, E., Lucks, S., Wenzel, J.: Overview of the candidates for the password hashing competition - and their resistance against garbage-collector attacks. IACR Cryptol. ePrint Arch. 2014, 881 (2014)
Google Scholar
Forler, C., Lucks, S., Wenzel, J.: Catena: A Memory-Consuming Password Scrambler. Cryptology ePrint Archive, Report 2013/525 (2013). http://eprint.iacr.org/
Forler, Christian, Lucks, Stefan, Wenzel, Jakob: Memory-demanding password scrambling. In: Sarkar, Palash, Iwata, Tetsu (eds.) ASIACRYPT 2014, Part II. LNCS, vol. 8874, pp. 289–305. Springer, Heidelberg (2014)
Google Scholar
Forler, C., Lucks, S., Wenzel, J.: The Catena Password-Scrambling Framework. Password Hashing Competition, 2nd round submission (2015). https://password-hashing.net/submissions/specs/Catena-v3.pdf
funkysash. catena-variants (2015). https://github.com/medsec/catena-variants
Gray, F.: Pulse Code Communication. Bell Telephone Labor Inc., New York (1953). US Patent 2,632,058,
Google Scholar
Gueron, S., Kounavis, M.E.: Intel carry-less multiplication instruction and its usage for computing the GCM Mode - Rev 2.01. Intel White Paper. Technical report, Intel corporation, September 2012
Google Scholar
Harris, B.: Replacement index function for data-independent schemes (Catena) (2015). http://article.gmane.org/gmane.comp.security.phc/2457/match=grey
HPSchilling. catena-variants (2015). https://github.com/HPSchilling/catena-variants
Kaliski, B.: RFC 2898 - PKCS #5: Password-Based cryptography specification Version 2.0. Technical report, IETF (2000)
Google Scholar
Lengauer, T., Tarjan, R.E.: Asymptotically tight bounds on time-space trade-offs in a pebble game. J. ACM 29(4), 1087–1130 (1982)
Article MathSciNet MATH Google Scholar
Lystad, T.A.: Leaked password lists and dictionaries - The Password Project. http://thepasswordproject.com/leaked_password_lists_and_dictionaries. Accessed 16 May 2013
McGrew, D.A., Viega, J.: The security and performance of the Galois/Counter Mode (GCM) of operation. In: Canteaut, A., Viswanathan, K. (eds.) INDOCRYPT 2004. LNCS, vol. 3348, pp. 343–355. Springer, Heidelberg (2004)
Chapter Google Scholar
Percival, C.: Stronger Key Derivation via Sequential Memory-Hard Functions. presented at BSDCan 2009, May 2009
Google Scholar
Peslyak, A.: yescrypt - a Password Hashing Competition submission (2015). https://password-hashing.net/submissions/specs/yescrypt-v1.pdf
Pornin, T.: The MAKWA Password Hashing Function (2015). https://password-hashing.net/submissions/specs/Makwa-v1.pdf
Provos, N., Mazières, D.: A future-adaptable password scheme. In: USENIX Annual Technical Conference, FREENIX Track, pp. 81–91. USENIX (1999)
Google Scholar
Shand, M., Bertin, P., Vuillemin, J.: Hardware speedups in long integer multiplication. In: SPAA, pp. 138–145 (1990)
Google Scholar
Simplicio, M., Almeida, L., dos Santos, P., Barreto, P.: The Lyra2 reference guide. Password Hashing Competition, 2nd round submission (2015). https://password-hashing.net/submissions/specs/Lyra2-v2.pdf
Soderquist, P., Leeser, M.: An area/performance comparison of subtractive and multiplicative divide/square root implementations. In: 12th Symposium on Computer Arithmetic ARITH-12 1995, July 19–21, 1995, Bath, England, UK, pp. 132–139 (1995)
Google Scholar
Cox, B.: MultHash - A simple multiplication speed limited hash function (2014). https://github.com/medsec/catena/blob/3a3ce823d4c54f2da33757bf8f6389488c31bd93/src/catena-multhash.c. (waywardgeek)

Download references

Acknowledgement

We would like to thank S. Schmidt and H. Schilling for their work on the reference implementation of Catena as well as on the tool Catena-Variants, E. List for his helpful comments and fruitful discussions, and H. Schilling for his analysis of the underlying graph-based structures. Furthermore, we would like to thank the reviewers of the Passwords 2015 for their helpful comments.

Author information

Authors and Affiliations

Bauhaus-Universität Weimar, Weimar, Germany
Stefan Lucks & Jakob Wenzel

Authors

Stefan Lucks
View author publications
You can also search for this author in PubMed Google Scholar
Jakob Wenzel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jakob Wenzel .

Editor information

Editors and Affiliations

University of Cambridge, Cambridge, UK
Frank Stajano
Norwegian University of Science and Technology, Trondheim, Norway
Stig F. Mjølsnes
University of Cambridge, Cambridge, UK
Graeme Jenkinson
God Praksis AS, Fyllingsdalen, Norway
Per Thorsheim

Appendices

A Memory-Hardness and Garbage-Collector Attacks

1.1 A.1 Memory-Hardness

In this part of the paper we show the definition of memory-hardness as it was described in [15]. The intuition is that for any parallelized attack (using $b $ cores) the required memory is decreased by a factor of $1/b $ per core, and vice versa.

Definition 1

(Memory-Hard Function). Let g denote the memory cost factor. For all $\alpha >0$, a memory-hard function f can be computed on a Random Access Machine using S(g) space and T(g) operations, where $S(g) \in \varOmega (T(g)^{1-\alpha })$.

Thus, for $S\cdot T=G^2$ with $G = 2^g$, using $b $ cores, we have

$$ \left( \frac{1}{b}\cdot S\right) \cdot \left( b \cdot T\right) = G^2. $$

A formal generalization of this notion is given in the following definition.

Definition 2

($\lambda -$Memory-Hard Function). Let g denote the memory cost factor. For a $\lambda -$memory-hard function f, which is computed on a Random Access Machine using S(g) space and T(g) operations with $G = 2^g$, it holds that

$$ T(g) = \varOmega \left( \frac{G^{\lambda +1}}{S(g)^{\lambda }}\right) . $$

Thus, we have

$$ \left( \frac{1}{b}\cdot S^\lambda \right) \cdot \left( b \cdot T\right) = G^{\lambda +1}. $$

Since we also consider instantiations of Catena which fulfill the definition of sequential memory-hardness (see [25]), we highly recommend the reader to have a look at the discussion $\lambda $-Memory-Hard vs. Sequential Memory-Hard given in Sect. 2.2 of [15].

1.2 A.2 (Weak) Garbage-Collector Attacks

The motivation behind these attacks is to exploit the fact that a password scrambler may leave its password-dependent internal state in memory for a long time (during its invocation). An adversary then gains benefits if either the memory access to the internal state depends on the password (Garbage-Collector Attack) or if a value directly derived from the password (e.g., using a fast hash function) remains in memory (Weak Garbage-Collector Attack). Based on such attacks, an adversary is able to filter password candidates using significant less time/memory in comparison to the original algorithm.

Definition 3

(Garbage-Collector Attack). Let ${P}_{G}(\cdot ) $ be a memory-consuming password scrambler that depends on a memory-cost parameter G and let Q be a positive constant. Furthermore, let v denote the internal state of ${P}_{G}(\cdot ) $ after its termination. Let $\mathcal {A}$ be a computationally unbounded but always-halting adversary conducting a garbage-collector attack. We say that $\mathcal {A}$ is successful if some knowledge about v reduces the runtime of $\mathcal {A}$ for testing a password candidate x from $\mathcal {O}({P}_{G}(x))$ to $\mathcal {O}(f(x))$ with $\mathcal {O}(f(x)) \lll \mathcal {O}({P}_{G}(x))/Q, \forall x \in \{0,1\}^{*}$.

In the following we define the Weak Garbage-Collector (WGC) Attack.

Definition 4

(Weak Garbage-Collector Attack). Let ${P}_{G}(\cdot ) $ be a password scrambler that depends on a memory-cost parameter G, and let $R(\cdot )$ be an underlying function of ${P}_{G}(\cdot ) $ that can be computed efficiently. We say that an adversary $\mathcal {A}$ is successful in terms of a weak garbage-collector attack if a value $y=R({\textit{pwd}})$ remains in memory during (almost) the entire runtime of ${P}_{G}({\textit{pwd}})$, where pwd denotes the secret input.

An adversary that is capable of reading the internal memory of a password scrambler during its invocation gains knowledge about y. Thus, it can reduce the effort for filtering invalid password candidates by just computing $y' = R(x)$ and checking whether $y = y'$, where x denotes the current password candidate. Note that the function R can also be the identity function. Then, the plain password remains in memory, rendering WGC attacks trivial.

B Hash-Function Instantiations

1.1 B.1 Compression Function of Argon2

Here we consider the underlying compression function of the PHC winner Argon2 [6]. The function CF (within the specification of Argon2, this function is called G, but since we use G already for the variants of the round function of BLAKE2b, we rename it here to CF) is built upon a permutation P using the BLAKE2b round function $G_L$ as defined in Lyra2 and is formally defined in Algorithm 4. Note that the function $G_L$, in comparison to the original one of BLAKE2b ($G_O$), does neither consider the message schedule nor handling of the message input. The input state words are directly written to the internal state (a, b, c, d) and then processed as shown in Algorithm 5 (left).

The function CF, using $G_L$, takes two 8192-bit values X, Y as input and outputs a 8192-bit value Z. The value $R = X \oplus Y$ (Line 1) is represented as an $8\times 8$ matrix consisting of 64 128-bit values $R_0,\ldots ,R_{63}$ ordered from top to down and left to right. First, all rows are updated using the permutation P generating an intermediate matrix Q (Lines 2–4). Then, all columns of Q are updated using P (Lines 5–7). The output of CF is then given by the XOR operation of the input matrix R and the intermediate matrix Q. Based on the large state of CF, it is possible to fill memory quite fast, e.g., about 1 GB memory is filled in less than 1 s [6]. For a formal definition of P we refer to the specification of Argon2 [6].

Even if the function CF requires two calls to the underlying round function of BLAKE2b to process 1024 bit, it has a significant speed-up in comparison to the way BLAKE2b-1 is used within Catena (see Table 2). The speed-up comes from the fact that CF processes 16384 bit per invocation whereas BLAKE2b-1 processes only 1024 bit. Thus, Catena requires significant more loading operations in comparison to Argon2 when processing inputs of equal size.

Note that we provide another instantiation of $H '$ in Sect. 3 called $\text {P}_{\text {compress}}$. This is a modification of the permutation P, where we split the output of P into two 512-bit halves and combine them using the XOR operation. This is a simple way to build a compression function out of the permutation P. We introduce $\text {P}_{\text {compress}}$ for comparison with BLAKE2b-1, which includes the regular message schedule, the initialization, and finalization of the round function of BLAKE2b (as shown in Algorithm 6), whereas $\text {P}_{\text {compress}}$ does not. Nevertheless, we do not consider it as a recommended instantiation of $H '$ since, due to the smaller size of the state words, it will never achieve the same throughput as the function CF.

1.2 B.2 BlaMka

The function $G_B$ was introduced by the authors of Lyra2 [30]. It described a slightly adjusted variant of $G_L$ (see Algorithm 5 for a comparison) and is there used as the underlying permutation of a sponge-based structure. Due to the missing cryptanalysis of BlaMka, the authors do not recommend its usage. Nevertheless, since it consists of a neat combination of ARX (Addition, Rotation, XOR) and integer multiplication, we also consider it as an alternative to $G_L$.

The only difference between the functions $G_L$ (see Algorithm 5 (left)) and $G_B$ (see Algorithm 5 (right)) is that each addition $a+b$ is replaced by a variant of the latin-square operation, namely $a+b+2\cdot lsw(a)\cdot lsw(b)$, where lsw(x) denotes the least significant 32-bit word of x. The idea behind the so called multiplication.-hardening is to provide a similar runtime for hardware- and software-based adversaries [29, 31].

Note that for comparison, we restate BLAKE2b-1 as used in the default instantiations of Catena (see Algorithm 6). It conducts $G_O$ ensuring that 12 invocations of BLAKE2b-1 are quite similar to one invocation of full BLAKE2b.

1.3 B.3 Galois-Field Multiplication

We took the Galois-Field multiplication into account since it allows to be used as a really fast compression function, especially when considering binary Galois Fields with an order that is a power of 2, since its coefficients can be represented by bits and addition/subtraction is equivalent to the XOR operation. We decided to conduct the multiplication in $GF(2^{128})$ using the reduction polynom $f(x) = x^{128} + x^7 + x^2 + x + 1$ from the well-analyzed and widely used Galois/Counter Mode (GCM) [24]. We provide the optimized version of the right-to-left multiplication as suggested in [10] or, if available, a version that uses the pclmulqdq instruction described in [18]. Note that in contrast to the multiplication used within GCM, we do not use bit reflection.

1.4 B.4 MultHash

The function MultHash is an experimental hash-function design by Cox [32] with the goal to max out the memory bandwidth. It processes two $\ell $-bit inputs A and B by splitting them into $\ell /32$-bit chunks and then conducts $\ell /32$ 32-bit multiplications of the form $C_i = (C_{i-1} \cdot (A\ |\ 3) \cdot B)\,{\text {mod}}\,2^{32}$, where $C_{i-1}$ denotes the former computed chaining value. Note that, especially in comparison to the Galois-Field multiplication presented before, the MultHash operation should be considered insecure. Nevertheless, due to its neat property of filling the internal state significantly fast, we include it in the comparison of possible instantiations of $H '$ even if it will not be part of our recommended variants of Catena.

C Extensions of Catena

1.1 C.1 Password-Independent Random Layer

This part of the section is devoted to the function $\varGamma $, which reflects a graph-based structure generated by a password-independent indexing function R as shown in Algorithm 7 (left). As defined in the specification of Catena, it overwrites $2^{\lceil 3g/4 \rceil }$ randomly chosen state words. In respect of the memory requirement in Table 5, it is intuitive to say that the penalty for a $\text {GR}\ell _{\lambda }^{g}$ will not increase significantly since almost the whole state is already stored. On the other hand, when considering a $\text {BRG}_\lambda ^g$, one would expect an increased penalty.

First, we wanted to know if the currently designated position of $\varGamma $ (invoked before $F$, see Algorithm 2) does provide the best possible resistance (in comparison to other chosen positions) against the precomputation method discussed in the former section. Therefore, we integrated the function (layer) $\varGamma $ into $F$, fixed $g = 12$ (for time reasons), and recomputed the penalties for different positions of $\varGamma $ within $F$, different values of $\lambda $, and the graph instantiations presented in Sect. 4. Further, we set the public input $\gamma $ to the salt and fixed it to a constant value ($\gamma = 0$). We see this as a valid approach since if R is instantiated with a reasonable strong pseudorandom number generator, all indices computed during the invocation of $\varGamma $ are randomly distributed values. The penalties depending on the positions of $\varGamma $ are shown in Table 8, where it is easy to see that placing the random layer $\varGamma $ at the third-last position leads, in general, to the highest penalty.

Table 8. Penalties depending on the position of $\varGamma $ within $F$ and the particular graph instance, where $F _i$ denotes the i-th layer of $F$ and $g = 12$. The values are given for $\lambda = 2$ (left) and $\lambda = 3$ (right).

Full size table

Thus, we assumed this configuration when computing the impact of the random layer for the recommended values of the garlic, i.e., $g \in \{18,21\}$. Our first intuition was confirmed by the results shown in Table 9. Note that due to time issues, the values for the precomputation method with $\lambda = 3$ and $g = 21$ are a rough estimation based on the values for $\lambda = 3$ in Table 6.

Table 9. Penalties depending on the garlic $g$, the attack method, and the particular graph instance extended by a random layer $\varGamma $, where the values are given for $\lambda = 2$ (left) and $\lambda = 3$ (right).

Full size table

By comparing the values presented in Tables 6 and 9, we observed that the penalties only clearly increase for a $\text {BRG}_\lambda ^g$. Thus, we can conclude that the additional invocation of $\varGamma $ does help substantially against TMTO attacks based on the precomputation method, since the penalties given for a $\text {BRG}_\lambda ^g$ are still much smaller than that of a $\text {GR}\ell _{\lambda }^{g}$. Nevertheless, the main aspect of including $\varGamma $ in our core function $flap$ was the increased resistance against ASIC-based adversaries, and not to thwart TMTO attacks. Furthermore, the penalties presented in Table 9 only hold if an adversary does not have the additional $2^{3g/4} \cdot n$ bit of memory available.

1.2 C.2 Password-Dependent Random Layer

This extension is described by introducing a further graph-based layer below the call to $F$ (see Algorithm 8, Line 7). In general, the function $\varPhi $ is used to update the internal state v of size $2^g\cdot n$ bit by sequentially updating each state word. An update of a state word $v_i$ depends on two inputs (see Algorithm 7 (right)): First, the immediate predecessor $v_{i-1}$ (or $v_{2^g-1}$ if $i = 0$) and second, a value chosen uniformly at random from the state, where the index is determined by the indexing function R.

More detailed, the first update depends on the secret value $\mu $, which we set per default to the value $v_{2^g-1}$ of the output state of $F$. All subsequent updates dependent on the previous computed state value $v_{i-1}$ (see Line 4 of Algorithm 7 (right)). Thus, we basically follow a slightly more generic approach in comparison to the ROMix function used within scrypt [25]. The function R can, for example, be given by xorshift1024star as used in the default instantiations of Catena [15], or a similar function to the one proposed by the designers of Argon2 [6], which uses the g least significant bits of the previous computed value to determine the new index.

It is easy to see that $\varPhi $ conducts sequential writes, whereas $\varGamma $ conducts writes to randomly determined words of the state. Thus, by calling $\varPhi $, Catena provides sequential memory-hardness as defined in [25]. Note that if the extension $\varPhi $ is added to $flap$, the function $F$ must return the full state v (see Line 6 of Algorithm 8) instead of a single value.

D Penalties Caused by Shifting Sampling Points

Table 10. Relative costs (penalty) depending on the shift of the sampling points and the depth $\lambda $.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lucks, S., Wenzel, J. (2016). Catena Variants. In: Stajano, F., Mjølsnes, S.F., Jenkinson, G., Thorsheim, P. (eds) Technology and Practice of Passwords. PASSWORDS 2015. Lecture Notes in Computer Science(), vol 9551. Springer, Cham. https://doi.org/10.1007/978-3-319-29938-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-29938-9_7
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-29937-2
Online ISBN: 978-3-319-29938-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A Memory-Hardness and Garbage-Collector Attacks

1.1 A.1 Memory-Hardness

Definition 1

Definition 2

1.2 A.2 (Weak) Garbage-Collector Attacks

Definition 3

Definition 4

B Hash-Function Instantiations

1.1 B.1 Compression Function of Argon2

1.2 B.2 BlaMka

1.3 B.3 Galois-Field Multiplication

1.4 B.4 MultHash

C Extensions of Catena

1.1 C.1 Password-Independent Random Layer

1.2 C.2 Password-Dependent Random Layer

D Penalties Caused by Shifting Sampling Points

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation