研究生: |
賴鉅元 Lai, Jyu-Yuan |
---|---|
論文名稱: |
適用於高效能橢圓曲線密碼系統處理器之設計架構 Design Framework for High-Performance Elliptic Curve Cryptographic Processors |
指導教授: |
黃稚存
Huang, Chih-Tsun |
口試委員: | |
學位類別: |
博士 Doctor |
系所名稱: |
電機資訊學院 - 資訊工程學系 Computer Science |
論文出版年: | 2010 |
畢業學年度: | 98 |
語文別: | 英文 |
論文頁數: | 109 |
中文關鍵詞: | 橢圓曲線密碼系統 、公開金鑰密碼系統 、超大型積體電路 |
外文關鍵詞: | Elliptic curve cryptography, public-key cryptography, Very-large-scale integration |
相關次數: | 點閱:2 下載:0 |
分享至: |
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報 |
我們提出一個適用於高效能橢圓曲線密碼系統 (Elliptic Curve Cryptographic, 簡稱ECC) 處理器的設計架構,以及其相對應適用於高成本效益設計探索的系統化設計方法學。首先,我們提出了一個使用一個到四個算術元件的平行並且具延展性的橢圓曲線密碼硬體架構,它可以適用於質數有限場 (GF(p)) 以及二元有限場 (GF(2m)) 的橢圓曲線算術運算。這個能執行雙場 (質數有限場與二元有限場) 運算的橢圓曲線密碼處理器核心支援的完整密碼系統功能夠適用於任意的橢圓曲線及不同位元長度的任意有限場,並且可以被用來執行實際的安全性應用程式,像是橢圓曲線數位簽章演算法 (Elliptic Curve Digital Signature Algorithm, 簡稱ECDSA) 和資料的加解密。第二,依據這個可延展的硬體架構,我們提出一個有效率的二階段運算排程方法,即粗略 (Coarse-Grained) 與精密 (Fine-Grained) 的兩階段有效排程。當硬體設計者指定不同的時間與硬體資源限制時,我們的兩階段運算排程方法能夠很快的並且有系統的去最佳化這個平行的硬體架構。第三,使用這些經過最佳化過的橢圓曲線密碼處理器核心當作設計模版,我們提出一個創新的使用了多個密碼處理器核心的橢圓曲線密碼處理器架構。如此一來,我們可以藉由將一個運算量很龐大的點純量乘法分解成數個可以同時執行的、小的純量乘法,來大幅的減少所需的運算時間。最後,我們提出一個適用於多個密碼處理器核心的橢圓曲線密碼處理器硬體架構的純量分割技術。使用我們提出的這個純量分割技術,我們可以自動的產生並分析整合了同質性與異質性密碼處理器核心的橢圓曲線密碼處理器。有了這完整的設計框架後,我們就可以在橢圓曲線密碼處理器的設計階層當中,去探討不同的平行程度所造成的影響,對於有不同硬體面積與效能需求的應用來說,它們的最佳化就可以被快速的並且有效率的達成。因此,設計一個高效能並且符合成本效益的密碼處理器將會變得更加的系統化。
使用130奈米CMOS的製程技術,我們採用了我們提出的二階段運算排程方法來實作出兩顆160位元的雙場橢圓曲線密碼系統處理器的測試晶片,這些測試晶片每一個都包含了四個可以平行執行的雙場32乘32位元的算術元件,可以用來加速橢圓曲線密碼的算術運算,它們也能讓我們對真實的晶片從實作、量測、到參數抽取有更實際的認識。第一顆晶片能夠支援在任意質數的質數場與任意不可約多項式的二元場內的任意橢圓曲線上的完整密碼功能,包含點的座標軸轉換、相同點的相加、相異點的相加、點的純量乘法、Montgomery乘法所需的前置與後序處理、取模數的指數運算、一般的有限場算術運算、與RSA的基本運算。另外,第二顆晶片整合了更先進的尋找有限場乘法反元素的方法以及由排程器控制的資料電路,可以提供更高的資料輸出率以及可動態調節耗能的運作模式,讓使用者能在效能與功率消耗上做權衡調整。這顆晶片的核心面積為1.35mm2,而包含了IO Pad的整體面積為4.97mm2。它可以支援同時使用四組算術元件的平行模式和單獨使用一組算術元件的循序模式,並能以相同的架構來支援質數場與二元場的密碼系統運算。量測的結果顯示在平行模式下,質數場中一個160位元的點的純量乘法以及點的座標軸轉換能以141MHz的時脈在385μs內完成,其中核心的功率消耗為80.4mW,而在二元場中能以158MHz的時脈在272μs內完成160位元的點的純量乘法,核心的功耗為79.6mW。我們的第二顆晶片相對於第一顆晶片有明顯的改善,以速度來說,它比第一顆晶片在質數場快了1.58倍,在二元場快了1.37倍。另外,和其他的ECC硬體架構相比,第二顆晶片最多可以分別在質數場和二元場比他們快了8.05倍和3.09倍。不管是與其他不同的橢圓曲線密碼處理器在資料輸出率、硬體面積、功率以及能量消耗上的比較,都能證明我們的高效能處理器晶片能夠提供使用者一個高功率與能量效率的實際設計,並且具備能支援雙場橢圓曲線密碼運算的彈性。
再者,我們使用了多個密碼處理器核心的橢圓曲線密碼處理器硬體架構以及我們提出的純量分割技術,實作了一個包含四個密碼處理器核心的超高效能橢圓曲線密碼系統處理器,其中每個密碼處理器核心都包含了三個256乘16位元的算術元件。依據佈局前的模擬結果顯示,使用90奈米CMOS的製程技術,這個平行的橢圓曲線密碼系統處理器使用了約一百三十八萬個邏輯閘,可以達到每秒鐘執行二萬七千個256位元質數場的點的純量乘法的資料輸出率,即每個點的純量乘法只花費36.7μs,和其他的ECC設計相比,快了1.1到122倍左右。比較的結果顯示,我們的密碼系統處理器在效能與對硬體面積的有效利用率上,都比其他設計要改善很多。因此,我們所提出的設計方法可以被證實能用來探索適用於廣泛實際應用的最佳化高效能橢圓曲線密碼系統處理器。
We present a design framework for the high-performance Elliptic Curve Cryptographic
(ECC) processors and the systematic design methodology for the cost-effectiveness design
exploration. First, a parallel and scalable ECC architecture utilizing one to four Arithmetic
Units (AUs) is proposed for the ECC arithmetic over both prime field GF(p) and binary
field GF(2m). The dual-field ECC cipher core supports comprehensive cryptographic functions
to fulfill realistic security applications, such as the Elliptic Curve Digital Signature
Algorithm (ECDSA) and data encryption/decryption schemes, with arbitrary elliptic curves
and arbitrary finite fields of different field sizes. Second, with the scalable architecture,
we propose an efficient two-phase, i.e., coarse-grained and fine-grained, operation scheduling
methodology. Given various timing and resource constraints, our two-phase operation
scheduling optimizes the parallel architecture rapidly and systematically. Third, with the
optimized ECC cores as design templates, a novel ECC architecture with multiple cipher
cores is proposed. Therefore, a large point scalar multiplication can be replaced by several
smaller ones which can be executed simultaneously to speed up the operation time significantly.
Finally, a scalar splitting technique is proposed for the multi-core ECC architecture.
With the proposed scalar splitting technique, ECC processors with homogeneous and heterogeneous
configurations can be generated and analyzed automatically. With the entire
design framework, different levels of parallelism among design hierarchies is explored. The
optimization to a variety of applications with different area/throughput requirements can
be achieved rapidly and efficiently. Therefore, design of high-performance and cost-effective
cryptographic processors becomes systematic.
Using 130nm CMOS technology, we have implemented two 160-bit dual-field ECC processor
chips by adopting the proposed two-phase operation scheduling. The test chips addressedrealistic chip implementation, measurement, and characterization. Each of them contains
four dual-field 32×32-bit AUs in parallel to speed up the ECC arithmetic. The first one
supports comprehensive cryptographic functions, including the point coordinate conversion,
point double, point addition, point scalar multiplication, Montgomery pre-/post-processing,
modular exponentiation, common finite field arithmetic functions, and RSA basic operations.
Prime field with arbitrary prime and binary field with arbitrary irreducible polynomial are
supported as well as arbitrary elliptic curve. In addition, the second fabricated chip integrated
the advanced field inversion method and scheduler-controlled datapath to provide
high-throughput and energy-adaptive security computing with power-performance trade-off.
It measures 4.97mm2 with the core area of only 1.35mm2, and is capable of parallel and
serial operation modes with unified architecture for both prime field and binary field cryptosystems.
The measurement results show that a 160-bit point scalar multiplication with
coordinate conversion can be done in 385μs at 141MHz with core power of 80.4mW over
GF(p) and in 272μs at 158MHz with 79.6mW over GF(2m) in the parallel mode. It is
a significant improvement over the first chip, with the speedup of 1.58 times over GF(p)
and 1.37 times over GF(2m) in terms of operation time. In addition, the second chip is
at most 8.05 and 3.09 times faster than other ECC architecture over GF(p) and GF(2m),
respectively. The comparison of throughput, area, power and energy consumption among
different ECC designs justifies that our high-throughput processor chips provide power- and
energy-efficient implementation with the flexibility of dual-field ECC.
Furthermore, an ultra high-performance ECC processor with four ECC cipher cores is
proposed by applying the multi-core ECC architecture with the proposed scalar splitting
method. Each cipher core consists of three 256×16-bit AUs. According to the pre-layout
simulation, the parallel ECC processor with 1383K gates achieves the throughput of more
than 27K point scalar multiplications per second (i.e., 36.70μs per operation) for 256-bit ECC
over GF(p) by using 90nm CMOS technology, which is 1.1 to 122 times faster as compared
with other ECC designs. The comparison shows that our processor outperforms others
significantly both in terms of throughput and area efficiency. The proposed methodology can
therefore be justified to explore optimized high-performance ECC processors for widespread
realistic applications.
[1] J.-Y. Lai and C.-T. Huang, “Elixir: High-throughput cost-effective dual-field processors
and the design framework for elliptic curve cryptography,” IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 16, no. 11, pp. 1567-1580, Nov. 2008.
[2] J.-Y. Lai and C.-T. Huang, “A highly efficient cipher processor for dual-field elliptic
curve cryptography,” IEEE Trans. Circuits Syst. II, Expr. Briefs, vol. 56, no. 5, pp.
394-398, May 2009.
[3] J.-Y. Lai and C.-T. Huang, “Energy-adaptive dual-field processor for high-performance
elliptic curve cryptographic applications,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., 2010 (to appear).
[4] J.-Y. Lai, S.-H. Chen, and C.-T. Huang, “Methodology of design space exploration
for high-performance elliptic curve cryptographic processors,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., 2010 (submitted).
[5] RSA Laboratories, RSA Cryptography Standard, PKCS #1 v2.1, Jun. 2002.
[6] V. S. Miller, “Use of elliptic curve in cryptography,” in Proc. Adv. Cryptology (Crypto),
pp. 417-426, 1986.
[7] N. Koblitz, “Elliptic curve cryptosystems,” in Proc. Math. Computation, pp. 203-209,
1987.
[8] IEEE, IEEE 1363 Standard Specifications for Public-Key Cryptography, Piscataway, NJ:
IEEE Standards Dept., Jan. 2000.
[9] ANSI, ANSI X9.62-1998: Public Key Cryptography for The Financial Services Industry:
The Elliptic Curve Digital Signature Algorithm (ECDSA), Washington, DC, Sep. 1998.
[10] E. Barker, W. Barker, W. Burr, W. Polk, and M. Smid, Recommendation for Key
Management - Part 1: General, National Institute of Standards and Technology (NIST),
Gaithersburg, Mar. 2007.
[11] W. T. Polk, D. F. Dodson, and W. E. Burr, Cryptographic Algorithms and Key Sizes for
Personal Identity Verification, National Institute of Standards and Technology (NIST),
Gaithersburg, Aug. 2007.
[12] J. L´opez and R. Dahab, “Improved algorithms for elliptic curve arithmetic in GF(2m),”
in Proc. Sel. Areas Cryptography: 5th Annu. Int. Workshop (SAC), vol. 1556 of LNCS,
Springer-Verlag, pp. 201-212, Aug. 1998.
[13] J. L´opez and R. Dahab, “Fast multiplication on elliptic curves over GF(2m) without
precomputation,” in Cryptographic Hardware and Embedded Systems (CHES) 1999, vol.
1717 of LNCS, Springer-Verlag, pp. 316-327, Aug. 1999.
[14] G. Orlando and C. Paar, “A scalable GF(p) elliptic curve processor architecture for
programmable hardware,” in Cryptographic Hardware and Embedded Systems (CHES)
2001, vol. 2162 of LNCS, Springer-Verlag, pp. 348-363, May 2001.
[15] D. V. Bailey and C. Paar, “Efficient arithmetic in finite field extensions with application
in elliptic curve cryptography,” Journal of Cryptology, vol. 14, no. 3, pp. 153-176, Dec.
2001.
[16] M. Ernst, M. Jung, F. Madlener, S. Huss, and R. Bl¨umel, “A reconfigurable system on
chip implementation for elliptic curve cryptography over GF(2n), ” in Cryptographic Hardware and Embedded Systems (CHES) 2002, vol. 2523 of LNCS, Springer-Verlag,
pp. 175-192, Aug. 2002.
[17] N. Gura, S. C. Shantz, H. Eberle, S. Gupta, V. Gupta, D. Finchelstein, E. Goupy, and
D. Stebila, “An end-to-end systems approach to elliptic curve cryptography,” in Cryptographic
Hardware and Embedded Systems (CHES) 2002, vol. 2523 of LNCS, Springer-
Verlag, pp. 349-365, Aug. 2002.
[18] A. Satoh and K. Takano, “A scalable dual-field elliptic curve cryptographic processor,”
IEEE Trans. Computers, vol. 52, no. 4, pp.449-460, Apr. 2003.
[19] M.-C. Sun, C.-P. Su, C.-T. Huang, and C.-W. Wu, “Design of a Scalable RSA and
ECC Crypto-Processor,” in Proc. Asia and South Pacific Design Automation Conf.
(ASP-DAC), Kitakyushu, pp. 495-498, Jan. 2003 (Best Paper Award).
[20] J. Lutz and A. Hasan, “High performance FPGA based elliptic curve cryptographic coprocessor,”
in Proc. IEEE Int. Conf. Information Technology: Coding and Computing
(ITCC), Los Vegas, vol. 2, pp. 486-492, Apr. 2004.
[21] E. ¨ Ozt¨urk, B. Sunar, and E. Sava¸s, “Low-power elliptic curve cryptography using scaled
modular arithmetic,” in Cryptographic Hardware and Embedded Systems (CHES) 2004,
vol. 3156 of LNCS, Springer-Verlag, pp. 92-106, Aug. 2004.
[22] S. M. H. Rodr´ıguez and F. Rodr´ıguez-Henr´ıquez, “An FPGA arithmetic logic unit for
computing scalar multiplication using the half-and-add method,” presented at the IEEE
Int. Conf. Reconfig. Comput. FPGAs (ReConFig), Puebia, Sep. 2005.
[23] R. C. C. Cheung, N. J.-B. Telle, W. Luk, and P. Y. K. Cheung, “Customizable elliptic
curve cryptosystems,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no.
9, pp. 1048-1059, Sep. 2005.
[24] G. M. de Dormale, R. Ambroise, D. Bol, J.-J. Quisquater, and J.-D. Legat, “Low-cost
elliptic curve digital signature coprocessor for smart cards,” in Proc. IEEE Int. Conf.
Appl.-Specific Syst., Arch. Process. (ASAP), Greece, pp. 347-353, Sep. 2006.
[25] P. K. Mishra, “Pipelined computation of scalar multiplication in elliptic curve cryptosystems,”
IEEE Trans. Computers, vol. 55, no. 8, pp. 1000-1010, Aug. 2006.
[26] C. J. McIvor, M. McLoone, and J. V. McCanny, “Hardware elliptic curve cryptographic
processor over GF(p),” IEEE Trans. Circuits Syst. I, Fundam. Theory Appl., vol. 53,
no. 9, pp. 1946-1957, Sep. 2006.
[27] K. Sakiyama, E. De Mulder, B. Preneel, and I. Verbauwhede, “A parallel processing
hardware architecture for elliptic curve cryptosystems,” in Proc. IEEE Int. Conf.
Acoustics, Speech and Signal Processing (ICASSP), Toulouse, vol. 3, pp. 904-907, May
2006.
[28] K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede, “Multicore curve-based cryptoprocessor
with reconfigurable modular arithmetic logic units over GF(2n),” IEEE
Trans. Computers, vol. 56, no. 9, pp. 1269-1282, Sep. 2007.
[29] G. Chen, G. Bai, and H. Chen, “A high-performance elliptic curve cryptographic processor
for general curves over GF(p) based on a systolic arithmetic unit,” IEEE Trans.
Circuits Syst. II, Expr Briefs, vol. 54, no. 5, pp. 412-416, May 2007.
[30] G. Chen, G. Bai, and H. Chen, “A dual-field elliptic curve cryptographic processor
based on a systolic arithmetic unit,” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS),
Seattle, pp. 3298-3301, May 2008.
[31] Y. K. Lee, K. Sakiyama, L. Batina, and I. Verbauwhede, “Elliptic-curve-based security
processor for RFID,” IEEE Trans. Computers, vol. 57, no. 11, pp. 1514-1527, Nov. 2008.
[32] B. Ansari and M. A. Hasan, “High-performance architecture of elliptic curve scalar
multiplication,” IEEE Trans. Computers, vol. 57, no. 11, pp. 1143-1153, Nov. 2008.
[33] K. J¨arvinen and J. Skytt¨a, “On parallelization of high-speed processors for elliptic curve
cryptography,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 16, no. 9, pp.
1162-1175, Sep. 2008.
[34] T. G¨uneysu and C. Paar, “Ultra high performance ECC over NIST primes on commercial
FPGAs,” in Cryptographic Hardware and Embedded Systems (CHES) 2008, vol.
5154 of LNCS, Springer-Verlag, pp. 62-78, Aug. 2008.
[35] M. Hamilton and W. P. Marnane, “FPGA implementation of an elliptic curve processor
using the GLV method,” in Proc. IEEE Int. Conf. Reconfig. Comput. FPGAs
(ReConFig), Cancun, pp. 249-254, Dec. 2009.
[36] NIST, Recommended Elliptic Curves for Federal Government Use, National Institute of
Standards and Technology (NIST), Gaithersburg, July 1999.
[37] R. P. Gallant, R. J. Lambert, and S. A. Vanstone, “Faster point multiplication on elliptic
curves with efficient endomorphisms,” in Proc. Advances in Cryptology - Crypto’01, pp.
190-200, Aug. 2001.
[38] CIC, CIC Referenced Flow for Cell-based IC Design, Chip Implementation Center
(CIC), Taiwan, May 2008.
[39] I. F. Blake, G. Seroussi, and N. P. Smart, Eds., Elliptic Curves in Cryptography, Number
265 in London Mathematical Society Lecture Note Series, Cambridge University, United
Kingdom, first edition, 1999.
[40] NIST, Advanced Encryption Standard (AES), National Technical Information Service,
Springfield, VA 22161, Nov. 2001.
[41] W. Diffie and M. Hellman, “New directions in cryptography,” IEEE Trans. Information
Theory, vol. 22, no. 11, pp. 644-654, Nov. 1976.
[42] ANSI, ANSI X9.30-1997, Part 2: Public Key Cryptography using Irreversible Algorithms
for the Financial Services Industry: The Secure Hash Algorithm 1 (SHA-1) (Revised),
American National Standards Institute (ANSI), Washington, DC, 1997.
[43] P. L. Montgomery, “Modular multiplication without trial division,” Math. Computation,
vol. 44, no. 7, pp. 519-521, 1985.
[44] C¸ . K. Ko¸c and B. S. Kaliski, Jr., “Analyzing and comparing Montgomery multiplication
algorithms,” IEEE Micro, vol. 16, no. 3, pp. 26-33, June 1996.
[45] H. Orup, “Simplifying quotient determination in high-radix modular multiplication,” in
Proc. 12th Symp. Computer Arithmetic, Bath, England, pp. 193-199, July 1995.
[46] T. Blum and C. Paar, “High-radix Montgomery modular exponentiation on reconfigurable
hardware,” IEEE Trans. Computers, vol. 50, no. 7, pp. 759-764, July 2001.
[47] B. S. Kaliski, Jr., “The Montgomery inverse and its applications,” IEEE Trans. Computers,
vol. 44, no. 8, pp. 1064-1065, Aug. 1995.
[48] E. Sava¸s, M. Naseer, A. A-A. Gutub, and C¸ . K. Ko¸c, “Efficient unified Montgomery
inversion with multibit shifting,” IEE Proc. - Comput. Digit. Techn., vol. 152, no. 4,
pp. 489-498, July 2005.
[49] J.-H. Lee, Y.-C. Hsu, and Y.-L. Lin, “LIP: A data-path scheduler using linear integer
programming,” in Proc. IEEE Int. Symp. VLSI Technol., Syst. Appl., pp. 247-251, May
1989.
[50] B. Parhami, “Analysis of tabular methods for modular reduction,” in Proc. IEEE 28th
Asilomar Conf. Signals, Syst., Comput., vol. 1, pp. 526-530, Nov. 1994.
[51] F. J. Kurdahi and A. C. Parker, “REAL: A program for register allocation,” in Proc.
IEEE/ACM Des. Autom. Conf. (DAC), pp. 210-215, June 1987.
[52] SECG, SEC 2: Recommended Elliptic Curve Domain Parameters, Standards for Efficient
Cryptography Group, Sep. 2000.
[53] W. Sun and L. Chen, “Design of scalable hardware architecture for dual-field Montgomery
modular inverse computation,” in 2009 Pacific-Asia Conf. Circuits, Communications
and System, Chengdu, China, pp. 409-411, May 2009.
[54] A. A.-A. Gutub and A. G. Tenca, “Efficient scalable VLSI architecture for Montgomery
inversion in GF(p),” Integration, the VLSI Journal, vol. 37, no. 2, pp. 103-120, May
2004.
[55] Z.-B. Dai, F. Qin, and X.-H. Yang, “Scalable hardware architecture for Montgomery
inversion computation in dual-field,” in 2009 WASE Int. Conf. Information Engineering,
Taiyuan, China, pp. 206-209, July 2009.
[56] T. Itoh and S. Tsujii, “A fast algorithm for computing multiplicative inverses in GF(2m)
using normal bases,” Information and Computation, vol. 78, no. 3, pp. 171-177, Sep.
1988.