EDUARDO NUNES DE SOUZA

Single Event Upset mitigation for FPGA-based Low-Density Parity-Check decoder

Monograph presented in partial fulfillment of the requirements for the degree of Bachelor in Computer Engineering.

Advisor: Prof. Dr. Gabriel Luca Nazar

Porto Alegre
2018
UNIVERSIDADE FEDERAL DO RIO GRANDE DO SUL
Reitor: Prof. Rui Vicente Oppermann
Vice-Reitor: Profa. Jane Fraga Tutikian
Pró-Reitor de Graduação: Prof. Vladimir Pinheiro do Nascimento
Diretor do Instituto de Informática: Profa. Carla Maria Dal Sasso Freitas
Coordenador do Curso de Engenharia de Computação: Prof. Renato Ventura Bayan Henriques
Bibliotecária-Chefe do Instituto de Informática: Beatriz Regina Bastos Haro
“The development of technology will leave only one problem: the infirmity of human nature.”

Karl Kraus
ACKNOWLEDGEMENTS

I would like to express my gratitude to Professor Gabriel, my advisor, for his patient guidance and assistance.

I would also like to extend my thanks to Marcos, for his help with the fault injection platform, and Geferson, for his assistance with the decoder.

Finally, I wish to thank my family and friends for their support and encouragement throughout my study.
ABSTRACT

With the increasing of data rates and physical limitation defined by channel capacity, communication systems have to be designed with high efficiency and reliability. LDPC codes have emerged over the last decades and became a key component of many commercialized systems as a benefit of their excellent performance and suitability to parallel hardware implementation. Under that scenario, FPGA-based decoders have been exploited since these devices offer rapid prototyping and high levels of parallelism. FPGAs, as any semiconductor device, have become sensitive to radiation due to the continual evolution of fabrication technology, such as device shrinkage, power supply reduction and increasing operating speeds. FPGAs’ cells are especially susceptible to single event upsets (SEUs) and fault tolerance techniques must be applied in order to mitigate their effects. In this work, it is presented a study about the effects of SEUs in an FPGA-based LDPC decoder and it is proposed a selective technique to improve reliability in this specific application.

Keywords: Fault Tolerance. Low-density Parity-Check. Forward Error Correction. Field-Programmable Gate Array. Single event upset.
Mitigação de single event upsets em um decodificador LDPC implementado em FPGA

RESUMO

Com o aumento das taxas de dados e limitações físicas definidas pela capacidade do canal, os sistemas de comunicação devem ser projetados com alta eficiência e confiabilidade. Os códigos LDPC emergiram nas últimas décadas e se tornaram um componente-chave de vários sistemas comerciais, como resultado de seu excelente desempenho e possibilidade de paralelismo. Nesse contexto, implementações em FPGAs vêm sendo exploradas, uma vez que esses dispositivos oferecem prototipagem rápida e altos níveis de paralelismo. Os FPGAs, como qualquer dispositivo semicondutor, tornam-se sensíveis à radiação devido à evolução contínua da tecnologia de fabricação, como encolhimento do dispositivo, redução da voltagem de alimentação e aumento das velocidades de operação. As células dos FPGAs são especialmente suscetíveis a single event upsets (SEUs) e técnicas de tolerância a falhas devem ser aplicadas para atenuar seus efeitos. Neste trabalho, é apresentado um estudo sobre os efeitos de SEUs em um decodificador LDPC implementado em FPGA e uma técnica seletiva para aumentar a confiabilidade nesta aplicação específica é proposta.

LIST OF FIGURES

Figure 1.1 – Factor graph representations for matrices H1 and H2 ........................................13
Figure 1.2 – Matrix H for QC-LDPC implemented in IEEE 802.11 with code rate = 1/2 ......14
Figure 1.3 – Misbehavior of a circuit caused by radiation .........................................................15
Figure 2.1 – Collision of an ionized particle and the resulting current pulse ......................17
Figure 2.2 – Particle strike in a SRAM cell ..........................................................................18
Figure 2.3 – FPGA structure ..................................................................................................19
Figure 2.4 – TMR and voter circuit .......................................................................................20
Figure 2.5 – Communication system scheme ........................................................................21
Figure 2.6 – BER performance for IEEE 802.16 LDPC using BPSK modulation ..........21
Figure 2.7 – Factor graph representing messages exchanges in BP algorithm ................23
Figure 3.1 – Modified Layered Decoding architecture ..........................................................27
Figure 3.2 – Architecture of a Check Node .........................................................................28
Figure 4.1 – Fault injection platform structure proposed by Leipnitz (2016) ..............30
Figure 4.2 – Readback data signals in Kintex-7 devices .....................................................31
Figure 4.3 – Frame addressing scheme of Kintex-7 devices ................................................31
Figure 5.1 – Normal distribution for BER per physical CN over failure .........................34
Figure 5.2 – Normal distribution for BER per code-level CN over failure .......................35
Figure 6.1 – Cost benefit scenarios of applying DMR in the modules of the CN ..........39
LIST OF TABLES

Table 4.1 – Number of frames per block type.................................................................32
Table 5.1 – Physical CNs x code-level CNs mapping....................................................33
Table 5.2 - BER per physical CN over failure ...............................................................34
Table 6.1 – Number of sensitive bits, LUTs and FFs per module of the CN..................36
Table 6.2 – Average of errors caused by sensitive bits per module of the CN ..............37
Table 6.3 – BER performance for each module under a faulty-state ............................37
Table 6.4 – WPD, AO and Gain for each module of the CN .........................................38
<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Full Form</th>
</tr>
</thead>
<tbody>
<tr>
<td>ASIC</td>
<td>Application Specific Integrated Circuit</td>
</tr>
<tr>
<td>AWGN</td>
<td>Additive White Gaussian Noise</td>
</tr>
<tr>
<td>BER</td>
<td>Bit Error Rate</td>
</tr>
<tr>
<td>BP</td>
<td>Belief Propagation</td>
</tr>
<tr>
<td>BPSK</td>
<td>Binary Phase-Shift Keying</td>
</tr>
<tr>
<td>BUBs</td>
<td>Bit Update Blocks</td>
</tr>
<tr>
<td>CLB</td>
<td>Configurable Logic Block</td>
</tr>
<tr>
<td>CNs</td>
<td>Check Nodes</td>
</tr>
<tr>
<td>DMR</td>
<td>Double Modular Redundancy</td>
</tr>
<tr>
<td>DSP</td>
<td>Digital Signal Processor</td>
</tr>
<tr>
<td>FEC</td>
<td>Forward Error Correction</td>
</tr>
<tr>
<td>FF</td>
<td>Flip-Flop</td>
</tr>
<tr>
<td>FPGAs</td>
<td>Field Programmable Gate Arrays</td>
</tr>
<tr>
<td>HDL</td>
<td>Hardware Description Language</td>
</tr>
<tr>
<td>IC</td>
<td>Integrated Circuit</td>
</tr>
<tr>
<td>ICAP</td>
<td>Internal Configuration Access Port</td>
</tr>
<tr>
<td>IDS</td>
<td>Informed Dynamic Scheduling</td>
</tr>
<tr>
<td>IOB</td>
<td>Input/output Block</td>
</tr>
<tr>
<td>LBP</td>
<td>Layered Belief Propagation</td>
</tr>
<tr>
<td>LDPC</td>
<td>Low-Density Parity-Check</td>
</tr>
<tr>
<td>LLR</td>
<td>Logarithmic-Likelihood Ratio</td>
</tr>
<tr>
<td>LUT</td>
<td>Lookup Table</td>
</tr>
<tr>
<td>MOS</td>
<td>Metal Oxide Semiconductor</td>
</tr>
<tr>
<td>PCUBs</td>
<td>Parity Check Update Blocks</td>
</tr>
<tr>
<td>QC</td>
<td>Quasi-cyclic</td>
</tr>
<tr>
<td>RAM</td>
<td>Random Access Memory</td>
</tr>
<tr>
<td>SEE</td>
<td>Single Event Effect</td>
</tr>
<tr>
<td>SEFI</td>
<td>Single Event Functional Interrupts</td>
</tr>
<tr>
<td>SEL</td>
<td>Single Event Latch-up</td>
</tr>
<tr>
<td>SET</td>
<td>Single Event Transient</td>
</tr>
<tr>
<td>SEU</td>
<td>Single Event Upsets</td>
</tr>
<tr>
<td>SRAM</td>
<td>Static Random Access Memory</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>----------------------------------</td>
</tr>
<tr>
<td>STMR</td>
<td>Selective Triple Modular Redundancy</td>
</tr>
<tr>
<td>TID</td>
<td>Total Ionizing Dose</td>
</tr>
<tr>
<td>TMR</td>
<td>Triple Modular Redundancy</td>
</tr>
<tr>
<td>VNs</td>
<td>Verification Nodes</td>
</tr>
</tbody>
</table>
1 INTRODUCTION

1.1 Motivation

The main purpose of a digital communication system is to transmit data from one extremity of the system to the other efficiently and with an acceptable level of quality and reliability. The quality of the signal is commonly expressed in terms of Bit Error Rate (BER), i.e., the probability of bit errors measured in the receiver side. The power of the transmitted signal and the channel’s bandwidth are the most fundamental parameters of a system and, along with the noise spectral density, they determine the energy per bit to noise power spectral density ratio \((E_b/N_0)\). Practical restrictions frequently limit the value of \(E_b/N_0\) and the modulation scheme used may not capable of providing an acceptable BER. In such cases, the best approach to guarantee data integrity is to encode the message transmitted by applying error control, which may be performed with Forward Error Correction (FEC) codes (HAYKIN, 2004).

Low-Density Parity-Check (LDPC) codes represent a powerful class of FEC codes, i.e., they may be employed for correcting transmission errors in communication systems. They were conceived by Gallager (1962) in his doctoral dissertation, but at the time, they were impractical to implement due to the lack of the necessary hardware technology. Thirty-four years later, Mackay and Neal (1996) verified that the performance of LDPC codes are equivalent to that of Turbo codes and could approach Shannon’s theoretical limit. Since their rediscovery, they have been extensively used in several commercial standards like WiFi, WiMAX, DVB-S2, CCSDS and ITU G.hn (HAILES et al., 2015).

LDPC codes are defined by a parity-check matrix, in which the rows are the coefficients of the parity-check equations. A code is considered regular if the number of nonzero elements does not vary among each row or each column of the parity-check matrix. Irregular codes, on the other hand, do not respect this property. In general, irregular codes have a better performance than regular codes (CARRASCO, 2009). Matrices \(H_1\) and \(H_2\), as follows, are parity-check matrix for an irregular code and a regular code, respectively. The corresponding parity-check equations are listed aside.

In the meantime, between the creation and effective usage, LDPC codes were not heavily investigated. One exception is the work of Tanner (1981), in which he generalized LDPC codes and proposed a graphical representation for them. Each row represents a parity check equation and each column of the matrix represents a codeword bit. The so-called factor graphs introduce the concepts of Check Nodes (CN), each of them is attached with a row of the
The parity-check matrix, and the Variable Nodes (VN), which are related with the columns. A connection between a CN and a VN represents a bit ‘1’ in the parity-check matrix. Figure 1.1 illustrates the corresponding factor graph to parity-check matrices $H_1$ and $H_2$.

**Figure 1.1 – Factor graph representations for matrices $H_1$ (a) and $H_2$ (b)**

Source: the author

The parity-check matrix of an LDPC code is often sparse, i.e., it has few ‘1’s and many ‘0’s, which allows the decoding to be partly-parallelized. Quasi-cyclic (QC) LDPC code is a class of construction code whose structure facilitates low-complexity memory addressing and routing for the hardware implementation. It is based on a matrix, in which each element represents an equally-sized square submatrix ($Z \times Z$). If a particular element in the matrix has a value of ‘0’, then the corresponding submatrix is a null matrix. Otherwise, the submatrix is an identity matrix, which has been cyclically shifted a number of times according to the corresponding value (HAILES et al., 2015). Figure 1.2 shows the parity-check matrix for a QC-LDPC code implemented in IEEE 802.11ad standard with a code rate $= \frac{1}{2}$, i.e., half of the coded message is formed by redundant bits.

Besides the implementation of LDPC codes using low-complexity calculations, a high level of parallelism can be exploited, which makes them suitable to be implemented in Field
Programmable Gate Arrays (FPGAs). These devices are extremely versatile and can offer high degree of parallel processing. In addition, they offer rapid prototyping and are especially useful for measuring BER performance, due to the reduced time that the simulations take when compared, for instance, to a general-purpose processor (HAILES et al., 2015). Their usage has been increasing more and more, and they are not restricted to coding applications, FPGAs are present in other fields such as industrial, automobile and medical applications.

When these applications run over the presence of ionizing radiation, FPGAs (as any other semiconductor device) are susceptible to many effects that may either damage the device or compromise the operation of the circuit by changing its behavior. The immediate effects caused by radiation are called single event effects (SEE). The particular type of SEE that we are interested are the single event upsets (SEUs). SEUs are soft errors, because the device itself is not permanently damaged by radiation, but the radiation event causes enough charge disturbance to reverse or flip the data of a memory cell, register, latch or flip-flop. SRAM-based FPGAs are particularly susceptible to SEUs within the configuration memory and fault tolerance techniques may be applied in order to guarantee the proper operation of the circuit, even in the presence of faults. Figure 1.3 illustrates the misbehavior of a circuit caused by a particle strike.

1.2 Goals

Several works in literature propose fault tolerance mechanisms to mitigate SEUs in LDPC decoders, but none of them has the specific concern with FPGAs implementations. Li
et al. (2016), for example, propose to protect parts of the control logic with a Hamming decoder while May et al. (2008) propose to protect some subsets of the circuit with Triple Modular Redundancy (TMR). Both of them are concerned in ASIC implementations, which have a different model of failures. On the other hand, there are several publications aiming to protect FPGAs from SEUs, as in Foster et al. (2010), which propose a technique to define and protect critical subsets of the circuit. Samudrala et al. (2004) describes a technique to predict which cells of the circuit are more sensitive to SEUs by the probability of signal's inputs, while Pratt et al. (2006), in a different manner, has a concern to protect cells that may lead the circuit to a complete faulty state, even when these cells are repaired with configuration scrubbing. All of these publications are focused on general-purpose applications.

This work presents an overview on the effects of radiation in semiconductor devices and FPGAs, FEC codes, LDPC codes as well as a study with the purpose of providing protection from radiation on an FPGA-based LDPC decoder. The Check Nodes (CNs) are considered the most critical element of LDPC decoders, since they execute all arithmetic operations, occupy most of the total area, and have the greatest workload of the circuit. Initially, it was made a coarse-grained analysis in the architecture, evaluating the severity of each CN to the overall performance of the decoder. A fine-grained analysis in the internal structure of the CNs was then made in order to determine which subsets of the circuit are most relevant to the overall sensitivity. The fault injection simulations were executed in a fault injection platform adapted from Leipnitz (2015) to Kintex-7 devices. The architecture evaluated is presented in Hess (2016), who proposes an HDL implementation and a software-based model. The results and the
method used to perform fault injections will be shown as well as the LDPC decoder architecture used as reference.

1.3 Structure

This work is structured as follows: section 2 explains the basic concepts, such as radiation effects on semiconductor devices (2.1), radiation effects on FPGAs (2.2), Forward Error Correction (2.3), LDPC codes (2.4) and related works (2.5). Section 3 details the architecture of the decoder used as reference. Section 4 presents the fault injection method and platform used. Section 5 presents the coarse-grained analysis among CNs and section 6 presents the fine-grained approach, which exploits the internal structure of CNs. Finally, section 7 shows the conclusions of this work.
2 BACKGROUND

2.1 Radiation effects on semiconductor devices

Ionizing radiation can be defined as the transmission of energy through atomic and subatomic particles with very high kinetic energy. It is a natural phenomenon and it is generated from materials on earth, the sun and other cosmic sources (WIRTHLIN, 2015). In space, there is high flow of protons, neutrons, alpha particles and heavy ions that affect semiconductor devices. At ground level, on the other hand, neutrons are the most frequent cause of failures (BAUMANN, 2005). When a single heavy ion strikes the silicon, it loses its energy through the production of free electron-hole pairs, resulting in a dense ionized track in the local region (KASTENSMIDT et al., 2004). Figure 2.1 illustrates the current pulse disturbance caused by this event in a reverse-biased junction.

Figure 2.1 – Collision of an ionized particle and the resulting current pulse

Source: Baumann (2005)

There are several radiation effects in semiconductor devices that vary in magnitude from data disruptions to permanent damage ranging from parametric shifts to complete device failure. Single event effects (SEEs), as the name implies, are device failures induced by a single radiation event. They can cause long-term effects to devices, compromising the operation of the circuit, or simply change the data state in a memory cell, register, latch or flip-flop. SEEs that do not permanently damage the device are called soft errors (BAUMANN, 2005).

Long-term effects include changes in device’s electrical parameters, such as the threshold voltage, leakage current and the timing of the MOS transistors. High-energy particles can also displace atoms in the lattice of semiconductor materials, causing permanent damages. The maximum amount of radiation that a device can tolerate before failing its parameters is called total ionizing dose (TID) (BARNABY, 2006).
Single event upsets (SEUs), otherwise, are soft errors. They occur when an ionized particle causes enough charge disturbance to change the voltage level of critical nodes within a memory cell, causing the inversion of the original data stored (changing a logic “1” to a logic “0” or a logic “0” to a logic “1”). This effect occurs due to the feedback nature of these cells, as shown in Figure 2.2. If the radiation-induced voltage happens in a logic gate, that is, a glitch that propagates through combinational circuitry, it is called a single event transient (SET). Single event functional interrupts (SEFIs) are failures that change the internal state of important control registers within a device that control device-level functionality. SEFIs compromise the operation of the application, but they can be resolved by repowering the device and placing it in its initial state. As SEUs, SETs and SEFIs are soft errors (WIRTHLIN, 2015).

Figure 2.2 – Particle strike in a SRAM cell

Source: Monteanu & Autran (2008)

2.2 Radiation effects on FPGAs

With the request of consumer’s demand for device shrinkage, power supply reduction and increasing in the operating frequency, the circuits are more and more susceptible to the effects of radiation. Particularly, there is growing interest in using FPGAs in space and other extreme environments where high-energy radiation is more common than on earth (KASTENSMIDT et al., 2004).

FPGAs are semiconductor devices consisting of logic blocks, RAM blocks and I/O blocks. The most fundamental logic block of an FPGA is formed by a Lookup Table (LUT) and a Flip-Flop (FF). The I/O blocks surround the outer edge of the microchip, providing I/O access to the pins on the exterior of the FPGA package. Programmable routing is implemented so that it is possible to connect logic blocks and IOBs to logic blocks arbitrarily, as shown in Figure 2.3.
Xilinx devices quantify the logic resources in terms of “slices”, each of them contains several LUTs and FFs - the nature and quantity of the hardware resources available in each slice depends on the model and generation of the FPGA. The terminology given by Xilinx also introduces the concept of Configurable Logic Blocks (CLBs), which consists of multiple slices (BUELL et al., 2007).

![FPGA structure](image)

Figure 2.3 – FPGA structure

Source: Hailes et al. (2015)

To determine the effects of radiation on FPGAs, they will be classified in three categories, based on the technology used to store the configuration data: antifuse, flash and SRAM-based FPGAs. Antifuse FPGAs are nonvolatile and the configuration data cannot be changed once the fuses have been programmed. This type of FPGA is usually the most reliable, since the configuration cells are made from passive, programmed fuses, and they are generally immune to SEEs. Flash FPGAs are also nonvolatile but they may be reprogrammed only for a certain number of times, which may not be suitable for reconfigurable systems requiring frequent reconfiguration. Flash FPGAs have a major concern with SETs through the combinational logic data path and routing resources (STERPONE & DU, 2014).

SRAM-based FPGAs, on the other hand, are volatile, so they lose their configuration when power is removed. Although SRAM cells require more power than antifuse or flash cells, they can be reprogrammed an unlimited number of times. SRAM-based FPGAs have a primary reliability concern in SEUs within the configuration memory, since these cells are made using standard static memory techniques and comprise the majority of the memory cells on the device. The incidence of radiation in the configuration memory may lead to changes in the logic and
routing of the operating circuit, deviating from the function they were supposed to fulfill (WIRTHLIN, 2015).

TMR is the most classical and robust technique with the purpose of mitigating SEUs in FPGAs, where a module is replicated three times and the output is extracted from a majority voter, as shown in Figure 2.4 (SAMUDRALA et al., 2004). In this work, Double Modular Redundancy (DMR) is proposed to protect the circuit. It provides error detection by comparing two copies of the circuit. DMR has at least 100% area and power overheads compared to the unhardened design. For TMR, on the other hand, these overheads are at least 200%, making its use very expensive. For applications with stringent cost and power limitations, the application of TMR may not be desirable, making DMR and other less costly error detection mechanisms more attractive choices (NAZAR et al., 2013). The re-writing of the configuration memory is called scrubbing and it is the approach proposed to correct the errors detected by DMR in this work.

Figure 2.4 – TMR (a) and voter circuit (b)

Source: Samudrala et al. (2004)

2.3 Forward Error Correction

Figure 2.5 shows the scheme of a communication system. The message $m$ to be transmitted is composed of binary symbols. The encoder accepts bits of the message and adds redundancy according to a predefined rule, producing data encoded in a higher bit-rate. In order to generate an $(n, k)$ block code, the encoder receives blocks of $k$ bits. For each block, it adds $n-k$ redundant bits and produces a coded block of $n$ bits, where $n > k$. The decoder in the receiver side explores this redundancy to decide which bits of the message were indeed
transmitted (HAYKIN, 2004). The relation \( r = \frac{k}{n} \) is called code rate, where \( 0 < r < 1 \). Figure 2.6 shows BER performance for IEEE 802.16 with different code rates.

Figure 2.5 – Communication system scheme

Assuming a Binary Phase-Shift Keying (BPSK) modulation, the modulated symbol vector \( \mathbf{x} = \{ x_j \}_{j=1}^N \) may be represented through the energy per symbol, \( x_j = + \sqrt{E_s} \) when \( c_j = 0 \) and \( x_j = - \sqrt{E_s} \) when \( c_j = 1 \). Moreover, assuming the Additive White Gaussian Noise (AWGN) channel, \( \hat{x}_j = x_j + \mathcal{N}(0, N_0) \), where \( \mathcal{N} \) is the normal distribution and \( N_0 \) is the noise power spectral density. The relation between \( E_s \) and \( E_b \) is given by:

\[
E_b = \frac{E_s}{r} \quad (2.1)
\]

In order to convert received symbols into demodulated bits, Logarithmic-Likelihood Ratio (LLR) is often used. The sign (positive or negative) expresses the most likely value for
the corresponding bit (0 or 1) and the magnitude represents the certainty on the value of the bit. The value of the LLR of a demodulated bit is calculated as (2.2), where \( c_i \) is the transmitted bit and \( \hat{x} \) is the received symbol.

\[
\text{LLR}(\hat{c}_i) = \log \frac{P(c_i=0 | \hat{x}_i)}{P(c_i=1 | \hat{x}_i)}
\] (2.2)

Considering a BPSK modulation over an AWGN channel, LLR values can be calculated as follows:

\[
\text{LLR}(\hat{c}_i) = 4 \ast r \ast \frac{E_b}{N_0} \ast \hat{x}_i
\] (2.3)

In terms of \( E_s \):

\[
\text{LLR}(\hat{c}_i) = 4 \ast \frac{E_s}{N_0} \ast \hat{x}_i
\] (2.4)

### 2.4 LDPC Codes

#### 2.4.1 Encoding

LDPC codes can be described as a \( k \)-dimensional subspace \( C \) of the vector space of binary \( n \)-tuples over the binary field \( \mathbb{F}_2 \). We first describe a basis \( B = \{g_0, g_1, \ldots, g_{k-1}\} \) which spans \( C \). The process of encoding a message is given by 2.3. Each codeword \( c \in C \) can be written as \( c = u_0g_0+u_1g_1+\ldots+u_{k-1}g_{k-1} \), or simply

\[
c = uG
\] (2.5)

where \( u = [u_0, u_1, \ldots, u_{k-1}] \) is the message to be transmitted and \( G \) is the \( k \times n \) generator matrix whose rows are the vectors \( \{g_i\} \).

The \((n-k)\)-dimensional null space \( C^\perp \) of \( G \) comprises all the vectors \( x \) for which \( xG^\top = 0 \) and is spanned by the basis \( B^\perp = \{h_0, h_1, \ldots, h_{n-k-1}\} \). For each \( c \in C \), \( cH^\top = 0 \), or simply

\[
cH^\top = 0
\] (2.6)

where \( H \) is the \((n-k) \times n\) parity-check matrix whose rows are the vector \( \{h_i\} \) and is the generator matrix for the null space \( C^\perp \) (RYAN, 2003).
We can obtain $G$ from $H$ by some simple steps. It is necessary first to transform $H$ with Gauss-Jordan elimination in order to get $H$ in the form:

$$H = [A, \ I_{n-k}]$$ (2.7)

where $I$ is the identity matrix of dimension $n-k$ and $A$ is a matrix of size $(n - k) \times k$. From that, $G$ can be easily found:

$$G = [I_k, \ A^T]$$ (2.8)

2.4.2 Decoding

LDPC decoding is usually done with belief propagation (BP) algorithm. In this approach, LLR values are passed in both directions along the edges between connected nodes of the factor graph describing the code. An important characteristic of the BP algorithm is that any message sent to a particular node does not depend on the message received from that node. In Figure 2.7, for example, CN $c_4$ is connected to VNs $v_2, v_4, v_5, v_6, v_7$ and $v_9$. The message $\tilde{r}_{4-9}$, however, will be calculated based on the values received from VNs $v_2, v_4, v_5, v_6$ and $v_7$.

Figure 2.7 – Factor graph representing messages exchanges in BP algorithm

LDPC decoder’s schedule, i.e., the order in which the nodes are activated, has a significant effect upon the error correction capability provided by the decoder. The three most common schedules schemes are Flooding, Layered Belief Propagation (LBP) and Informed Dynamic Scheduling (IDS).
Flooding is the simplest decoding schedule, where each iteration comprises the activation of all CNs simultaneously followed by the activation of all VNs. This approach offers a high degree of parallel processing; however, it demands high area to be implemented.

In LBP schedule, the nodes are processed sequentially within each iteration. Once a CN has been activated, all its connected VNs are activated before moving to the next CN. This schedule results in a lower throughput, higher latency and high complexity per iteration. However, LBP tends to converge to the correct result using fewer iterations, resulting in lower complexity overall when compared to flooding. In addition, a certain level of parallelism can be exploited when using QC codes with this policy. LBP will be more exploited later, since is the schedule used in the decoder used as reference in this work.

IDS verifies the messages passed between the nodes and activates the node that offers the greatest improvement in belief. This schedule requires additional calculations, increasing the complexity per iterations, but the complexity overall is decreased since it demands fewer iterations to achieve the correct output.

2.5 Related Works

Several works in literature propose techniques to mitigate SEUs in LDPC decoders. Li et al. (2016), for instance, proposes the design of a fault-tolerant LDPC decoder that corrects soft-errors caused by SEUs inside the control logic with a Hamming decoder. For RAM cells, a layered pipelined architecture is proposed as well as a scheme that detects soft errors by parity check, then the errors will be corrected by the inherent decoder’s iterative process. This approach could save 42% of cell area compared with TMR method and the reduction of 12% of memory bits compared with similar works.

The work presented in May et al. (2008) proposes a technique that assumes that not all data bits of a message or channel value have the same importance and corruption in higher significant bits has a larger impact on the overall communications performance than corruption in lower bits. If an LLR value which is calculated by a functional node is corrupted, the value is reset to 0. In other words, the corresponding node/edge is temporarily removed for the current iteration. Thereby, no information is associated with the respective bit and the error tends to be minimized.

In order to protect FPGAs from the effects of SEUs, several publications have proposed alternatives to the traditional TMR method by protecting specifics subsets of the circuit, the ones considered most critical. Foster et al. (2010) presents several methodologies for selecting
these subsets, such as metrics that consider the number of logic cells that use the cell’s output signals as inputs, the number of logic resources necessary to add TMR to the logic cell and the number of logic cells in longest propagation path through the logic cell.

Another approach is presented in Samudrala et al. (2004), where a Selective Triple Modular Redundancy (STMR) technique is described. The logic cells are classified by the “sensitivity” to SEUs, which is measured by the signal probability of its inputs. It is assumed that the primary inputs of the circuit are specified by the user in terms of signal probabilities and then it is propagated to compute the signal probability of each internal node.

In a different way, Pratt et al. (2006) prioritizes the protection of structures causing “persistent” errors within the design. Configuration bits are categorized in “persistent” and “non-persistent”. A non-persistent configuration bit will cause a design fault when upset and may be repaired through configuration scrubbing, which will lead the design back to normal operation. Persistent bits, on the other hand, will also cause a design fault when upset, but after repairing persistent bits through configuration scrubbing, the FPGA circuit does not return to normal operation.

Unlike the other works focused on LDPC decoders, this work specifically targets LDPC in FPGAs and the associated failure model, which is different from ASICs. In addition, unlike the other works in partial redundancy for FPGAs, this work is specific to an application, that is, the impact of the failures in metrics relevant to the application in question will be taken into account.
3 LDPC DECODER ARCHITECTURE

LBP is often used in LDPC decoders since it can achieve effectively high decoding throughput with low computation complexity (CUI et al., 2012). In this policy, the parity check matrix is viewed as horizontal layers, each layer represents a component code and the message updating is performed layer by layer. The model presented in Hocevar (2004) consists in a memory used to store the bit values of the message to be decoded, Parity Check Update Blocks (PCUBs) and Bit Update Blocks (BUBs), which implement the message exchange calculations, as well as a router and a reverse router to arrange data as needed by the algorithm and architecture.

Hess (2016) implemented an HDL description and a bit-accurate software simulator of a modified version of the architecture presented in Hocevar (2004), as shown in Figure 3.1. Bit Memory is used to store the LLR values of the message bits, the Controller implements a finite state machine and is responsible for controlling the data flow among the other components of the circuit, and the Check Nodes perform the message exchange calculations. It was used a single router, called Permuter, since the output of the CNs will be routed only in the processing of the last row of the matrix. The Permuter rotates the values from the Bit Memory by the value obtained from GetRotX, in order to deliver the correct inputs to CNs.

The algorithm used to decode the messages is a modified version of Min-Sum algorithm (KARKOOTI et al., 2008). It calculates two messages: one that considers the smallest value among all VNs connected, and another that considers the second smallest value. The first message is sent to all VNs connected and the second message is sent to the VN connected that owns the smallest value. A correction factor (β > 0) is applied to the calculation of the messages aiming to reduce the information loss due to the simplified function that is used. Besides the advantages brought by the original version of the algorithm, as the reduced complexity to calculate the messages and the small area necessary, the modified version offers a greater energy efficiency. The Modified Min-Sum algorithm, along with the Layered Decoding policy, is described in Algorithm 1.

The internal structure of the CN is represented in Figure 3.2. Initially, the subtractor (Sub) receives LLR values from bit $r_j$ and from the values stored in the internal memories. MagMem stores the magnitude of the messages with lower values, SignMem contains the signals and IdxMem contains the indexes of these messages.
Figure 3.1 – Modified Layered Decoding architecture

Source: Hess (2016)

Algorithm 1 – LD-MMS

\begin{verbatim}
Input: r, max_iterations
Output: \bar{s}

begin
    iterations = 0
    while iterations < max_iterations do
        for j = 1 : M do
            for i \in B_j do
                // Messages from VNs
                M_{i,j} = r_i - E_{j,i}
                // Messages from CNs
                E_{j,i} = \prod_{i' \in B_{j',j}} \text{sign}(M_{j,i'}) \cdot \max(\min_{i' \in B_{j',j'}}(\|M_{j,i'}\|) - \beta, 0)
                // Updated VNs values
                r_i = M_{i,j} + E_{j,i}
            end
        end
        iterations = iterations + 1
    end
    for i = 1 : N do
        L_i = \sum_{j \in B_i} E_{j,i} + r_i
        \bar{s}_i = \begin{cases} 1, & L_i \leq 0 \\ 0, & L_i > 0 \end{cases}
    end
end
\end{verbatim}

Source: Hess (2016)
The magnitude of the signal generated by \textit{Sub} is sent to a register (\textit{MagReg}) and to \textit{Find2Smaller}, which calculates the two smallest values among all. Both values are subtracted by the correction factor $\beta$ and are compared with 0 in \textit{Max}. The decision by which value to use is made by a signal from \textit{Find2Smaller}. \textit{AllSign} stores the sum of all signs and, by applying XOR with the sign from previous iteration (stored in \textit{SignReg}), we have the correct sign. Finally, \textit{Sum} calculates the output of the CN based on the values from previous iteration and the value given by \textit{Find2Smaller}.

**Figure 3.2 – Architecture of a Check Node**
4 FAULT INJECTION METHOD

Fault injection methods can be seen as techniques for testing systems with respect to the effects of faults on their behavior. They are applicable when it is not possible to get statistical data from field operation or when preliminary studies of the behavior of the system in the presence of faults should be considered in development phase. Moreover, it can identify implementation errors in fault-tolerance mechanisms and provide feedback on those mechanisms’ efficiency (CLARK & PRADHAN, 1995; ARLAT et al., 1990).

This technique may be categorized in five main groups: hardware, software, simulation, emulation and hybrid-based fault injection. Hardware-based approach is accomplished when applied at physical level, by disturbing the IC itself with environments parameters (such as heavy ion radiation, electromagnetic interferences or power supply disturbances). Software-based fault injection consists of a software implementation reproducing the errors that would have been produced upon occurring faults in the hardware. In a different way, simulation-based approach consists of injecting faults in high-level models (usually HDL models) and emulation-based fault injection, which is applied in this work, takes advantage of FPGAs for effective circuit emulation and speeding-up fault simulation. Hybrid-based approaches mix software-implemented fault injection and hardware monitoring (ZIADE et al., 2004).

Leipnitz (2016) proposes a fault injection platform, focused on communication systems, for a Virtex-5 device to emulate SEUs, whose structure is shown in Figure 4.1. The approach presented consists in an FPGA device connected to a host computer, used to define the fault injection campaign, control the injection experiments, display the results and communicate with the board though the PCIe interface. The hardware emulated by the FPGA contains a module called PCIe I/O Ctrl to perform communication with the host computer. System Ctrl is responsible for performing fault injection/removal in the configuration memory and CUT I/O Ctrl performs the execution of the circuit under test.

It was used a modified version of the platform presented by Leipnitz (2016) to be synthesized in Kintex-7 devices, since the original version was designed for Virtex-5 devices. The overall structure of the platform remains the same, but several adaptations had to be done due to differences in the communication bus, configuration memory addressing, interface with internal ports and circuit placement.

The PCIe interface used to perform communication between the FPGA and computer host is implemented with Xillybus system, developed by Eli Billauer (2014). It consists of an
IP core and a host driver and provides low latency, full-duplex communication and data rates between 200MB/s and 800MB/s. The Virtex-5 device used in the original version has a PCIe 1.0 x1 and can reach at most 250MB/s, which limits the communication rate. Kintex-7 device has a PCI-express 2.0 x4 that achieves 2GB/s, which makes the upper bound rate defined by the IP core. Despite changing the IP core module and the driver in the host computer, several parameters and internal buffers had to be altered due to the different number of lanes in the bus as well as the placement of PCIe ports.

To emulate the effects of SEUs in the FPGA's configuration memory, bit-flips in the memory content are performed. The access is made through the Internal Configuration Access Port (ICAP), a mechanism provided by Xilinx devices which allows accessing the configuration memory within the device and guarantees faster fault injection/removal times. The original version of the platform treats a signal called busy from ICAP to get readback data. Kintex-7 devices counts with ICAPE2, a newer version of the interface, in which this signal was discontinued and readback data is synchronous with other signals, as illustrated in Figure 4.2.

The configuration memory is divided into frames that can be accessed with an addressing scheme, as shown in Figure 4.3. The FPGA structure is divided in rows, which may be in the top half or in the bottom half of the lattice. Each row is numbered from 0, starting from the center. The rows are divided into columns, which may be either a CLB, DSP, block RAM or IOB block. The columns are numbered from 0, starting from the left. Each column contains a certain number of frames, depending on the block type, as shown in Table 4.1.
Figure 4.2 – Readback data is available deterministically three clock cycles after CSI_B is set to 0 in Kintex-7 devices

Source – Xilinx UG470 (2016)

System Ctrl had to be modified to match the addressing scheme of Kintex-7 devices, which contains 3232 bits per frame, against 1312 bits in Virtex-5 devices. The word size remains the same (32 bits), but the words per frame has increased from 41 to 101 from one FPGA to the other.

Figure 4.3 – Frame addressing scheme of Kintex-7 devices

Source: the author
Table 4.1 – Number of frames per block type

<table>
<thead>
<tr>
<th>Block type</th>
<th>Number of frames</th>
</tr>
</thead>
<tbody>
<tr>
<td>CLB</td>
<td>36</td>
</tr>
<tr>
<td>DSP</td>
<td>28</td>
</tr>
<tr>
<td>Block RAM</td>
<td>30</td>
</tr>
<tr>
<td>IOB</td>
<td>54</td>
</tr>
<tr>
<td>Clock</td>
<td>4</td>
</tr>
</tbody>
</table>

Source: Xilinx UG470 (2016)
5 COARSE-GRAINED REDUNDANCY

This section presents a coarse-grained analysis in the CNs in order to identify the ones that are most critical to the overall system. By identifying the CNs that cause greater impact on the decoder's BER performance, it is possible to protect them with a selective redundancy technique.

The simulations made in this work take advantage of QC code construction to parallelize the processing of multiple CNs and reduce the hardware area occupied. The codeword length \(n\) is 648, the code rate \(r\) is 0.5 and the parity-check matrix is formed by square submatrices of length 27 \((Z)\). Since each row is connected to a single column in the same submatrix, it is possible to process the rows of each submatrix in parallel. The HDL model implements 27 physical CNs, each one corresponding to a row in a submatrix (code-level CNs), which are activated simultaneously. Table 5.1 shows the mapping between physical and code-level CNs, (an iteration corresponds to the processing of a submatrix).

<table>
<thead>
<tr>
<th>Physical CN index</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
<th>10</th>
<th>11</th>
<th>12</th>
<th>13</th>
<th>Iteration index</th>
</tr>
</thead>
<tbody>
<tr>
<td>Code CN index</td>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
<td>10</td>
<td>11</td>
<td>12</td>
<td>13</td>
<td>0</td>
</tr>
<tr>
<td>27</td>
<td>28</td>
<td>29</td>
<td>30</td>
<td>31</td>
<td>32</td>
<td>33</td>
<td>34</td>
<td>35</td>
<td>36</td>
<td>37</td>
<td>38</td>
<td>39</td>
<td>40</td>
<td>1</td>
<td></td>
</tr>
<tr>
<td>54</td>
<td>55</td>
<td>56</td>
<td>57</td>
<td>58</td>
<td>59</td>
<td>60</td>
<td>61</td>
<td>62</td>
<td>63</td>
<td>64</td>
<td>65</td>
<td>66</td>
<td>67</td>
<td>2</td>
<td></td>
</tr>
<tr>
<td></td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td></td>
<td></td>
</tr>
<tr>
<td>297</td>
<td>298</td>
<td>299</td>
<td>300</td>
<td>301</td>
<td>302</td>
<td>303</td>
<td>304</td>
<td>305</td>
<td>306</td>
<td>307</td>
<td>308</td>
<td>309</td>
<td>310</td>
<td>311</td>
<td></td>
</tr>
<tr>
<td>Physical CN index</td>
<td>14</td>
<td>15</td>
<td>16</td>
<td>17</td>
<td>18</td>
<td>19</td>
<td>20</td>
<td>21</td>
<td>22</td>
<td>23</td>
<td>24</td>
<td>25</td>
<td>26</td>
<td>Iteration index</td>
<td></td>
</tr>
<tr>
<td>Code CN index</td>
<td>14</td>
<td>15</td>
<td>16</td>
<td>17</td>
<td>18</td>
<td>19</td>
<td>20</td>
<td>21</td>
<td>22</td>
<td>23</td>
<td>24</td>
<td>25</td>
<td>26</td>
<td>0</td>
<td></td>
</tr>
<tr>
<td>41</td>
<td>42</td>
<td>43</td>
<td>44</td>
<td>45</td>
<td>46</td>
<td>47</td>
<td>48</td>
<td>49</td>
<td>50</td>
<td>51</td>
<td>52</td>
<td>53</td>
<td>1</td>
<td></td>
<td></td>
</tr>
<tr>
<td>68</td>
<td>69</td>
<td>70</td>
<td>71</td>
<td>72</td>
<td>73</td>
<td>74</td>
<td>75</td>
<td>76</td>
<td>77</td>
<td>78</td>
<td>79</td>
<td>80</td>
<td>2</td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td>...</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>311</td>
<td>312</td>
<td>313</td>
<td>314</td>
<td>315</td>
<td>316</td>
<td>317</td>
<td>318</td>
<td>319</td>
<td>320</td>
<td>321</td>
<td>322</td>
<td>323</td>
<td>11</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

Source: the author

Each physical CN was placed in a faulty-state in the decoder’s software model. To simulate the faults, the CN's outputs had the magnitudes attenuated or increased by a random factor and the signal was changed depending also of a random factor. Note that a fault induced in the first physical CN, for instance, corresponds to 12 code-level CNs (of indexes 0, 27, 54, etc.) operating inappropriately. The BER measured for a fault-free execution was 0.00011, Table 5.2 shows the BER performance for each CN in a faulty-state and Figure 5.1 shows the
corresponding normal distribution (the average is equal to 0.08867 and the standard deviation is 0.00103).

### Table 5.2 - BER per physical CN over failure

<table>
<thead>
<tr>
<th>Physical CN index</th>
<th>BER</th>
<th>Physical CN index</th>
<th>BER</th>
<th>Physical CN index</th>
<th>BER</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0.088034</td>
<td>9</td>
<td>0.087201</td>
<td>18</td>
<td>0.088772</td>
</tr>
<tr>
<td>1</td>
<td>0.088035</td>
<td>10</td>
<td>0.089398</td>
<td>19</td>
<td>0.091276</td>
</tr>
<tr>
<td>2</td>
<td>0.088644</td>
<td>11</td>
<td>0.089682</td>
<td>20</td>
<td>0.08881</td>
</tr>
<tr>
<td>3</td>
<td>0.086128</td>
<td>12</td>
<td>0.088972</td>
<td>21</td>
<td>0.089552</td>
</tr>
<tr>
<td>4</td>
<td>0.089682</td>
<td>13</td>
<td>0.089432</td>
<td>22</td>
<td>0.087181</td>
</tr>
<tr>
<td>5</td>
<td>0.088052</td>
<td>14</td>
<td>0.089139</td>
<td>23</td>
<td>0.08811</td>
</tr>
<tr>
<td>6</td>
<td>0.088424</td>
<td>15</td>
<td>0.088645</td>
<td>24</td>
<td>0.088412</td>
</tr>
<tr>
<td>7</td>
<td>0.088528</td>
<td>16</td>
<td>0.08779</td>
<td>25</td>
<td>0.090668</td>
</tr>
<tr>
<td>8</td>
<td>0.08927</td>
<td>17</td>
<td>0.087991</td>
<td>26</td>
<td>0.088508</td>
</tr>
</tbody>
</table>

Source: the author

### Figure 5.1 – Normal distribution for BER per physical CN over failure

Source: the author
Given these results, we can conclude that all the physical CNs are similarly critical for proper system behavior, since the BER performance among them is within the same order-of-magnitude and is highly degraded compared to a fault-free execution.

The code-level CNs were then evaluated in the overall system’s performance. The same approach to simulate the faults was adopted, but this time individually for each of the 324 code-level CNs. The average obtained for BER is equal to 0.0240 and the standard deviation is 0.00187. Note that the code-level CNs are individually less relevant to the system performance than the physical CNs, which was already expected since each physical CN corresponds to twelve faulty code-level CNs. Yet, due to the low standard deviation of BER performance among them and high degradation compared to a fault-free execution, they are similarly critical for proper system behavior, as illustrated in Figure 5.2.

Figure 5.2 – Normal distribution for BER per code-level CN over failure

Source: the author
6 FINE-GRAINED REDUNDANCY

The next step is to evaluate the internal structure of the CN in order to compare the performance loss, in terms of BER increase, caused by each module in a faulty-state and the area overhead obtained by applying selective redundancy. Here, we assume that DMR will be applied to detect the errors, scrubbing will be applied to correct the errors and the signals will be reprocessed.

Fault injections were performed in the components of a single CN for LLR values with $E_b/N_0 = 2.5$ dB and the results were obtained for twenty input blocks of the LDPC decoder. Each input block of the decoder produces 440 inputs and 525 outputs in the CN, totaling 8800 inputs and 10500 outputs in the CN. The total amount of 51712 bits of the configuration memory were affected and 22536 (43,58%) produced some error in the CN’s output. The fault injections were performed in all configuration bits associated with logic and routing. Memory elements (BRAMs) were not evaluated since these components have a different model of failures and may be protected with different fault tolerance techniques, such as error-correcting codes. Table 6.1 shows the distribution of sensitive bits, LUTs and FFs on the modules evaluated.

<table>
<thead>
<tr>
<th>Module</th>
<th>Sensitive bits</th>
<th>LUTs</th>
<th>FFs</th>
</tr>
</thead>
<tbody>
<tr>
<td>SUM</td>
<td>5422</td>
<td>50</td>
<td>17</td>
</tr>
<tr>
<td>SUB</td>
<td>4929</td>
<td>50</td>
<td>17</td>
</tr>
<tr>
<td>SUB_BETA_1</td>
<td>3267</td>
<td>50</td>
<td>17</td>
</tr>
<tr>
<td>Find2Smaller</td>
<td>3076</td>
<td>22</td>
<td>23</td>
</tr>
<tr>
<td>SUB_BETA_0</td>
<td>2793</td>
<td>50</td>
<td>17</td>
</tr>
<tr>
<td>AllSign</td>
<td>2416</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>MAX_0</td>
<td>529</td>
<td>7</td>
<td>0</td>
</tr>
<tr>
<td>MAX_1</td>
<td>104</td>
<td>7</td>
<td>0</td>
</tr>
<tr>
<td><strong>Total</strong></td>
<td><strong>22536</strong></td>
<td><strong>239</strong></td>
<td><strong>91</strong></td>
</tr>
</tbody>
</table>

Source: the author

The CN’s outputs may be either correct or categorized within one or more classes of errors. The errors with respect to the value produced are:

- Change of sign;
- Increase in magnitude;
- Decrease in magnitude;
If the behavior of the circuit is affected, the errors may be classified in:

- Control errors (regarding the signals that control the behavior of the CN);
- Timeout (if the CN does not produce any output).

Table 6.2 shows the average of errors caused by sensitive bits per module of the CN. For example, each sensitive bit of *Find2Smaller* produced, on average, 16.8% outputs with wrong signal.

Table 6.2 – Average of errors caused by sensitive bits per module of the CN

<table>
<thead>
<tr>
<th>Module</th>
<th>Change of sign</th>
<th>Increase in magnitude</th>
<th>Decrease in magnitude</th>
<th>Control errors</th>
<th>Timeout</th>
<th>Correct outputs</th>
</tr>
</thead>
<tbody>
<tr>
<td>Find2Smaller</td>
<td>16.8%</td>
<td>43.6%</td>
<td>19.3%</td>
<td>2.9%</td>
<td>2.3%</td>
<td>24.3%</td>
</tr>
<tr>
<td>MAX_0</td>
<td>29.7%</td>
<td>26.6%</td>
<td>32.2%</td>
<td>5.2%</td>
<td>1.1%</td>
<td>31.0%</td>
</tr>
<tr>
<td>MAX_1</td>
<td>39.9%</td>
<td>28.0%</td>
<td>28.2%</td>
<td>10.4%</td>
<td>3.2%</td>
<td>29.5%</td>
</tr>
<tr>
<td>SUB</td>
<td>31.9%</td>
<td>37.2%</td>
<td>4.6%</td>
<td>31.4%</td>
<td>2.2%</td>
<td>22.3%</td>
</tr>
<tr>
<td>SUB_BETA_0</td>
<td>28.3%</td>
<td>32.6%</td>
<td>24.1%</td>
<td>8.3%</td>
<td>2.0%</td>
<td>30.8%</td>
</tr>
<tr>
<td>SUB_BETA_1</td>
<td>17.2%</td>
<td>31.6%</td>
<td>31.6%</td>
<td>8.4%</td>
<td>2.3%</td>
<td>24.1%</td>
</tr>
<tr>
<td>SUM</td>
<td>53.2%</td>
<td>1.1%</td>
<td>44.6%</td>
<td>23.0%</td>
<td>0.3%</td>
<td>14.8%</td>
</tr>
<tr>
<td>AllSign</td>
<td>27.0%</td>
<td>5.9%</td>
<td>12.3%</td>
<td>5.2%</td>
<td>3.9%</td>
<td>51.4%</td>
</tr>
</tbody>
</table>

Source: the author

The behavior obtained in the fault injections, shown in Table 6.2, was replicated in the software model in order to evaluate the impact of the faults in the decoder's BER performance. The results are presented in Table 6.3.

Table 6.3 – BER performance for each module under a faulty-state

<table>
<thead>
<tr>
<th>Module</th>
<th>BER</th>
</tr>
</thead>
<tbody>
<tr>
<td>SUB</td>
<td>0.238585</td>
</tr>
<tr>
<td>SUM</td>
<td>0.238522</td>
</tr>
<tr>
<td>Find2Smaller</td>
<td>0.231251</td>
</tr>
<tr>
<td>SUB_BETA_0</td>
<td>0.101422</td>
</tr>
<tr>
<td>MAX_0</td>
<td>0.046245</td>
</tr>
<tr>
<td>SUB_BETA_1</td>
<td>0.001991</td>
</tr>
<tr>
<td>MAX_1</td>
<td>0.000411</td>
</tr>
<tr>
<td>AllSign</td>
<td>0.000245</td>
</tr>
</tbody>
</table>

Source: the author
We can define the \textit{Weighted Performance Degradation (WPD)} of a module $M$ as the product of the relative quantity of sensitive bits in $M$ by the increase in BER performance produced by $M$:

$$WPD(M) = \frac{\text{sensitive bits in } M}{\text{total sensitive bits in the CN}} \cdot \frac{\text{BER for } M \text{ under fault}}{\text{BER for fault free}}$$  \hspace{1cm} (6.1)

The \textit{Area Overhead (AO)} of the module $M$ is simply the hardware area occupied by $M$ since we are applying DMR, i.e., the sum of LUTs and FFs:

$$AO(M) = \text{LUTs}(M) + \text{FFs}(M)$$  \hspace{1cm} (6.2)

The ratio of the \textit{Weighted Performance Degradation} to the \textit{Area Overhead} is defined as the \textit{Gain} and is a quantity we want to maximize:

$$Gain(M) = \frac{WPD(M)}{AO(M)}$$  \hspace{1cm} (6.3)

Table 6.4 shows the metrics (6.1), (6.2) and (6.3) for each module of the CN.

<table>
<thead>
<tr>
<th>Module</th>
<th>WPD</th>
<th>AO</th>
<th>Gain</th>
</tr>
</thead>
<tbody>
<tr>
<td>SUM</td>
<td>438.066</td>
<td>67</td>
<td>6.538</td>
</tr>
<tr>
<td>SUB</td>
<td>398.340</td>
<td>67</td>
<td>5.945</td>
</tr>
<tr>
<td>Find2Smaller</td>
<td>240.947</td>
<td>45</td>
<td>5.354</td>
</tr>
<tr>
<td>SUB_BETA_0</td>
<td>95.952</td>
<td>67</td>
<td>1.432</td>
</tr>
<tr>
<td>MAX_0</td>
<td>8.287</td>
<td>7</td>
<td>1.184</td>
</tr>
<tr>
<td>AllSign</td>
<td>0.201</td>
<td>3</td>
<td>0.067</td>
</tr>
<tr>
<td>SUB_BETA_1</td>
<td>2.203</td>
<td>67</td>
<td>0.033</td>
</tr>
<tr>
<td>MAX_1</td>
<td>0.014</td>
<td>7</td>
<td>0.002</td>
</tr>
</tbody>
</table>

Source: the author

Figure 6.1 illustrates the scenarios of applying cumulative DMR in the modules of the CN. The x-axis is ordered by the gain (shown in Table 6.4) and each module represented in the x-axis incurs in the protection of itself and all the others to the left. Observe that by applying redundancy in \textit{SUM}, \textit{SUB} and \textit{Find2Smaller}, the area would be increased about 55% and the remaining WPD would be about 10%, which means that 90% of the possible WPD would have been achieved. The protection of \textit{SUB_BETA_1} and \textit{MAX_1} has a high cost in area, the WPD
obtained is low and if the area occupation is a major concern to the hardware designer, it would probably not be worthy.

Figure 6.1 – Cost benefit scenarios of applying DMR cumulatively in the modules of the CN

Source: the author
7 CONCLUSIONS

In this work, we have presented a study about the effects of SEUs in an FPGA-based LDPC decoder and proposed a selective technique to improve reliability in this specific application.

It was shown the destructive and non-destructive effects and models of failures of radiation in semiconductor devices. The structure of FPGAs was presented and these devices were categorized by the technology used to store the configuration data. SRAM-based FPGAs were best exploited since they are especially susceptible to SEUs and were targets of this work. Regarding the application evaluated, an overview of FEC codes and communication systems was presented as well as the processes to encode and decode a message with LDPC codes. The decoding encompasses not only the algorithm used (belief propagation), but also the advantages and disadvantages of different schedules that may be applied.

The architecture and algorithm used as reference was shown and the structure of a CN was exploited, since it is the most critical element of LDPC decoders (they execute all arithmetic operations, occupy most of the total area, and have the greatest workload of the circuit). Fault injections were performed in the HDL modules and a bit-accurate software model was used to obtain BER performance for the analyses taken.

It was used a modified version of the platform described in Leipnitz (2016) to Kintex-7 devices and several adaptations had to be done due to differences in the communication protocol to perform partial reconfiguration, communication bus with the host computer and frame addressing of the configuration memory.

The first approach to propose a selective redundancy to the circuit was made in CN-level, by identifying the CNs that cause greater impact on the decoder’s BER performance. Initially, it was analyzed the 27 physical CNs, by placing each one in a faulty-state in the decoder's software model. Then, a similar approach was applied, but this time for the 324 code-level CNs. The results have shown that both physical and code-level CNs are equally critical for proper system behavior.

The internal structure of the CN was then analyzed and the results have shown that \textit{SUB}, \textit{SUM} and \textit{Find2Smaller} are the modules that cause greater impact in the decoder’s BER performance. Within the metrics created to evaluate the application of DMR, \textit{SUM} is the module that provides the best gain when protected, \textit{SUB\_BETA\_I} and \textit{MAX\_I} are the ones that provide the smaller contribution in the decoder’s BER performance when we consider their area occupation. Nevertheless, there are several parameters that must be taken into account by a
hardware designer to develop the best solution for the application’s needs. If the concern is with the type of error that the modules are more susceptible to produce instead with area occupation, for example, the protection of these modules may be considered to best fit in the project’s requirements.
REFERENCES


JÚNIOR, Geferson L. H. Implementação e caracterização de falhas em um decodificador LDPC. December 2016.

PRATT, Brian; CAFFREY, Michael; GRAHAM, Paul; MORGAN, Keith; WIRTHLIN, Michael. Improving FPGA Design Robustness with Partial TMR. IEEE International Reliability Physics Symposium Proceedings, San Jose, CA, 2006, pp. 226-232. DOI: 10.1109/RELPHY.2006.251221


LI, B.; PEI, Y.; GE, N.; **Area-Efficient Fault-Tolerant Design for Low-Density Parity-Check Decoders.** 2016 IEEE 84th Vehicular Technology Conference, Sep 2016. DOI: 10.1109/VTCFall.2016.7880909
AVALIANT. **Case Study: Performance Results of Avaliant Mercury.** Sep 2018 [Online]. Available: https://static1.squarespace.com/static/553e7ab4e4b07293cf6dd681/t/59120dfaff7c507ca01ff4e3/1494355459355/LDPC_Case_Study_v6.pdf


STERPONE, L.; DU, B. **Analysis and mitigation of single event effects on flash-based FPGAS.** 19th IEEE European Test Symposium (ETS). Paderborn, Germany. May, 2014. DOI: 10.1109/ETS.2014.6847804
APPENDIX A – GRADUATION PROJECT I

Reliable FPGA-based LDPC Decoder

Eduardo Nunes de Souza

Instituto de Informática – Universidade Federal do Rio Grande do Sul (UFRGS)
Caixa Postal 15.064 – 91.501-970 – Porto Alegre – RS – Brazil
ensouza@inf.ufrgs.br

Abstract. LDPC codes are extremely advantageous, both from the theoretical point of view – which makes them attractive to the academic community – and from the applicability perspective, which justifies their wide use in data communication applications. In particular, FPGAs are very timely to the implementation of these codes: the high level of parallelism given by these devices combined with the efficiency of LDPC codes ensures an elevated performance of these applications. In critical systems, the data reliability is a major requirement and, in such cases, as communication satellites, these devices are exposed to a high incidence of ionizing radiation, which may lead to failures. In this sense, this work presents a study about LDPC codes and the effects of soft errors in FPGAs. To workaround this issue, it is proposed the implementation of fault-tolerance techniques in an FPGA-based LDPC decoder.

1. Introduction

Low-Density Parity-Check (LDPC) codes represent a powerful class of Forward Error Correction (FEC) codes. They were conceived by [Gallager 1962] in his doctoral dissertation, but at the time, they were impractical to implement. Thirty-four years later, [Mackay and Neal 1996] verified that the performance of LDPC codes are equivalent as Turbo codes and could approach Shannon’s bound. Since their rediscovery, LDPC codes have been extensively used in several standards like WiFi, WiMAX, DVB-S2, CCSDS and ITU G.hn [Hailes et al. 2015]. These codes indeed offer an excellent performance and when hardware resources capable of implementing them showed up, they became a success in many communication systems.

In the meantime between the creation and the beginning of effective usage, LDPC codes were not heavily investigated. One notable exception is the work of Tanner, in which he generalized LDPC codes and proposed a graphical representation for them [Tanner 1981]. LDPC codes are defined by a sparse matrix (containing mostly zero elements), called parity-check matrix. Each row represents a parity check equation and each column of the matrix represents a codeword bit. The so-called Tanner graphs introduce the concepts of Check Nodes (CN), each of them is attached with a row of the parity-check matrix, and the Virtual Nodes (VN), related with the columns of the parity-check matrix.

Besides the implementation of LDPC codes uses low-complexity calculations, a high level of parallelism can be exploited, which makes them very suitable to be implemented in FPGAs (Field Programmable Gate Arrays). These devices are extremely versatile and can offer high degree of parallel processing. In addition, they offer rapid prototyping and are specially useful for measuring Bit Error Rate (BER) performance, due to the reduced time that the simulations take (when compared to a general-purpose processor, for instance) [Hailes et al. 2015]. Their usage have been increasing more and more, and they are not restricted to coding
applications, FPGAs are present in other fields such as industrial, automobile and medical applications.

When these applications run over the presence of ionizing radiation, FPGAs (and any other semiconductor device) are susceptible to many effects that vary in magnitude from data disruptions to permanent damage ranging from parametric shifts to complete device failure [Baumann 2005]. The immediate effects caused by radiation are called single-event effects (SEEs). The particular type of SEE that we are interested are the single-event upsets (SEUs). SEUs are soft errors, because the device itself is not permanently damaged by radiation, but the radiation event causes enough charge disturbance to reverse or flip the data of a memory cell, register, latch or flip-flop. SRAM-based FPGAs are specially susceptible to SEUs within the configuration memory [Wirthlin 2015].

In order to ensure data reliability on an FPGA-based LDPC decoder, this work aims to mitigate SEUs within the design, taking advantage of the architecture of the circuit. To the best of our knowledge, this is the first work proposing fault tolerance mechanisms focusing on FPGA-based LDPC decoders. ASIC implementations have been proposed, however, as in [Li et al. 2016], which presents an area-efficient design by using Hamming decoders inside the control logic and [May et al. 2008] which presents techniques to improve the performance of the decoder in the presence of SEUs. ASIC and FPGAs have a different model of failures, though.

The decoder used as reference is presented by [Júnior 2016]. Fault injections and the characterization of faults inside the check nodes (CN) were performed, since it is the main module of the decoder. It implements a Layered Decoding architecture and the Modified Min-Sum algorithm.

2. Background

The following topics detail the basic concepts of this work. Initially, in section 2.1, it is discussed the effects of radiation in FPGAs, followed by some basic concepts of LDPC codes (section 2.2). Finally, section 2.3 presents previous works related with this paper.

2.1. Radiation Effects on FPGAs

FPGAs are semiconductor devices consisting of logic blocks, RAM blocks and I/O blocks. The most fundamental logic block of an FPGA is formed by a Lookup Table (LUT) and a Flip-Flop (FF). The I/O blocks surround the outer edge of the microchip, providing I/O access to the pins on the exterior of the FPGA package. Programmable routing is implemented so that it is possible to connect logic blocks and IOBs to logic blocks arbitrarily [Buell et al. 2007], as shown in Figure 1.

Xilinx devices quantify the logic resources in terms of “slices”, each of them contains several LUTs and FFs - the nature and quantity of the hardware resources available in each slice depends on the model and generation of the FPGA. The terminology given by Xilinx also introduces the concept of Configurable Logic Blocks (CLBs), which consists of multiple slices [Buell et al. 2007].

To determine the effects of radiation on a FPGA, it is necessary to classify these devices in three categories, based on the technology used to store the configuration data: antifuse, flash and SRAM-based FPGAs. Antifuse FPGAs are nonvolatile and the
configuration data cannot be changed once the fuses have been programmed. This type of FPGA is usually the most reliable, since the configuration cells are made from passive, programmed fuses, and they are generally immune to single-event effects. Flash FPGAs are also nonvolatile but they may be reprogrammed only for a certain number of times, which may not be suitable for reconfigurable systems requiring frequent reconfiguration. Flash FPGAs are as well immune to SEUs and both types of FPGAs have a primary radiation concern in SEUs within the user flip-flops and block memories.

SRAM-based FPGAs, on the other hand, are volatile, so they lose their configuration when power is removed. Although SRAM cells require more power than antifuse or flash cells, they can be reprogrammed an unlimited number of times. SRAM FPGAs have a primary reliability concern in SEUs within the configuration memory, since these cells are made using standard static memory techniques and comprise the majority of the memory cells on the device. [Wirthlin 2015].

The incidence of radiation in the configuration memory causes SEUs, which may lead to changes in the logic and routing of the operating circuit, deviating from the function they were supposed to fulfill. Figure 2 illustrates this behavior.
2.2. LDPC Codes

An LDPC code is defined by its parity check matrix (H), a matrix containing mostly zero elements and few nonzero elements. Three important parameters are its code word length (n), its dimension (k) and the number of parity bits (m = n – k).

We can categorize these codes by regular and irregular. A code is regular if the number of nonzero elements do not vary amongst each row or each column of the matrix H. Irregular codes, on the other hand, do not respect this property. In general, irregular codes have a better performance than the regular ones [Carrasco 2009]. Figure 3 illustrates examples of both types of codes.

![Figure 3: parity-check matrices (a) from an irregular code (b) from a regular code](image)

LDPC codes can be described as a k-dimensional subspace C of the vector space of binary n-tuples over the binary field $F_2$. We first describe a basis $B = \{g_0, g_1, \ldots, g_{k-1}\}$ which spans C. Each $c \in C$ can be written as $c = u_0 g_0 + u_1 g_1 + \ldots + u_{k-1} g_{k-1}$, or simply

$$c = uG$$

where $u = [u_0, u_1, \ldots, u_{k-1}]$ and G is the $k \times n$ generator matrix whose rows are the vectors $\{g_i\}$. The $(n - k)$-dimensional null space $C^\perp$ of G comprises all the vectors x for which $xG^T = 0$ and is spanned by the basis $B^\perp = \{h_0, h_1, \ldots, h_{n-k-1}\}$. For each $c \in C$, $cH^T = 0$, or simply

$$cH^T = 0$$

where H is the $(n-k) \times n$ parity-check matrix whose rows are the vector $\{h_i\}$ and is the generator matrix for the null space $C^\perp$ [Ryan 2003].

The process of encoding a message is given by 2.1. We can obtain G from H by some simple steps. It is necessary first to transform H with Gauss-Jordan elimination in order to get H in the form:

$$H = [A, I_{n-k}]$$

where I is the identity matrix of dimension n-k and A is a matrix of size $(n - k) \times k$. From that, G can be easily found:

$$G = [I_k, A^T]$$

2.2.1. Tanner Graph Representation

Each row of the parity-check matrix (H) represents a parity-check equation and is represented by a Check Node (CN) in the Tanner graph. In the same way, each column represents a coded bit and is represented by a Virtual Node (VN). Edges on Tanner graphs may
only connect two nodes of different types and an edge between a check node $j$ and variable node $i$ exists whenever element $h_{ji}$ in $H$ is equal to 1. Figure 4 shows the Tanner graphs corresponding to the matrices of Figure 3.

Figure 4: Tanner graphs corresponding to matrix (a) $H1$ (b) $H2$

2.3. Related Works

The following topics present works related with this paper. The first two propose techniques to mitigate soft errors in LDPC decoders implemented in ASICs. The third presents the implementation of an FPGA-based LDPC decoder and a study about the effect of faults in this scenario.

2.3.1 Area-efficient LDPC decoder

[Li et al. 2016] proposes the design of a fault-tolerant LDPC decoder that corrects soft-errors caused by SEUs inside the control logic with a Hamming decoder. For RAM cells, a layered pipelined architecture is presented as well as a scheme that detects soft errors by parity check, then the errors will be corrected by the inherent decoder’s iterative process. This approach could save 42% of cell area compared with TMR method and the reduction of 42% and 12% of memory bits compared with similar works.

2.3.2 Resilient LDPC decoder

[May et al. 2008] presents a technique that assumes that not all data bits of a message or channel value have the same importance and corruption in higher significant bits has a larger impact on the overall communications performance than corruption in lower bits. If an LLR value which is calculated by a functional node is corrupted, the value is reset to 0. In other words, the corresponding node/edge is temporarily removed for the current iteration. Thereby, no information is associated with the respective bit and the error tends to be minimized.

2.3.3 FPGA-based LDPC decoder and characterization of faults

[Júnior 2016] implemented a parameterizable LDPC decoder using FPGA, as well as a bit accurate simulator written in C. In order to evaluate the effects of faults on an FPGA-based LDPC decoder, it is also presented the results of fault injections campaigns performed inside the check nodes, the most important module of the circuit. Modified Min-Sum algorithm is implemented, as well as a Layered Decoding architecture, as shown in Figure 5.
Figure 5: architecture implemented by [Júnior 2016]

Figure 6 illustrates the importance of the mitigation of soft errors in FPGA-based LDPC decoders. The signal is highly degraded in the presence of faults.

Figure 6: comparison between BER with and without faults [Júnior 2016]
3. Ongoing work

3.1. Fault Injection Platform Adaptation

The fault injection campaigns performed in [Júnior 2016] took place in a fault injection platform specific for data communication systems [Leipnitz et al. 2016]. A Xilinx Virtex-5 XUPV5-LX110T FPGA board was used (for which the platform was originally developed). Due to a bigger amount of logic resources, this work will use a Kintex-7 XC7K325T-1FFG676 FPGA board, also from Xilinx. Since the FPGAs have a different architecture, an adaptation of the platform from one board to another has already been made.

The platform is composed of a module responsible for the communication with the PC Host (PCIe I/O Control), another one to perform readback and partial reconfiguration in the configuration memory of the FPGA (System Control) and the circuit that will be submitted to faults (CUT I/O Control). The first two modules were the ones that suffered the most significant changes due to the difference in the PCIe bus width and the scheme to access the bits in the configuration memory, that differs from one board to another. Figure 7 shows the architecture of the platform.

![Figure 7: fault injection platform architecture [Leipnitz 2016]](image)

3.2. Selective Technique to mitigate SEUs

The next step of this work encompasses a detailed study in the architecture implemented in [Júnior 2016] in order to find the best approach to mitigate SEUs. TMR (Triple Modular Redundancy) is the most classical and robust technique with this purpose, where a module is replicated three times and the output is extracted from a majority voter. This technique, however, demands excessive area overhead and, in SRAM FPGAs, the voter circuit has to be implemented using SRAM cells which themselves are highly susceptible to upsets [Samudrala et al. 2004].

Several publications have proposed alternatives to the traditional TMR method by protecting specifics subsets of the circuit, the ones considered most critical. [Foster et al. 2010] presents several methodologies for selecting these subsets, such as metrics that consider the number of logic cells that use the cell’s output signals as inputs, the number of logic resources necessary to add TMR to the logic cell and the number of logic cells in longest propagation path through the logic cell.
Another approach is presented by [Samudrala et al. 2004], where a Selective Triple Modular Redundancy (STMR) technique is described. The logic cells are classified by the “sensitivity” to SEUs, which is measured by the signal probability of its inputs. It is assumed that the primary inputs of the circuit are specified by the user in terms of signal probabilities and then it is propagated to compute the signal probability of each internal node.

In a different way, [Pratt et al. 2006] prioritizes the protection of structures causing “persistent” errors within the design. Configuration bits are categorized in “persistent” and “non-persistent”. A non-persistent configuration bit will cause a design fault when upset and may be repaired through configuration scrubbing, which will lead the design back to normal operation. Persistent bits, on the other hand, will also cause a design fault when upset, but after repairing persistent bits through configuration scrubbing, the FPGA circuit does not return to normal operation.

An analysis of the “severity” of CNs will be performed in this work. In other words, faults will be injected sequentially in each CN in order to get the BERs caused by individual CNs under a faulty state, in order to determine the most critical CNs in the circuit and propose a selective mechanism to protect the circuit from SEUs.

4. Schedule

The development of the second stage of this work also comprises the implementation of the technique in the C-based simulator and in the VHDL modules. The steps are listed below and Table 1 shows the estimated amount of time necessary for each task.

1. Analysis of the “severity” of CNs and definition of the selective technique to mitigate SEUs
2. Implementation of the technique in the LDPC decoder simulator
3. Validation and performing of fault injection campaigns in the simulator in order to evaluate the technique implemented
4. Implementation of the technique in the VHDLs modules
5. Validation and performing of fault injection campaigns in the FPGA board
6. Writing and presentation of Graduation Work II

<table>
<thead>
<tr>
<th>Task</th>
<th>Feb</th>
<th>Mar</th>
<th>Apr</th>
<th>May</th>
<th>Jun</th>
<th>Jul</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>2</td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>3</td>
<td></td>
<td>X</td>
<td>X</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>4</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>X</td>
<td>X</td>
</tr>
<tr>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td>X</td>
<td>X</td>
<td></td>
</tr>
<tr>
<td>6</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>X</td>
</tr>
</tbody>
</table>

Table 1: activities schedule
5. Final Considerations

This work presented a study about LDPC codes and the effects of radiation on FPGAs. It discussed about the architecture of an FPGA and explained why it is susceptible to soft-errors, particularly SEUs. It has also given a formal definition to LDPC codes and the graphical representation for them, the Tanner graphs.

As the concern with fault-tolerance mechanisms in FPGA-based LDPC decoders is not heavily exploited in literature, we presented related publications targeting ASIC implementations. It was also shown the LDPC decoder implementation used as reference in this work, the fault injection platform and the modifications required to exchange the FPGA board used.

Finally, it was explained the approach that will be taken to define the best technique to mitigate SEUs in the circuit and the tasks necessary to conclude Graduation Work II.

6. References


JÚNIOR, Geferson L. H. Implementação e caracterização de falhas em um decodificador
LDPC. December 2016.

PRATT, Brian; CAFFREY, Michael; GRAHAM, Paul; MORGAN, Keith; WIRTHLIN, Michael. Improving FPGA Design Robustness with Partial TMR. IEEE International Reliability Physics Symposium Proceedings, San Jose, CA, 2006, pp. 226-232. DOI: 10.1109/RELPHY.2006.251221
