Architectures for silicon nanoelectronics and beyond

Although nanoelectronics won't replace CMOS for some time, research is needed now to develop the architectures,methods, and tools to maximally leverage nanoscale devices and terascale capacity.Addressing the complementary architectural and system issues involved requires greater collaboration at all levels.The effective use of nanotechnology will call for total system solutions.

The semiconductor industry faces serious problems with power density, interconnect scaling, defects and variability, performance and density overkill, design complexity, and memory bandwidth limitations. Instead of raw clock speed, parallelism must now fuel further performance improvements, while few persuasive parallel applications yet exist. A candidate to replace complementary metal-oxide semiconductors (CMOS), nanoelectronics could address
some of these challenges, but it also introduces new problems.

Molecular-scale computing will likely allow additional orders of magnitude improvements in device density and complexity, which raises three critical questions:

* How will we use these huge numbers of devices?
* How must we modify and improve design tools and methodologies to accommodate radical new ways of computing?
*Can we produce reliable, predictable systems from unreliable components with unpredictable behavior?

The effective use of nanotechnology will require not just solutions to increased density, but total system solutions. We can't develop an architecture without a sense of the applications it will execute. And any paradigm shift in applications and architecture will have a profound effect on the design process and tools required. Researchers must emphasize the complementary architectural and system issues involved in deploying these new technologies and push for greater collaboration at all levels: devices, circuits, architecture, and systems.

WHAT IS NANOARCHITECTURE?

We define nanoarchitecture as the organization of basic computational structures composed of nanoscale devices assembled into a system that computes something useful. Nanoarchitecture won't provide superior computing ability for many applications or algorithms, but will enable radically different computational models. Since architecture is rarely created in a vacuum, these issues will greatly effect nanoarchitecture development. There are two paths to follow: evolutionary and revolutionary.

Evolutionary path

Silicon semiconductor technology will continue to shrink. But there's an increasing performance gap between device technology and its ability to deliver performance in proportion to device density. Performance, in terms of millions of instructions per second per watt, isn't keeping up with the increase in millions of devices per chip.There's also a gap between device density and our ability to design new chips that use every device on the chip and guarantee they're designed correctly.Power consumption and heat dissipation present additional challenges. The semiconductor industry is investing tremendous effort in finding solutions as we move down this evolutionary path, but it's increasingly difficult to design, fabricate, and test solutions.

Revolutionary path

Knowing the end of Moore's law scaling is in sight for traditional silicon technology, many have embarked on revolutionary nanoelectronics research. Researchers are studying carbon nanotube transistors, carbon nanotube memory devices, molecular electronics, spintronics, quantum-computing devices, magnetic memory devices, and optoelectronics— technologies addressed in the emerging devices section of the 2005 International Technology Roadmap for Semiconductors (www.itrs.net/Links/2005ITRS/ ERD2005.pdf). Unfortunately, we won't use many of these devices until it's absolutely necessary to consider a replacement technology. So, how should we use these revolutionary nanoelectronic devices in the interim, especially when these devices haven't demonstrated sufficient reliability and large enough signal-to-noise ratio to guarantee reliable digital computation?

RELIABLE SYSTEMS WITH UNRELIABLE DEVICES

In addition to massive CMOS-scaling efforts, many researchers are pursuing molecular, optical, or quantum devices that they could integrate with CMOS-based digital logic to produce hybrid systems. While there's no consensus yet about which hybrids will enter production, future nanodevices will certainly have high manufacturing-defect rates. Further, we expect them to operate at reduced noise margins, thereby exposing computation to higher soft-error rates. For non-CMOS nanoscale electronics, operation uncertainties originate in the devices' inherent stochastic switching behavior. Finally, devices will have more process variability—and thereby more nonuniform behavior across a chip—so circuits must be more robust to this process variation to prevent unacceptable yield loss.

Power density and energy cost are the mail design bottlenecks for CMOS nanoscale technology. Adding redundancy to increase error resilience eventually increases design complexity, decreasing energy efficiency and compromising density advantages. Granularity of the fault tolerance is also important. Some redundant techniques improve yield with small cost increases, such as providing spare cache lines to substitute for defective hardware. Others, such as macro-redundancy in the form of triplicate voting schemes, are much more expensive. While nanoscale devices have the advantage of low power, particularly if switching is accomplished withouT physically moving significant amounts of charge in space, nanoarchitectures will most likely have huge complexities, driven by application needs and the redundancy required to enable fault tolerance. Low-power nanodevices are intrinsically error-prone because thermal Fluctuations can easily switch devices across the lowenergy barrier separating different logic states.
Temporal and hardware redundancy have traditionally resolved high fault rates, as Figure 1 shows. The unpredictability in confirmation completion and the worst-case hardware overhead require reliable hybrid architectures, necessitating exploration of speculation and adaptivity to ensure correct computation at low hardware and time costs.

High-level issues
Researchers must address several high-level issues in the search for revolutionary architectures for building reliable computers from unreliable devices.

Defect and fault rates. Devices designed in the nanoregime create different problems than those with current VLSI technology. In particular, defect and fault rates, as well as process variability, were never considered "show stoppers." At the nanoscale level, however, high defect rates and variability will be first-order design considerations, not merely add-ons to previously established design objectives. Most effective, novel design approaches must incorporate redundancy at several levels of abstraction. Synergy between levels. There must be a tight synergy between levels of technology abstraction, which might require passing on more design information from one level of abstraction to the next. Although this might lead to more complex designs, it's required for achieving an appropriate level of reliability.

Well-designed interface. Since this work is interdisciplinary, researchers must clarify interfaces between various levels of abstraction during the tool-development process. Researchers need to understand expectations among different groups before developing a well-defined interface. Exploring potential. Research in nanoarchitectures for revolutionary computing models will lead to new ways of exploiting the potentials of nanotechnology and nanoelectronics. Reliability issues will cut across both active device and interconnect design levels and might require regular topologies to enable amortization of reliability overhead. Figure 2 shows the fundamental opportunities and attributes in nanoelectronics that will shape design approaches at various levels of abstraction.

Devices and circuits

Computation with nanoscale devices implies computing close to the thermal limit. At this point, computation becomes probabilistic in nature. Along with \ault modeling, analysis, and propagation, evaluating these systems' probabilistic behavior requires more theoretical work. Borrowing ideas from stochastic system analysis might be useful here. Researchers need to develop new computational paradigms that take probabilistic implementationinto account. It's still uncertain how much and what kind of noise nanodevices will encounter in real operation. As researchers develop these devices, we'll get a better sense of their behavior. Nevertheless, researchers must base nanoscale architectures on information obtained from modeling and analyzing real nanodevices so that they're making appropriate assumptions about noisy behavior. Another important issue concerns the degree to which the application itself can tolerate hardware faults, incorrect operations, and so on. Being absolutely fault free is significantly more expensive than allowing a small number of faults to be visible at the software level. Circuit designers have relied on different logic styles to obtain area, delay, or power advantages. Due to the nature of molecular-scale circuitry, designers need to add a new constraint—reliability—to the optimization equation. We need comparative studies to assess the reliability of these different logic styles and analyze how the reliability of these styles might change as devices shrink to the nanoregime. One approach is to combine reliable elements with unreliable devices, such as hybrid CMOS/nanodevice circuits. Researchers have proposed several such approaches.

Architecture

Examining the need for new architectures requires an understanding of the applications the architecture will execute and evaluating existing architectures' limitations. While the goal is to design reliable, cheaper, and better-performing architectures built from hybrid nanoelectronic
circuitry, it's not clear what aspects of current architectures will present the most serious constraints in reaching this goal. For example, how will interconnect and memory bottlenecks limit the ability to handle high fault and defect rates? Are random technology layouts becoming less desirable as a "fabric" for handling defective devices? Although they've been tried several times over the years, asynchronous self-timed circuits and logic have limited use. Synchronous circuit techniques have always been more cost-effective and have design inertia and tools on their side. But slow signal-propagation times might bring this era to an end in the nanoscale regime.

Researchers must explore asynchronous designs as a means of simplifying global communication and power issues. A globally asynchronous, locally synchronous (GALS) design approach might be the best way to take advantage of synchrony problems between blocks of nanoscale circuitry. However, GALS and asynchronous designs aren't without theirown challenges. Such designs might increase the number of wires, and random noise will be more disruptive. In addition, such challenges become more involved once we consider faulty connections and devices. The design of
fault-tolerant asynchronous hardware is largely unexplored. Ultimately, successful integration of asynchronous designs in future nanoscale architectures will depend on which technologies are viable.

Plausible bottom-up fabrication techniques have demonstrated the feasibility of two-terminal nanodevices for computing applications. Consequently, several approaches to nanoelectronic device architectures have explored ways to leverage two-terminal nanodevices. While the relatively low functionality of two-terminal devices limits circuit architectures, further research
can explore its potential for computing applications. It's possible to build dense regular structures such as logic and memory arrays, which might be the best way to use two-terminal devices when nanodevices first achieve commercial viability.

Reliability theory. Reliability theory has traditionally investigated bounds on system behavior based on simplified assumptions. For example,

*all gates have the same probability of failure,
*only gates fail (and not connections),
*only stuck-at-faults are considered, or
*faults aren't state dependent.

Although simplistic, these assumptions have let designers reasonably approximate expected system behavior. On the other hand, these same assumptions might lead to flawed conclusions about expected behavior for systems built from molecular circuits. We need more realistic characterization of the nature of faults at the molecular scale, as well as an understanding of how faults might manifest themselves in terms of logical and system behavior. We need new fault models for both gates and wires. And researchers should review traditional theoretical results using these new fault models. We need to identify and optimize algorithms for automating such computations, since they'll be essential in developing fault-tolerant circuits and CAD tools
for reliability estimation.

Computational theory. To build effective architectures for reliable computation, we must consider several

issues at various levels of abstraction. At the highest levels, we need to explore new models of computation and information representation. Current approaches to data representation might no longer be viable when a system has widespread static and dynamic faults and noise. Consequently, we need to understand issues involved in adding reliability at different levels of abstraction. Standard and innovative hybrid techniques might be appropriate at different levels of abstraction.

Figure 3 shows a rough impedance match for high faultrate regimes such as nanoelectronics. Allowing fault tolerance to operate at different levels of abstraction might allow for a more cost-effective design. Furthermore, developers can hierarchically implement error detection and correction at various levels of abstraction, as well as represent data using error-correction codes. Hierarchical techniques can also provide avenues for handling fault clustering cost-effectively. We should consider security in parallel with reliability since these two issues might share similar solution spaces.

Fault/defect management. Reliability concerns an entire system with contributions from all levels. Once researchers develop fault models, they must conduct probabilistic analyses of the models. Detecting the faults requires incorporating an effective test-design methodology into the architecture. Another open area of research deals with the testing of fault-tolerant-based circuits. The reconfiguration or sparing process should be part of defect testing. Handling transient and intermittent faults will require runtime monitoring to detect these soft errors, along with prediction and recovery schemes. Given the high error rates, it might be more economical to borrow coding techniques from the communications community rather than building in massive redundancy or reconfigurability. However, the design ultimately will need both error-correction codes and

redundancy/reconfiguration if minimum area is the goal. Blending the two approaches and achieving the gradual transition from brute-force redundancy at the very low level to ECC at higher levels of design abstraction will be challenging.

Cesar Hernandez

19.502.806

CRF

referencias

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.115...

http://electronicadeestadossolidos.blogspot.com/2010/06/architectures-for-silicon.html