MBus: A Fully Synthesizable Low-power Portable Interconnect Bus for Millimeter-scale Sensor Systems

Inhee Lee, Ye-Sheng Kuo, Pat Pannuto, Gyohoo Kim, Zhiyoong Foo, Ben Kempke, Seokhyeon Jeong, Yejoong Kim, Prabal Dutta, David Blaauw, and Yoonmyung Lee

Abstract—This paper presents a fully synthesizable low power interconnect bus for millimeter-scale wireless sensor nodes. A segmented ring bus topology minimizes the required chip real estate with low input/output pad count for ultra-small form factors. By avoiding the conventional open drain-based solution, the bus can be fully synthesizable. Low power is achieved by obviating a need for local oscillators in member nodes. Also, aggressive power gating allows low-power standby mode with only 53 gates powered on. An integrated wakeup scheme is compatible with a power management unit that has nW standby mode. A 3-module system including the bus is fabricated in a 180 nm process. The entire system consumes 8 nW in standby mode, and the bus achieves 17.5 pJ/bit/chip.

Index Terms—Wireless sensor node, IoT, data bus, interconnect, low power

I. INTRODUCTION

Continued advances in ultra-low power circuit design techniques have steadily moved the next generation of computer systems towards the vision of smart dust — a miniature, integrated sensing, computing, storage, and communication platform [1]. These systems are highly optimized in volume and power draw, targeting a millimeter-scale form factor and running on μW in active mode and nW in standby mode [2]. Early efforts to realize such systems have resulted in monolithic and tightly integrated designs, with little capability for reuse [1, 3]. This design approach is in contrast to the modularity that has characterized embedded system design and enabled it to address a highly diverse application space. The miniature sensor node application space is similarly diverse, ranging from implantable medical monitors [4-6] to nearly invisible surveillance [7]. Hence, a modular design approach that enables extensive reuse of chip modules is a key to fully address its application space. The recently introduced millimeter-scale modular sensing platform [8] is designed to exploit layered IC structure to maximize modularity as shown in Fig. 1.

A critical component in the modular platform is the bus through which the different modules communicate with each other. However, the bus needs to satisfy unique constraints in the millimeter-scale sensor systems. First, the number of input/output (I/O) pads on each module should be kept low and fixed. In the proposed millimeter-scale form factor, bond-wires are used over through-silicon vias (TSVs) for inter-layer connection due to TSV’s high manufacturing cost and limited availability across technologies. With state-of-the-art wire bonding pitch being at least 35–65 μm/pad, and accounting for several power supplies and a few module-specific I/O wire pads, only a handful of pads are available for the bus interface in a millimeter-scale form factor. Therefore, wiring topology that requires additional pads for an additional module, such as SPI,
A wireless communication schemes with an on-chip antenna can eliminate wires between the layers and provide higher energy-per-bit efficiency (5.7 pJ/bit [9]). However, its large instant current (32 mA [9]) causes significant battery voltage drop since the millimeter-size battery typically has >1 kΩ of internal battery resistance [10], and it can result in system operation failure.

Second, active power consumption should be low. Due to battery voltage drop with high internal resistance of millimeter-size batteries, the active power budget of millimeter sensors is limited to 10s of μW [10].

Third, sub-nW standby power and organic power mode control with regard to bus state is required. A millimeter-scale energy harvester can provide only nW power to a battery in a weak harvesting condition (i.e. millimeter-scale solar cell with indoor light). Hence, sub-nW standby power is required for perpetual operation. This requires aggressive power gating and a power management unit (PMU) that can switch to an ultra-low, nW power mode. To avoid additional wires for communicating wakeup events, the bus interface must support a wakeup request originating from any node. This poses two challenges: 1) The logic that monitors/transmits such an event must be minimized since it remains always active and directly contributes to the standby power; 2) When the wakeup request is transmitted in the bus, the PMU is still in standby mode, meaning that active current draw used for this transmission should not exceed the nA range.

Fourth, fully synthesizable design is desirable. Synthesizable bus interface significantly reduces time and effort to migrate between technologies and eases adoption. It not only allows fast design by “dropping-in” fully verified Verilog, but also ensures robust timing which is automatically checked by tools.

To address these unique challenges of millimeter-scale sensor nodes, a new chip-to-chip bus interconnect, referred to as MBus, is proposed in [11] and discussed in from the system architectural viewpoint [12]. This paper describes the MBus in greater detail in circuit perspective. The bus nodes are arranged in a segmented ring topology, which gives a 4 pad count for each module and supports a fully synthesizable design. The MBus also achieves low power by obviating a need of local oscillators in the member nodes, using aggressive power gating, and controlling a PMU. The bus is implemented with a 3-module system in a 180 nm process, and the measurement shows > 10 Mb/s data rate, with 17.6 pJ/bit/chip and 8 nW system standby power.

This paper is organized as follows. Section II describes conventional serial buses. Section III discusses design and implementation of MBus. Section IV shows the measurement results. Finally, the conclusion is given in Section V.

II. CONVENTIONAL SERIAL BUSES

Conventional serial bus standards such as SPI [13], I2C [14], UNI/O [15] and 1-Wire [16] are designed to have small pad counts and have been widely used for applications where speed is not critical. However, these standards cannot be adopted for millimeter-scale sensors due to their limits on scalability, power and synthesizability.

Standard SPI protocol requires a dedicated slave-select (SS) wire for each module in the system as shown in Fig. 2(a). Hence, the maximum number of modules needs to be determined in design time, often resulting in over-provisioning and a large total pad count. For instance, in a moderate 8-module system, the SPI controller would require at least 11 pads, which is impossible to realize in a millimeter-scale system. Fig. 2(b) shows a variant of SPI that uses daisy chains for data and slave selection [17]. Although it reduces the pad count down to 5, delay is significantly increased and multi-master operation is still not allowed, which is a critical feature in the target system.

I2C, UNI/O and 1-Wire require only 2 or 4 pads on
Large resistor value can be used for lower power consumption, but it slows down the maximum speed the bus can operate and it is also difficult to implement a high-precision oscillator. For instance, \( \frac{1}{2} \) uses a k\( \Omega \) resistor due to size limitation. Fig. 3(b) shows an \( \frac{1}{2} \)C variant that reduces power consumption by using output keepers instead of resistors [8]. However, it includes custom drivers, ratioed logic, and delay chains that require design margin and post-silicon tuning. Table 1 summarizes the characteristics of the conventional serial buses and the proposed MBus for the millimeter-scale sensor systems.

<table>
<thead>
<tr>
<th>I/O Pad Count</th>
<th>SPI</th>
<th>Daisy-Chain SPI</th>
<th>( \frac{1}{2} )</th>
<th>UNI/O or 1-Wire</th>
<th>MBus</th>
</tr>
</thead>
<tbody>
<tr>
<td>3+N</td>
<td>5</td>
<td>4</td>
<td>2</td>
<td>4</td>
<td></td>
</tr>
<tr>
<td>Scalability</td>
<td>Low</td>
<td>High</td>
<td>High</td>
<td>High</td>
<td>High</td>
</tr>
<tr>
<td>Output Type</td>
<td>High Z</td>
<td>High Z</td>
<td>Open Drain</td>
<td>Open Drain</td>
<td>High Z</td>
</tr>
<tr>
<td>Power</td>
<td>Low</td>
<td>Low</td>
<td>High</td>
<td>High</td>
<td>Low</td>
</tr>
<tr>
<td>Synthesizability</td>
<td>Yes</td>
<td>Yes</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
</tr>
<tr>
<td>Multi-Master</td>
<td>No</td>
<td>No</td>
<td>Yes</td>
<td>No</td>
<td>Yes</td>
</tr>
</tbody>
</table>

Table 1. Conventional serial buses and proposed MBus for millimeter-scale sensor systems

Fig. 4. MBus ring topology. The proposed scheme can be also used for regular non-layered inter-IC communication. In the case, the I/O pad count for \( \frac{1}{2} \)C is 2 by using shared wires for CLK and DOUT. However, the MBus still has advantages over the conventional schemes: lower power against \( \frac{1}{2} \)C, UNI/O and 1-Wire and multi-master feature against SPI and Daisy-Chain SPI.

III. CIRCUIT DESCRIPTION

To satisfy the new requirements for chip-to-chip communication in millimeter-scale sensor systems, a new bus interconnect (MBus) is proposed with following properties.

1. Segmented Ring Topology

Fig. 4 shows MBus topology. Each MBus node has four I/O pads (unit area), which are \( D_{\text{OUT}} \), \( D_{\text{IN}} \), \( \text{CLK}_{\text{OUT}} \), and \( \text{CLK}_{\text{IN}} \). DATA and CLK connections are arranged in...
ring topology by connecting \( D_{OUT}/CLK_{OUT} \) to the next node’s \( D_{IN}/CLK_{IN} \) and eventually looping back. The required pad count per node is fixed with a variable number of layers.

The signals shoot through the rings, and a message generated in one node can be sent to any node. In the proposed topology, the signal ring chains (DATA and CLK) are segmented by nodes, and only one driver is assigned to each segmented wire. Thus, compared to the conventional open drain-based design, it is not sensitive to variation of the strength of drivers and can be designed in low power without passive components.

2. Single Clock Generating Oscillator

In many conventional buses, each node requires an oscillator generating a local clock for state machine operation. For example, in multi-master \(^{*}\) configuration, each node needs a local reference clock so that it can drive clock and data wires as a bus master. This means there are redundant clock generating oscillators which always run even when the bus is idle. MBus proposes to use a centralized clock driving scheme where the clock is only driven by the Mediator node as shown in Fig. 4. This scheme can reduce power consumption significantly since only one oscillator is required for the Mediator node. Since flip-flops in regular nodes are clocked by \( CLK_{IN} \), regular nodes only consume static leakage power when the bus is idle. Note that any node can still initiate MBus message transaction, although only the Mediator node includes the clock generator.

3. Minimizing Standby Power

Fig. 5 shows a block diagram of MBus related circuitry in an MBus node. A member node includes three controllers—Sleep Controller, Bus Controller, and Interrupt Controller—which is the minimum set of modules required for regular node operation. The Bus Controller handles most of the MBus operations in a regular node. It accepts message transmission requests and initiates message transactions. It also interprets message transaction on the bus and forwards the message to the local node if the message was targeted to the node. To achieve reasonable operation speed for complex logic, the Bus Controller is designed with regular-Vth transistors. Therefore, to minimize standby power, the Bus Controller is power gated with a high-Vth gating transistor in standby mode. The power gating is controlled by the Sleep Controller which releases power gating once any activity is detected on the bus so that the Bus Controller can immediately start parsing next message upon wake up. When a node is in standby mode, it still needs to be able to monitor events on the node. For this purpose, always-on External Interrupt Controller is designed with high-Vth transistors to minimize standby power overhead.

The Mediator node additionally includes an Activity Detector, an oscillator and a Mediator Controller for the special role of the Mediator. In standby mode, both oscillator and Mediator Controller are power gated with a high Vth power gating header. The power gating is controlled by the Activity Detector which releases power.
gating when there is a wake up request on the bus so that the oscillator can provide clock for new message transaction.

4. Clock Gating for Low Active Power

When an MBus message is transmitted on the bus, target address of the message is parsed by the Bus Controller on each node. Both CLK_IN and D_IN need to be forwarded to Bus Controller logic at this point. Once the target address is parsed, and if it is determined that the current message is not addressed to the node that the Bus Controller belongs to, CLK_IN and D_IN input to the Bus Controller is gated until the end of current message to reduce dynamic energy consumption by 23%. CLK_IN and D_IN are still forwarded to next node through CLK_OUT and D_OUT so that target layer can still receive the message.

5. Co-operation with PMU

In standby mode, member nodes forward both DATA and CLK. The Mediator breaks the loop by fixing both CLK_OUT and D_OUT high in standby mode. To minimize standby power, all MBus components except the frontend (i.e. Muxes in Fig. 5) are power-gated. In this state, PMU (Fig. 6) is in its lowest power mode, where conversion efficiency is optimized for low loading current (i.e. 10 nA). To transmit a full message (8 or 32-bit address and arbitrary length data), the PMU must switch to a high power mode before the MBus power gates are released since the lowest power mode cannot sustain power supply with power gate released. In this high power mode, the PMU is optimized for delivering 10s of μW of power for full bus operation.

Mode transition from low power mode to high power mode could be done in the following procedures. Power gates are not released initially and D_IN/D_OUT are left in a high state in standby mode. Wakeup is then initiated by a node pulling D_OUT low, which consumes negligible power. This falling edge is propagated along the ring of DATA until it is detected by the Mediator which switches the PMU to high-power mode. After the PMU completes the state transition, the mediator starts to propagate clock edges through DATA. The first four edges are used by the regular nodes to sequentially release power gates, clock, isolation gates, and reset, at which point the member node becomes fully active.

6. PMU Design

To demonstrate MBus co-operation with PMU, a switched-capacitor DC-DC converter is designed to accommodate low and high power mode operation requirement. The PMU down-converts 3.8 V thin-film lithium battery output to 1.2 V or 0.6 V low voltage supply for low power operation as shown in Fig. 6. The system can fail by losing supply voltage if the DC-DC converter cannot provide sufficient required current. The amount of current that can be provided by the switched-capacitor DC-DC converter is proportional to flying capacitance and switching frequency. Therefore, to implement low and high power mode, switching frequency is controlled with mode. In low power mode, switching frequency is lowered as low as 340 Hz to minimize switching loss of DC-DC converter and improve conversion efficiency with low standby power as low as a few nW. In high power mode, switching frequency as high as 335 kHz is used to provide 10s of μW of currents. An example of the PMU design achieves conversion efficiency of 60.7% and 63.8% for low and high power mode, respectively [18].

7. Robust Timing

In an MBus-based system, the loading and driving strength of D_OUT/CLK_OUT drivers on each node can be unpredictable due to irregular wirebonding and process variation. This creates uncertainty in the relative arrival
time of $D_{IN}/CLK_{IN}$. Therefore, with conventional single-phase clocking scheme, a large number of hold-time buffers would be required to prevent a hold time violation, which would incur power and performance penalties. In the proposed MBus, driving and latching edges are separated as shown in Fig. 7. By sampling $D_{IN}$ on positive $CLK_{IN}$ edge and driving $D_{OUT}$ on negative $CLK_{IN}$ edge, setup and hold time margins are balanced. While this incurs a performance penalty, it makes hold time scale with frequency, guaranteeing robust operation with post-silicon frequency tuning.

**IV. Measurement Results**

MBus is implemented in six chips (three in a 180 nm CMOS process shown in Fig. 8) in three different technologies and two FPGA fabrics, and they all interoperate without error and with no need for tuning. MBus can achieve higher than 10 Mb/s communication performance, which is limited by test configuration. An MBus member node is implemented with 227 sequential and 2900 combinational logic cells, occupying 37.2 $\mu$m$^2$.

Fig. 9 shows how MBus seamlessly operates with power mode conversion in sensor system. In the scenario shown in Fig. 9(a), the system wakes up for 10 ms to initiate low power temperature measurement. While temperature is measured, the system stays at low power mode. Once the measurement is done, the system wakes up for 20 ms to process and store data and then re-enters sleep mode. Fig. 9(b) show that, upon wake up request (DATA pull-down), PMU switches to high power mode and rest of the MBus message is handled when power is stable (PMU draws >10 $\mu$A). Fig. 9(c) shows that, after sleep request message, PMU switches to low power mode after ~4 ms of preparation time for member nodes.

Table 2 summarizes the performance of the proposed design and compares to the previous works. The MBus Mediator consumes 27.5 pJ/bit when sending a message. MBus member node consumes 22.7 pJ/bit and 17.6 pJ/bit for receiving and forwarding a message, respectively. Compared to the low power I$^2$C variant demonstrated in [8], MBus achieves 23% energy saving for the 3-layer
system at 617 kHz, by minimizing flop switching and number of oscillators. This saving further increases with additional layers since an additional MBus layer only consumes 17.6 pJ/bit whereas an additional layer in I2C-Variant system would consume approximately 29.3 pJ/bit due to additional oscillator with it. Note that these active energy consumption numbers are two orders of magnitude lower than traditional I2C.

The entire system consumes 8 nW in standby mode, but this number is mainly dominated by other components such as an optical wakeup receiver, a low-power mode PMU, a battery supervisor, and 3 kB SRAM.

### IV. CONCLUSION

Today’s emerging sensing platform needs a bus interconnect that addresses area and energy constraints rather than focusing on increasing performance or bandwidth. In this paper, MBus, a new serial interconnect is proposed to addresses inter-chip communication requirements for the next generation of ultra-low power, millimeter-scale wireless sensor nodes. A complete millimeter-scale modular system is presented, which consists of sensors, a processor, and a radio connected with MBus. MBus offers 8 nW standby power, 17.6 pJ/bit/chip energy consumption, 4 I/O pad count, is fully synthesizable, and supports multi-master operation, all with robust timing. These features open the door to modular, pervasive computing systems.

### ACKNOWLEDGMENTS

This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning (2016R1C1B2009047). The authors acknowledge CubeWorks Inc. for system level testing and integration.

### REFERENCES


Inhee Lee received the B.S. and M.S. degrees in electrical and electronic engineering from Yonsei University, Seoul, Korea, in 2006 and 2008, respectively, and the Ph.D. degree from the University of Michigan, Ann Arbor, MI, USA, in 2014. He is currently a research scientist at the University of Michigan. His research interests include energy harvesters, power management circuits, battery monitoring circuits, and low-power sensing systems for IoT applications.

Ye-Sheng Kuo received the B.S degree in electrical engineering from National Taiwan University, Taipei, in 2008, and Ph.D degree in electrical engineering in the University of Michigan, Ann Arbor, in 2015. His research interests include embedded system, sensor network and visible light communication.

Pat Pannuto received the B.S. degree in computer engineering from the University of Michigan, Ann Arbor, in 2012. He is currently pursuing the Ph.D. degree with the University of Michigan, Ann Arbor, MI, USA. His current research interests include systems design for millimeter-scale systems and technologies for high-fidelity indoor localization.

Gyuhoo Kim received B.S., M.S., and Ph.D degrees in electrical engineering from the University of Michigan, Ann Arbor, in 2009, 2011, and 2014, respectively, where he is currently holding a post-doctoral research fello position. His research focuses on ultra-low power VLSI design for energy-constrained systems.

ZhiYoong Foo received his B.S, M.S., and PhD degrees in Electrical Engineering from University of Michigan. His research includes low cost and low power VLSI circuit systems integration. He is currently heading CubeWorks Inc., a startup spun out of University of Michigan commercializing ultra low power systems.

Ben Kempke received the B.S.E. degree in computer engineering and the M.S.E. degree in computer science and engineering from the University of Michigan in 2009 and 2010, respectively. He is currently pursuing the Ph.D. degree in same institution. His research interest include the design of low-power and high-accuracy indoor RF localization technologies.
Seokhyeon Jeong received the B.S degree in electrical engineering from the Korea Institute of Science and Technology (KAIST), South Korea, in 2011. He is currently pursuing the Ph.D. degree in electrical engineering from the University of Michigan. His research interests include subthreshold circuit designs, ultra-low power sensors, and the design of mm-scale computing systems.

Yejoong Kim received his B.S., in Electrical Engineering from Yonsei University, South Korea, in 2008, and M.S., and Ph.D. degrees from the University of Michigan, Ann Arbor, in 2012 and 2015, respectively, all in Electrical Engineering. He is currently a Research Fellow at the University of Michigan, working on ultra-low power system designs.

Prabal Dutta received the B.S. degree in electrical and computer engineering, in 1997, the M.S. degree in electrical engineering, in 2004, from the Ohio State University, and the Ph.D. degree in computer science from University of California, Berkeley, in 2009, where he was a National Science Foundation and Microsoft Research Graduate Fellow. Since 2009, he has been an Assistant Professor of Electrical Engineering and Computer Science at the University of Michigan.

David Blaauw received his B.S. in Physics and Computer Science from Duke University in 1986, and his Ph.D. in Computer Science from the University of Illinois, Urbana, in 1991. After his studies, he worked for Motorola, Inc. in Austin, TX, where he was the manager of the High Performance Design Technology group. Since August 2001, he has been on the faculty at the University of Michigan where he is a Professor. He has published over 450 papers and holds 40 patents. His work has focused on VLSI design with particular emphasis on ultra-low power and high performance design. He is an IEEE Fellow.

Yoonmyung Lee received a B.S. degree in Electronic and Electrical Engineering from the Pohang University of Science and Technology (POSTECH), Pohang, Korea, in 2004, and his M.S. and Ph.D. degree from the University of Michigan in 2008 and 2012, respectively. He was a recipient of Samsung Scholarship and Intel Ph.D. fellowship. From 2012 to 2014, Dr. Lee was with University of Michigan as research faculty and recently joined Sungkyunkwan University (SKKU), Korea, as an Assistant Professor. His research interests include energy-efficient integrated circuits and millimeter-scale wireless sensor systems design.