An Energy-Efficient Matching Accelerator Using Matching Prediction for Mobile Object Recognition

Seongrim Choi, Hwanyong Lee, and Byeong-Gyu Nam

Abstract—An energy-efficient object matching accelerator is proposed for mobile object recognition based on matching prediction scheme. Conventionally, vocabulary tree has been used to save the external memory bandwidth in object matching process but involved massive internal memory transactions to examine each object in a database. In this paper, a novel object matching accelerator is proposed based on matching predictions to reduce unnecessary internal memory transactions by mitigating non-target object examinations, thereby improving the energy-efficiency. Experimental results show a 26% reduction in power-delay product compared to the prior art.

Index Terms—Object recognition, object matching, vocabulary tree, matching prediction, matching accelerator

I. INTRODUCTION

Recently, object recognition becomes widespread in mobile multimedia applications and plays a key role in augmented reality and autonomous robots [1]. Object matching is one of the major parts in this object recognition process [1] and selects the closest item with input object from the object database (DB). It involves massive accesses to external memory containing the DB [1], and therefore, energy efficient design of this object matching is of the most importance in mobile vision applications [2]. A variety of object matching algorithms have been studied, and recently, the vocabulary tree (VT) is gaining attentions for its efficient use of external memory bandwidth [3, 4]. However, its energy dissipation is still demanding as it involves considerable transactions to internal memory. In this paper, we propose a novel object matching accelerator based on our proposed matching prediction scheme to reduce its internal memory transactions triggered by the investigations on non-target objects. The proposed matching prediction scheme exploits the temporal coherence of consecutive image frames. Therefore, we can match the input object by utilizing the previous matching result, which is based on the similarity between consecutive frames. As a result, our object matching accelerator shows 26% improvement in power-delay product (PDP) from the previous art [4].

II. MATCHING PREDICTION SCHEME

One of the most challenging issues in object matching is its huge external memory bandwidth required in accessing object DB [1]. Vocabulary tree (VT) is well known for its efficient use of memory bandwidth by quantizing the key points of objects in the DB, which makes the entire DB fit in the on-chip memory [1, 4]. However, the VT-based object matching incurs considerable transactions to internal memory as it compares input object with the entire DB. Therefore, we present a matching prediction scheme to find the target object without going through the entire DB by exploiting the temporal coherence associated with consecutive frames. Our scheme matches the input object if it is close enough to the object from previous frame. In this way,
we can reduce the internal memory transactions significantly by removing the examinations to non-target objects.

Fig. 1(a) shows the proposed matching prediction unit. In the VT framework, each object in the DB is scored based on its difference with input object. In this architecture, the prediction gets hit when the scores of the input object and the matched one from previous frame are close enough, as shown in Fig. 1(b). In the hit case, input object is matched using the previous matching result without examining the non-target objects. On the other hand, when the prediction does not get hit, input object is scored by going through the entire DB. Scoring of the input object is carried out periodically for the entire DB as the false matching can arise repeatedly when a misprediction happens. We set the period to 15 frames as the false matching for such a short term is unnoticeable to human visual system. The misprediction rate of this matching prediction scheme is as low as 2.6% as shown in Table 1, thus input object can be matched with a very high accuracy without scoring the non-target objects in the DB. Therefore, the proposed matching prediction scheme reduces internal memory transactions by 65.9% compared to the conventional object matching strategy.

The overall architecture of the proposed object matching accelerator is illustrated in Fig. 2. It consists of four major components; VT search, matching prediction, score generation, and max score unit. The VT search unit finds several matching candidates with the smallest distance to the input, and then the score generation unit scores these matching candidates compared to the input based on their geometry information such as coordinates and orientation. Finally, the max score unit chooses the object with the highest score as the final matching result. The proposed prediction unit is placed between the VT search and score generation unit and gates the score generation of non-target objects to reduce on-chip bandwidth based on the prediction result.

### III. Experimental Results

The proposed object matching accelerator exploiting the matching prediction scheme is fabricated using 65 nm CMOS process. A die photograph and chip characteristics are described in Fig. 3. The experimental results demonstrate 6.8 mW average power and 6.7 ns critical path delay. The proposed prediction scheme reduces internal memory transactions by 65.9% at a 1.2% increase in area compared to conventional object matching scheme, as illustrated in Fig. 4. Table 2 presents the PDP comparison to the state-of-the-art object matching accelerator. The proposed object

<table>
<thead>
<tr>
<th>Object Categories</th>
<th>Size of Dataset</th>
<th>Misprediction rate (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Doll</td>
<td>220</td>
<td>1.7</td>
</tr>
<tr>
<td>Fruit</td>
<td>60</td>
<td>2.9</td>
</tr>
<tr>
<td>Kitchen Supplies</td>
<td>52</td>
<td>1.9</td>
</tr>
<tr>
<td>Book</td>
<td>980</td>
<td>3.5</td>
</tr>
<tr>
<td>Office Supplies</td>
<td>36</td>
<td>3.2</td>
</tr>
</tbody>
</table>
A novel object matching accelerator exploiting matching prediction is presented for mobile object recognition applications. The proposed object matching accelerator demonstrates a 26% reduction in PDP from state-of-the-art.

**ACKNOWLEDGMENTS**

This work was supported by research fund of Chungnam National University.

**REFERENCES**


**Seongrim Choi** received the B.S and M.S degrees in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2011 and 2013, respectively, where he is currently working toward the Ph.D. degree. His current research interests include object recognition processor and wearable SoC.

**Hwanyong Lee** received the B.S and M.S degrees in computer science and engineering from the Chungnam National University (CNU), Daejeon, in 2014 and 2016, respectively. He is currently with SiliconWorks working on display T-CON design. His research interests include mobile GPU and embedded processor design.
Byeong-Gyu Nam received his B.S. degree (*summa cum laude*) in computer engineering from Kyungpook National University, Daegu, Korea, in 1999, M.S. and Ph.D. degrees in electrical engineering and computer science from Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea, in 2001 and 2007, respectively. His Ph.D. work focused on low-power GPU design for wireless mobile devices. In 2001, he joined Electronics and Telecommunications Research Institute (ETRI), Daejeon, Korea, where he was involved in a network processor design for InfiniBand™ protocol. From 2007 to 2010, he was with Samsung Electronics, Giheung, Korea, where he worked on world first low-power 1-GHz ARM Cortex™ microprocessor design. Dr. Nam is currently with Chungnam National University, Daejeon, Korea, as an associate professor. He is serving as a vice director of the System Design Innovation and Application Research Center (SDIA), KAIST and a member of steering committee of the IC Design Education Center (IDEC), KAIST. His current interests include mobile GPU, embedded microprocessor, low-power SoC design, and embedded software platforms. He co-authored the book Mobile 3D Graphics SoC: From Algorithm to Chip (Wiley, 2010) and presented tutorials on mobile processor design at IEEE ISSCC 2012 and IEEE A-SSCC 2011. He is serving as the chair of Digital Architectures and Systems (DAS) subcommittee in ISSCC and a member of the TPC for IEEE ISSCC, IEEE A-SSCC, IEEE COOL Chips, VLSI-DAT, ASP-DAC, and ISOCC. He served as a Guest Editor of the IEEE Journal of Solid-State Circuits (JSSC) and is an Associate Editor of the IEIE Journal of Semiconductor Technology and Science (JSTS).