|Qiang Li||Beijing Jiaotong University, P.R. China|
|Xuan Feng||University of Chinese Academy of Sciences, P.R. China|
|Haining Wang||University of Delaware, USA|
|Zhi Li||Institute of Information Engineering, Chinese Academy of Sciences, P.R. China|
|Limin Sun||Institute of Information Engineering, China Academy of Science, Beijing, P.R. China|
An increasing number of embedded devices are connecting to the Internet at a surprising rate. Those devices usually run firmware and are exposed to the public by device search engines. Firmware in embedded devices comes from different manufacturers and product versions. More importantly, many embedded devices are still using outdated versions of firmware due to compatibility and release-time issues, raising serious security concerns. In this paper, we propose generating fine-grained fingerprints based on the subtle differences between the filesystems of various firmware images. We leverage the natural language processing technique to process the file content and the document object model to obtain the firmware fingerprint. To validate the fingerprints, we have crawled 9,716 firmware images from official websites of device vendors and conducted real-world experiments for performance evaluation. The results show that the recall and precision of the firmware fingerprints exceed 90%. Furthermore, we have deployed the prototype system on Amazon EC2 and collected firmware in online embedded devices across the IPv4 space. Our findings indicate that thousands of devices are still using vulnerable firmware on the Internet.