Focus: Querying Large Video Datasets With Low Latency And Low Cost

Authors:
Kevin Hsieh Carnegie Mellon University
Ganesh Ananthanarayanan Microsoft
Peter Bodik Microsoft
Shivaram Venkataraman Microsoft
Paramvir Bahl Microsoft
Matthai Philipose Microsoft
Phillip B. Gibbons Carnegie Mellon University
Onur Mutlu ETH Zurich

Introduction:

the authors present Focus, a system providing both low-cost and low-latency querying on large video datasets. Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering “after the fact” queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video.At ingest time (on live videos), Focus uses cheap convolutional network classifiers (CNNs) to construct an approximate index of all possible object classes in each frame (to handle queries for any class in the future).

Abstract:

Large volumes of videos are continuously recorded from cameras deployed for traffic control and surveillance with the goal of answering “after the fact” queries: identify video frames with objects of certain classes (cars, bags) from many days of recorded video. Current systems for processing such queries on large video datasets incur either high cost at video ingest time or high latency at query time. We present Focus, a system providing both low-cost and low-latency querying on large video datasets. Focus’s architecture flexibly and effectively divides the query processing work between ingest time and query time. At ingest time (on live videos), Focus uses cheap convolutional network classifiers (CNNs) to construct an approximate index of all possible object classes in each frame (to handle queries for any class in the future). At query time, Focus leverages this approximate index to provide low latency, but compensates for the lower accuracy of the cheap CNNs through the judicious use of an expensive CNN. Experiments on commercial video streams show that Focus is 48× (up to 92×) cheaper than using expensive CNNs for ingestion, and provides 125× (up to 607×) lower query latency than a state-of-the-art video querying system (NoScope).

You may want to know: