Big Data Trip Classification on the New York City Taxi and Uber Sensor Network

2018 
Millions of trips are made every day by taxis and Uber in New York City. We first employ big data technologies to analyze this vast dataset: Apache Spark is used for data processing and classification, Apache Hive is used for data storage, and MapReduce is used for data profiling. Since taxis and Uber are equipped with GPS sensors, we then visualize a mobile sensor network over New York City separated into fine-sized regions each acting as a mobile sensing node. Each location on the network falls into a region and is classified into one of three categories based on which service dominates the particular region: Yellow taxi, Green taxi, or Uber. We utilize logistic regression to classify a region into one of the three categories. Our classification algorithm is then used to analyze the interaction between taxi and Uber, for example to quantify the expansion of Uber. Experiments run on the Spark cluster show our classifier achieves an accuracy of over 85% scored on the 2014 taxi and Uber dataset. Finally, we propose a trip recommendation system for users using classification results together with a web service application.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    0
    References
    3
    Citations
    NaN
    KQI
    []