A Complete Data Science Work-flow For Insurance Field

2018 
In recent years, "Big Data" has become a new ubiquitous term. Big Data is transforming science, engineering, medicine, health-care, finance, business, and ultimately our society itself. Learning from Big Data has become a significant challenge and requires development of new types of algorithms. Most machine learning algorithms can not easily scale up to Big Data. MapReduce is a simplified programming model for processing large datasets in a distributed and parallel manner. In this paper, we present our work carried in a big data project 1 which is dedicated to the insurance sector. This allows us to validate our method on real-world data for insurance. We present the complete pipeline or work-flow going from data collection to visualization, passing by data fusion, data analysis, clustering, and prediction tasks. The insurance dataset is enriched with data collected from heterogeneous sources. A predictive and analysis system is proposed by combining the clustering result with decision trees. We use the topological approach, especially the SOM method, for its interest in being able to cluster and visualize the data at the same time. We make the source code of our SOM-MapReduce algorithm, written with Spark using the MapReduce paradigm, publicly available 2 .
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    16
    References
    0
    Citations
    NaN
    KQI
    []