Data science

Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data. Data science is the same concept as data mining and big data: 'use the most powerful hardware, the most powerful programming systems, and the most efficient algorithms to solve problems'. Data science is a 'concept to unify statistics, data analysis, machine learning and their related methods' in order to 'understand and analyze actual phenomena' with data. It employs techniques and theories drawn from many fields within the context of mathematics, statistics, computer science, and information science. Turing award winner Jim Gray imagined data science as a 'fourth paradigm' of science (empirical, theoretical, computational and now data-driven) and asserted that 'everything about science is changing because of the impact of information technology' and the data deluge. In 2015, the American Statistical Association identified database management, statistics and machine learning, and distributed and parallel systems as the three emerging foundational professional communities. In 2012, when Harvard Business Review called it 'The Sexiest Job of the 21st Century', the term 'data science' became a buzzword. It is now often used interchangeably with earlier concepts like business analytics, business intelligence, predictive modeling, and statistics. Even the suggestion that data science is sexy was paraphrasing Hans Rosling, featured in a 2011 BBC documentary with the quote, 'Statistics is now the sexiest subject around.' Nate Silver referred to data science as a sexed up term for statistics. In many cases, earlier approaches and solutions are now simply rebranded as 'data science' to be more attractive, which can cause the term to become 'dilute beyond usefulness.' While many university programs now offer a data science degree, there exists no consensus on a definition or suitable curriculum contents. To its discredit, however, many data-science and big-data projects fail to deliver useful results, often as a result of poor management and utilization of resources. The term 'data science' has appeared in various contexts over the past thirty years but did not become an established term until recently. In an early usage, it was used as a substitute for computer science by Peter Naur in 1960. Naur later introduced the term 'datalogy'. In 1974, Naur published Concise Survey of Computer Methods, which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications. The modern definition of 'data science' was first sketched during the second Japanese-French statistics symposium organized at the University of Montpellier II (France) in 1992. The attendees acknowledged the emergence of a new discipline with a specific focus on data from various origins, dimensions, types and structures. They shaped the contour of this new science based on established concepts and principles of statistics and data analysis with the extensive use of the increasing power of computer tools. In 1996, members of the International Federation of Classification Societies (IFCS) met in Kobe for their biennial conference. Here, for the first time, the term data science is included in the title of the conference ('Data Science, classification, and related methods'), after the term was introduced in a roundtable discussion by Chikio Hayashi. In November 1997, C.F. Jeff Wu gave the inaugural lecture entitled 'Statistics = Data Science?' for his appointment to the H. C. Carver Professorship at the University of Michigan.In this lecture, he characterized statistical work as a trilogy of data collection, data modeling and analysis, and decision making. In his conclusion,he initiated the modern, non-computer science, usage of the term 'data science' and advocated that statistics be renamed data science and statisticians data scientists.Later, he presented his lecture entitled 'Statistics = Data Science?' as the first of his 1998 P.C. Mahalanobis Memorial Lectures. These lectures honor Prasanta Chandra Mahalanobis, an Indian scientist and statistician and founder of the Indian Statistical Institute. In 2001, William S. Cleveland introduced data science as an independent discipline, extending the field of statistics to incorporate 'advances in computing with data' in his article 'Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics,' which was published in Volume 69, No. 1, of the April 2001 edition of the International Statistical Review / Revue Internationale de Statistique. In his report, Cleveland establishes six technical areas which he believed to encompass the field of data science: multidisciplinary investigations, models and methods for data, computing with data, pedagogy, tool evaluation, and theory.

Parent Topic

Child Topic

No Parent Topic