# Data Science — definition and structural representation

## DOI:

https://doi.org/10.20535/SRIT.2308-8893.2021.1.05## Keywords:

Data Science, Drew Conway’s Data Science Venn Diagram, Data Science definition, Data Science structure, data, information, knowledge## Abstract

This article is a continuation of the discussion on the existing meanings and formalization of the definition of “Data Science” as an autonomous discipline, field of knowledge, clarification of its defining components, integration, and interaction processes between them. It is noted that most scientific results trace the data-centric nature of the presentation and analysis of this discipline, i.e. the emphasis on the word Data. Analysis of the frequency of use of key terms in the definitions of Data Science shows what our colleagues focus on, which terms of the definitions of Data Science they are based on. In this paper, we make and argue certain additions to Drew Conway’s Data Science Venn Diagram, which does not reflect all the resources of the components that define the applied side of Data Science, and, moreover, does not reveal the interaction of these resources not from the point of view of the data researcher, nor in its global understanding. We also propose a unified structural representation of Data Science in the format of an updated Drew Conway’s Venn diagram based on a property/attribute that establishes correspondences that provide integration/interoperability between the elements of the sets of Drew Conway’s Venn diagram. The new definition of Data Science as an interdisciplinary science and methodology of presenting activities for analysis and extraction of data, information, and knowledge is substantiated.

## References

Thomas Davenport and D.J. Patil, “Data Scientist: The Sexiest Job of the 21st Century”, Harvard Business Review, October 2012.

Drew Conway, “The Data Science Venn Diagram”, Personal blog. September 30, 2010.

Cathy O’Neil and Rachel Schutt, Doing data science: Straight talk from the frontline. O’Reilly Media, Inc., 2013.

Vasant Dhar, “Data science and prediction”, Communications of the ACM, 56.12, pp. 64–73, 2013.

Jake Vanderplas, Python data science handbook: Essential tools for working with data. O’Reilly Media, Inc., 2016.

Annalyn Ng and Kenneth Soo, “Data Science for the Layman: No Math Added”, Numsense!, 2017.

Provost Foster and Tom Fawcett, Data Science for Business: What you need to know about data mining and data-analytic thinking. O’Reilly Media, Inc., 2013.

Provost Foster and Tom Fawcett, “Data science and its relationship to big data and data-driven decision making”, Big data, 1.1, pp. 51–59, 2013.

Matthew A. Waller and Stanley E. Fawcett, “Data science, predictive analytics, and big data: a revolution that will transform supply chain design and management”, Journal of Business Logistics, 34.2, pp. 77–84, 2013.

Bohdan Pavlyshenko, “Subjective view on Data Science in Ukraine”, dou.ua article, January 9, 2017.

Jeff Leek, “The key word in “Data Science” is not Data, it is Science”, Simply Statistics, December 12, 2013.

J.W. Tukey, “Sunset salvo”, The American Statistician, 40(1), pp. 72–76, 1986.

J.W. Tukey, Exploratory data analysis, 1977.

J.W. Tukey, “The future of data analysis”, The annals of mathematical statistics, 33(1), pp. 1–67, 1962.

N.J. Nilsson, The quest for artificial intelligence. Cambridge University Press, 2009.

A.L. Samuel, “Some studies in machine learning using the game of checkers”, IBM Journal of research and development, 3(3), pp. 210–229, 1959.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning”, Nature, 521(7553), pp. 436–444, 2015.

P. Russom, “Big data analytics”, TDWI best practices report, fourth quarter, 19(4), pp. 1–34, 2011.

C.H. Chen, W.K. Härdle, and A. Unwin (Eds.), Handbook of data visualization. Springer Science & Business Media, 2007.

R. Ihaka and R. Gentleman, “R: a language for data analysis and graphics”, Journal of computational and graphical statistics, 5(3), pp. 299–314, 1996.

M. Abadi et al., “Tensorflow: A system for large-scale machine learning”, in 12th {USENIX} symposium on operating systems design and implementation ({OSDI} 16), pp. 265–283, 2016.

D.J. Higham and N.J. Higham, MATLAB guide. Society for Industrial and Applied Mathematics, 2016.

B. Maxfield, Essential PTC® Mathcad Prime® 3.0: A guide for new and current users. Academic Press, 2013.

J.P.M. De Sá, Applied statistics using SPSS, Statistica, MatLab and R. Springer Science & Business Media, 2007.

R. Collobert, S. Bengio, and J. Mariéthoz, Torch: a modular machine learning software library (No. REP_WORK). Idiap, 2002.

X. Meng et al., “Mllib: Machine learning in apache spark”, The Journal of Machine Learning Research, 17(1), pp. 1235–1241, 2016.

H. Wimmer and L.M. Powell, “A comparison of open source tools for data science”, Journal of Information Systems Applied Research, 9(2), pp. 4, 2016.

A. Gulli and S. Pal, Deep learning with Keras. Packt Publishing Ltd., 2017.

F. Pedregosa et al., “Scikit-learn: Machine learning in Python”, Journal of machine Learning research, 12, pp. 2825–2830, 2011.

E. Loper and S. Bird, “Nltk: The natural language toolkit”, arXiv preprint cs/0205028, 2002.

C. Adams, Learning Python data visualization. Packt Publishing Ltd., 2014.

C. Rossant, Learning IPython for interactive computing and data visualization. Packt Publishing Ltd., 2013.

A.N. Kolmogorov and S.V. Fomin, Introductory real analysis. Courier Corporation, 1975.