Pioneer schools venturing into graduate and undergraduate data science degrees or modifying existing statistics curricula are setting the trends on what a company may require and that effort will end up making the job descriptions clearer. And they are doing so at high speed.
Graduate programs started mushrooming after Thomas H. Davenport and D.J. Patil cheerfully reiterated in a 2012 Harvard Business Review article that the sexiest job of the 21st century is data science, but also gloomingly acknowledged that there were no university programs offering degrees. As if predicting the rise of Patil to CDS of the White house, many universities started offering a well-defined Master in Data Science and others initiated a specialised data science undergraduate program. Business schools on the other hand, started offering MBAs in Business Analytics, to emphasise the application of data Science to business.
Take, for example, the online Professional Master of Information and Data Science at the University of California at Berkeley, the most comprehensive there is so far, and featured, among others, in a July 2015 article of AMSTAT News, a magazine of the American Statistical Association. The curriculum contains 5 main modules at a basic level, three of which can be identified with the typical offerings of a master in Statistics. But two of those modules are not offered in conventional master programs. One of them is on storing and retrieving data (Python, Relational databases, Hadoop, Map reduce, Spark , Cloud Computing (AWS)) and the other is on applied machine learning (Python libraries for linear algebra, plotting, machine learning: numpy, matplotlib, sk-learn, Github for submitting project code). The program also offers advanced modules on scaling up really big data (OpenStack [Heat], Distributed Filesystems,Apache Hadoop, Apache Spark [Dstreams], SaltStack, Ansible, CouchDB, Cloundant, CloudSoft Brooklyn, Swift, Apache Solr, Apache Mesos, Open MPI, Computational Genomics, IBM Watson) , parallel computing and advanced statistics. But Berkeley is not alone.
Master’s degrees in data science have gained so much prominence that they are already being ranked, or at least the top 25 are. The same occurs with MBAs in Business Analytics. Ranked high are, in addition to Berkeley, the Master of Science in Data Science at Columbia University, which has curriculum similar to that of Berkeley and is the product of the collaboration among 6 different departments, the master of data science at New York University and the one at the University of Virginia, to name a few. As indicated, when attached to a school of business, data science is masked under the name Business Analytics, or Big data. The number of web sites compiling lists of universities where to get such degrees are booming. Regardless of the name used (Big Data, Analytics, Data Science), all these degrees share content knowledge similar to that found in the Master at Berkeley, with variation in the extent to which they cover each of the areas. And we should not neglect to notice the proliferation of open online sites such as Coursera, which offers a nine-course introduction to data science. All this may seem like a very fast response of universities and open education to the shortage of data scientist. However, many universities have not yet initiated a degree program called data science.
Not jumping in the bandwagon of Master of Data Science or a MBA in Business (or data) Analytics does not mean they are not making great efforts to look like they are. Consider for example Stanford University, which has created a special track on data science in their regular Master of Statistics without changing the name of the program. This track consists of 18 units of electives covering the areas of data science that full-fledged data science master’s programs offer. Harvard, to give another example, has a very popular online course called Big Data Analysis and has a web page in the statistics department dedicated to data science, where they claim that they pioneered data science. Harvard and MIT offer open courses in many areas relevant to data science as well. But graduate programs are not alone.
Many other statistics departments have changed their undergraduate curriculum to incorporate courses on big data, computing and machine learning. To name a few, in the April 2015 issue of Amstat News, Purdue University, the University of Florida, the University of California Davis, the University of Illinois at Urbana-Champaign, and the University of California at Berkeley acknowledge changing their curriculum to adapt to the data science era. But the real indication that data science and big data are here to stay is the growth in the number of colleges offering undergraduate in not statistics but data science. This is happening just as the number of statistics departments is increasing and new elementary, middle and high school statistics curriculum is implemented. Data science curriculum will be knocking at middle school teachers’ doors before they have time to learn how to cope with the new statistics curriculum.
Some universities, however, have realised that simply tweaking existing or new undergraduate statistics program is not considered enough to prepare students for data science. That is why new data science undergraduate degrees have started to populate universities, all of them encompassing the core of computer science, statistics and mathematics in a well-integrated program. In an article featured in the July 2015 issue of Amstat News, Northern Kentucky University, the University of California at Irvine, Winona State University, the University of Nottingham and Warwick University explained their data science majors. David Hodge, Uwe Alckelin, Christian Wagner and Ian Drydenis, of Nottingham see data science as the newborn sister of mathematics, computational sciences and statistics. Housed in the Computer Science and Mathematics departments, they put together a curriculum similar to that of Berkeley, but at an undergraduate level. So did Warwick, motivated by the growing demand for those skills in the job market. The creators of the program at Winona State claim that “data science is not statistics.” Their program was motivated by the need to help those undergraduate students seeking employment, and it also intends to have their students think like both computer science and mathematics. Having a major in Statistics with a minor in Computer Science or vice versa is not enough training. Students from now on will have to think like a statistician, a computer scientist and a business person, all at once. Training to make them do so requires a degree in data science.
The reasoning behind all the changes in graduate and undergraduate programs and curricula is that most likely a data scientist will be working as part of a team on a project, and being comfortable in communicating statistical and computer science methodology will be invaluable in business and consulting settings. To properly address the challenges in Big Data takes knowledge and experience in more than one area, and that’s what integrated, comprehensive undergraduate and graduate degree in data science are trying to prepare students for. New degrees built in touch with employers emphasise many of the “algorithmic components” of a traditional computer science degree (algorithms, data structures, programming, data management, software engineering), but will be combined with a large number of courses in statistics and machine learning (to a much more significant degree than would be in a traditional computer science degree). The statistical skill will always be there as employers need to answer the key questions statisticians always ask: When can you generalise your results to a larger population? What assumptions are we making when using statistical methods? How can we inherently understand and quantify variability and uncertainty? As the field of data science emerges and evolves, the best data scientists will be those with solid foundations in both statistical thinking and computational skills.
Institute of Science and Technology Austria
October 30, 2020
November 04, 2020
ESSEC Business School
November 29, 2020
November 11, 2020
October 28, 2020
The Health Foundation
November 01, 2020
Barcelona Graduate School of Economics
December 31, 2020
Institute of Statistical Science, Academia Sinica
December 27, 2020