Data Science and machine learning are two wonderful and exciting disciplines and are a great part of our lives. Sometimes people confuse them, but they are quite different things.
What data science is
Data Science is, like the name suggests, the science of data. It’s a set of techniques and tools that make the data scientist extract information behind data. Such a mining process can be done using statistical tools or mathematical models. Most of the time, a data scientist uses data visualization techniques. Visualizing something is a good way to understand it and when we understand something, we are gathering information. That’s the purpose of data science: to extract information.
A good data scientist must master math, statistics, programming and data analysis, focusing particularly on visualization. Information can be extracted without making use of complex algorithms. On the contrary, simplicity is always a good choice when it comes to explaining phenomena starting from the data they produce.
Data science is not applying machine learning models blindly looking for the highest accuracy possible, it’s not bypassing the exploratory data analysis because it’s boring and the boss has the hunger for results. In fact, you can produce excellent data science deliverables without training anything. For example, a good correlation matrix can be a very powerful deliverable for a non-technical manager. It helps people understand how correlated our features are and this is business information worth the time used to extract it. The model is not necessary. I talk about Exploratory Data Analysis in my free online course, because I believe that it’s a complete and useful data science process that can give us very valuable deliverables.
All this stuff gives a great business value even without models. In my experience, I’ve given several deliverables of pure data science without delivering a single model and the result was always astonishing for my clients. So, data science is not always related to machine learning and it can produce good results by itself.
What machine learning is
Machine learning is the art of making machines learn to perform tasks they haven’t explicitly been made for. It’s a part of data science and we refer to it as the process of model building. It’s true both for supervised and unsupervised models.
Machine learning can be done without data science, using for example an AutoML tool, but it’s something I don’t suggest. No algorithm can ever extract the information that can be used to train a model without human control. Let’s remember that an algorithm is fed by data in order to make a mathematical representation of the information (i.e. the model), so performing machine learning without the proper data science steps in advance may produce wrong results, slow training procedures and, worst, no idea of the really important features of our dataset.
My personal opinion is that yes, you can perform machine learning without data science, but if you do it, you fail. Instead, data science and machine learning can be done together by constantly remembering that we’re not building a model for making predictions, but for understanding the information behind data. Following this approach, we choose simple models that are easy to interpret, we apply a proper feature selection and a deep feature importance analysis, we remove redundant features and keep only the relevant attributes of our dataset. In this way, machine learning becomes meaningful for a business understanding purpose and not only a technical aid to predict frauds or customer churn.
Conclusions
In my opinion, data science can easily exist without machine learning and machine learning can exist without data science. However, only by performing data science and machine learning together we can reach the highest results. Machine learning can be a very useful and monetizable way to exploit the knowledge of data given by data science and data science can lead machine learning to better and better models. Never forget this next time you work on a data science project.