Your Data Teacher Blog Archives | Page 5 of 6

May 10, 2021

How many neurons for a neural network?

Neural networks are a fascinating field of machine learning. Let’s see how to find the best number of neurons of a neural network for our dataset.

May 5, 2021

Feature selection in machine learning using Lasso regression

The first thing I have learned as a data scientist is that feature selection is one of the most important steps of a machine learning pipeline. Fortunately, some models may help us accomplish this goal by giving us their own interpretation of feature importance. One of such models is the Lasso regression.

May 3, 2021

Why training set should always be smaller than test set

In the machine learning world, data scientists are often told to train a supervised model on a large training dataset and test it on a smaller amount of data. The reason why training dataset is always chosen larger than the test one is that somebody says that the larger the data used for training, the better the model learns.

April 30, 2021

An efficient language detection model using Naive Bayes

Language detection (or identification) is a fascinating branch of Natural Language Processing. Its goal is to create a model that is able to detect the language a text is written in. Data Scientists usually employ neural network models to accomplish such a goal. In this article, I show how to create a simple language detection model in Python using a Naive Bayes model.

April 28, 2021

Free Workshop – Feature importance in machine learning

I’m glad to introduce my new free online workshop about feature importance in machine learning. In this workshop, feature importance in supervised machine learning is presented both in theory and in practice using Python programming language and its powerful scikit-learn library.

April 26, 2021

Feature selection via grid search in supervised models

Feature selection is probably the most important part of machine learning, as well as hyperparameter tuning. How can we select the right …

purple-and-white dices on white lined paper

April 23, 2021

Feature selection by random search in Python

Feature selection has always been a great task in machine learning. According to my experience, I can surely say that feature selection is much more important than model selection itself.

April 21, 2021

When and how to use power transform in machine learning

Professional data scientists know that data must be prepared before feeding any model with it. Data pre-processing is probably the most important part of a machine learning pipeline and its importance is sometimes underestimated.

April 19, 2021

The bootstrap. The Swiss army knife of any data scientist

Every measure must be followed by an error estimate. There’s no chance to avoid this. If I tell you “I’m 1,93 …

April 16, 2021

The most used probability distributions in Data Science

Statistics is a must-have skill in a Data Scientist’s CV, so there are concepts and topics that must be known in advance if somebody wants to work with data and machine learning models. Probability distributions are a must-have tool. Let’s see the most important ones to know for a Data Scientist.