Dealing with unbalanced datasets is always hard for a data scientist. Such datasets can create trouble for our machine learning models if we don’t deal with them properly. So, measuring how much our dataset is unbalanced is important before taking the proper precautions. In this article, I suggest some possible techniques.
When to retrain a machine learning model?
Training a model is a complex process requiring much effort and analysis. Once a model is ready, we know that it won’t be valid forever and that we’ll need to train it again. How can we decide if a model needs to be retrained? There are some techniques that help us.
Which models require normalized data?
Data pre-processing is an important part of every machine learning project. A very useful transformation to be applied to data is normalization. Some models require it as mandatory to work properly. Let’s see some of them.
Which models are interpretable?
Model explanation is an essential task in supervised machine learning. Explaining how a model can represent the information is crucial to understanding the dynamics that rule our data. Let’s see some models that are easy to interpret.
How To Run A/B Tests
Online marketing and startup growth are better if you can continuously test different ideas. The statistic comes into help when we have to perform A/B tests. The results you may achieve with the proper analysis can give your project a great boost.
Are your training and test sets comparable?
Data scientists usually split a dataset into training and test sets. Their model is trained on the former and then its performance is checked in the latter. But, if these sets are sampled wrongly, model performance may be affected by biases.
A language detection model in pure Javascript
Models are very powerful tools to be used in several web applications. The general approach is to make them accessible by using REST APIs. In this article, I’ll talk about a model that can be created in Python and then deployed in pure Javascript.
Visualize the predictive power of a numerical feature in a classification problem
Measuring the predictive power of some feature in a supervised machine learning problem is always a hard task to accomplish. Before using any correlation metrics, it’s important to visualize wether a feature is informative or not. In this article, we’re going to apply data visualization to a classification problem.
How to measure outlier probability
Outliers are a great problem for a data scientist. They are “strange points” in a dataset that must be checked in order to verify whether they are errors or real phenomena.
Why SQL is still important for data analysis
Data Science mixes different skills and there are some old skills that are still useful. One of such skills is SQL.