A simple script to run the “behind-the-scenes” admin task of producing certificates!

Have you found yourself wanting to issue certificates to participants in an online learning session?

One of the recent tasks I had to take care of was preparing participation certificates for all our attendees of the Statistics in Data Science Series. I decided to create a script that can help. In this article, I will walk you through how I used a package called PILLOW to create the script for this repeating task; a little behind the scenes of leadership tasks at Women Who Code.

Photo by Cookie the Pom on Unsplash

PILLOW or PIL…


Choosing the right test to perform Hypothesis Testing between 2 groups

This article provides an overview of the 7 key steps to perform a successful Hypothesis Testing. Our focus is on experimenting with 2 groups (binary comparison) when the response variable is continuous. We will walk through the process to deduce the right test for the experiment based on the developed hypothesis and tested assumptions. At the end of the article, there is a bonus Python notebook with a sample statistical experiment that illustrates all of the steps covered in the article!

Photo by Coffee Geek on Unsplash

Why Statistical Experiments?

Statistical experiments provide an easy way to make statistical inferences on variability in data.

Hypothesis testing is the most…


Introduction to the 3 most commonly used distributions

From a simple coin toss to the weather reports, there is a chance element to all events being measured. If you have looked deeply into your weather widget in your phone, you can see something like “chance of rain: 45%”. Thus Probability Distributions are important in understanding these changes and they talk about how likely is each possible outcome. Distributions come in many shapes but we will decipher the most common ones here.

For any data scientist, a student or a practitioner, distribution is a must know concept. …


4 Key Steps in Preprocessing Data

Photo by Joshua Sortino on Unsplash

Data cleaning is the most hectic and time-consuming task before getting into machine learning modelling and analysis. Initial data is raw and unique — filled with noises that need attention. In this post we will walk through the overview of 4 key steps and techniques you can make use of to clean data and perform Exploratory Data Analysis (EDA).


Algorithm explanation and a simple project to cluster wine types

If you are trying to understand unsupervised learning, K-means Clustering is an easy algorithm to start with. In this post, we will look at the working of K-Means Clustering and implement an algorithm to categorize red and white wine based on their chemical compositions from the Wine Quality Dataset.

What is Unsupervised Learning?

Unsupervised Learning is a type of machine learning that helps to discover patterns and find important features in data without any labeled responses. The most common unsupervised learning is cluster analysis where the data is grouped based on the hidden patterns…


Straight-forward code to calculate the simple aggregations

BigQuery is a fast-processing analytical tool that processes SQL queries on the Google Cloud Platform. In this article, I will show code examples to calculate the mean, median and mode of a simple dataset in BigQuery. Whenever we start exploratory data analysis, these are the first few metrics to calculate on the numerical fields to understand the distribution of the data.

If you would like to learn more about these 3 summary statistics, you can look into this comprehensive video by Khan Academy.

Calculating Mean, Median and Mode in BigQuery

Dataset

We’ll be using the FIFA World Cup Dataset from Kaggle. This is one of the most downloaded…


A quick run down of how I followed through my study plan

My last achievement of 2020 was passing the Google Cloud Professional Data Engineer Certification Exam. It was more of a Journey for me and I want to break it down for anyone who is planning to get the certification. I hope this helps you to prepare your study plan to ace the exam.

The certification tests your ability to make data-driven decisions to design, build and operationalize a complete data processing systems, operationalize machine learning models and along the way ensure quality, security and privacy of the data.

You can look for more information at the Google Cloud Certifications Official…

Sneha Thanasekaran

Data Scientist | Learning Enthusiast with focus on Statistics, Machine Learning and Analytics. Finding new aspirations/dreams and having fun along the way!

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store