An Engineer’s Trek into Machine Learning

You are a Software Engineer. You notice Artificial Intelligence, Machine Learning, Deep Learning, Data Science buzzwords all around. You wonder what these phrases mean, whether all this is for real and useful or is yet another hype and passing fad.

You want to figure out how it is changing or will change the computer/IT industry, and why you should care, if at all. You google about it, you read various articles, blogs, and tutorials. You get some idea but are also overwhelmed by the enormous wealth of math, tools, frameworks you discover.

You wish if someone could give an overview, say, a map and compass suitable for an engineer, to help you embark on the journey of mastering it all. This blog post is for you.

Like you, I am a software engineer and have been through that journey. My experience is relevant, and you can use it for charting out your own course. Apart from programming, I also love to trek in the Himalayas. I see interesting parallels in my ML journey and high altitude treks. Through those parallels, I will explain how you can plan and go on an ML expedition.

A Trekking Expedition

A trek has the following four stages.

Irresistible Allure: I am drawn to a trek by hearing or reading about it. The peaks, trails, valleys, and terrain. People gushing that it is a beautiful and challenging trek. It is an allure difficult to resist. But before I invest time, energy, and resources, I ask: is the trail worth it? What is the experience like? Why should I go on it? I read people’s blogs, see photos, spend time on Google Earth. In the end, I either say “Naah!” or I am even more hooked into doing it.

Study the Terrain: That is when the real work starts. Every trek has its challenges. So I study the terrain on Google Maps and Google Earth. I spend time looking at elevation maps, reading blogs, understanding points of no returns, and possible escape routes. I want to know key geographical features, difficulties, and dangers.

Train Hard: Once it is clear what I am up to, I plan and prepare. I start training hard, make a detailed plan, and regularly assess my progress.

Check your Gears: Finally, I stock the supplies and check my gears, especially the map and compass. I ensure that the map is ingrained in my mind, and I know enough about the vicinity to have an intuitive sense of direction.

And then the adventure begins!

Let’s examine these four stages of the expedition to learn Machine Learning.

Irresistible Allure: Why Learn Machine Learning

Like a vast number of software engineers, I am not formally trained in Machine Learning. For most of my working life, I built compilers, program analysis and programming tools, and IDEs; something very different from Machine Learning. But while working at Microsoft Research, I saw my colleagues applying Machine Learning and statistics to solve hard program analysis and software engineering problems. The success of these techniques allured me. But let’s examine: is it really worth your while or is all this buzz just a passing fad?

The four most common buzzwords are Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Data Science (DS). Let’s check out Google Trends for these phrases, as the saying goes: In God we trust; others must provide data.

Google Trends for Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), Data Science (DS)

Since AlexNet in September 2012, there has been a significant increase in these four phrases. AlexNet winning ImageNet competition by a margin of almost 11% was a watershed moment. In research, improving the state of art by a few percentage points is a big deal, but doing it by over 10% is very rare.

In 2015, Microsoft Research CNN surpassed human-level vision performance. Considering focal adjustment and color cones bless human eyes with one of the best depth and color perception in the animal kingdom, beating human-level in vision tasks was astonishing!

At present, it is not just the trends data. In fact, consumers use ML several times every day: web search result ranking, email spam detection, Estimated Arrival Time (ETA) in maps, news story clustering around topics, ads on Google and Facebook, recommendation systems on sites like Amazon and Netflix, and Alexa and Google Assistant.

Businesses are also increasingly relying on ML for fraud detection, pricing/financial modeling, customer churn prediction, equipment failure prediction, network intrusion detection, customer segmentation, sentiment classification, image/video analysis, and speech/audio processing.

It has exploded into our daily lives because of two reasons:

a lot of data is being generated, and
the availability of economical on-demand cloud computing.

It is clear that ML is not just hype or passing fad. It is real, it is here and now. So the journey is worth it.

Study the Terrain: Introduction to Machine Learning

Once you have decided to go on this journey, it is important to study the terrain. Let’s start with understanding the four buzz phrases.

Artificial intelligence (AI): Intelligence demonstrated by machines by performing like humans.

Machine learning (ML): Use of statistical models to make predictions by recognizing patterns in the data, and without requiring explicit instructions.

Deep learning (DL): Machine learning methods using deep neural networks.

Data science (DS): Use of statistical techniques to extract knowledge and insights from structured and unstructured data.

AI is the broadest term including statistical as well as other techniques. ML is a subset of AI, and DL is a subset of ML. Data Science overlaps with AI, ML, and DL.

An oversimplified definition of the differences between DS, ML, and AI is:

Data Science produces insights
Machine Learning produces predictions
Artificial Intelligence produces actions

Data management and continuous deployment of ML/DL models are becoming mainstream. Companies are making significant investments in programming and engineering discipline.

Data Science vs. Machine Learning vs. Deep Learning vs. Artificial Intelligence

Traditional Programs vs. Machine Learning

All these definitions are fine, but what is the difference between Machine Learning and traditional programs?

In traditional programming, a programmer designs logic or algorithm to solve a problem. The program applies this algorithm to input and computes the result.

But in machine learning, the programmer does not write the logic to compute results. Instead, she builds a model from the data. The model is logic. As newer data (user feedback on the correctness of output) arrives, the model (which is logic) changes too. So program “learns” on its own.

Machine Learning programs have two distinct phases:

Training: Input and the expected output are used to train and test various models, and select the most suitable model.
Inference: The model is applied to the input to compute results. These results are wrong sometimes. A mechanism is built the gather user feedback on such occasions.

This feedback is added to the training data, which in turn leads to improved models. This loop is called data-pipeline or data-engineering.

Traditional Programs vs. Machine Learning

All this is still abstract. Let’s take the problem of detecting email spam, and compare traditional and machine learning solutions.

In Traditional Programming Solution, a programmer will analyze how a human will determine whether an email is spam, and enumerate an exhaustive list of rules and patterns. For example:

the word FREE occurs several times
there are phrases like Weight Loss,
messages claiming you have won a lottery
messages originating from specific countries or IP addresses, and so on.

As spammers change tactics, the programmer needs to continuously update these rules to keep up with them. This is how Knowledge or Expert Systems were built in the past.

In Machine Learning Solutions, a programmer will:

Prepare data set: a large number of emails labeled by humans as spam or no-spam
Train, test, and tune models, and select the best.
During inference, this model is applied to determine whether to keep in an email in the inbox or the spam folder.
Statistical models are not 100% accurate. Spammers too keep coming up with new tactics. So sometimes spam classification is incorrect. Users will move such emails from the inbox to the spam folder (or vice versa).
Such user actions are tracked and are treated as new human-labeled data.
These examples are added to the data set, and a new model is trained to remain up-to-date with the spam trends.

Machine Learning

Machine Learning has three kinds of techniques:

Supervised Learning: Train a function that maps an input to output. Training infers the relationship in given labeled input-output pair examples (called training dataset). Two common methods are regression and classification.

Unsupervised Learning: Train to find previously unknown patterns in data set without pre-existing labels. Two common methods are clustering and principal component analysis (also known as dimensionality reduction).

Reinforcement Learning: Train software agents to take actions in an environment to maximize some notion of cumulative reward. Its applications are in robotics, games, skill learning, and adaptation.

Let’s learn a bit more about the three most common techniques: regression, classification, and clustering.

Linear Regression vs. Classification vs. Clustering

Regression is a supervised learning technique to estimate the value of a dependent variable from one or more independent variables. An example is to estimate the value of a house from its size, location, number of bedrooms, number of bathrooms, etc.

It is like fitting a curve over the given points such that the difference between the estimated value and the actual value is minimized over a large sample data set. In the estimator function, Y = f(X), Y is called outcome, and X is called feature-vector.

Classification is a supervised learning technique to identify the class/group from the characteristics of an object. An example is to identify whether the vehicle in a given photo is of a car, or a truck, or a motorcycle.

It like drawing lines to divide a region into a number of regions (3 in this case, representing car, truck, motorcycle) such that the number of objects not falling in their region is minimum. In the classification function, Y = f(X), Y is called labels (and is a finite set, like an enum in programs), and X is called feature vector.

Clustering is an unsupervised learning technique to group objects into clusters of similar objects. In other words, objects in a cluster are more similar to each other than those in other clusters, as per the given similarity criteria. An example is to cluster similar news articles based on topics.

The basic difference between classification and clustering is that in classification, the label-set is finite and given; but in clustering, the number and definition of the clusters are not known in advance and is inferred from the data, so label-set is neither finite nor given.

Now let’s try to map some of the problems listed in the beginning to one of these techniques:

Web search result ranking/scoring: Regression
Email spam detection: Classification
ETA in maps: Regression
Showing the ad that will maximize earning: Regression
Recommendation systems: Clustering

I want to pause for a moment and emphasize that machine learning is not a magic bullet. It all depends on the data you use for the training because, in ML, the data is the logic. If you are not careful in collecting and curating the data, ML predictions can be severely faulty. It is known as Garbage In, Garbage Out.

For supervised learning, it matters what you have trained your system for. For example, if you trained a classification system to distinguish between car, truck, and motorcycle, you can not use it to, say, a red car from a blue car. If your problem changes, you have to change labels in training data, and retrain the model.

Deep Learning

Deep Learning is a subset of machine learning using Deep Neural Network (DNN) models. These models have one input layer, one output layer, and several intermediate hidden layers.

There are various specialized network designs suitable for different problems. Some examples are Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) neural network. We will not get into details of various types of DNNs in this post.

DNNs solve the same problems: regression, classification, clustering, etc. But DNNs are computation-intensive (and thus expensive), these are used for certain types of data.

Other machine learning techniques usually suffice for structured data. DNNs give better results on unstructured data. There are three kinds of unstructured data that DNNs are commonly used for:

Vision: To process images and video data. Common applications are object recognition in images, video summarization.

Natural Language: To process text in natural languages. Common applications are sentiment classification, intent recognition, entity recognition, machine translation.

Speech: To process voice audio data. Common applications are speech recognition (speech to text) and speech synthesis (text to speech).

Tools and Frameworks

Python is the most popular language among machine learning practitioners. There is a rich ecosystem of libraries and frameworks.

For data science and machine learning, you can use NumPy, Pandas, SciPy, and SciKit-Learn. For data visualization, MatplotLib and Seaborn are useful.

During experimentation and exploring ideas, Jupyter Notebook, Kaggle kernels, and Google Colabs are very convenient to keep notes of the code as well as visualization and experiment outputs.

For deep learning, TensorFlow and PyTorch, and Keras API are the most popular frameworks for building neural networks.

To deploy on the cloud, there are alternatives from all major cloud providers: Google Cloud AutoML, Amazon SageMaker, Microsoft Azure ML.

Train Hard: How to Break into Machine Learning

As you can see, machine learning is a vast terrain, a lot of ground to cover. And it is not easy. So you need to train hard.

The good news is that it is the best time in human history to learn even advance topics of computer science on your own. You just need motivation and a computer with a good internet connection.

There is no dearth of good articles, tutorials, and excellent and affordable online courses. I reiterate, if you are motivated to learn machine learning on your own, there wasn’t a better time ever.

Online Courses and Tutorials for Machine Learning

I am listing a few courses that I am fond of, but there are so many other very good and useful online resources.

Machine Learning Basics: Andrew Ng’s famous ML course
Machine Learning for practitioners: DS/ML Python Bootcamp @ Udemy
Google’s ML crash course
TensorFlow: Tutorials
PyTorch: Tutorials
Deep Learning: Courses @ DeepLearning.ai
Deep Learning: Practical Deep Learning for Coders @ FastAI
ML in Production: Full Stack Deep Learning
Cloud: Google Cloud AutoML, Amazon SageMaker, Microsoft Azure ML
Kaggle has a rich source of data sets and competitions.

Machine Learning Books

Once you get started in ML, you might want to pick some of these books and strengthen your theory foundation. These are some of the best books from some of the most respected professors. These are available online, free of cost.

The Elements of Statistical Learning, by Hastie, Tibshirani, Friedman
An Introduction to Statistical Learning, by James, Witten, Hastie, Tibshirani
Deep Learning, by Goodfellow, Bengio, Courville, Bach
Pattern Recognition and Machine Learning, by Bishop
Machine Learning Yearning, by Andrew Ng
Speech and Language Processing, by Jurafsky, Martin

Quickstart Guides

If you want to gather a quick glimpse of a specific problem, here is a list of articles to quickly start working on a specific technique or problem.

An intro, tutorial, and complete guide to data manipulation with NumPy and Pandas
Tips on graph plotting and data visualization with Matplotlib and Seaborn, and having fun doing Exploratory Data Analysis
A hands-on intro to Scikit-Learn with examples
A beginner’s guide to Linear Regression with Scikit-Learn
Overview of Logistic Regression or Classification with Scikit-Learn
Overview and explanation of Support Vector Machines (SVM) with Scikit-Learn and various kernels
Introduction to Decision Tree and implementing and understanding Enchanted Random Forests with Scikit-Learn
Intro, Tutorial, explanation, and implementation of Naive Bayes Classifier with Scikit-Learn
An overview of Time Series data handling and forecasting with Scikit-Learn
Understanding the Bias-Variance tradeoff
Accuracy, Precision, Recall, and F1 score, which metrics to use for evaluating ML algorithms
Ridge (L2) and Lasso (L1) Regularization with Scikit-Learn
K Nearest Neighbors or k-NN classifier with Scikit-Learn, a tutorial
K-Means Clustering explained with Scikit-Learn
DBSCAN clustering with Scikit-Learn
Principal Component Analysis (Dimensionality Reduction) with Scikit-Learn example
A brief explanation of Building a Recommender Systems

Check Your Gears: Machine Learning for Developers

As you begin your ML journey, it is time to check your gears and get a hang of the map and compass.

Deterministic logic is ingrained in Software Engineers. But machine learning is statistical in nature. We need to learn to accept that the model will not work correctly on all inputs. Fixing it for a particular input will most likely degrade overall performance. Learning that was the biggest struggle I had.

You may be surprised that a unit test that has been passing, may start failing even without any code change. It can happen due to the random partitioning of training data into the train, validation, and test sets. Random partitioning can result in a slightly different model. And that model can coincidently fail on the input used in that unit test. You need to internalize the notion of statistical correctness.

Machine Learning Map for Developers

You learned about the three most prominent machine learning techniques: linear regression, classification, and clustering. You know about neural networks and applications of deep learning. You also got a list of courses and books for in-depth knowledge. You also have a list of articles for the most important topics and techniques. All of it together gives you an overview of the ML landscape. That is your map. You are ready to go on the journey of learning how ML models work.

Machine Learning Compass for Developers

Data scientists are strong in mathematics. They have mastered playing with data and designing efficient models. These might not be exactly your forte, but you need to slowly develop these capabilities. Without that, a model will remain a black box.

Do remember your strengths as a software engineer. You have strong programming skills. You are an expert in building highly scalable applications. You have mastered the continuous develop-test-deploy pipeline. You engineer systems that run 24x7 with automated monitoring and alerting.

These skills are not common in data scientists. They may not care about establishing correctness guarantees of an ML model in a 24x7 production system. They typically build batch programs for a given data set. It may require significant (re)work and engineering to take it to production. You have to adapt your engineer practices to the world of ML. You can bring that engineering discipline and rigor to ML.

Both data scientists and software engineers need to develop a better understanding of the counterpart. They must keep moving towards the top-right quadrant. That is your compass.

Typically, data scientists are stronger in statistics and developers are stronger in software engineering. — तमसो मा ज्योतिर्गमय (tamasō mā jyōtirgamaya). Lead me from darkness to light.

Adventure Begins

Key takeaways are:

AI is real, the future is here. As Andrew Ng says: AI is the new electricity. Like electricity during the industrial revolution, AI will revolutionize one industry after another.
It is only the beginning of the AI adventure humanity is embarking upon.
You can learn anything by training, start today!

Best of Luck, Bon Voyage !!!