How to self learn data scienceBy Cinzia Braglia on March 25, 2022
There are different ways you can start a career in data science and they don’t necessarily require you to have completed a degree in data science or computer science.
With the rapid expansion of the data science community, many resources have become available online for self learning; from blog posts, to videos, to free books to entire courses. However, navigating in this sea of information and deciding where to start can be overwhelming.
I found myself in the same situation when I first started my journey into the data science world, after finishing my Master’s degree in Astrophysics and having very little knowledge on the topic.
In this blog post I would like to share what I learned from my personal experience, and hopefully inspire anyone else looking to start a career in data science from scratch.
The basics of data science
Data science is a very wide field that is evolving continuously, with different areas to specialize in. However, there are certain core skills required in a data scientist role that you should aim to master.
In this blog post, Amy and Sorcha explain very well what these skills are and why they’re important.
At the heart of everything a data scientist does is programming, so this is a good starting point if you’re not familiar with coding. Most data scientists work in Python and/or R; both languages offer a wide variety of libraries to implement machine learning (ML) methods and are also useful for data analysis and visualization.
Another key aspect is understanding the algorithms behind the ML methods you will eventually learn to apply. This requires some mathematical and statistical knowledge that will naturally be easier for some learners than others. Although having a math and stats background is helpful, becoming skilled in ML is achievable for anyone willing to put in the work.
Although having a mathematics and statistics background is helpful, becoming skilled in ML is achievable for anyone willing to put in the work.
Data Scientist at Peak
I recommend starting with some practical resources first, like a course with hands-on exercises so that you can familiarize yourself with key data science and ML concepts, test your knowledge and – most of all – have fun tackling a real-world problem. In time, you can then move on to the more theoretical background.
A personal favorite of mine is this Udemy data science course. It is very well structured and accessible for a complete beginner, in both programming and machine learning.
The first part of the course is dedicated to programming and offers a crash course in Python, starting with explaining how to set up a Python environment on your computer. It then goes on to focus on libraries that are most useful for data science, both for data manipulation and visualization.
The second part of the course covers a variety of ML methods most commonly used in data science; from linear/ logistic regression to decision tree/random forest, clustering methods, natural language processing and neural networks.
For each method there is a mini project you can work on to test your understanding, mostly using models from Python scikit-learn library.
My top resource recommendations
Although this is a very valid course, it alone is not enough to prepare you for a real data science job as each topic is explained in a very introductory way, with little emphasis on the math behind the models.
My approach to this course was to mostly use it as a guide. It gave me a structure of what I needed to learn and a general idea of each topic, but then I used a variety of other resources to deepen my knowledge.
Here is a list of useful resources I particularly enjoyed and highly recommend to consult:
- Python Data Science Handbook: this is a book about doing data science with Python. It assumes the users to have a bit of familiarity with the Python language
- R for Data Science: this book explains how to apply the R language to data science
- 3Blue1Brown: a really good YouTube channel explaining advanced math concepts using pretty cool visualization tools. If you don’t have a strong mathematics background, this channel has a series of videos on linear algebra and calculus:
- StatQuest: another great YouTube channel explaining statistics, ML and data science in a friendly and accessible way
- An Introduction to Statistical Learning with Applications in R (2nd Edition): this book will give you a good theoretical understanding of a variety of ML methods. It offers a good balance between descriptive writing, math and stats concepts and hands on examples, resulting accessible to different backgrounds
- Machine Learning, A Probabilistic Perspective: this book is more advanced, explaining machine learning using concepts from probability theory. The reader is assumed to be familiar with basic calculus, probability and linear algebra
- Kaggle: a platform for data scientists to access a variety of datasets, share code with other users and join competitions. This is where you can keep having fun with some practical projects and learn the challenges of a data science project
The resources I provided will hopefully give you a solid theoretical background and best prepare you for a first job in data science.
My final piece of advice is that it’s OK not to know everything! Each of us has a unique background and set of skills; what matters is being open to learning. This is only the beginning of your journey in data science, and during your career you will always be learning and expanding your knowledge – which is both the challenge, and the joy, of the profession!