Growing Your Career as a Data Scientist

How to Become a Data Scientist

Posted by RICKY on December 23, 2019

img

Data Science is getting faster every day. There are huge prospects in data engineering, data research, data visualization, machine learning, etc. However, there is a huge gap between industry demand and skilled people.

What is a Data Scientist?

Data scientists do many of the same things as data analysts, but they also need to build machine learning/deep learning models to make accurate predictions about the future trend based on historical data. A data scientist often has more freedom to pursue their own ideas and do experiments to find interesting patterns and trends in the past data to generate revenue or avoid the business from losing money.

How to Become a Data Scientist?

I’ve seen a lot (of people) want to become a Data Scientist, but 60% of people start to give up when they on the learning track. Firstly, all data science learners need to develop persistence in learning.

Second, you need to get started to prepare the required technical skills. I have listed the skills of data science I have learned in the past 2 years. You may feel crazy when you starting to look at those skills. But trust me, you can learn all of that and be a great Data Scientist like me.

Machine Learning Essentials:

  • Regression Models: Logistic Regression, Polynomial Regression, Stepwise, Ridge, Lasso, ElasticNet
  • Classification Model: Naive Bayes, Decision trees, Random Forests, XG Boost, GBDT, AdaBoost, SVM, KNN, LDA
  • Unsupervised Learning: Hierarchical Clustering, K-means Clustering, PCA and SVD (Dimensionality Reduction)
  • Deep Learning: Neural Networks, CNN, RNN, Graph Neural Network (GNN)

    Python Packages: 

    Numpy, Pandas, Scipy, Seaborn, Matplotlib, Plotly, Keras, Scikit-learn, NLTK, PyTorch, Beautiful Soup, WordCloud, TensorFlow, Flask, SQLAlchemy

    BI Tools/Big Data Tools: 

    Tableau, Microsoft Power BI, MicroStrategy, Dash, R-Shiny, Spark (SparkSQL, Spark MLlib, Spark Streaming, Spark GraphX), Hadoop, MapReduce, Hive

    Programming Languages: 

    Python2.7/3, R, SQL, Linux Shell (CentOS), Markdown

    Database: 

    MySQL, SQL Server, Oracle, PostgreSQL, MongoDB

    Infrastructure: 

    Docker, AWS, Microsoft Azure, Git, Bitbucket, Databricks

Data Science Learning Path

Let’s start to learn based on your background. If you have programming experience, you can start to learn the statistic and machine learning. I recommend you to check Udacity free online courses. Data Science Free Online Courses

If you have no experience in programming and statistic. Don’t worry! Let’s start with the basic probability theory and statistical learning - The Elements of Statistical Learning. This is a very good book I have used before. After getting knowledge from basic mathematics. You can start to work on some data preprocessing projects. For example, you can extract data from Twitter using the Twitter API. Then you can start to clean data, clean outliers, deal with missing values, select feature and build a simple prediction model to predict the tweets forwarding times. Don’t be hurry in making the progress, because patience is the law of progress, so take your time. After you get into the Data Science world, make sure you keep learning every day. For example, you should know the basic usage of NoSQL database(MongoDB), Data Visualization Tools(Tableau, PowerBI) and RESTful APIs(Flask, Django). The last important thing is reading Blogs Towards Data Science! The full-stack Data Scientist not only can build the machine learning model but also can deploy the final model as an API. By deploying on the web, users everywhere can make requests to your URL to get predictions.

Closing

This is only a very traditional learning path of growing your career as a Data Scientist. I would like to share more of my Data Science experiences in the future. If you have any feedback or critiques, please feel free to share them with me. Thanks!