Chong Dang

1725 Bay Ridge Pkwy, Brooklyn, NY, 11204 | 1-(631)590-0688 | rickydangc@yahoo.com

CORE QUALIFICATIONS

• Extensive working experiences in the field of Data Science, Machine Learning, Deep Learning, Data Mining, Predictive

Modeling, Recommendation Systems, ETL Development and Data Visualization

• Comprehensive programming skills in Python2/3, R, MATLAB, SQL, Scala, Bash, JavaScript, HTML5, CSS3, C, C# and Java

• Expertise in Supervised Machine Learning Algorithms like Regression and Classification, such as Decision Tree, Ada-Boost,

Gradient Boosting, XG-Boost, Random Forest, Naïve Bayes, KNN, SVM, LDA and Deep Learning Method. Proficient at

Unsupervised Learning like K-Means Clustering and PCA (Principal Component Analysis)

• Skilled in Deep Learning Framework: TensorFlow, Keras and Py Torc h; Familiar with Deep Learning Models like Neural Networks,

CNN and RNN (LSTMs, GRU)

• Experienced in building Data Warehousing and Extract Transform Load (ETL) pipelines using Spark, Airflow and cloud tools

• Experience in defining project scope across Data Science, Data Analytics projects in collaboration with senior management and client

• Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification in both

Waterfall and Agile methodologies

• Adept in using Python libraries such as Pandas, NumPy, SciPy, Seaborn, Matplotlib, Scikit-learn, Keras, Tenso rF low and NLTK

• Experience in using Anaconda Navigator (Jupyter Notebook), PyCharm, RStudio for Python and R programming

• Working knowledge with Big Data technologies like Hadoop, MapReduce, Spark, SparkSQL, HDFS, Hive and HBase

• Expert in designing visualizations using Tableau10.3, Dash, R-Shiny, Power BI and D3.js

• Experience in using A/B test, Hypothesis test and ANOVA testing to find the accuracy of model

• Professional experience with handling with Structured and Unstructured data (Social Media, Texting, Photographs and Videos)

using relational databases like MySQL_5.X, Oracle_11g

• Expert in dealing with big data on NoSQL databases like Cassandra3.0 and MongoDB3.2

• In-depth knowledge with Cloud Infrastructure like AWS , GCP and Azure

• Experience in working with version control systems like GIT and used Source code management client tools like GitBash and GitHub

• Excellent communication, analytical, interpersonal, and presentation skills; expert at managing multiple projects simultaneously

• Familiar with current industry standards, such ISO, Six Sigma, and Capability Maturity Model (CMM)

• Good knowledge in JIRA, Microsoft Project, Microsoft Office, WordPress, Photoshop etc.

TECHNICAL SKILLS

• Machine Learning Essentials:

- Regression Models: Logistic Regression, Polynomial Regression, Stepwise, Ridge, Lasso, ElasticNet

- Classification Model: Naive Bayes, Decision trees, Random Forests, XG Boost, GBDT, AdaBoost, SVM, KNN, LDA

- Unsupervised Learning: Hierarchical Clustering, K-means Clustering, PCA and SVD (Dimensionality Reduction)

- Deep Learning: Neural Networks, CNN, RNN, Graph Neural Network (GNN)

• Packages: Numpy, Pandas, Scipy, Seaborn, Matplotlib, Plotly, Keras, Scikit-learn, NLTK, PyTorch, Beautiful Soup, WordCloud,

TensorFlow, Flask, SQLAlchemy

• BI Tools/Big Data Tools: Tableau, Microsoft Power BI, MicroStrategy, Dash, R-Shiny, Spark (SparkSQL, Spark MLlib, Spark

Streaming, Spark GraphX), Hadoop, MapReduce, Hive

• Report/Document Tools: MS Office 2016, MS Project, JIRA

• Languages: Python2.7/3, R, SQL, Scala, Pig, HTML, CSS, Linux Shell (CentOS), Markdown

• Database: MySQL, SQL Server, Oracle, PostgreSQL, MongoDB

• Infrastructure: Docker, AWS, Microsoft Azure, Git, Bitbucket, Databricks

EXPERIENCE

Bethpage, NY, Altice USA – Data Scientist 03/2019—Till Date

• Project Development: Designed and developed scalable production-level recommendation systems leveraging Machine Learning,

Deep Learning, Natural Language Processing, Statistical Modeling using Python to solve real-world business problems; collaborated

with backend and frontend engineers implemented recommendation systems into Django Rest Framework and successfully deployed

on AW S EC2

• Data Analysis: Translating numbers into meaningful facts for businesses to help them make better business decisions; Perform

cleansing, manipulation, analysis, and visualization of client data; Generated data visualization dashboard using Ta bl eau10.3 and

Python library Matplotlib/ Seaborn

• Data Preprocessing: Collected 6 GB data through company’s API, built Data Processing Pipeline and performed data cleaning,

features scaling, features engineering using Pandas and NumPy packages in python; built streaming data ETL using Spark that write

only the data that changed from previous batch

• NLP (Natural Language Processing) Tech niques: Built projects utilizing NLP knowledge including text mining, regex, bag of

words, TF-IDF, Word2Vec, PCA, LSTMs, cosine similarity, sentiment analysis, NER, and information extraction

• Log Classification: Applied feature selection based on tree importance to get 8 most important features from IVR data and extracted

features from modem logs and trained Random Forest to classify intents(label), then built content-based recommender that

recommends improvement mimicking status of given cable modem

• Recommendation Algorithm: Designed User-Based and Item-Based Collaborative-Filtering based on Pearson correlation between

users/items; hybridized content-based recommender with collaborative-filtering

• Model Evaluation: Measured model performance using Confusion Matrix, AUC-ROC curve; and identified accuracy, precision,

recall and F1 score using Confusion Matrix; used GridSearch to tune hyperparameters and evaluate a model for each combination of

algorithm parameters specified in a grid, finally we increased accuracy by 5%

• Agile Project Coordinator: Pitched machine learning ideas, showed exploratory data analysis (EDA) and presented project demo to

front desk business users; suggested, collected and synthesized business requirements based on use cases, created an effective roadmap

towards the deployment of a production-level machine learning application

New York City, NY, Entropy Technology – Software Engineer & Data Scientist 9/2017—3/2019

• Strategies Building: Being a member of a five-person group charged with building resume-parsing systems using NLP related

strategies for recruiting platform based on machine learning and deep learning

• Implementation: Transformed resume from PDF, Word, and other forms to txt file using Tika; Created corpus word list including

segment keyword list, university list and company list etc.; Searched segment keywords and created bounding box near keyword using

Hierarchical Layout, then stored each sentence into respective segment; performed feature extraction by creating segment specific

feature list and searched main feature in the respective segment

• Machine Learning/Deep Learning: Developed machine learning algorithms for Named Entities Recognition (NER), such as

recognizing candidate’s name and company’s name; used Support Vector Machine and Naïve Bayes Classifier to better generate

segmentation result; applied Regular Expression for information extraction, such as extracting email address; Implemented Deep

learning multi-class classification using RNN and CNN networks; Designed Confusion Matrix and calculated precision, recall and

f1 score to measure model performance, the accuracy reached to 99.9%

• Data Engineering: Constructed data pipeline on AWS by deploying Linux environment to use Jupyter notebook to query and clean

data, enabling data pipeline ETL, and preparing machine-learning oriented features table; Applied cloud technology (Google Cloud,

AWS, and Databricks) to synchronize and deploy Parse Server (Docker Container) on AWS through EC2; Processed one million

resume files and increased time efficiency by 20 times

• Interpersonal Communication and Leadership: Served as group leader for all interns to develop an adaptive information extraction

algorithm based on about 100 academic papers; Reviewed and refined all interns' information extraction strategies by testing the

results; Collaborating with product managers, marketing analytics, and front-end engineers to deliver features

New York City, NY, Sawtest Solution Inc – Software Test Engineer (Remote) 2/2018—2/2019

• Analyzed feature requirement and created test plans for devices like IOT devices, hotspot devices and smart phones and executed

manual/automation/field performance test

• Developed/Maintained automation test script with Jenkins to implement Agile development concept and setup automation test

environment with Linux Shell/Windows command prompt for Android phones

• Evaluated testing model by analyzing ADB log, modem log of field testing for GSM /WCDMA/ LTE (including Volte IMS)

• Experienced with T-Mobile Fit4Launch Field Test requirement, Verizon FIT Field Test, AGPS Performance Test, and etc.

• Dedicated in both Qualcomm and MediaTek chipset devices with Spirent Datum and Datum Mobile, Spirent Nomad HD, Wireshark,

QXDM, QPST, QCAT, QRCT, MediaTek ELT, ADB (Android Debug Bridge) and T-Mobile LCAT test tools

New York City, NY, Chinesehighway.com – Data Scientist 5/2017—2/2018

• Project Management: Analyzed and then effectively strategized in regard to the project goal, requirement, resources and deadlines;

clear communicated problems and process with the upper management team

• Data Preprocessing/Visualization: Collected 100,313 post from users for different topics including features like user_id, posts,

create_date etc.; built Data Preprocessing Pipeline and performed data cleaning, features scaling, features extraction using Pandas

and NumPy; used Matplotlib, Seaborn in Python to visualize the data and performed Featuring Engineering such as detecting

outliers, missing value and interpreting variables; applied extensive regular expressions to extract hashtags, URLs, and emotions

• Machine Learning/NLP: Worked in all phases of research like Feature Selection, Feature Engineering, Data Modeling, Developing

Tools, Validation, Visualizations and Model Evaluation; implemented classification algorithms(Random Forests, XG Boost, SVM,

KNN) to return a positive, negative, or neutral post; extracted posts and created a Wordcloud to determine the most frequent words;

created a graph to see the correlation between the tweets of the sentiment analysis; used Pipeline to manage the preprocessing steps

in one step

• Model Evaluation: Evaluated classification models using ROC curve and Confusion Matrix and identified accuracy, precision,

recall and F1 score

• Business Improvement: Generated weekly report manually containing regression for prediction on user’s activities and visualization

for acquisition and behavior; Targeting and connecting potential customers (Increased 50% customer base) by analyzing unstructured

data using Text Minin g; Design and develop specific databases (MySQL)for collection, tracking, and reporting of current data

New York City, NY, International Academic Alliance – Data Analyst Intern 12/2016—5/2017

• Database: Imported the data from different database (SQL Server 2012) using SQL Server Management Studio; Wrote queries in

MySQL to retrieve data from data sources; Collected data from different APIs and save them into the SQL Server database

• Student Data Visualization: Performed data imputation using Scikit-learn package in Python. Performed data processing using

Python libraries like Numpy and Pandas. Worked with data analysis using ggplot2 library in R to do data visualizations for better

understanding of customers' behaviors. Visually plotted data using Tablea u for dashboards and reports.

• Research: Participated in all phases of research including data collection, data cleaning, data mining, developing models and

visualizations. Collaborated with data engineers and operation team to collect data from internal system to fit the analytical

requirements

Xi’an City, China, CNPC– Data Analyst (Python & SQL) 9/2015—6/2016

• Database: Created database program in SQL server to manipulate data accumulated by oil transactions; Responsible for writing SQL

statements and stored procedures using PL/SQL.

• Oil Data Visualization: Performed exploratory data analysis like statistical calculation, data cleaning and data visualizations using

Numpy, Pandas and Matplotlib; Created interactive Dashboards on desktop platform to visualize the data by using Power BI in

MS Excel and Table au; Developed a RShiny-app to highlight Bayesian analysis and performed visualizations with ggplot2 using R;

Developed comprehensive reports and charts to present data and guide investment strategies

• Web Development: Created the fully functional website by using Django Rest Framework and successfully deployed on Server,

maintain features including sales search, oil fields filter and company email service; Improved the coding standards, code reuse, and

performance of the Extend application by making effective use of various design patterns

EDUCATION

Master: SUNY AT STONY BROOK January 2016—May 2017

Technological Systems Management (Data Science) GPA:3.8/4.0

Bachelor: NORTHWEST UNIVERSITY (Second-Class Scholarship, Twice) September 2012—July 2016

Electronic Science and Technology GPA:3.6/4.0