Chong Dang
1725 Bay Ridge Pkwy, Brooklyn, NY, 11204 | 1-(631)590-0688 | rickydangc@yahoo.com
CORE QUALIFICATIONS
Extensive working experiences in the field of Data Science, Machine Learning, Deep Learning, Data Mining, Predictive
Modeling, Recommendation Systems, ETL Development and Data Visualization
Comprehensive programming skills in Python2/3, R, MATLAB, SQL, Scala, Bash, JavaScript, HTML5, CSS3, C, C# and Java
Expertise in Supervised Machine Learning Algorithms like Regression and Classification, such as Decision Tree, Ada-Boost,
Gradient Boosting, XG-Boost, Random Forest, Naïve Bayes, KNN, SVM, LDA and Deep Learning Method. Proficient at
Unsupervised Learning like K-Means Clustering and PCA (Principal Component Analysis)
Skilled in Deep Learning Framework: TensorFlow, Keras and Py Torc h; Familiar with Deep Learning Models like Neural Networks,
CNN and RNN (LSTMs, GRU)
Experienced in building Data Warehousing and Extract Transform Load (ETL) pipelines using Spark, Airflow and cloud tools
Experience in defining project scope across Data Science, Data Analytics projects in collaboration with senior management and client
Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design Specification in both
Waterfall and Agile methodologies
Adept in using Python libraries such as Pandas, NumPy, SciPy, Seaborn, Matplotlib, Scikit-learn, Keras, Tenso rF low and NLTK
Experience in using Anaconda Navigator (Jupyter Notebook), PyCharm, RStudio for Python and R programming
Working knowledge with Big Data technologies like Hadoop, MapReduce, Spark, SparkSQL, HDFS, Hive and HBase
Expert in designing visualizations using Tableau10.3, Dash, R-Shiny, Power BI and D3.js
Experience in using A/B test, Hypothesis test and ANOVA testing to find the accuracy of model
Professional experience with handling with Structured and Unstructured data (Social Media, Texting, Photographs and Videos)
using relational databases like MySQL_5.X, Oracle_11g
Expert in dealing with big data on NoSQL databases like Cassandra3.0 and MongoDB3.2
In-depth knowledge with Cloud Infrastructure like AWS , GCP and Azure
Experience in working with version control systems like GIT and used Source code management client tools like GitBash and GitHub
Excellent communication, analytical, interpersonal, and presentation skills; expert at managing multiple projects simultaneously
Familiar with current industry standards, such ISO, Six Sigma, and Capability Maturity Model (CMM)
Good knowledge in JIRA, Microsoft Project, Microsoft Office, WordPress, Photoshop etc.
TECHNICAL SKILLS
Machine Learning Essentials:
- Regression Models: Logistic Regression, Polynomial Regression, Stepwise, Ridge, Lasso, ElasticNet
- Classification Model: Naive Bayes, Decision trees, Random Forests, XG Boost, GBDT, AdaBoost, SVM, KNN, LDA
- Unsupervised Learning: Hierarchical Clustering, K-means Clustering, PCA and SVD (Dimensionality Reduction)
- Deep Learning: Neural Networks, CNN, RNN, Graph Neural Network (GNN)
Packages: Numpy, Pandas, Scipy, Seaborn, Matplotlib, Plotly, Keras, Scikit-learn, NLTK, PyTorch, Beautiful Soup, WordCloud,
TensorFlow, Flask, SQLAlchemy
BI Tools/Big Data Tools: Tableau, Microsoft Power BI, MicroStrategy, Dash, R-Shiny, Spark (SparkSQL, Spark MLlib, Spark
Streaming, Spark GraphX), Hadoop, MapReduce, Hive
Report/Document Tools: MS Office 2016, MS Project, JIRA
Languages: Python2.7/3, R, SQL, Scala, Pig, HTML, CSS, Linux Shell (CentOS), Markdown
Database: MySQL, SQL Server, Oracle, PostgreSQL, MongoDB
Infrastructure: Docker, AWS, Microsoft Azure, Git, Bitbucket, Databricks
EXPERIENCE
Bethpage, NY, Altice USAData Scientist 03/2019Till Date
Project Development: Designed and developed scalable production-level recommendation systems leveraging Machine Learning,
Deep Learning, Natural Language Processing, Statistical Modeling using Python to solve real-world business problems; collaborated
with backend and frontend engineers implemented recommendation systems into Django Rest Framework and successfully deployed
on AW S EC2
Data Analysis: Translating numbers into meaningful facts for businesses to help them make better business decisions; Perform
cleansing, manipulation, analysis, and visualization of client data; Generated data visualization dashboard using Ta bl eau10.3 and
Python library Matplotlib/ Seaborn
Data Preprocessing: Collected 6 GB data through company’s API, built Data Processing Pipeline and performed data cleaning,
features scaling, features engineering using Pandas and NumPy packages in python; built streaming data ETL using Spark that write
only the data that changed from previous batch
NLP (Natural Language Processing) Tech niques: Built projects utilizing NLP knowledge including text mining, regex, bag of
words, TF-IDF, Word2Vec, PCA, LSTMs, cosine similarity, sentiment analysis, NER, and information extraction
Log Classification: Applied feature selection based on tree importance to get 8 most important features from IVR data and extracted
features from modem logs and trained Random Forest to classify intents(label), then built content-based recommender that
recommends improvement mimicking status of given cable modem
Recommendation Algorithm: Designed User-Based and Item-Based Collaborative-Filtering based on Pearson correlation between
users/items; hybridized content-based recommender with collaborative-filtering
Model Evaluation: Measured model performance using Confusion Matrix, AUC-ROC curve; and identified accuracy, precision,
recall and F1 score using Confusion Matrix; used GridSearch to tune hyperparameters and evaluate a model for each combination of
algorithm parameters specified in a grid, finally we increased accuracy by 5%
Agile Project Coordinator: Pitched machine learning ideas, showed exploratory data analysis (EDA) and presented project demo to
front desk business users; suggested, collected and synthesized business requirements based on use cases, created an effective roadmap
towards the deployment of a production-level machine learning application
New York City, NY, Entropy Technology Software Engineer & Data Scientist 9/2017—3/2019
Strategies Building: Being a member of a five-person group charged with building resume-parsing systems using NLP related
strategies for recruiting platform based on machine learning and deep learning
Implementation: Transformed resume from PDF, Word, and other forms to txt file using Tika; Created corpus word list including
segment keyword list, university list and company list etc.; Searched segment keywords and created bounding box near keyword using
Hierarchical Layout, then stored each sentence into respective segment; performed feature extraction by creating segment specific
feature list and searched main feature in the respective segment
Machine Learning/Deep Learning: Developed machine learning algorithms for Named Entities Recognition (NER), such as
recognizing candidate’s name and company’s name; used Support Vector Machine and Naïve Bayes Classifier to better generate
segmentation result; applied Regular Expression for information extraction, such as extracting email address; Implemented Deep
learning multi-class classification using RNN and CNN networks; Designed Confusion Matrix and calculated precision, recall and
f1 score to measure model performance, the accuracy reached to 99.9%
Data Engineering: Constructed data pipeline on AWS by deploying Linux environment to use Jupyter notebook to query and clean
data, enabling data pipeline ETL, and preparing machine-learning oriented features table; Applied cloud technology (Google Cloud,
AWS, and Databricks) to synchronize and deploy Parse Server (Docker Container) on AWS through EC2; Processed one million
resume files and increased time efficiency by 20 times
Interpersonal Communication and Leadership: Served as group leader for all interns to develop an adaptive information extraction
algorithm based on about 100 academic papers; Reviewed and refined all interns' information extraction strategies by testing the
results; Collaborating with product managers, marketing analytics, and front-end engineers to deliver features
New York City, NY, Sawtest Solution Inc Software Test Engineer (Remote) 2/2018—2/2019
Analyzed feature requirement and created test plans for devices like IOT devices, hotspot devices and smart phones and executed
manual/automation/field performance test
Developed/Maintained automation test script with Jenkins to implement Agile development concept and setup automation test
environment with Linux Shell/Windows command prompt for Android phones
Evaluated testing model by analyzing ADB log, modem log of field testing for GSM /WCDMA/ LTE (including Volte IMS)
Experienced with T-Mobile Fit4Launch Field Test requirement, Verizon FIT Field Test, AGPS Performance Test, and etc.
Dedicated in both Qualcomm and MediaTek chipset devices with Spirent Datum and Datum Mobile, Spirent Nomad HD, Wireshark,
QXDM, QPST, QCAT, QRCT, MediaTek ELT, ADB (Android Debug Bridge) and T-Mobile LCAT test tools
New York City, NY, Chinesehighway.comData Scientist 5/2017—2/2018
Project Management: Analyzed and then effectively strategized in regard to the project goal, requirement, resources and deadlines;
clear communicated problems and process with the upper management team
Data Preprocessing/Visualization: Collected 100,313 post from users for different topics including features like user_id, posts,
create_date etc.; built Data Preprocessing Pipeline and performed data cleaning, features scaling, features extraction using Pandas
and NumPy; used Matplotlib, Seaborn in Python to visualize the data and performed Featuring Engineering such as detecting
outliers, missing value and interpreting variables; applied extensive regular expressions to extract hashtags, URLs, and emotions
Machine Learning/NLP: Worked in all phases of research like Feature Selection, Feature Engineering, Data Modeling, Developing
Tools, Validation, Visualizations and Model Evaluation; implemented classification algorithms(Random Forests, XG Boost, SVM,
KNN) to return a positive, negative, or neutral post; extracted posts and created a Wordcloud to determine the most frequent words;
created a graph to see the correlation between the tweets of the sentiment analysis; used Pipeline to manage the preprocessing steps
in one step
Model Evaluation: Evaluated classification models using ROC curve and Confusion Matrix and identified accuracy, precision,
recall and F1 score
Business Improvement: Generated weekly report manually containing regression for prediction on users activities and visualization
for acquisition and behavior; Targeting and connecting potential customers (Increased 50% customer base) by analyzing unstructured
data using Text Minin g; Design and develop specific databases (MySQL)for collection, tracking, and reporting of current data
New York City, NY, International Academic AllianceData Analyst Intern 12/2016—5/2017
Database: Imported the data from different database (SQL Server 2012) using SQL Server Management Studio; Wrote queries in
MySQL to retrieve data from data sources; Collected data from different APIs and save them into the SQL Server database
Student Data Visualization: Performed data imputation using Scikit-learn package in Python. Performed data processing using
Python libraries like Numpy and Pandas. Worked with data analysis using ggplot2 library in R to do data visualizations for better
understanding of customers' behaviors. Visually plotted data using Tablea u for dashboards and reports.
Research: Participated in all phases of research including data collection, data cleaning, data mining, developing models and
visualizations. Collaborated with data engineers and operation team to collect data from internal system to fit the analytical
requirements
Xi’an City, China, CNPCData Analyst (Python & SQL) 9/2015—6/2016
Database: Created database program in SQL server to manipulate data accumulated by oil transactions; Responsible for writing SQL
statements and stored procedures using PL/SQL.
Oil Data Visualization: Performed exploratory data analysis like statistical calculation, data cleaning and data visualizations using
Numpy, Pandas and Matplotlib; Created interactive Dashboards on desktop platform to visualize the data by using Power BI in
MS Excel and Table au; Developed a RShiny-app to highlight Bayesian analysis and performed visualizations with ggplot2 using R;
Developed comprehensive reports and charts to present data and guide investment strategies
Web Development: Created the fully functional website by using Django Rest Framework and successfully deployed on Server,
maintain features including sales search, oil fields filter and company email service; Improved the coding standards, code reuse, and
performance of the Extend application by making effective use of various design patterns
EDUCATION
Master: SUNY AT STONY BROOK January 2016—May 2017
Technological Systems Management (Data Science) GPA:3.8/4.0
Bachelor: NORTHWEST UNIVERSITY (Second-Class Scholarship, Twice) September 2012July 2016
Electronic Science and Technology GPA:3.6/4.0