Data Science
Collection of useful data science topics along with articles and videos.
Subscribe to:
- CodeCut for articles and bite-sized Python tips in your mailbox
- My YouTube channel for videos related to Python and data science
How to Download the Code in This Repository to Your Local Machine
To download the code in this repo, you can simply use git clone
git clone https://github.com/khuyentran1401/Data-science
Contents
- MLOps
- Data Management Tools
- Testing
- Productive Tools
- Python Helper Tools
- Tools for Deployment
- Speed-up Tools
- Math Tools
- Machine Learning
- Natural Language Processing
- Computer Vision
- Time Series
- Feature Engineering
- Visualization
- Mathematical Programming
- Scraping
- Python
- Logging and Debugging
- Linear Algebra
- Data Structure
- Statistics
- Web Applications
- Share Insights
- Cool Tools
- Learning Tips
- Productive Tips
- VSCode
- Book Review
- Data Science Portfolio
MLOps
Title | Article | Repository | Video |
---|---|---|---|
Stop Hard Coding in a Data Science Project – Use Configuration Files Instead | 🔗 | 🔗 | 🔗 |
Poetry: A Better Way to Manage Python Dependencies | 🔗 | 🔗 | |
Git for Data Scientists: Learn Git through Practical Examples | 🔗 | 🔗 | |
Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code | 🔗 | 🔗 | |
Kedro — A Python Framework for Reproducible Data Science Project | 🔗 | 🔗 | |
Orchestrate a Data Science Project in Python With Prefect | 🔗 | 🔗 | |
Orchestrate Your Data Science Project with Prefect 2.0 | 🔗 | 🔗 | 🔗 |
DagsHub: a GitHub Supplement for Data Scientists and ML Engineers | 🔗 | 🔗 | |
4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | 🔗 | 🔗 | 🔗 |
BentoML: Create an ML Powered Prediction Service in Minutes | 🔗 | 🔗 | 🔗 |
How to Structure a Data Science Project for Maintainability (with DVC) | 🔗 | 🔗 | 🔗 |
How to Structure an ML Project for Reproducibility and Maintainability (with Prefect) | 🔗 | 🔗 | |
GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model | 🔗 | 🔗 | |
Create Robust Data Pipelines with Prefect, Docker, and GitHub | 🔗 | 🔗 | |
Create a Maintainable Data Pipeline with Prefect and DVC | 🔗 | 🔗 | |
Build a Full-Stack ML Application With Pydantic And Prefect | 🔗 | 🔗 | 🔗 |
Streamline Code Updates with DVC and GitHub Actions | 🔗 | 🔗 | 🔗 |
Create Observable and Reproducible Notebooks with Hex | 🔗 | 🔗 | 🔗 |
Build Reliable Machine Learning Pipelines with Continuous Integration | 🔗 | 🔗 | 🔗 |
Automate Machine Learning Deployment with GitHub Actions | 🔗 | 🔗 | 🔗 |
How to Build a Fully Automated Data Drift Detection Pipeline | 🔗 | 🔗 | 🔗 |
Data Management Tools
Title | Article | Repository | Video |
---|---|---|---|
Introduction to DVC: Data Version Control Tool for Machine Learning Projects | 🔗 | 🔗 | 🔗 |
Great Expectations: Always Know What to Expect From Your Data | 🔗 | 🔗 | |
Validate Your pandas DataFrame with Pandera | 🔗 | 🔗 | 🔗 |
Introduction to Schema: A Python Libary to Validate your Data | 🔗 | 🔗 | |
How to Create Fake Data with Faker | 🔗 | 🔗 | |
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing | 🔗 | 🔗 | 🔗 |
What is dbt (data build tool) and When should you use it? | 🔗 | 🔗 | 🔗 |
Streamline dbt Model Development with Notebook-Style Workspace | 🔗 | 🔗 | 🔗 |
Testing
Title | Article | Repository | Video |
---|---|---|---|
Pytest for Data Scientists | 🔗 | 🔗 | 🔗 |
4 Lessor-Known Yet Awesome Tips for Pytest | 🔗 | 🔗 | |
DeepDiff — Recursively Find and Ignore Trivial Differences Using Python | 🔗 | 🔗 | |
Checklist — Behavioral Testing of NLP Models | 🔗 | 🔗 | |
Detect Defects in a Data Pipeline Early with Validation and Notifications | 🔗 | 🔗 | 🔗 |
Write Readable Tests for Your Machine Learning Models with Behave | 🔗 | 🔗 | 🔗 |