Complete-Life-Cycle-of-a-Data-Science-Project
CREDITS:All corresponding resources
MOTIVATION:Motivation to create this repository to help upcoming aspirants and help to others in the data science field
https://www.theinsaneapp.com/2021/03/how-to-build-machine-learning-project.html
**** If you like my work. please buy me a coffee it motivate me -> https://www.buymeacoffee.com/achuthasubhash?new=1 ****
Business understanding
1.Data collection
Data consists of 3 kinds
a.Structure data (tabular data,etc...)
b.Unstructured data (images,text,audio,etc...)
c.semi structured data (XML,JSON,etc...)
variable
a.qualitative (nominal,ordinal,binary)
b.quantitative(discrete,continuous)
https://www.chi2innovations.com/blog/discover-data-blog-series/data-types-101/
database scraping data from websites purchasing data data from surveys data, sensors, cameras, apis etc.
cleanlab https://l7.curtisnorthcutt.com/cleanlab-python-package https://github.com/cgnorthcutt/cleanlab https://github.com/cgnorthcutt/label-errors https://github.com/cgnorthcutt/rankpruning https://github.com/subeeshvasu/Awesome-Learning-with-Label-Noise
Measure Data Quality ydata-quality https://github.com/ydataai/ydata-synthetic https://towardsdatascience.com/how-can-i-measure-data-quality-9d31acfeb969
a.Web scraping best article to refer-https://towardsdatascience.com/choose-the-best-python-web-scraping-library-for-your-application-91a68bc81c4f
https://www.kdnuggets.com/2021/02/6-web-scraping-tools.html
https://www.bigdatanews.datasciencecentral.com/profiles/blogs/top-30-free-web-scraping-software
https://towardsdatascience.com/6-web-scraping-tools-that-make-collecting-data-a-breeze-457c44e4411d
https://medium.com/analytics-vidhya/master-web-scraping-completly-from-zero-to-hero-38051423256b
1.Beautifulsoup https://www.freecodecamp.org/news/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe/
mechanicalsoup https://analyticsindiamag.com/mechanicalsoup-web-scraping-custom-dataset-tutorial/
2.Scrapy,PyScrappy,Pandas Datareader,Instaloader,lxml
3.Selenium https://www.freecodecamp.org/news/better-web-scraping-in-python-with-selenium-beautiful-soup-and-pandas-d6390592e251/
4.Request to access data
5.AUTOSCRAPER - https://github.com/alirezamika/autoscraper https://www.youtube.com/watch?v=9BQ353Yu1D0 https://www.analyticsvidhya.com/blog/2021/04/automate-web-scraping-using-python-autoscraper-library/
scrapeasy Scrape Any Website in Seconds with One Line of Code https://github.com/joelbarmettlerUZH/Scrapeasy
Scrap Images From E-Commerce Website Using AutoScraper https://www.analyticsvidhya.com/blog/2021/05/scrap-images-from-e-commerce-website-using-autoscraper-library/
amazon auto scraper library https://webautomation.io/
Listly https://www.listly.io/r/stdfr
FiftyOne Now easier to download and evaluate https://towardsdatascience.com/googles-open-images-now-easier-to-download-and-evaluate-with-fiftyone-615ce0482c02
webbot https://pypi.org/project/webbot/
gazpacho https://github.com/maxhumber/gazpacho
html_scraper_streamlit_app https://www.youtube.com/watch?v=6U5xJ3mXRKA&feature=youtu.be
6.Twitter scraping tool (𝚝𝚠𝚒𝚗𝚝 or tweepy or tweetlib)-https://github.com/twintproject/twint
twitterscraper https://www.youtube.com/watch?v=MpIi4HtCiVk
twython https://github.com/ryanmcgrath/twython
twarc https://github.com/DocNow/twarc https://scholarslab.github.io/learn-twarc/01-quick-start.html
snscrape extract twitterr data https://github.com/JustAnotherArchivist/snscrape
Scweet A simple and unlimited twitter scraper https://github.com/Altimis/Scweet
GetOldTweets3,GoogleNews,snscrape,GetOldTweets3
Scrape Twitter for Tweets https://github.com/taspinar/twitterscraper
HAR File Web Scraper https://stevesie.com/har-file-web-scraper https://www.youtube.com/watch?v=LcqVDfueb8g
https://analyticsindiamag.com/complete-tutorial-on-twint-twitter-scraping-without-twitters-api/
https://developer.twitter.com/en/docs
pytrends https://medium.com/nerd-for-tech/scraping-data-from-online-platforms-to-enhance-time-series-forecasts-6eec3c68636d
Scraping Instagram -instaloader https://thecleverprogrammer.com/2020/07/30/scraping-instagram-with-python/
Instascrape
Scrape LinkedIn Profiles with ProxyCurl API
Reddit Dataset Using PSAW and PRAW in Python
Scraping Reddit using Python Reddit API Wrapper (PRAW)
Scrape Wikipedia wikipedia https://www.thepythoncode.com/article/access-wikipedia-python
patang - Scrape Product details from eCommerce Sites with Puppeteer and DOM String https://www.youtube.com/watch?v=3sgxRmyOuXs
Download Wikipedia https://www.wikidata.org/wiki/Wikidata:Main_Page https://www.youtube.com/watch?v=hC1rY4lRY0s https://towardsdatascience.com/an-efficient-way-to-read-data-from-the-web-directly-into-python-a526a0b4f4cb
Web Scraping to Create a CSV File https://thecleverprogrammer.com/2020/08/08/web-scraping-to-create-csv/
Amazon Web Scraper, Amazon Auto Scraper
7.urllib
8.pattern
9.Octoparse Easy Web Scraping https://www.octoparse.com/
prowebscraper https://prowebscraper.com/features
Web scraper https://chrome.google.com/webstore/detail/web-scraper-free-web-scra/jnhgnonknehpejjnehehllkliplmbmhn?hl=en
ParseHub https://www.parsehub.com/ https://analyticsindiamag.com/parsehub-no-code-gui-based-web-scraping-tool/
PyScrappy https://github.com/mldsveda/PyScrappy https://www.analyticsvidhya.com/blog/2022/02/web-scraping-with-pyscrappy/
Gazpacho https://github.com/maxhumber/gazpacho
ScrapeSimple Website: https://www.scrapesimple.com
Content Grabber https://contentgrabber.com/Manual/understanding_the_concept.htm
Crawly https://crawly.diffbot.com/
Apify https://apify.com/
Mozenda Website: https://www.mozenda.com/
obsei https://github.com/lalitpagaria/obsei
Diffbot https://analyticsindiamag.com/diffbot/
Trustpilot,webhose,scrapingbot
lxml https://lxml.de/index.html#introduction
ScrapingBee https://analyticsindiamag.com/scrapingbee-api/
Scrape HTML tables https://www.youtube.com/watch?v=6U5xJ3mXRKA&feature=youtu.be or pd.read_html
requests-html https://github.com/kennethreitz/requests-html
newspaper https://github.com/codelucas/newspaper https://www.youtube.com/watch?v=Hfry5XnISyc
newspaper3k: https://newspaper.readthedocs.io # easily extract text from articles
newscatcher https://github.com/kotartemiy/newscatcher https://www.youtube.com/watch?v=pHzOuizZq4I
patang (extract product details) https://github.com/tejazz/patang
lisc https://github.com/lisc-tools/lisc
Helena WEB AUTOMATION FOR END USERS https://helena-lang.org/
pandas(read_html)
wget,curl,parsehub,webhouse,octoparse,scraping bot,scraping bee,Common,Content Grabber,Docparser,Scraper API,Import.io,Altair Monarch,WebAutomation.io,WebScraper.io,Scrape.do, AvesAPI, ParseHub, Import.io, Octoparse, Scrapingdog, Diffbot, ScrapingBee, Grepsr, Scraper API, Scrapy
Crawl Crawly https://crawly.diffbot.com/
HTML basics for web scraping,Web Scraping with Octoparse,Web Scraping with Selenium
10-best-web-scraping-tools https://www.scraperapi.com/blog/the-10-best-web-scraping-tools/
https://www.kdnuggets.com/2021/02/6-web-scraping-tools.html
https://analyticsindiamag.com/complete-learning-path-to-web-scraping-with-all-major-tools/ https://towardsdatascience.com/6-web-scraping-tools-that-make-collecting-data-a-breeze-457c44e4411d
https://towardsdatascience.com/6-web-scraping-tools-that-make-collecting-data-a-breeze-457c44e4411d https://www.kdnuggets.com/2018/02/web-scraping-tutorial-python.html
https://www.octoparse.com/ https://github.com/tirthajyoti/pydbgen https://www.mozenda.com/ https://www.mockaroo.com/ https://lionbridge.ai/ https://www.mturk.com/ https://appen.com/
11.GoogleImageCrawler,google_images_download,bing_image
https://www.freepik.com/popular-photos , https://stocksnap.io/ , https://www.pexels.com/ ,https://unsplash.com/ , https://pixabay.com/
b.Web Crawling
https://python.libhunt.com/scrapy-alternatives
Flat Data https://octo.github.com/projects/flat-data
b.3rd party API'S
22 APIs every data scientist should learn https://www.springboard.com/library/data-science/top-apis-for-data-scientists/
c.creating own data (manual collection eg:google docx,servey,etc...) primary data
d.etl awesome ETL https://github.com/pawl/awesome-etl#python https://github.com/achuthasubhash/awesome-etl
38x faster data pipelines with tf.data
d.Databases
Databases are 2 kind sequel and no sequel database
sql,sql lite,mysql,mongodb,montydb,hadoop,elastic search,cassendra,amazon s3,hive,googlebigtable,AWS DynamoDB,HBase,oracle db
sql https://mode.com/sql-tutorial/ https://www.w3schools.com/sql/
sql in python https://medium.com/jbennetcodes/how-to-rewrite-your-sql-queries-in-pandas-and-more-149d341fc53e
PyMongo https://analyticsindiamag.com/guide-to-pymongo-a-python-wrapper-for-mongodb/
Cloud AI Data labeling service https://cloud.google.com/ai-platform/data-labeling/docs?utm_source=youtube&utm_medium=Unpaidsocial&utm_campaign=guo-20200503-Data-Labeling
e.Online resources - ultimate resource https://datasetsearch.research.google.com/ https://medium.com/swlh/where-to-find-awesome-machine-learning-datasets-6bb909a3f350
10 BEST DATA COLLECTION TOOLS FOR EFFECTIVE RESULTS https://www.analyticsinsight.net/10-best-data-collection-tools-for-effective-results/
https://www.freecodecamp.org/news/https-medium-freecodecamp-org-best-free-open-data-sources-anyone-can-use-a65b514b0f2d/ https://research.google/tools/datasets/
Machine learning datasets https://www.datasetlist.com/ https://wiki.pathmind.com/open-datasets
https://guides.library.cmu.edu/az.php https://docs.microsoft.com/en-us/azure/azure-sql/public-data-sets https://registry.opendata.aws/ https://paperswithcode.com/datasets https://datasets.quantumstat.com/ https://www.quandl.com/ http://dataportals.org/ https://opendatamonitor.eu/frontend/web/index.php?r=dashboard%2Findex https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research https://www.quora.com/Where-can-I-find-large-datasets-open-to-the-public https://www.reddit.com/r/datasets/ https://ourworldindata.org/ https://data.worldbank.org/ https://data.world/ https://data.census.gov/cedsci/ https://data.seattle.gov/ https://www.openml.org/ https://visualdata.io/discovery
World’s Largest Data Platform https://worlddata.ai/
Awesome list of datasets in 100+ categories https://www.kdnuggets.com/2021/05/awesome-list-datasets.html
https://sebastianraschka.com/blog/2021/ml-dl-datasets.html https://enoumen.com/2021/04/23/data-sciences-datasets-data-visualization-data-analytics-big-data-data-lakes/
https://serokell.io/blog/best-machine-learning-datasets https://medium.com/@ODSC/25-excellent-machine-learning-open-datasets-940ca2124dfc
1)kaggle-https://www.kaggle.com/datasets , 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚔𝚊𝚐𝚐𝚕𝚎𝚍𝚊𝚝𝚊𝚜𝚎𝚝𝚜
Downloading Kaggle datasets directly into Google Colab -https://towardsdatascience.com/downloading-kaggle-datasets-directly-into-google-colab-c8f0f407d73a
How to Download Kaggle Datasets using Jupyter Notebook https://www.analyticsvidhya.com/blog/2021/04/how-to-download-kaggle-datasets-using-jupyter-notebook/
2)https://sebastianraschka.com/blog/2021/ml-dl-datasets.html
movielens-https://grouplens.org/datasets/movielens/latest/
dagshub datset https://dagshub.com/explore/datasets
100+ of the Best Free Data Sources For Your Next Project https://www.columnfivemedia.com/100-best-free-data-sources-infographic/
World and national data, maps & rankings https://knoema.com/atlas/sources
3)data.gov-https://data.gov.in/
4)uci-https://archive.ics.uci.edu/ml/datasets.php https://github.com/tirthajyoti/UCI-ML-API
5)Group Lens dataset https://grouplens.org/
Wikipedia ML Datasets https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research
AWS Open Data Registry,data.gov (portals),YELP Open dataset,UNICEF Dataset,Big Bad NLP Database,Microsoft Dataset
6)world3bank https://data.world/ , worldbank
7)Google Cloud BigQuery public datasets
Google Public Datasets-cloud.google.com/bigquery/public-data/
Google Cloud Data Catalog https://cloud.google.com/data-catalog
Academic Torrents-https://academictorrents.com/check.htm?returnto=%2Fbrowse.php
8)online hacktons
Datasets https://www.paperswithcode.com/datasets
9)image data from google_images_download
https://www.visualdata.io/discovery
http://xviewdataset.org/#dataset
https://ai.googleblog.com/2016/09/introducing-open-images-dataset.html
10)image data from Bing_Search
image data from simple_image_download https://github.com/RiddlerQ/simple_image_download
11)https://www.columnfivemedia.com/100-best-free-data-sources-infographic
graviti Unleash the Power of Unstructured Data https://www.graviti.com/?utm_medium=0730Ismael
12)Reddit:https://lnkd.in/dv5UCD4 https://www.reddit.com/r/datasets/
praw.Reddit https://github.com/praw-dev/praw
13)https://datasets.bifrost.ai/?ref=producthunt
14)data.world:https://lnkd.in/gEK897K
15)https://data.world/datasets/open-data
https://tinyletter.com/data-is-plural