K_means_clustering_airbnb

This is a study case to cluster and group customers based on their data and consumption behaviour data, analyze the core characteristics of user groups.
Alternatives To K_means_clustering_airbnb
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Data Science Ipython Notebooks23,924
6 months ago26otherPython
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
100 Days Of Ml Code17,892
a year ago9mitJupyter Notebook
100-Days-Of-ML-Code中文版
Datasets15,649920821 hours ago52June 15, 2022526apache-2.0Python
🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools
Ydata Profiling10,411
3 days ago153mitPython
Create HTML profiling reports from pandas DataFrame objects
Abu9,650
a month ago2gpl-3.0Python
阿布量化交易系统(股票,期权,期货,比特币,机器学习) 基于python的开源量化交易,量化投资架构
Mlcourse.ai8,670
14 days ago3otherPython
Open Machine Learning Course
Python Small Examples7,135
7 months ago7July 19, 20211Python
告别枯燥,致力于打造 Python 实用小例子,更多Python良心教程见 Python中文网 http://www.zglg.work
Ai Learn5,131
9 months ago19
人工智能学习路线图,整理近200个实战案例与项目,免费提供配套教材,零基础入门,就业实战!包括:Python,数学,机器学习,数据分析,深度学习,计算机视觉,自然语言处理,PyTorch tensorflow machine-learning,deep-learning data-analysis data-mining mathematics data-science artificial-intelligence python tensorflow tensorflow2 caffe keras pytorch algorithm numpy pandas matplotlib seaborn nlp cv等热门领域
Pixie4,447
a day ago88April 24, 2021236apache-2.0C++
Instant Kubernetes-Native Application Observability
Orange34,018574019 hours ago56April 02, 202280otherPython
🍊 :bar_chart: :bulb: Orange: Interactive data analysis
Alternatives To K_means_clustering_airbnb
Select To Compare


Alternative Project Comparisons
Readme

Case study: Airbnb K-means clustering

Background

Airbnb has a wide range of user travel scenarios around the world. It collects comprehensive user behavior data on its app, webpage and through various marketing channels. Through these data, it is of great importance of Airbnb develpment and it is the cornerstone to target potential target customer groups and formulate corresponding marketing strategies.

Analysis goal

Based on customer data and consuming behaviour data

  • cluster group customers using jupyter notebook
  • analyze core characteristics of customer groups through analyzing data

1. Data analysis

Index explanation

  • id: user unique id
  • date_acc_reg: user registration date
  • date_first_booking: Date of first booking
  • gender
  • age
  • married
  • children: number of children
  • ios: Booked on the iphone
  • android: Booked on the Android
  • mobile_web: Booked on the mobile
  • web: Booked on the computer
  • language_en: use English language
  • language_chn: use Chinese language
  • country_usa: destination is USA
  • country_eur: destination is European countries

2. Univariate analysis

2.1 Handle exception of numerical variables

Conclusion: The age of users ranges from 18 to 80, with an average age of 36 years old and a median age of 33 years old. Among them, users aged 28-32 are the main consumers.

2.2 Categorical variable

  • 2.2.1 Adjust date variable
    1. extract user registration date and change its Dtype from object to datetime64
    1. compute how many years the users have been registered

Conslusion: 1. the minimum year length of registration is 7 years while the longest is 11 years. 2. the shortest time since the first booking is 6 years and the longest is 11 years

  • 2.2.2 Adjust gender to be dummy variables

3. Relevant and visualization

3.1 Observe the relationship between age and other users personal information

3.1heatmap-view

Conclusion:

    1. The users age is positively correlated with the variables of language_en and children, indicating that Airbnb is more popular in families with higher age, higher frequency of using English and more children.
    1. Age has a negative correlation with the country_usa variable, indicating that the greater the age of the user, the less likely they are to use Airbnb in USA.

3.2 Observe the relationship between age and the user's ordering channel and gender

3.2heatmap-view

Conclusion:

    1. As age increases, users will be more inclined to order on the computers.
    1. Elder users tend to order on android phones while younger users tend to order on iPhones.
    1. Male users prefer to order on the webpages, and they dont like to order on Android phones.
    1. The correlation between age and the user's ordering channel and gender is too weak, which is of little significance for subsequent analysis.

4. Model establishment and evaluation

4.1 Establish model

The selection is based on the user's behavioral preferences and consideration of the user's personal information

  • variables of android, mobile_web, webpage, ios reflects the customers behavioral preferences
  • age is the variable of user information

4.2 Data visualization, view univariate dimensional analysis results

There are only two dimensions of 0 and 1 in ios, so its visualization is not good.

4.3 Model evaluation

  • cluster 0 prefer order on webpageless on android phoens.
  • cluster 2 prefer ios,very few order via webpage.

4.4 Model optimization

  • age tends to be elder
  • cluster 1 prefer order on webpage,less on android phoens.
  • cluster 2 do not like order on webpages.
  • cluster 3 prefer ios, very few order on webpages
  • age of cluster 4 is eldest, and cluster 4 and 0 data distinction is very small, which is of little significance to actual analysis.

5 Summarize

  • Pay attention to the heavy Airbnb users who are 28-32 years old and registered year for 6-7 years, and develop corresponding marketing strategies for customers with low responsiveness.

  • The users age is positively correlated with the variables of language_en and children, indicating that Airbnb is more popular in families with higher age, higher frequency of using English and more children.

  • Age has a negative correlation with the country_usa variable, indicating that the greater the age of the user, the less likely they are to use Airbnb in USA.

  • As age increases, users will be more inclined to order on the computers.

  • Elder users tend to order on android phones while younger users tend to order on iPhones.

  • Male users prefer to order on the webpages, and they dont like to order on Android phones.

  • The correlation between age and the user's ordering channel and gender is too weak, which is of little significance for subsequent analysis.

Popular Machine Learning Projects
Popular Pandas Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Machine Learning
Pandas