This project was done with a dataset of players of Sngame Company. The project aimed to predict the time spending game and the game purchase made by these players. I used the regression model, precisely with 2 linear models
#verify data set and variables
#Summary of the data set
#Describe data set to get value number of valid case, missing value, standard deviation, median absolute deviation (mad), mean, median, min, max, range and standard error
- The dataset has 300 observations in total, with 154 female players, and 146 male players. The age is range from 20-39 years old.
- The mean age: 26.79 years old.
- There are 4 players who have elementary school education, 100 players in high school, and 196 players at the university level.
- The minimum salary of the players is 1012, the maximum is 4986, the salary mean is 2973.
- The mean of the number of people has social network site 2670, and the median is 2592 (sn.conn).
- The mean of the number of minutes that people spend on the social network is 159.54 (sn.min). But the mean of the number of minutes that players spend on playing games 30.921 (game.min)
- The mean of the amount of money the players spend on purchasing on the features, upgrades, and power-ups, etc. in the games is 18.97 (game.purchase)
gender age edu salary sn.conn
Female:154 Min. :20.00 Elementary School: 4 Min. :1012 Min. :1800
Male :146 1st Qu.:22.00 High School :100 1st Qu.:1866 1st Qu.:2208
Median :26.00 University :196 Median :3028 Median :2592
Mean :26.79 Mean :2973 Mean :2670
3rd Qu.:31.00 3rd Qu.:3864 3rd Qu.:3032
Max. :39.00 Max. :4986 Max. :4028
sn.min game.min game.purchase
Min. : 66.98 Min. : 8.092 Min. : 1.00
1st Qu.:128.46 1st Qu.:20.857 1st Qu.: 7.00
Median :150.90 Median :29.669 Median :14.00
Mean :159.54 Mean :30.921 Mean :18.97
3rd Qu.:184.45 3rd Qu.:38.689 3rd Qu.:25.00
Max. :296.08 Max. :81.466 Max. :88.00
#create the data frame containing only numerical variables: age, salary, game.min, game.purchase
sngame_num <- sngame[c(2,4,7,8)]
#Calculates the correlation for all the pairs of variables
#Testing the correlation of all numerical variables and proability values of this correlation, particularly to see p-value of two pairs (game.min& game.purchase and game.min&age)which have high correlation as we found earlier
This plot shows a positive association between time spent on game and age of players. It means that the older players are, the more time they spend on playing the game.
This plot shows a positive correlation of the salary and game purchase. When a player earns the higher salary, they will also purchase more upgrades and extra features in the game.
The plot shows the positive correlation between game.purchase and game.min variables. The player who spends longer time on the social network sites to play the game will spend more money for game purchasing.
The correlation matrix between the variables is as follows:
age 1.0000000 -0.035873701 0.510517894 0.3063233
salary -0.0358737 1.000000000 0.007457177 0.5062324
game.min 0.5105179 0.007457177 1.000000000 0.7839289
game.purchase 0.3063233 0.506232439 0.783928866 1.0000000
#t-test to understand if there is difference between two groups
We conducted a t-test to check whether the mean of game.min differs among male and females. The t-test showed no significant difference between male and female (p=0.85).
We also use a t-test to check whether the mean of game. purchase differs among male and females. The t-test showed no significant difference between male and female (p=0.64)
Therefore, we can conclude that the gender of player has no affection for their time and money spending on the game.
#Graphical analysis of age, salary, game.min and game.purchase to identify the peak value of each variable
In the dataset of Sngame Company, the players are mainly people at the age of 20s.
Based on salary, there are two groups of correspondent that has the highest number of players. They are players who earn within a range of 1500-2000 euro and another group earns within 3500-4000 euro.
According to the diagram, the group of players who spend average 20 to 40 minutes for playing the game is the most common group.
The diagram shows that the players tend to spends around 15-20 euro at most for upgrading game and extra features.