Should we hire new staff?

Hiring staff can be a very complicated and late task, because depending on the position, candidates are needed who have to meet certain capabilities and cover other skills. Finding the ideal candidate can take a long time, some studies show that this task can last up to 52 days.

Listed below are some points to take into consideration when hiring new employees.

Hiring and retaining employees are extremely complex tasks within organizations that require capital, time and skills.
The heads of the Human Resources department dedicate approximately 40% of their working hours to tasks that are related to the hiring and dismissal of employees, tasks that generally do not bring income to the organization.
Companies can spend between 15% to 20% of the employee’s salary to hire a new candidate.
Depending on the line of business of the organization, in certain jobs, the company has to invest in equipment so that the new employee can work, giving as an example, personal protection equipment, and when an employee decides to leave the position, there is the possibility that this does not return the equipment, so it translates directly into a non-returnable investment for the company.

A database has been generated with information regarding the resignations that have been presented in a company. In order to attack the problem of staff attrition, an analysis is carried out with the aim of predicting which employees are likely to resign their position.

A signal on time can be important for the company to generate solutions to retain the employee and avoid hiring a new one.

Within the set of data provided by the department is information related to the following items:

Labor participation
Education
Work satisfaction
Performance Rating
Satisfaction in relationships
Work-Life Balance

The data to be used for the analysis is Open Database available at: IBM HR Analytics Employee Attrition & Performance.

Understanding the data

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')

# Importing the data
employee_df = pd.read_csv('./Human_Resources.csv')
employee_df.head()

	Age	Attrition	BusinessTravel	DailyRate	Department	DistanceFromHome	Education	EducationField	EmployeeCount	EmployeeNumber	...	RelationshipSatisfaction	StandardHours	StockOptionLevel	TotalWorkingYears	TrainingTimesLastYear	WorkLifeBalance	YearsAtCompany	YearsInCurrentRole	YearsSinceLastPromotion	YearsWithCurrManager
0	41	Yes	Travel_Rarely	1102	Sales	1	2	Life Sciences	1	1	...	1	80	0	8	0	1	6	4	0	5
1	49	No	Travel_Frequently	279	Research & Development	8	1	Life Sciences	1	2	...	4	80	1	10	3	3	10	7	1	7
2	37	Yes	Travel_Rarely	1373	Research & Development	2	2	Other	1	4	...	2	80	0	7	3	3	0	0	0	0
3	33	No	Travel_Frequently	1392	Research & Development	3	4	Life Sciences	1	5	...	3	80	0	8	3	3	8	7	3	0
4	27	No	Travel_Rarely	591	Research & Development	2	1	Medical	1	7	...	4	80	1	6	3	3	2	2	2	2

We are at face on a dataset with 1470 samples and 35 variables or characteristics.

Once the dataset is imported, it is time to start with the analysis. It is necessary to apply an exploratory data analysis, this before applying any machine learning algorithm. This project shows how the variables that are considered most important have influence in the decision-making, visualizations and conclusions are being analyzed.

On this project, three different algoriths will be applied. We will have different results on ecah one, the scientits will be in charge to select the best to apply on the company. We will analize the data an apply:

Logistic Regression.
Random Forest.
Logistic Neural network.

I invite you to explore about it by reading the complete project available in pdf format. Or, by going to its repository and having access to the jupyter Notebook file

Extra material regarding the post

We provide you with a .pdf version of this post, and you can also access its GitHub repository by clicking the corresponding button.