How to find out-
import matplotlib
%matplotlib inline
import seaborn as sns
sns.boxplot(adult_data['age']);
Age 90 is extreme value. Most likely, it should be removed from dataset as it may hamper our analysis. But, whether to consider that as outlier or not depends on buiness knowledge. You may need to study these people separatly if your problem statement requires. Here, we will remove this observation from data.
adult_data = adult_data[adult_data['age'] < 90]
adult_data.shape
describe()
adult_data.describe()
You can have a look at the data spread - min, max, median etc and take the decision to keep or remove some data beyond some threshold. For example, keep data between 0.5%ile-99.5%ile and remove rest of the data.
value_counts() - Lets draw bar plot for 'relationship' and see if there is any category which looks peculiar.
adult_data['relationship'].value_counts().plot( kind='bar')
We can keep the all the values here as nothing looks odd.
We can do same analysis on other columns.
Removal of row with age = 90 was a data preparation step above