Feature Engineering in Machine Learning - Data Science

Feature Engineering is a process by which features or predictor variables are extracted from the datasets available.

This is probably the most important and difficult part of the Data Science models. Following are the key points to be remembered for Feature Engineering:

1.       Learn enough domain before getting in to feature extraction.
2.       Try to extract features that help you in predicting the outcome of class variable.
3.       Extract as many features as you can.
4.       Important point to remembered is, before feeding data to Machine Learning algorithm, make sure that each row represents the features of unique entity and each column represents unique feature.
5.       Descriptive statistics play a major role in feature engineering.
6.       Features extracted might vary from Scientist to Scientist and it is solely dependent up on the creativity of individual.
7.       Many researchers worry about the importance of features extracted, but in reality once done with feature extraction, there are many statistical techniques and machine learning algorithms help to identify them.

Example:

Loyal Customer Analysis:

Problem Statement:

                Identify the loyal customers from the historical demographic, transactions, offers data.

Consider we have the following data from a retail store:

1.       Customer ID, Transaction ID, Product, Brand, Category, Company, Date, Quantity
2.       Customer ID, Demographic details
3.       OfferID, Offer details.

Now think a while on what features need to be extracted to know the loyal customer. In retail domain, following features need to extracted:
1.       Recent visit
2.       Frequency of visits from the past 7,14,30,60,90,180 days.
3.        Monetary invested  from the past 7,14,30,60,90,180 days.
4.       Quantity bought.
5.       Favorite category
6.       Favorite Brand
7.       Favorite Company

Finally input to machine learning algorithm looks something like this:

Customer ID, Recency, Frequency of visits, Monetary invested, Quantity, Favorite Category, Brand, Company, LoyalCustomer(Yes or No)

One of my projects in which I dealt is with 22GB of data, but it came down to 54 MB, when feature extraction is done.

This new minute dataset resulted in 95% accuracy of prediction model.


               



5 comments: