Feature Engineering is a process by which features or predictor
variables are extracted from the datasets available.
This is probably the most important and difficult part of the Data
Science models. Following are the key points to be remembered for Feature
Engineering:
1. Learn enough domain before getting in
to feature extraction.
2. Try to extract features that help you
in predicting the outcome of class variable.
3. Extract as many features as you can.
4. Important point to remembered is,
before feeding data to Machine Learning algorithm, make sure that each row
represents the features of unique entity and each column represents unique
feature.
5. Descriptive statistics play a major
role in feature engineering.
6. Features extracted might vary from
Scientist to Scientist and it is solely dependent up on the creativity of
individual.
7. Many researchers worry about the
importance of features extracted, but in reality once done with feature
extraction, there are many statistical techniques and machine learning
algorithms help to identify them.
Example:
Loyal Customer Analysis:
Problem Statement:
Identify the loyal customers from the historical
demographic, transactions, offers data.
Consider we have the
following data from a retail store:
1. Customer ID, Transaction ID, Product,
Brand, Category, Company, Date, Quantity
2. Customer ID, Demographic details
3. OfferID, Offer details.
Now think a while on what features need to be extracted to
know the loyal customer. In retail domain, following features need to
extracted:
1. Recent visit
2. Frequency of visits from the past 7,14,30,60,90,180
days.
3. Monetary invested from the past 7,14,30,60,90,180 days.
4. Quantity bought.
5. Favorite category
6. Favorite Brand
7. Favorite Company
Finally input to machine learning algorithm looks something
like this:
Customer ID, Recency, Frequency of visits, Monetary invested,
Quantity, Favorite Category, Brand, Company, LoyalCustomer(Yes or No)
One of my projects in which I dealt is with 22GB of data, but it came down to
54 MB, when feature extraction is done.
This new minute dataset resulted in 95% accuracy of prediction model.
This new minute dataset resulted in 95% accuracy of prediction model.
nice..
ReplyDeletecan you send me any test study on same kind of data?
mail: prtk.bit@gmail.com
Hi Pratik, sure i will.
ReplyDeleteI would borrow 3 days more time, before i come up with a new post for you question :).
Amazing.. I'm being interviewed by quadratyx right now... hope i make it.. i absolutely love this work...
ReplyDeleteIt is very nice articles thank you for sharing.
ReplyDeleteData ScienceTraining in Hyderabad
Data Science Course Content
Data Science Interview Questions
Data Science Training in ameerpet
Data Science Online Training in Hyderabad
Great Article
ReplyDeleteData Mining Projects
Python Training in Chennai
Project Centers in Chennai
Python Training in Chennai