Feature Engineering in Machine Learning - Data Science

Feature Engineering is a process by which features or predictor variables are extracted from the datasets available.

This is probably the most important and difficult part of the Data Science models. Following are the key points to be remembered for Feature Engineering:

1.       Learn enough domain before getting in to feature extraction.
2.       Try to extract features that help you in predicting the outcome of class variable.
3.       Extract as many features as you can.
4.       Important point to remembered is, before feeding data to Machine Learning algorithm, make sure that each row represents the features of unique entity and each column represents unique feature.
5.       Descriptive statistics play a major role in feature engineering.
6.       Features extracted might vary from Scientist to Scientist and it is solely dependent up on the creativity of individual.
7.       Many researchers worry about the importance of features extracted, but in reality once done with feature extraction, there are many statistical techniques and machine learning algorithms help to identify them.

Example:

Loyal Customer Analysis:

Problem Statement:

                Identify the loyal customers from the historical demographic, transactions, offers data.

Consider we have the following data from a retail store:

1.       Customer ID, Transaction ID, Product, Brand, Category, Company, Date, Quantity
2.       Customer ID, Demographic details
3.       OfferID, Offer details.

Now think a while on what features need to be extracted to know the loyal customer. In retail domain, following features need to extracted:
1.       Recent visit
2.       Frequency of visits from the past 7,14,30,60,90,180 days.
3.        Monetary invested  from the past 7,14,30,60,90,180 days.
4.       Quantity bought.
5.       Favorite category
6.       Favorite Brand
7.       Favorite Company

Finally input to machine learning algorithm looks something like this:

Customer ID, Recency, Frequency of visits, Monetary invested, Quantity, Favorite Category, Brand, Company, LoyalCustomer(Yes or No)

One of my projects in which I dealt is with 22GB of data, but it came down to 54 MB, when feature extraction is done.

This new minute dataset resulted in 95% accuracy of prediction model.


               



7 comments:

  1. nice..
    can you send me any test study on same kind of data?
    mail: prtk.bit@gmail.com

    ReplyDelete
  2. Hi Pratik, sure i will.

    I would borrow 3 days more time, before i come up with a new post for you question :).

    ReplyDelete
  3. Amazing.. I'm being interviewed by quadratyx right now... hope i make it.. i absolutely love this work...

    ReplyDelete
  4. I'm glad to hear that, Data Science. Good luck to you. Blogging is a great thing, and you get better with practice. Data Science training in Hyderabad One of the best ways to grow is to read other people's blogs. See what they do, how they do things. It's always food for thought, and sometimes, it's downright inspiring.

    ReplyDelete