Data Mining : IME 672A

From Indian Institute of Technology, Kanpur
Jump to: navigation, search



This course is for people who are interested in Data Analytics, Data Mining. People who are interested in ML(Machine Learning) Techniques can also take this course, even though working environments might differ, the underlying basic concepts still remain the same. The data mining task is automatic or semi-automatic analysis of large quantities of data to extract previously unknown, interesting patterns such as groups of data records, unusual records, and dependencies. This usually involves using database techniques such as spatial indices. These patterns can then be seen as a kind of summary of the input data, and may be used in further analysis or, for example, in machine learning and predictive analytics.


This course starts with the basic question of what exactly is data mining and goes all the way to how to analyse and mine data in best possible way and hands on implementation of codes for doing this. It covers various steps that are required to be taken for handling Data and discovering patterns.

KDD(Knowledge Discovery) Outline

Simply put, Data Analytics should deal with 3V's :
1.Volume: Quantity of transactions, events, or amount of history Attributes, dimensions, or predictive variables

2.Variety: Assortment of data Data does not have a predefined data model and/or does not fit well into a relational database Text, audio, video, image, geospatial, Internet data Unstructured data

3.Velocity: Speed at which data is created, accumulated, ingested, and processed

This course explains clearly how to handle these 3V's efficiently, And how it is used in Business Intelligence.

Course Info

Comes under IME department. It is 9 credits course, consisting of around 40 lectures, with weekly 3 classes. Pre-requistes are Statistics course like MSO 201.Course instructor Faiz Hamid.

Course Contents

You will come to know various models like Linear and Logical Regression models, Decision Trees and Random Forests, Naive Bayesian Classification, SVM's, various types of clustering models. Various Boosting techniques to improve the efficiency of Predictive models. You will learn to write codes in R language and how to use RStudio.


These are assignments are meant to be done in teams of 4 or 5. You have to mine the rough data given to you, then describe and present various techniques used by you. And in some assignments, you need to read few journals and select few of them in which you find interesting modern techniques .


In this course project you have to combine and apply all the methods you have learned and create a better model of prediction for a given data set. This project is the best way test all the things you learned and is quite helpful.

Personal tools