15 What packages are used for data mining in R
IEEE Projects On Data Mining
IEEE Projects On Data Mining include text mining , image mining ,web mining. * IEEE data mining projects are done by java programming language in a more efficient manner * Usually, data mining projects are processed with internal and external datasets which contains lots of information * Many research scholars and students to choose data mining domain to do their projects
Data Mining Needs:
* Need to extract useful information from data and to interpret the data * Too much data and too little information
Data mining Algorithms:
* Hierarchical Clustering Algorithms * Supervised Algorithms * Unsupervised Algorithms * K-Means Algorithm * 5 Algorithm * K-NN Classification Algorithm * Support vector Machine Algorithm * Apriori Algorithm
List of few latest thesis topics in computer science is below:
* Thesis topics in data mining * Thesis topics in machine learning * Thesis topics in digital image processing * Latest thesis topics in Internet of things (IOT) * Research topics in Artificial Intelligence * Networking can be chosen as a thesis topic in computer science * Trending thesis topics in cloud computing * Data aggregation as a thesis topics in Big Data * Research topics in Software Engineering
Big Data is a term to denote the large volume of data which is complex tohandle. The data may be structured or unstructured. Structured data is anorganized data while unstructured data is an unorganized data. Big data can beexamined for the intuition that can give way to better decisions and schematicbusiness moves. The definition of big data is termed in terms of three Vs.These vs are: * Volume: Volume defines large volume of data from different sources * Velocity: It refers to the speed with which the data is generated * Variety: It refers to the varied amount of data both structured and unstructured.Application areas: * Government * Healthcare * Education * Finance * Manufacturing * Media * Sports * Privacy preserving big data publishing: a scalable k-anonymization approach using MapReduce. * Nearest Neighbour Classification for High-Speed Big Data Streams Using Spark. * Efficient and Rapid Machine Learning Algorithms for Big Data and Dynamic Varying Systems. * Disease Prediction by Machine Learning Over Big Data From Healthcare Communities. * A Parallel Multi-classification Algorithm for Big Data Using an Extreme Learning Machine.Thus you can prepare your project report or thesis report on this.
Data Mining is the process of identifying and establishing a relationshipbetween large datasets for finding a solution to a problem through analysis ofdata. There are various tools and techniques in Data Mining which givesenterprises and organizations the ability to predict futuristic trends. DataMining finds its application in various areas of research, statistics,genetics, and marketing. Following are the main techniques used in the processof Data Mining: * Decision Trees * Genetic Algorithm * Induction method * Artificial Neural Network * Association * Clustering
BELOW IS THE LIST OF FEW LATEST AND TRENDING RESERACH TOPICS IN DATA MINING
:- * Performance enhancement of DBSCAN density based clustering algorithm in data mining * The classification scheme for sentiment analysis of twitter data * To increase accuracy of min-max k-mean clustering in Data mining * To evaluate and improve apriori algorithm to reduce execution time for association rule generation * The classification scheme for credit card fraud detection in Data mining * To propose novel technique for the crime rate prediction in Data Mining * To evaluate and propose heart disease prediction scheme in Data Mining * Software defect prediction analysis using machine learning algorithms * A new data clustering approach for data mining in large databases * The diabetes prediction technique for Data mining using classification * Novel Algorithm for the network traffic classification in Data MiningAdvantages of Data Mining * Data Mining helps marketing and retail enterprises to study customer behavior. * Organizations into banking and finance business can get information about customer’s historical data and financial activities. * Data Mining help manufacturing units to detect faults in operational parameters. * Data Mining also helps various governmental agencies to track record of financial activities to curb on criminal activities.Disadvantages of Data Mining * Privacy Issues * Security Issues * Information extracted from data mining can be misused
Image Processing is another field in Computer Science and a popular topic fora thesis in Computer Science. There are two types of image processing – Analogand Digital Image Processing. Digital Image Processing is the process ofperforming operations on digital images using computer-based algorithms toalter its features for enhancement or for other effects. Through ImageProcessing, essential information can be extracted from digital images. It isan important area of research in computer science. The techniques involved inimage processing include transformation, classification, pattern recognition,filtering, image restoration and various other processes and techniques.Main purpose of Image ProcessingFollowing are the main purposes of image processing: * Visualization * Image Restoration * Image Retrieval * Pattern Measurement * Image RecognitionApplications of Image ProcessingFollowing are the main applications of Image Processing: * UV Imaging, Gamma Ray Imaging and CT scan in medical field * Transmission and encoding * Robot Vision * Color Processing * Pattern Recognition * Video Processing * To propose classification technique for plant disease detection in image processing * The hybrid bio-inspired scheme for edge detection in image processing * The HMM classification scheme for the cancer detection in image processing * To propose efficient scheme for digital watermarking of images in image processing * The propose block wise image compression scheme in image processing * To propose and evaluate filter based on internal and external features of an image for image de noising * To improve local mean filtering scheme for de noising of MRI images * To propose image encryption base d on textural feature analysis and chaos method * The classification scheme for the face spoof detection in image processing * The automated scheme for the number plate detection in image processing
Data Structures in R
Data Structure| Description —|— Vector| A vector is a sequence of data elements of the same basic type.Members in a vector are called components. List| Lists are the R objects which contain elements of different types like −numbers, strings, vectors or another list inside it. Matrix| A matrix is a two-dimensional data structure. Matrices are used tobind vectors from the same length. All the elements of a matrix must be of thesame type (numeric, logical, character, complex). Dataframe| A data frame is more generic than a matrix, i.e different columnscan have different data types (numeric, character, logical, etc). It combinesfeatures of matrices and lists like a rectangular list.
15. What packages are used for data mining in R?
Some packages used for data mining in R: * data.table- provides fast reading of large files * rpart and caret- for machine learning models. * Arules- for associaltion rule learning. * GGplot- provides varios data visualization plots. * tm- to perform text mining. * Forecast- provides functions for time series analysis
43. How would you check the distribution of a categorical variable in R?
We would often want to find out how are the values of a categorical variabledistributed.We can use the table() function to find the distribution of categoricalvalues. table(iris$Species)
45. How would you find the number of missing values in a dataset and
remove all of them?Missing values bring in a lot of chaos to the data. Thus, it is alwaysimportant to deal with the missing values before we build any models.Let’s take an example:This is an employee data-set which consists of missing values, let’s go aheadand remove them.This Code gives the number of missing values-> sum(is.na(employee))Now, let’s delete the missing values: na.omit(employee)This is the result after deleting the missing values: