Modeling unreliable data and sensors: Using F-measure attribute performance with test samples from low-cost sensors Conference

Iyer, V, Iyengar, SS. (2011). Modeling unreliable data and sensors: Using F-measure attribute performance with test samples from low-cost sensors . 15-22. 10.1109/ICDMW.2011.124

cited authors

  • Iyer, V; Iyengar, SS

authors

abstract

  • Building a high performance classifier requires training with labeled data, which is supervised and allows generalizing the classifier's decision boundary and in practice most of the data is unlabeled, newer algorithms needs to be learn by knowledge discovery. Sufficient training data are collected in the form of empirical evidence, which have labeled positive and negative samples to build the hypothesis. The hypothesis is constructed by the conjunction of the attributes, which can be learnt by machine learning algorithm. In this paper, we work with two forms of ranking weights, precision and relevance, which help in finding hidden patterns and prediction future events. Empirical evidence for a weather patterns and tracking of a phenomenon needs to accurately extract the attributes and label the training samples, which is a very laborious and time-consuming effort. Automating weather prediction algorithms, which are trained by supervised learning, needs to be generalized so that it can be tested with unreliable and noisy weather data from low-cost sensors. We use a training data from previous forest fires events, the datasets containing all the attributes are labeled using manual data logs for a given geographical area. The labeled original dataset is mapped to the data collected from on-line sensors, which further improves the accuracy of the training set. As some of classes have very few samples, which are related to the peak fire seasons, domain specific knowledge are added by sensor measurements and Fire Weather Index (FWI) to help accurately model the events. We show that training accuracy of the small forest fire classifier using attributes from manual logs is enhanced by 30% by using sensor data. The rare and hard to classify large forest fires are 95% accurately classified by using the new Fire Weather Index (FWI). We also show that our framework is more robust to outliers from noisy sensor measurements by accounting for in the model parameters. The model allows further generalization for linearly and non-linearly separable datasets by estimating the parameters (1 - δ) and minimum allowable error ε for hypothesis, sampling accuracy and cross validation. © 2011 IEEE.

publication date

  • December 1, 2011

Digital Object Identifier (DOI)

International Standard Book Number (ISBN) 13

start page

  • 15

end page

  • 22