Streaming Data and Entropy

Streaming Data

  • Stream data, a potentially unbounded(infinite size), ever-growing dataset
    • Volume, Velocity (Variety)
      • Too large for single memory
      • Too fast for single CPU
      • Too changeable for single machine learning system
Stream Data and Processing Traditional Data and Processing
Online/Real-time processing Offline processing
Linear and sub-linear computational techniques are widely used Techniques with high space and time complexity are used if necessary
Limitation on access times and process times for each instances The restrictions are more loose
Storage of all data is not feasible Storage of data is feasible
Store statistic, temporary, processed data Storage of the raw data is possible
Approximate results are acceptable Accurate results are required
Processing of samples of data is the usual task Processing of every data item/record is the usual task
Statistical characteristics change over time Statistical characteristics are stable

Concept Drift

  • Concept drift, The nature of data may evolve over time due to various conditions
  • The number and relevance of instances and features may change by drifting

Real and virtual concept drift

  • Different influence on the learned classification boundaries
    • Real concept drift
    • Virtual concept drift

Types of concept drift

  • Types of change
    • Sudden
    • Gradual
    • Incremental
    • Recurring
    • Blips
    • Noise
    • Mixed, more than one types

Tackling Drifting

  • Concept drift detector
    • Explicit drift handling
    • Statistical criteria
      • SD
      • Predictive error
      • Instance distribution
      • Stability
    • Two stages
      • When drifting occurs, train a new classifier on recent instances
      • When drifting is severe, replace the old classifier with the new one
  • Sliding windows
    • Implicit drift handling
    • Windows size is critical
  • Online learner
    • Each object only be processed once
  • Ensemble learner

Concept Evolution

Concept Evolution

  • New classes evolve in the data
  • Solutions
    • Radius and adaptive threshold
    • Gini Coefficient
    • Multiple novel class detection

Evaluation Criteria

  • Predictive power
  • Memory consumption
  • Recovery time
    • The time about accommodating new instances and updating algorithm structure
    • Process instances before new ones will arrive to avoid queuing
  • Decision time
  • Requirement for true class labels
    • Hard to label the entire data stream

Data Processing

Instance Reduction

  • Sampling
  • Instance Selection (IS)
    • New concept may be classified as noise and removed by a misbehavior
  • Instance Generation (IG)

Dimensionality Reduction

  • Feature Selection (FS)
    • Filter
      • Easily adaptable to the online environment
      • Hard to deal new features or classes
    • Wrapper
    • Hybrid
  • Feature loss
    • Lossy Fixed (Lossy-F)
      • Fixed feature space
    • Lossy Local (Lossy-L)
      • Feature space varies with training batch
      • Feature space of training data may be different with test data
    • Lossless Homogenizing (Lossless)
      • Unify train/test feature space
      • Pad missing features

Feature Space Simplification

  • Normalization
  • Discretization
    • Bins
      • #\# bins
      • Size\boundaries of bins

Entropy

Shannon Entropy

Hs(X)=i=1np(xi)loga(p(xi))Hs(X) = - \sum_{i=1}^n p(x_i) \log_a(p(x_i))

Generalized Entropy

  • Control the tradeoff between contributions from the main mass of the distribution and the tail

Xϕ=ϕ1(i=1np(xi)ϕ(xi))\langle X \rangle _{\phi} = \phi^{-1} (\sum_{i=1}^n p(x_i) \phi(x_i))

Rényi Entropy

ϕ(xi)=2(1α)xi\phi(x_i) = 2^{(1-\alpha)x_i}

HRα(X)=11αloga(i=1np(xi)α)H_{R\alpha}(X) = \frac{1}{1-\alpha}\log_a(\sum_{i=1}^n p(x_i)^{\alpha})

limα1HRα(X)=Hs(X) \lim_{\alpha \to 1}H_{R\alpha}(X) = Hs(X)

Tsallis Entropy

ϕ(xi)=2(1α)xi11α\phi(x_i) = \frac{2^{(1-\alpha)x_i}-1}{1-\alpha}

HTα(X)=11α(i=1np(xi)α1)H_{T\alpha}(X) = \frac{1}{1-\alpha}(\sum_{i=1}^n p(x_i)^{\alpha} - 1)

limα1HTα(X)=log2Hs(X) \lim_{\alpha \to 1}H_{T\alpha}(X) = \log2Hs(X)

Comparison by Binominal Distribution

  • pp, probability of success
  • 1p1-p, probability of failure

Shannon entropy

Shannon entropy

Rényi entropy

Rényi entropy of several α\alpha-values

Tsallis entropy

Tsallis entropy of several α\alpha-values

results matching ""

    No results matching ""