Volume 39 Issue 2
Feb.  2013
Turn off MathJax
Article Contents
Wang Deqing, Zhang Hui. Support-vector-based iteratively adjusted centroid classifier for text categorization[J]. Journal of Beijing University of Aeronautics and Astronautics, 2013, 39(2): 269-274. (in Chinese)
Citation: Wang Deqing, Zhang Hui. Support-vector-based iteratively adjusted centroid classifier for text categorization[J]. Journal of Beijing University of Aeronautics and Astronautics, 2013, 39(2): 269-274. (in Chinese)

Support-vector-based iteratively adjusted centroid classifier for text categorization

  • Received Date: 11 Jan 2012
  • Publish Date: 28 Feb 2013
  • To address the lackness of centroid-based classifier (CC) that is prone to generate inductive bias or model misfit, a support-vector-based iteratively-adjusted centroid classifier (IACC_SV) was proposed, which employs support vectors found by some routines, e.g., linear support vector machines (SVMs) to construct centroid vectors for CC, and then iteratively adjusts the initial centroid vectors according to the misclassified training samples. Compared with traditional classification algorithms, IACC_SV achieves better performance in terms of macro-F1 and micro-F1, and the extensive experiments on 8 real-world text corpora demonstrate the effectiveness of the proposed algorithm, especially on text corpora with highly imbalanced classes.

     

  • loading
  • [1]
    Sebastiani F.Machine learning in automated text categorization[J].ACM Computing Surveys,2002,34(1):1-47
    [2]
    Wang D,Zhang H,Liu R,et al.Predicting bugs' components via mining bug reports[J].Journal of Software,2012,7(5): 1149-1154
    [3]
    Han E H,Karypis G.Centroid-based document classification: analysis & experimental results[C]//Proceedings of PKDD'00.London:Springer-Verlag,2000:424-431
    [4]
    Tam V,Santoso A,Setiono R.A comparative study of centroidbased,neighborhood-based and statistical approaches for effective document categorization[C]//Proceedings of 16th ICPR.Washington:IEEE Computer Society,2002:235-238
    [5]
    Guan H, Zhou J,Guo M.A class-feature-centroid classifier for text categorization[C]//Proceedings of WWW.New York:ACM,2009:201-210
    [6]
    Tan S.An improved centroid classifier for text categorization[J].Expert Systems with Applications,2008,35(1/2):1279-1285
    [7]
    Tan S,Wang Y,Wu G.Adapting centroid classifier for document categorization[J].Expert Systems with Applications,2011, 38(8):10264-10273
    [8]
    Lertnattee V,Theeramunkong T.Effect of term distributions on centroid-based text categorization[J].Information Sciences,2004,158:89-115
    [9]
    Shankar S,Karypis G.Weight adjustment schemes for a centroid based classifier .TR 00-035,2000
    [10]
    Foody G M.Issues in training set selection and refinement for classification by a feedforward neural network[C]//Proceedings of IGARSS.Seattle:IEEE,1998:409-411
    [11]
    Cortes C,Vapnik V.Support-vector networks[J].Machine Learning,1995,20:273-297
    [12]
    Joachims T.Text categorization with support vector machines .TR-23,University of Dortmund,1997
    [13]
    Salton G,Buckley C.Term-weighting approaches in automatic text retrieval[J].Information Processing & Management,1988,24(5):513-523
    [14]
    Jones K S.A statistical interpretation of term specificity and its application in retrieval[J].J Documentation,1972,28(1):11-21
    [15]
    Han E H.Tmdata .Minnesota:University of Minnesota,2000 .http://www.cs.umn.edu/~han/data/tmdata.tar.gz
    [16]
    Xiong H,Wu J,Chen J.K-means clustering versus validation measures:a data-distribution perspective[J].IEEE Transactions on Systems,Man,and Cybernetics Part B,2009,39(2):318-331
    [17]
    Lewis D.Reuters-21578 .Dublin:Trinty College,2007 .
    [18]
    Lang Ken.20Newsgroup .Massachusetts:Massachusetts Institute of Technology,2007 .
    [19]
    Lewis D D.Evaluating and optimizing autonomous text classification systems[C]//Proceedings of 18th SIGIR.New York:ACM,1995:246-254
    [20]
    Yu H,Hsieh C J,Chang K W,et al.Large linear classification when data cannot fit in memory[C]//Proceedings of KDD-10.New York:ACM,2010:833-842
    [21]
    Yang Y,Liu X.A re-examination of text categorization methods[C]//Proceedings of SIGIR '99.New York:ACM,1999: 42- 49
    [22]
    Chang C C,Lin C J.Libsvm:a library for support vector machines .Taiwan:Department of Computer Science and Information Engineering,National Taiwan University,2001 .http://www.csie.ntu.edu.tw/~cjlin/libsvm
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Article Metrics

    Article views(1651) PDF downloads(605) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return