Volume 51 Issue 2
Feb.  2025
Turn off MathJax
Article Contents
XIONG G Y,YANG B L. A self-decision topic crawler algorithm with online training[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(2):602-615 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0002
Citation: XIONG G Y,YANG B L. A self-decision topic crawler algorithm with online training[J]. Journal of Beijing University of Aeronautics and Astronautics,2025,51(2):602-615 (in Chinese) doi: 10.13700/j.bh.1001-5965.2023.0002

A self-decision topic crawler algorithm with online training

doi: 10.13700/j.bh.1001-5965.2023.0002
More Information
  • Corresponding author: E-mail:xa_403@163.com
  • Received Date: 04 Jan 2023
  • Accepted Date: 03 Mar 2023
  • Available Online: 07 Apr 2023
  • Publish Date: 24 Mar 2023
  • Tunnel crossing problem is unavoidable in the development of the topic crawler. To solve this problem, a self-decision topic crawler algorithm based on Boyd loop (FCIDOL) was proposed. The algorithm took the Boyd loop as the basic framework and formed a closed loop according to the principle of “observation-assessment-decision-action”. According to the work completed by the crawler, which refers to memory, the algorithm evaluated the current state observed to generate decisions of radical or conservative strategies, guiding the crawler to search for new theme-relevant web pages or to focus on the actions of short-term benefits. The role of memory was to provide training materials for the assessment network, thus realizing the online training of the network to meet the cold start of the crawler. The experiment shows that compared with various topic crawler algorithms in different topic environments, FCIDOL achieves an improvement of over 7.8% in harvest rate, and the number of duplicate links is reduced by more than 15.6%.

     

  • loading
  • [1]
    BERGMARK D, LAGOZE C, SBITYAKOV A. Focused crawls, tunneling, and digital libraries[C]// Lecture Notes in Computer Science. Berlin: Springer, 2002: 91-106.
    [2]
    ABITEBOUL S, PREDA M, COBENA G. Adaptive on-line page importance computation[C]//Proceedings of the Twelfth International Conference on World Wide Web-WWW '03. New York: ACM, 2003: 280-290.
    [3]
    PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: Bringing order to the web[C]. Stanford Digital Libravies Working Paper, [s. l.]: [s. n.], 1998.
    [4]
    WANG C, GUAN Z Y, CHEN C, et al. On-line topical importance estimation: an effective focused crawling algorithm combining link and content analysis[J]. Journal of Zhejiang University: Science A, 2009, 10(8): 1114-1124. doi: 10.1631/jzus.A0820481
    [5]
    朱庆生, 徐宁, 周瑜. 一种基于链接和内容分析的自适应主题爬虫算法[J]. 计算机与现代化, 2015(9): 77-80. doi: 10.3969/j.issn.1006-2475.2015.09.016

    ZHU Q S, XU N, ZHOU Y. An adaptive focused crawling algorithm based on link and content analysis[J]. Computer and Modernization, 2015(9): 77-80(in Chinese). doi: 10.3969/j.issn.1006-2475.2015.09.016
    [6]
    KANG X P, MIAO D Q. A study on information granularity in formal concept analysis based on concept-bases[J]. Knowledge-Based Systems, 2016, 105: 147-159. doi: 10.1016/j.knosys.2016.05.005
    [7]
    JING W P, WANG Y J, WEIWEI D. Research on adaptive genetic algorithm in application of focused crawler search strategy[J]. Computer Science, 2016, 43(8): 254-257.
    [8]
    LIU W J, DU Y J. A novel focused crawler based on cell-like membrane computing optimization algorithm[J]. Neurocomputing, 2014, 123: 266-280. doi: 10.1016/j.neucom.2013.06.039
    [9]
    ZHENG S. Genetic and ant algorithms based focused crawler design[C]//Proceedings pf the Second International Conference on Innovations in Bio-inspired Computing and Applications. Piscataway: IEEE Press, 2011: 374-378.
    [10]
    GUAN W G, LUO Y C. Design and implementation of focused crawler based on concept context graph[J]. Computer Engineering and Design, 2016, 37 (10): 2679-2684.
    [11]
    FEI C J, LIU B S. Focused crawler based on LDA extended topic terms[J]. Computer Applications and Software, 2018, 35 (4) : 49-54.
    [12]
    LIU J F, DONG Y, LIU Z X, et al. Applying ontology learning and multi-objective ant colony optimization method for focused crawling to meteorological disasters domain knowledge[J]. Expert Systems with Applications, 2022, 198: 116741. doi: 10.1016/j.eswa.2022.116741
    [13]
    ENCK R E. The OODA loop[J]. Home Health Care Management & Practice, 2012, 24(3): 123-124.
    [14]
    RANI M, DHAR A K, VYAS O P. Semi-automatic terminology ontology learning based on topic modeling[J]. Engineering Applications of Artificial Intelligence, 2017, 63: 108-125. doi: 10.1016/j.engappai.2017.05.006
    [15]
    CHURCH K W. Word2Vec[J]. Natural Language Engineering, 2017, 23(1): 155-162. doi: 10.1017/S1351324916000334
    [16]
    AIZAWA A. An information-theoretic perspective of tf–idf measures[J]. Information Processing & Management, 2003, 39(1): 45-65.
    [17]
    LI L, ZHANG G Y, LI Z W. Research on focused crawling technology based on SVM[J]. Computer Science, 2015, 42(2) : 118-122.
    [18]
    CHIBA Z, ABGHOUR N, MOUSSAID K, et al. A novel architecture combined with optimal parameters for back propagation neural networks applied to anomaly network intrusion detection[J]. Computers & Security, 2018, 75: 36-58.
    [19]
    BILSKI J, KOWALCZYK B, MARCHLEWSKA A, et al. Local levenberg-marquardt algorithm for learning feedforwad neural networks[J]. Journal of Artificial Intelligence and Soft Computing Research, 2020, 10(4): 299-316. doi: 10.2478/jaiscr-2020-0020
    [20]
    DE JESÚS RUBIO J. Stability analysis of the modified levenberg–marquardt algorithm for the artificial neural network training[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 32(8): 3510-3524.
    [21]
    LIU J F, GU Y P, LIU W J. Focused crawler method combining ontology and improved Tabu search for meteorological disaster[J]. Journal of Computer Applications, 2020, 40(8): 2255-2261.
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(10)  / Tables(2)

    Article Metrics

    Article views(392) PDF downloads(14) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return