ELEMENT: Text Extraction for the Dark Web

2022 
The increasing amount of data on the Internet has been a constant challenge when text cleaning and relevant-text extraction are of interest. One of the areas of focus on the internet is the Dark Web; the data here is much more volatile and dynamic. With more researchers looking for data and extracting information, the algorithms have always been in a state of constant improvement. The solutions currently offered, all work based on text feature extraction algorithms like TF-IDF, Bag of Words, Word2Vec. Discussion on these extraction methods is well documented but a critical evaluation among these algorithms is amiss from standard literature. This paper discusses a balanced approach for tagging extracted data; ELEMENT (Effective Lemmatization, Efficient Management of Extracting Noteworthy Tags) which is a modified form of TF-IDF. Having a balanced approach like ELEMENT will benefit from being able to perform well under any given circumstance. The paper discusses and compares the proposed approach with existing strategies of text feature extraction. This comparison spans across accuracy of feature extraction, efficiency concerning Time and Space, concluding with a simplistic view of the strengths and weaknesses of each algorithm.
    • Correction
    • Source
    • Cite
    • Save
    • Machine Reading By IdeaReader
    15
    References
    0
    Citations
    NaN
    KQI
    []