atarashi.agents.tfidf module

Copyright 2018 Aman Jain (amanjain5221@gmail.com)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

class atarashi.agents.tfidf.TFIDF(licenseList, algo=<TfidfAlgo.cosineSim: 2>)[source]

Bases: atarashi.agents.atarashiAgent.AtarashiAgent

class TfidfAlgo[source]

Bases: enum.Enum

An enumeration.

cosineSim = 2
scoreSim = 1
_TFIDF__cosine_similarity(a, b)

https://blog.nishtahir.com/2015/09/19/fuzzy-string-matching-using-cosine-similarity/

Returns:Cosine similarity value of two word frequency arrays
_TFIDF__tfidfcosinesim(inputFile)

TF-IDF Cosine Similarity Algorithm. Used TfidfVectorizer to implement it.

Parameters:inputFile – Input file path
Returns:Sorted array of JSON of scanner results with sim_type as __tfidfcosinesim
_TFIDF__tfidfsumscore(inputFile)

TF-IDF Sum Score Algorithm. Used TfidfVectorizer to implement it.

Parameters:inputFile – Input file path
Returns:Sorted array of JSON of scanner results with sim_type as __tfidfsumscore
getSimAlgo()[source]
scan(filePath)[source]
setSimAlgo(newAlgo)[source]
atarashi.agents.tfidf.tokenize(data)[source]