atarashi.agents.cosineSimNgram module

Copyright 2018 Aman Jain (amanjain5221@gmail.com)

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

class atarashi.agents.cosineSimNgram.NgramAgent(licenseList, ngramJson, algo=<NgramAlgo.bigramCosineSim: 3>)[source]

Bases: atarashi.agents.atarashiAgent.AtarashiAgent

class NgramAlgo[source]

Bases: enum.Enum

An enumeration.

bigramCosineSim = 3
cosineSim = 1
diceSim = 2
_NgramAgent__Ngram_guess(processedData)
Parameters:processedData – Processed Data form input file
Returns:Returns possible licenses contained in the input file based on matching unique N-grams from Ngram_keywords.json
_NgramAgent__bigram_tokenize(s)
Parameters:string – Input string to create tokens
Returns:Array of bi-gram tokens
getSimAlgo()[source]
scan(inputFile)[source]
Parameters:inputFile – Input file path that needs to be scanned
Returns:Array of JSON with the output of scan of the file.
shortname Short name of the license
sim_type Type of similarity from which the result is generated
sim_score Similarity score for the algorithm used mentioned above
desc Description/ comments for the similarity measure
setSimAlgo(newAlgo)[source]