atarashi.libs.utils module

Copyright 2018 Gaurav Mishra <gmishx@gmail.com>

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

atarashi.libs.utils.cosine_similarity(a, b)[source]

https://blog.nishtahir.com/2015/09/19/fuzzy-string-matching-using-cosine-similarity/ Cosine similarity value of two word frequency dictionaries

atarashi.libs.utils.l2_norm(a)[source]

Scalar value of word frequency array (vector)

atarashi.libs.utils.ngram_l2_norm(a)[source]

Scalar value of word frequency dictionary

atarashi.libs.utils.unpack_json_tar()[source]

Unzip the ngram file

atarashi.libs.utils.wordFrequency(data)[source]

Calculates the frequency of each unique word in the file

Parameters:data – Processed and Extracted text from the input file
Returns:Word frequency Dictionary