atarashi.libs.ngram module¶
Copyright 2018 Aman Jain (amanjain5221@gmail.com)
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
-
atarashi.libs.ngram.
createNgrams
(licenseList, ngramJsonLoc, threads=4, verbose=0)[source]¶ Creates a Ngram_keywords.json in location specified by user that contains unique ngrams for each license cluster
Parameters: - licenseList – Processed License List (CSV)
- ngramJsonLoc – Specify N-Gram Json File location
- threads – Number of CPU to be used for creating n-grams. This is done to speed up the process.
- verbose – Specify if verbose mode is on or not (Default is Off/ None)
Returns: Returns - n-gram json file location, - Array - matched_output (Licenses that has non-zero unique n-gram identifiers) - Array - no_keyword_matched (licenses woth zero unique n-gram identifiers)
-
atarashi.libs.ngram.
find_ngrams
(input_list, n)[source]¶ Zip ngrams of given length n from Input list