Skip to main content

Coding Week-3 Meeting

Attendees

Discussions

  1. Results from Markov implemented text files that were validated by nomos.
  2. Reusing Unclassified_licenses labeled files by adding significant keywords.
  3. To implement the algorithms on few more licenses that are MIT and Apache to get a generalized view on both the approaches.
  4. To-do: Validation on n-gram generated licenses and compare both the results.
  5. How to not make the dataset biased.

Week 3 Progress

  1. Generated MIT-0, MIT, MITNFA, MIT-CMU, MIT-enna, MIT-feh, MIT-advertising, Apache-1.1, Apache-2.0, Apache-1.0 licenses and got them validated them using nomos.
  2. Implemented ngram + markov approach to generate all the files.
  3. Tweaked regex implementation while generating files for these licenses which further improved the results during validation.
  4. Segregated different labeled license files in different folders.
  5. Working on a script to automate the entire process.
  6. Worksamples : GeneratedFiles, Markov_implementation, filesgen-markov, fileshen-ngram

Conclusion and Further Plans

Automate the entire process for different licenses and get them validated using Nomos.