Coding Week-3 Meeting
Attendees
- Gaurav Mishra
- Anupam Ghosh
- Michael C. Jaeger
- Shaheem Azmal
- Ayush Bhardwaj
- Vasudev Maduri
- Omar Mohamed
- Kaushlendra Pratap
- Shreya Singh
Discussions
- Results from Markov implemented text files that were validated by nomos.
- Reusing Unclassified_licenses labeled files by adding significant keywords.
- To implement the algorithms on few more licenses that are MIT and Apache to get a generalized view on both the approaches.
- To-do: Validation on n-gram generated licenses and compare both the results.
- How to not make the dataset biased.
Week 3 Progress
- Generated MIT-0, MIT, MITNFA, MIT-CMU, MIT-enna, MIT-feh, MIT-advertising, Apache-1.1, Apache-2.0, Apache-1.0 licenses and got them validated them using nomos.
- Implemented ngram + markov approach to generate all the files.
- Tweaked regex implementation while generating files for these licenses which further improved the results during validation.
- Segregated different labeled license files in different folders.
- Working on a script to automate the entire process.
- Worksamples : GeneratedFiles, Markov_implementation, filesgen-markov, fileshen-ngram
Conclusion and Further Plans
Automate the entire process for different licenses and get them validated using Nomos.