Coding Week-1 Meeting
Attendees
- Gaurav Mishra
- Anupam Ghosh
- Michael C. Jaeger
- Shaheem Azmal
- Ayush Bhardwaj
- Vasudev Maduri
- Omar Mohamed
- Kaushlendra Pratap
- Shreya Singh
Discussions
- To handle Regex expansion - approaches were discussed.
- The First approach was to generate random characters generated from .{0,30}. This was discarded because it won't provide semantic meaning to generated texts.
- Second, to skip the character generation portion but it will hamper similarity matching algorithms based on distances. So it was also discarded.
- Third approach to generate meaningful sentences using python library and licenses vocabulary.
Week 1 Progress
- Extracted License header and regex from STRINGS.in file using extractregex()
- 935 datasets of Motosoto licenses saved in drive in which regex expansion isn't considered : Drive link
- Regex Expansion - here are the expansions for the special macros in STRINGS.in file: regex_expansion
sed -e 's/ =FEW= /.{0,30}/g' -e 's/ =SOME= /.{0,60}/g' -e 's/ =ANY= /.*/g' \ -e 's/=YEAR=/(19|20)[0-9][0-9][ ,-]+/g'
- To handle regex expansion 3 approaches were discussed which can be found in this colab
- Worksamples : ExtractRegex, ExtractRough, GeneratingLicenses
Conclusion and Further Plans
- To work on 3rd approach and generate words that follow a sequence.