Skip to main content

Coding Week-1 Meeting

Attendees

Discussions

To handle Regex expansion - approaches were discussed.
The First approach was to generate random characters generated from .{0,30}. This was discarded because it won't provide semantic meaning to generated texts.
Second, to skip the character generation portion but it will hamper similarity matching algorithms based on distances. So it was also discarded.
Third approach to generate meaningful sentences using python library and licenses vocabulary.

Week 1 Progress

Extracted License header and regex from STRINGS.in file using extractregex()
935 datasets of Motosoto licenses saved in drive in which regex expansion isn't considered : Drive link
Regex Expansion - here are the expansions for the special macros in STRINGS.in file: regex_expansion

sed -e 's/ =FEW= /.{0,30}/g' -e 's/ =SOME= /.{0,60}/g' -e 's/ =ANY= /.*/g' \ -e 's/=YEAR=/(19|20)[0-9][0-9][ ,-]+/g'

To handle regex expansion 3 approaches were discussed which can be found in this colab
Worksamples : ExtractRegex, ExtractRough, GeneratingLicenses

Conclusion and Further Plans

To work on 3rd approach and generate words that follow a sequence.

Attendees
Discussions
Week 1 Progress
Conclusion and Further Plans