Skip to main content

Coding Week-1 Meeting

Attendees

Discussions

  1. To handle Regex expansion - approaches were discussed.
  2. The First approach was to generate random characters generated from .{0,30}. This was discarded because it won't provide semantic meaning to generated texts.
  3. Second, to skip the character generation portion but it will hamper similarity matching algorithms based on distances. So it was also discarded.
  4. Third approach to generate meaningful sentences using python library and licenses vocabulary.

Week 1 Progress

  1. Extracted License header and regex from STRINGS.in file using extractregex()
  2. 935 datasets of Motosoto licenses saved in drive in which regex expansion isn't considered : Drive link
  3. Regex Expansion - here are the expansions for the special macros in STRINGS.in file: regex_expansion

sed -e 's/ =FEW= /.{0,30}/g' -e 's/ =SOME= /.{0,60}/g' -e 's/ =ANY= /.*/g' \ -e 's/=YEAR=/(19|20)[0-9][0-9][ ,-]+/g'

  1. To handle regex expansion 3 approaches were discussed which can be found in this colab
  2. Worksamples : ExtractRegex, ExtractRough, GeneratingLicenses

Conclusion and Further Plans

  1. To work on 3rd approach and generate words that follow a sequence.