Week 1
(May 31, 2024 - June 6, 2024)
Meeting 1
(June 5, 2024)
Attendees
- Rajul Jha
 - Gaurav
 - Kaushlendra
 - Shaheem Azmal
 - Avinal Kumar
 - Katharina
 
Discussions
- Discussed unified diff format to populate the data fetched from the Github and Gitlab API's
 - We also discussed after extraction of the content in unified diff format, how will we extract the line number from it.
 - We discussed potential risks that we had to keep in mind before approaching this:
- The scanner results should give required info for searching line number.
 - The scanner results should not be affected by this.
 
 
Updates
- Came across this thread on stackoverflow. Used this gawk command as a reference and wrote a python script to convert the api content into unified diff format.
 - Create a new class 
FormatResultto handle all the formatting of the results and diff content. - Also, created a function to extract the line number from the formatted diff content.
 - Tested both the scripts extensively and all cover potential edge cases.
 
Planning for next week
- Use the script on the diff content and try to find the line number for copyright and keyword scanners.
 - Add relevant byte info to the JSON output of nomos scanner.
 - Figure out what to do for repo scans.