Week 1
(May 31, 2024 - June 6, 2024)
Meeting 1
(June 5, 2024)
Attendees
- Rajul Jha
- Gaurav
- Kaushlendra
- Shaheem Azmal
- Avinal Kumar
- Katharina
Discussions
- Discussed unified diff format to populate the data fetched from the Github and Gitlab API's
- We also discussed after extraction of the content in unified diff format, how will we extract the line number from it.
- We discussed potential risks that we had to keep in mind before approaching this:
- The scanner results should give required info for searching line number.
- The scanner results should not be affected by this.
Updates
- Came across this thread on stackoverflow. Used this gawk command as a reference and wrote a python script to convert the api content into unified diff format.
- Create a new class
FormatResult
to handle all the formatting of the results and diff content. - Also, created a function to extract the line number from the formatted diff content.
- Tested both the scripts extensively and all cover potential edge cases.
Planning for next week
- Use the script on the diff content and try to find the line number for copyright and keyword scanners.
- Add relevant byte info to the JSON output of nomos scanner.
- Figure out what to do for repo scans.