Skip to main content

Week 1

(May 31, 2024 - June 6, 2024)

Meeting 1

(June 5, 2024)

Attendees

Discussions

Discussed unified diff format to populate the data fetched from the Github and Gitlab API's
We also discussed after extraction of the content in unified diff format, how will we extract the line number from it.
We discussed potential risks that we had to keep in mind before approaching this:
- The scanner results should give required info for searching line number.
- The scanner results should not be affected by this.

Updates

Came across this thread on stackoverflow. Used this gawk command as a reference and wrote a python script to convert the api content into unified diff format.
Create a new class FormatResult to handle all the formatting of the results and diff content.
Also, created a function to extract the line number from the formatted diff content.
Tested both the scripts extensively and all cover potential edge cases.

Planning for next week

Use the script on the diff content and try to find the line number for copyright and keyword scanners.
Add relevant byte info to the JSON output of nomos scanner.
Figure out what to do for repo scans.

Meeting 1
Attendees
Discussions
Updates
Planning for next week