Google Summer of Code Proposals 2022
Welcome to the main page for all GSoC 2022 related information.
Intro
We from the fossology project would like to apply for GSOC. Please see two main resources for finding out more FOSSology in general:
Meetings: Checkout the Meetings table
Interested in Application? - Getting Grip
If you are interested in an application - great! We encourage your application. So the question is how to get started with the topic, just a few points:
- Check https://www.fossology.org and these Github pages https://github.com/fossology/fossology/wiki
- Maybe check some initial intro at https://github.com/fossology/fossology/wiki/New-at-FOSSology%2C-You-Could-...
- Try to install fossology, either using vagrant or docker
- Check out our YouTube video for installation from source: https://youtu.be/q12KwmPYZG4
- Read the proposed topics
- Use the mailing list fossology-devel@fossology.org or contact proposed mentors for further steps
- Slack invite link
- GitHub discussion
- If you are interested in trying to make contributions, see out issues with the label good first issue. Maybe you could sort out the workflow and make a first pull request.
Examples from past programs
In 2020, we were awarded seven slots, please see here what was the result of it:
- Ayush and Kaushlendra's work on the Atarashi license scanner and Nirjas
- Darshan's work on Dashboard: https://github.com/darshank15/GSoC_2020_FOSSOlogy/wiki
Also - very much fun - There are some YouTube videos created:
- Ayush made a youtube video / interview style of his experience: https://youtu.be/C8f_etew-yc
- Hypnos invited Darshan for an interview: https://youtu.be/_KbQ83JK7Q0
In 2021, the GSOC program awarded the fossology project with 7 slots. It was a lot bigger and a lot of fun for 2021, a dedicated page has been set up. Please see the GOSC works here.
From this page you can also get an idea about the work being carried out: check the weekly reporting, for example for the UI project.
Mentors
Interested in becoming a mentor? Please reach out to us!
Proposals so far:
- Shruti Agarwal for FOSSologyUI
- Kaushlendra Pratap
- Anupam Ghosh
- Gaurav Mishra
- Ayush Bhardwaj
- Shaheem Azmal M MD
- Nicolas Toussaint
- Avinal Kumar
- Vasudev Maduri
- Vivek Kumar
- Sahil Jha
Topic Proposals
Please reach out to us to add more proposals for GSoC 2022.
Currently discussion happening on https://github.com/fossology/fossology/discussions/2140
Topic Proposals from 2022
- SPDX naming updates and reporting
- REST API and UI improvements
- Integrating Open Source Review Toolkit
- Adopting REUSE standards in FOSSology
- Improving FOSSology CI scanner image
- Enhancement with ClearlyDefined.io (spasht)
- Compatibility for PHP-8
- Introduce concept of project in FOSSology
- Improve Minerva OSS Dataset and implement models for Atarashi
- Overhauling scheduler design
- Debian packaging for Debian repository
SPDX naming updates and reporting
Goal: Update SPDX license names and support for report formats
- Updating SPDX license names to new naming convention. Resource
- Fixing issues with SPDX RDF (see notes bellow).
- Other SPDX format reports, five file formats possible:
- Tag/value (.spdx)
- Exists
- JSON(.spdx.json)
- Can be implemented
- YAML(.spdx.yml)
- Can be implemented
- RDF/xml(spdx.rdf)
- Exists
- Spreadsheets (.xls)
- Need to understand if helpful
- https://github.com/fossology/fossology/issues/1379
References:
- See https://github.com/fossology/fossology/pull/2133 for sample implementation of simple formats
- See https://github.com/spdx/spdx-spec/tree/development/v2.2.2/examples
- Related comments are available on discussion https://github.com/fossology/fossology/discussions/2140#discussioncomment-1939857
Category | Rating |
---|---|
Low Hanging Fruit | *** |
Risk/Exploratory | ** |
Fun/Periphial | ** |
Core Development | *** |
Project Infrastructure | *** |
Possible mentors | Gaurav, Shaheem, Michael |
Project size | 175 hour |
Preferred contributor | Student |
Notes from SPDX DocFest:
- File types are sometime OTHER vs SOURCE
- Originator/Supplier missing
- Download Location is NOASSERTION
- Package verification code is wrong. Should be all the source files
- Filename should start with './' https://spdx.github.io/spdx-spec/package-information/#74-package-file-name-field
- Should rename file name as per https://spdx.github.io/spdx-spec/conformance/#44-standard-data-format-requirements
REST API and UI improvements
Goal: Bringing new FOSSologyUI towards completion
- Continue work on React repo.
- Bring in new features.
- New design patterns.
- Improve REST API and expose more endpoints
- https://github.com/fossology/fossology/issues?q=is%3Aissue+is%3Aopen+REST
- https://github.com/Shruti3004/FOSSology-REST-API/issues
Category | Rating |
---|---|
Low Hanging Fruit | - |
Risk/Exploratory | * |
Fun/Periphial | *** |
Core Development | ** |
Project Infrastructure | *** |
Possible mentors | Shruti, Shaheem, Sahil, Vivek |
Project size | 350 hour |
Preferred contributor | Student/Professional |
Integrating Open Source Review Toolkit
Goal: Using ORT to fetch dependencies and generate SBOM
Build systems fetch the required dependencies (library/artifact) for a project while building the project. Its important to get an insight of these dependencies for license compliance check.
The OSS Review Toolkit is an open source project helps to find dependencies in a project.
The goal of this project is to render the project dependencies created by ort and display those in the fossology-UI. Dependencies can be scheduled directly from the UI and scan with fossology.
Alternative: oss-review-toolkit/ort#2694
Category | Rating |
---|---|
Low Hanging Fruit | - |
Risk/Exploratory | - |
Fun/Periphial | ** |
Core Development | *** |
Project Infrastructure | * |
Possible mentors | Gaurav, Shaheem, Avinal, Michael |
Project size | 350 hour |
Preferred contributor | Student/Professional |
Adopting REUSE standards in FOSSology
Goal: Adopting REUSE.software specs in FOSSology codebase
Copyright and licensing is difficult, especially when reusing software from different projects that are released under various different licenses. REUSE make it easier for you to declare the licenses under which your works are released, but they also make it easier for a computer to understand how your project is licensed. This specification defines a standardized method for declaring copyright and licensing for software projects. REUSE also helps in creating a bill of materials with just one simple command.
FOSSology currently uses old methods of defining licenses on source files which can lead to some ambiguity. Following the REUSE specs, the code base of FOSSology should be updated with new licensing format.
Note: On its own, the project is very small for being a GSoC project. We can club it with other tasks, like working on issue #1592
Category | Rating |
---|---|
Low Hanging Fruit | ** |
Risk/Exploratory | * |
Fun/Periphial | * |
Core Development | * |
Project Infrastructure | *** |
Possible mentors | Gaurav, Michael, Nicolas |
Project size | 175 hour |
Preferred contributor | Student |
Improving FOSSology CI scanner image
Goal: Enhancing current scanner image with new reports and features
As a fun project, FOSSology started combining scanners in a simple and small Docker image which can be run on CI providers. The image is currently capable of understanding build environment (GitLab/GitHub Actions/Travis) and use their API's to fetch diff of a branch or scan the complete repo. The capabilities of image include license scanning with Nomos and ojo scanners, copyright and keyword scanning with respective scanners. The image makes use of a Python script to perform all the tasks.
- The script, however, currently prints report in text format on console and generate the same as an artifact. It can be improved by generating reports in other formats like SPDX reports. See their Python API.
- The integration with GitHub Actions can be improved by reporting line number where a license violation is found.
- Allowing user to provide a different list of Keywords for scanning (currently stored at
/usr/local/share/fossology/keyword/agent/keyword.conf
). - Improving on whitelist format with feature to provide it from other sources, currently it is read from a file which is expected to be in the root of repo being scanned.
- Create a server backed image
- There will be a server running on a remote machine.
- The CI script will push the package to the server for scan with auto conclusion.
- The package will become a version of a project.
- Downloads the report and make it available as an artifact.
- If there are unidentified licenses left, provide a link for manual review.
- This can be extended to integrate with ticket management systems like Jira and redmine.
Resources:
- Current documenation: https://gitlab.com/GMishx/fossology/-/merge_requests
- Sample outputs: https://gitlab.com/GMishx/fossology/-/merge_requests
- Sample implementaiton for GHA: https://github.com/GMishx/fossology-scan
Category | Rating |
---|---|
Low Hanging Fruit | - |
Risk/Exploratory | ** |
Fun/Periphial | *** |
Core Development | ** |
Project Infrastructure | *** |
Possible mentors | Gaurav, Anupam |
Project size | 350 hour |
Preferred contributor | Student/Professional |
Enhancement with ClearlyDefined.io (spasht)
Goal: Contribute compliance metadata back to ClearlyDefined.io and community
Clearly defined is a project for collecting metadata about published software. This metadata shall help, among other things, for achieving OSS license compliance. More info can be found at:
- Docs -- https://docs.clearlydefined.io
- GitHub -- https://github.com/clearlydefined/clearlydefined
- API Docs -- https://api.clearlydefined.io/api-docs/#/curations/patch_curations
- Discord -- http://discord.gg/wEzHJku
- Twitter -- http://twitter.com/clearlydefd
The spasht agent is already pulling the data from ClearlyDefined, following enhancements are required:
- Fetch the main license of the package.
- What to do if package does not exist on ClearlyDefined.io
- Push the curated data back to ClearlyDefined.io
- Current understanding: Send a patch request to
PATCH /curations
and in response, there will be a GitHubprNumber
andurl
-
{
"contributionInfo": {
"summary": "title (100 char)",
"details": "What's the problem",
"resolution": "What's fixed and how",
"type": "missing/incorrect/incomplete/ambiguous/other",
"removeDefinitions": false
},
"patches": [
{
"coordinates": {"type": "composer", "provider": "packagist",
"namespace": "athome", "name": "odtphp", "revision": "1.5"},
"revisions": {
"1.5": {
"files": [
{
"path": "Listeur-odtphp-9f31202/library/zip/pclzip/pclzip.lib.php",
"license": "LGPL-2.1-or-later",
"attributions": [
"123123"
]
}
]
}
}
}
]
}
- Current understanding: Send a patch request to
Category | Rating |
---|---|
Low Hanging Fruit | ** |
Risk/Exploratory | * |
Fun/Periphial | ** |
Core Development | ** |
Project Infrastructure | * |
Possible mentors | Gaurav, Kaushlendra |
Project size | 175 hour |
Preferred contributor | Student/Professional |
Compatibility for PHP-8
Goal: FOSSology should be compatible with the PHP-8 version
- Syntax compatibility for PHP-8
- Backward compatibility with PHP-7.2.24
- Omit all the deprecated features.
- Migrating views from plain PHP to Twig.
- Introduction to OPcache or other compatible caching mechanism
- FOSSology should run on top of PHP8
References
- See https://github.com/fossology/fossology/pull/1925 for sample implementation
- See https://github.com/fossology/fossology/pull/2107 for new dependencies
Docs:
- PHP Migration Docs: https://www.php.net/manual/en/migration80.php
- Twig Docs: https://twig.symfony.com/doc/3.x/
Category | Rating |
---|---|
Low Hanging Fruit | * |
Risk/Exploratory | ** |
Fun/Periphial | * |
Core Development | *** |
Project Infrastructure | *** |
Possible Mentors | Shaheem, Kaushlendra, Avinal |
Project size | 175 hour |
Preferred contributor | Student |
Good to have: Improving the MVC architecture of Fossology using Symfony Symfony as of now is used purely for dependency injection but it can do a lot more. Introducing the capabilities and making the architecture more stable is a good extension to the project.
Introduce concept of project in FOSSology
Goal: Gather uploads under identified projects
Originally described in #1738 A project would be defined by:
- a project ID (as primary key)
- a project name
- a root folder in Fossology
- one or more Group IDs and associated access.
- a URL / ID / Free field to link the project to an external tool
This could be used, for example:
- to gather multiple uploads under the same umbrella,
- and make bulk changes changes to all uploads / sub-directories that belong to a given project
- reuse (cleared) findings from previous scan of same Project (or same component)
- to associate a link to a ticketing system or project management solution
- to produce metrics with Work-In-Progress dashboard
In terms of UI integration:
- Add an optional selection of project for new uploads
- Add pages to list, edit, delete projects (in Organize or Admin menu)
- Display the project name in the "Folder Navigation" window, and in the yellow band
Other extensions:
- inside Projects, create Components
- have a proper name
- are versioned
- leverage the use of Fossology tags
Practical use example:
- I have a new project FooBar
- This project is composed of 3 components: Front-end, Back-end and Mobile application
- I want uploads to be stored in folders
projects/FooBar/Front-end
, ... - I want to be able to give permissions for all Folders and Uploads for that Project
- I want to automatically reuse findings from previous scans of same components
Category | Rating |
---|---|
Low Hanging Fruit | * |
Risk/Exploratory | ** |
Fun/Periphial | * |
Core Development | *** |
Project Infrastructure | * |
Possible mentors | Nicolas, Anupam |
Project size | 175 hour |
Preferred contributor | Student |
Improve Minerva OSS Dataset and implement models for Atarashi
Goal: To implement a semantic text model for finding out OSS license similarity with best accuracy
OSS Dataset Repository: Minerva Dataset Generation; Atarashi Repository: FOSSology/Atarashi
In GSoC 2021 @FOSSology we have created our initial OSS License Dataset "Minerva Dataset Generation" which can be used to build Machine Learning models for license detection in Atarashi. We are now planning to try out few machine learning models to be trained on our dataset for the use case of Atarashi (OSS License Detection). As the open source license texts differ by very few tokens only which changes the meaning of the text. Semantic Similarity models might be the good place to try but we are open to discuss more about any other models that can fit our use case perfectly.
Tasks:
- Improve Minerva Dataset Generation for more accurate license files
- Research and suggest various ML/DL models to be implemented for Atarashi (open for discussion)
- Implement the best model discussed for Atarashi
- Improve the performance
Category | Rating |
---|---|
Low Hanging Fruit | ** |
Risk/Exploratory | * |
Fun/Periphial | *** |
Core Development | ** |
Project Infrastructure | * |
Possible mentors | Anupam, Ayush, Kaushlendra, Vasudev |
Project size | 175 hour |
Preferred contributor | Student/Professional |
Overhauling scheduler design
Goal: Improving FOSSology scheduler or replacing with OTS solution
The existing scheduler design is causing new issues which need to be addressed. Moreover, existing scheduler design is not touched in years.
Concerning points
- The scheduler is written in C which makes it next to impossible to find cause of a failure.
- The C language does not support exception handling out of the box. It makes code less readable and prone to errors.
- The linear queue design causes issue when there should be only one instance of an agent running for an upload, but overall the agent is not mutually exclusive.
For example, if the monkbulk has a limit set to 1, it should be implied for only single upload. But with linear queue, this monkbulk job will block all other agents from executing even when they are not effected by the results of monkbulk.
This essentially makes the agent mutually exclusive even though, there is a special flag EXCLUSIVE for the very same purpose: https://github.com/fossology/fossology/wiki/Job-Scheduler#agentconfs
-
One idea on redesigning the queue, it can be broken into buckets per upload each maintaining its own priority queue. There can be another queue for global operations like maintenance, delagent, etc.
-
Doing so, each bucket can be traversed in round-robin and pick first pending job and check against host limit. This will eliminate the scenario mentioned in point 3. Also, exclusive agents can be sent to global queue.
upload specific queue
|-<upload_2> -> nomos, copyright, ojo, keyword
|-<upload_3> -> monkbulk, decider, monkbulk, decider
|-<upload_4> -> reuser, decider
global queue
-> delagent,
- Since the FOSSology is released, there can be number of new scheduling libraries being released which needs to be explored. They can be a nice addition to the project.
Category | Rating |
---|---|
Low Hanging Fruit | - |
Risk/Exploratory | ** |
Fun/Periphial | *** |
Core Development | *** |
Project Infrastructure | * |
Possible mentors | Gaurav, Anupam, Michael |
Project size | 350 hour |
Preferred contributor | Professional |
Debian packaging for Debian repository
Goal: Improve Debian packaging and make it acceptable for APT
The existing effort to put FOSSology under Debian packaging list needs to be taken forward. A repository under Debian Salsa was setup initially but not maintained any more: https://salsa.debian.org/fossology-team/fossology
It is configured to use gbp.
Blockers
- The Debian building mechanism does not allow installation from sources other than apt. The Composer packages need to be packed as Debian packages and shipped with FOSSology.
- Packaging and shipping other tools needs to satisfy their licensing terms.
- The versions of packages in APT and actual versions used are different.
- APT also provides JS libraries like JQuery and DataTables but RHL does not.
See also
- https://github.com/fossology/fossology/pull/2075
- https://wiki.debian.org/PackagingWithGit
- https://wiki.debian.org/SimplePackagingTutorial
- https://wiki.debian.org/Diagrams
- https://wiki.debian.org/PHP
- https://peertube.debian.social/videos/watch/0fb2dbc4-f43d-477e-8b14-20c426f970de
Category | Rating |
---|---|
Low Hanging Fruit | * |
Risk/Exploratory | ** |
Fun/Periphial | *** |
Core Development | * |
Project Infrastructure | *** |
Possible mentors | Gaurav, Michael |
Project size | 175 hour |
Preferred contributor | Student/Professional |