Skip to main content

Google Summer of Code Proposals 2025

Welcome to the main page for all GSoC 2025 related information.

Intro

We from the fossology project would like to apply for GSoC 2025. Please see two main resources for finding out more FOSSology in general:

Meetings: Checkout the Meetings table

Interested in Application? - Getting Grip

If you are interested in an application - great! We encourage your application. So the question is how to get started with the topic, just a few points:

Examples from past programs

In 2020, we were awarded seven slots, please see here what was the result of it:

Also - very much fun - There are some YouTube videos created:

In 2021, the GSoC program awarded the fossology project with 7 slots. It was a lot bigger and a lot of fun for 2021, a dedicated page has been set up. Please see the GSoC works here.

From this page you can also get an idea about the work being carried out: check the weekly reporting, for example for the UI project.

You can check out our GSoC 2022 projects with 8 slots. The dedicated page can be found here.

You can check out our GSoC 2023 projects with 5 slots. The dedicated pages can be found GSoC 2023.

The recent participation in GSOC 2024 also came to an end with 8 slots. The dedicated pages can be found GSoC 2024.

Mentors

Interested in becoming a mentor? Please reach out to us!

Volunteers so far:

Topic Proposals

Please reach out to us to add more proposals for GSoC 2025.

Currently, discussion happening on https://github.com/fossology/fossology/discussions/2908

Topic Proposals from 2025

  1. Data pipelining for safaa project
  2. License Detection Using Large Language Models
  3. Transforming Nirjas into a Technical Documentation tool Using Large Language Models (LLMs)
  4. Overhauling scheduler design
  5. Debian packaging for Debian repository
  6. User & Developer Assistant Chatbot using Large Language Models
  7. Support text phrases and bulk based scanning for MONK a like agent
  8. Enhance atarashi ability
  9. Integrating Open Source Review Toolkit
  10. Complete microservices infrastructure for FOSSology
  11. Rewrite FOSSology UI using React
  12. FOSSology UX and UI design
  13. New single file view page to accommodate license + copyright clearing

Data pipelining for safaa project

Goal: Automate the process of model training using pipelining.

Currently in Safaa Project data was manually curated And we see that most of the things are manual here. the project should concentrate on creating a pipeline, Utilizing LLMS if required to increase the accuracy, use deep learning techniques to improve.

Scripts to copy copyright data automatically(group's data or some users data) from fossology instance to train the model.


Test cases needs to be provided as well.

CategoryRating
Low Hanging Fruit**
Risk/Exploratory*
Fun/Peripheral**
Core Development*
Project Infrastructure**
Project sizeLarge
Preferred contributorStudent/professional
Skills neededPython, ML And Data
Contact@Kaushl2208 @GMishx @shaheemazmalmmd

License Detection Using Large Language Models

Goal: To automate license detection using license dataset and ensure accurate and up-to-date results by leveraging a Retrieval-Augmented Generation (RAG) approach.

We have previously tried semantic similarity approach for license detection #104-Atarashi. Which used text processing and prompt engineering. We have tried multiple LLM models for license statement types. Visit Weekly Reports for more performance details

What we want to achieve?

  • Utilize SPDX or SAFAA Database for licenses.
  • To create RAG knowledge Base, For model to understand specifics of licenses.
  • High Accuracy on Random license texts(Input provided need not to be a full fledged statement). Confidence score if necessary
  • Needs to be a Language Agnostic Solution.
  • Pipeline to Fetch New License Data (If available) from SPDX Database or SAFAA so RAG Knowledge Base should always be up to date.
CategoryRating
Low Hanging Fruit**
Risk/Exploratory*
Fun/Peripheral**
Core Development*
Project Infrastructure**
Project sizeMedium/Large
Preferred contributorStudent/professional
Skills neededPython, LLMs, Fine-tuning, Documentation
Contact@Kaushl2208 @GMishx @shaheemazmalmmd

Transforming Nirjas into a Technical Documentation tool Using Large Language Models (LLMs)

Goal: To transform Nirjas into a comprehensive technical documentation tool using LLMs by automatically generating, improving, and structuring documentation for source code files. This will include comments, function documentation, and metadata extracted using Nirjas, ensuring consistency, clarity, and quality in technical documentation.

We have previously worked on extracting metadata and comments using regex-based approaches in Nirjas. While this method provided structured results, it can also be used to generate high-quality documentation. Leveraging LLMs with metadata extraction from Nirjas.

What we want to achieve?

  1. Integrate LLMs for Documentation Generation
  2. Use Existing Knowledge Sources for Training
  3. Implement a Retrieval-Augmented Generation (RAG) Approach
  4. Automatic Summarization and Quality Scoring
  5. Seamless Integration with Existing Tools
CategoryRating
Low Hanging Fruit**
Risk/Exploratory*
Fun/Peripheral**
Core Development**
Project Infrastructure**
Project SizeMedium/Large
Preferred ContributorStudent/Professional
Skills NeededPython, LLMs, Fine-tuning, Data Engineering, Documentation Standards
Contact@hastagAB @GMishx @Kaushl2208

Overhauling scheduler design

Goal: Improving FOSSology scheduler or replacing with OTS solution

The existing scheduler design is causing new issues which need to be addressed. Moreover, existing scheduler design is not touched in years.

Concerning points

  1. The scheduler is written in C which makes it next to impossible to find cause of a failure.
  2. The C language does not support exception handling out of the box. It makes code less readable and prone to errors.
  3. The linear queue design causes issue when there should be only one instance of an agent running for an upload, but overall the agent is not mutually exclusive.

    For example, if the monkbulk has a limit set to 1, it should be implied for only single upload. But with linear queue, this monkbulk job will block all other agents from executing even when they are not effected by the results of monkbulk.

    This essentially makes the agent mutually exclusive even though, there is a special flag EXCLUSIVE for the very same purpose: https://github.com/fossology/fossology/wiki/Job-Scheduler#agentconfs

  • One idea on redesigning the queue, it can be broken into buckets per upload each maintaining its own priority queue. There can be another queue for global operations like maintenance, delagent, etc.
  • Doing so, each bucket can be traversed in round-robin and pick first pending job and check against host limit. This will eliminate the scenario mentioned in point 3. Also, exclusive agents can be sent to global queue.
      upload specific queue
    |-<upload_2> -> nomos, copyright, ojo, keyword
    |-<upload_3> -> monkbulk, decider, monkbulk, decider
    |-<upload_4> -> reuser, decider

    global queue
    -> delagent,
  1. Since the FOSSology is released, there can be number of new scheduling libraries being released which needs to be explored. They can be a nice addition to the project.
  2. There have been some work already done in GSoC 2024, Can be visited here
CategoryRating
Low Hanging Fruit-
Risk/Exploratory**
Fun/Peripheral***
Core Development***
Project Infrastructure*
Project sizeLarge
Preferred contributorProfessional
Skills neededGo
Contact@GMishx @Kaushl2208 @avinal @shaheemazmalmmd

Debian packaging for Debian repository

Goal: Improve Debian packaging and make it acceptable for APT

The existing effort to put FOSSology under Debian packaging list needs to be taken forward. A repository under Debian Salsa was setup initially but not maintained any more: https://salsa.debian.org/fossology-team/fossology It is configured to use gbp.

Blockers

  1. The Debian building mechanism does not allow installation from sources other than apt. The Composer packages need to be packed as Debian packages and shipped with FOSSology.
  2. Packaging and shipping other tools needs to satisfy their licensing terms.
  3. The versions of packages in APT and actual versions used are different.
  4. APT also provides JS libraries like JQuery and DataTables but RHL does not.

See also

CategoryRating
Low Hanging Fruit*
Risk/Exploratory**
Fun/Peripheral***
Core Development*
Project Infrastructure***
Project sizeSmall
Preferred contributorStudent/Professional
Skills neededDebian, APT, CMake
Contact@GMishx @shaheemazmalmmd @Kaushl2208

User & Developer Assistant Chatbot using Large Language Models

Goal: To develop an intelligent assistant chatbot that leverages Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) techniques to provide comprehensive support for both end-users and developers of our tool. The assistant will bridge the gap between users, documentation, and the codebase to ensure an interactive and efficient problem-solving experience.

The chatbot will be designed to interactively assist new and existing users with various aspects of the tool, including:

  1. Feature Discovery:

    • Answer questions about available features, their functionalities, and usage.
    • Provide contextual information derived from the tool's wiki and feature documentation.
  2. Problem Resolution and Recommendations:

    • Assist users during the project setup phase by identifying common setup errors.
    • Provide troubleshooting steps for known issues by integrating knowledge from GitHub issues.
  3. Developer Support:

    • Answer codebase-related queries by identifying relevant classes, methods, or functions.
    • Enhance developers' understanding of the project by linking features to the corresponding implementation in the code.

The chatbot will utilize LangChain, RAG, and a Vector Database for retrieval, enabling contextual conversations. A seamless pipeline will integrate multiple data sources, including documentation, GitHub issues, and the codebase.

What We Want to Achieve:

  1. For End-Users:

    • Improved Onboarding:
      • Enable new users to quickly understand the tool's features and capabilities through interactive conversations.
    • Efficient Problem Resolution:
      • Provide real-time recommendations for known issues encountered during project setup.
      • Reduce reliance on manual troubleshooting by surfacing relevant GitHub issues.
    • Enhanced User Engagement:
      • Increase user satisfaction by offering a conversational interface that adapts to their queries and knowledge level.
  2. For Developers:

    • Codebase Exploration:
      • Allow developers to query the codebase for insights into specific classes or functions, fostering faster understanding and debugging.
    • Knowledge Consolidation:
      • Create a unified interface where feature descriptions, documentation, and implementation details converge.
  3. Broader Objectives:

    • Reduce the time spent on documentation searches.

PS: There are some features which aligns with the goal but not be possible in short time interval. Topics like: Knowledge Consolidation & Codebase Exploration but the development should be done by taking all this in mind

CategoryRating
Low Hanging Fruit*
Risk/Exploratory*
Fun/Peripheral***
Core Development*
Project Infrastructure**
Project sizeLarge
Preferred contributorStudent/professional
Skills neededPython, LLMs, Documentation Standards
Contact@Kaushl2208 @GMishx @shaheemazmalmmd

Support text phrases and bulk based scanning for MONK a like agent

Goal: Adding text phrases from UI to database and use existing bulk phrases and provide ability to scan them using MONK and identify files if the match is 100%

FLOW :

  • Create a UI Where user can add multiple text phrases associated with license(FROM FOSSology License Database).
  • Use existing bulk phrases table from database.
  • Create a new agent like existing MONK agent which not only identifies the matches but also decides the files.
  • Test cases needs to be provided as well.
CategoryRating
Low Hanging Fruit**
Risk/Exploratory*
Fun/Peripheral**
Core Development*
Project Infrastructure**
Project sizeMedium
Preferred contributorStudent/professional
Skills neededPHP, C++
Contact@GMishx @shaheemazmalmmd

Enhance atarashi ability

Goal: Improve license identification of atarashi

  • Improve existing model which have 80 % accuracy.
  • Use some model to identify the license-possibility using keywords.
  • Once there is some license possibility pass this to existing trained model to identify the accurate license.
  • If the trained model miss to find the license then add license-possibility to file so that users checks the file and clarify.
  • Work on the existing branch(https://github.com/fossology/fossology/pull/1634) and make sure that this gets merged.
  • Know more about atarashi.
CategoryRating
Low Hanging Fruit*
Risk/Exploratory**
Fun/Peripheral***
Core Development*
Project Infrastructure***
Project sizeSmall
Preferred contributorStudent/Professional
Skills neededPython, ML , CMake
Contact@GMishx @shaheemazmalmmd @Kaushl2208 @hastagAB

Integrating Open Source Review Toolkit

Goal: Using ORT to fetch dependencies and generate SBOM

Build systems fetch the required dependencies (library/artifact) for a project while building the project. Its important to get an insight of these dependencies for license compliance check.

The OSS Review Toolkit is an open source project helps to find dependencies in a project.

The goal of this project is to render the project dependencies created by ort and display those in the fossology-UI. Dependencies can be scheduled directly from the UI and scan with fossology.

Also vice versa integrate FOSSology to ORT to scan the opensource dependencies.

CategoryRating
Low Hanging Fruit-
Risk/Exploratory-
Fun/Peripheral**
Core Development***
Project Infrastructure*
Project sizeLarge
Preferred contributorStudent/Professional
Skills NeededPHP, Cmake, Kotlin
Contact@GMishx @shaheemazmalmmd @Kaushl2208

Complete microservices infrastructure for FOSSology

Goal: Continue the work from previous GSoC and bring FOSSology to a working state on Kubernetes

As part of GSoC 2021, a large portion of work was done to bring FOSSology to work on Kubernetes. Since then, there have been countless changes to the codebase and the build system. Here are a few objectives we expect to be achieved:

  1. Go through the changes in the codebase and devise strategies for integrating them
  2. Inspect the changes in #2086 and complete the work
  3. By the end, we should have a fully working FOSSology installation on Kubernetes
  4. Create documentation for setting up FOSSology on a cluster and all the options available
  5. Stretch goal: Create an all-in-one script for easy Kubernetes setup with FOSSology
  6. Stretch goal: Add mechanism for health checks of the installation
  7. Stretch goal: Expose usage and performance metrics

References

CategoryRating
Low Hanging Fruit-
Risk/Exploratory***
Fun/Peripheral*
Core Development*
Project Infrastructure***
Project SizeMedium/Large
Preferred ContributorProfessional
Skills NeededKubernetes, Docker/Podman, CMake, Bash
Contact@avinal @GMishx @shaheemazmalmmd @Kaushl2208

Rewrite FOSSology UI using React

Goal: Rewrite FOSSologyUI using react.

  • Existing code is old. and needs a fix.
  • Implementation of new API'S to existing code.
  • Implementation designed templates.
CategoryRating
Low Hanging Fruit*
Risk/Exploratory**
Fun/Peripheral***
Core Development*
Project Infrastructure***
Project sizeSmall
Preferred contributorStudent/Professional
Skills neededphp, react, CMake
Contact@GMishx @shaheemazmalmmd @Kaushl2208 @deo002

FOSSology UX and UI design

Goal: Redesign the FOSSology UX and UI to modernize its interface and enhance user-friendliness.

Understand the Primary Users

  • Identify user personas: Determine who the key users of FOSSology are, such as developers, compliance officers, or open-source contributors.
  • Analyze pain points: Conduct surveys, interviews, or user studies to understand the challenges users face while using the current system.

Analyze the Current Interface

  • Evaluate usability issues: Identify areas where the current interface is difficult to use or navigate.
  • Highlight outdated design elements: Assess visual components and workflows that no longer align with modern design standards or user expectations.

Identify Redesign Requirements

  • Define goals: Establish clear objectives for the redesign, such as improving efficiency, accessibility, or ease of use.
  • Prioritize features: Focus on addressing critical pain points and implementing high-impact improvements.

Design Reusable Components

  • Catalog interface elements: List existing components and determine which can be updated or replaced.
  • Ensure consistency: Create reusable design components to maintain a cohesive user experience and simplify scalability.

Draft Layouts and Workflows

  • Streamline user journeys: Map out key workflows to reduce complexity and improve navigation.
  • Prototype layouts: Create wireframes or mockups to visualize potential improvements and gather early feedback.

Establish a Cohesive Design System

  • Define visual guidelines: Standardize elements such as colors, typography, and spacing for a unified aesthetic.
  • Componentize the UI: Build a library of modular components for easier development and maintenance.

Gather Feedback and Refine

  • Conduct usability testing: Engage users to validate the new designs and identify areas for improvement.
  • Iterate based on feedback: Refine layouts, workflows, and components to ensure the redesign meets user needs effectively.
CategoryRating
Low Hanging Fruit*
Risk/Exploratory**
Fun/Peripheral***
Core Development*
Project Infrastructure***
Project sizeMedium/Large
Preferred contributorStudent/Professional
Skills neededwireframe and other design techniques
Contact@EttingerK @GMishx @shaheemazmalmmd @Kaushl2208

Goal: To Redesign & develop new single file view page accommodate all the clearings.

  • Have a folder tree with blue & red buttons to indicate the clearing.
  • Integrate drag and drop functionality to copy the clearing decisions from one file to another.
  • Have a histogram feature to accommodate license groups in the current upload.
  • Have a file view page with highlights of all the findings (licenses + copyrights + keywords + ECC).

Refer the screenshot of the design.

screenShot154

CategoryRating
Low Hanging Fruit*
Risk/Exploratory**
Fun/Peripheral***
Core Development*
Project Infrastructure***
Project sizeSmall
Preferred contributorStudent/Professional
Skills neededwireframe and other design techniques
Contact@EttingerK @GMishx @shaheemazmalmmd @Kaushl2208