Week 4

Meeting 6

(July 07th, 2022)

GSOC 2022 weekly update

Attendees

Discussions

Created the python packages for both LogisticRegression and Linear SVC model. Below is the file structure for created package:

                  +-- linearsvc
                  │   +-- LICENSE
                  │   +-- MANIFEST.in
                  │   +-- README.md
                  │   +-- setup.py
                  │   +-- src
                  │       +-- linearsvc
                  │       │   +-- data
                  │       │   │   +-- linearsvc
                  │       │   +-- __init__.py
                  │       +-- model_train.py
                  +-- logreg
                      +-- LICENSE
                      +-- MANIFEST.in
                      +-- README.md
                      +-- setup.py
                      +-- src
                          +-- logreg
                          │   +-- data
                          │   │   +-- logreg
                          │   +-- __init__.py
                          +-- model_train.py

Modified init.py from the src folder of both the python packages as suggested:
- In the code below, it can be seen that the linearsvc class have two functions:
  1. linearsvc.classify() can be called to get the model classifier and the classifier can be further used to predict the license shortname for atarshi agent just by using the predict() function.
  2. And in linearsvc.predict_shortname(), we can directly pass the preprocessed file and it will return the license shortname.
- Similar functions has been implemented for logreg model also.

            class linearsvc():
              def __init__(self, preprocessed_file):
                  self.preprocessed_file = preprocessed_file

              def classify(self):
                  data = resource_filename("linearsvc", "data/linearsvc")
                  with open(data, 'rb') as f:
                      Classifier = pickle.load(f)
                  return Classifier

              def predict_shortname(self):
                  predictor = self.classify()
                  return predictor.predict(self.preprocessed_file)

Implemented the agent for Linear SVC on atarshi locally.

Conclusion and Further Plans

Will make the changes according to further suggestion.
Will start implementing okapi_BM25 in place of tfidftransformer for ranking the license text on dataset for training the models and compare which among the two is working better on dataset.

Meeting 6​

Attendees​

Discussions​

Conclusion and Further Plans​

Meeting 6

Attendees

Discussions

Conclusion and Further Plans