📃NLP API's

NLP API's allow to add data for analysis on LLM dashboard

Pre-requisite

Create Project.
Get the project_id and API key.

Initialisation

Initialise censius_client object.

from censius.nlp import CensiusClient

API_KEY   = "<API KEY>"
projectId = <PROJECT ID> # datatype as integer

client = CensiusClient(api_key = API_KEY, project_id = projectId)

Introducing Types

DatasetType
- 🔹DatasetType.TEXT: Multi-string text containing special characters and limiters (, . & / \ * @ and so on)
ModelType
- 🔹ModelType.NLP
- 🔹ModelType.LLM
UseCase
- NLP
  - 🔹UseCase.NLP.SUMMARIZATION
  - 🔹UseCase.NLP.SENTIMENT_CLASSIFICATION
  - 🔹UseCase.NLP.INTENT_CLASSIFICATION
  - 🔹UseCase.NLP.Q_AND_A
  - 🔹UseCase.NLP.TOXICITY_DETECTION
  - 🔹UseCase.NLP.INFORMATION_RETRIEVAL
  - 🔹UseCase.NLP.LANGUAGE_TRANSLATION
- LLLM
  - 🔸UseCase.LLM.SUMMARIZATION
  - 🔸UseCase.LLM.SENTIMENT_CLASSIFICATION
  - 🔸UseCase.LLM.INTENT_CLASSIFICATION
  - 🔸UseCase.LLM.Q_AND_A
  - 🔸UseCase.LLM.TOXICITY_DETECTION
  - 🔸UseCase.LLM.INFORMATION_RETRIEVAL
  - 🔸UseCase.LLM.LANGUAGE_TRANSLATION
  - 🔸UseCase.LLM.REASONING
  - 🔸UseCase.LLM.VARY_PROMPTING
  - 🔸UseCase.LLM.VARY_STRATEGY
  - 🔸UseCase.LLM.CALIBRATION
  - 🔸UseCase.LLM.HARM_EVALUATION

Example Dataset

Simple dataset with News articles, reference summary, and summary generated by the T5 base model.

UseCase : Summarisation

Following APIs will be supported in the given order:

Register Dataset : register_dataset() - Register a training dataset to the Censius Platform.

Argument

Type

Description

Presence

name

Text

A name for reference.

Required

file

CSV path

This is expected to be a Training dataset CSV file name. The CSV has to be in the provided format.

Required

dataset_type

DatasetType.TEXT

As of now by default, we are considering dataset_type as “text”. In later stage, we will be supporting:

”DatasetType.Vector”

Required

use_case

enum

UseCase.LLM.* or UseCase.NLP.* Please see the introduced types

Required

➡️ ROUGE score is calculated from generated and reference summaries. Therefore, both summaries must be provided by the user.

from censius.nlp import DatasetType, ModelType, UseCase
import pandas as pd

datasetDetails = client.register_dataset(
    name="training-summarization",
    file="training-summarization.csv",
    use_case=UseCase.SUMMARIZATION,
    dataset_type=DatasetType.TEXT
)
print(datasetDetails)

datasetId = datasetDetails["dataset_id"]

Register Model - register_model()

This API allows the user to register a new model to the Censius platform.

Argument

Type

Description

Presence

model_name

string

The name of the model

Required

model_type

enum

ModelType.NLP or ModelType.LLM; Whichever applies

Required

use_case

enum

UseCase.LLM.* or UseCase.NLP.* Introduced_types

Required

dataset_id

INTEGER

Recording the ID of the dataset the model is trained on

Required

parent_model_id

INTEGER

Id of the model being updated (version)

Optional

modelDetails = client.register_model(
    model_name="summarization model",
    model_type=ModelType.LLM,
    use_case=UseCase.SUMMARIZATION, # UseCase.SENTIMENT
    dataset_id=datasetId
)


print(modelDetails)

Log - log your predictions

This function enables logging individual predictions and features. It can be integrated as part of the production environment to log below values as predictions are made.

Argument

Type

Description

Presence

log_id

string

The ID of this prediction log. This can be used to update the actual of this log later

Required

model_id

int

The ID of the model

Required

prediction

DatasetType.TEXT

The summary generated by the model

Required

referenced_output

DatasetType.TEXT

The referenced_summary used for validating prediction, hence actuals

Required

timestamp

integer

This is supposed to be Timestamp of prediction generated in millisecond.

Required

input

DatasetType.TEXT

The input query went to the LLM model.

Required

file

Pandas.DataFrame

File for bulk insertion in single call. Example provided.

Optional (WIP)

confidence_score

float

model confidence score between 0 and 1.

optional

from time import time

response = client.log(
    log_id="<string>",
    model_id=<integer>,   # present in client.register_model response.
    input="<string>",
    referenced_output="<string>",
    prediction="<string>",
    confidence_score=<float> 
    timestamp=int(round(time.time() * 1000)),
)

print(response)

PreviousML API's NextValidation Suite

Last updated 2 years ago

hashtagPre-requisite

hashtagInitialisation

hashtagIntroducing Types

hashtagExample Dataset

hashtagUseCase : Summarisation

hashtagRegister Model - register_model()

hashtagLog - log your predictions

Pre-requisite

Initialisation

Introducing Types

Example Dataset

UseCase : Summarisation

Register Model - register_model()

Log - log your predictions