💻ML API's

SDK has methods which can be used to upload dataset and add logs to the dataset

Initialization

Create a project first to get the project_id.

The first step is to instantiate an Client object with an API key and a project ID. This is successively used to authenticate every call made using the SDK.

from censius.ml import CensiusClient, ModelType, DatasetType, ExplanationType, Dataset

client = CensiusClient(api_key = *YOUR_API_KEY*, project_id = *YOUR_PROJECT_ID*)

Register Dataset

`register_dataset()`

You can use the register_dataset API to register a dataset to the Censius platform.

Download the Titanic dataset that we use in the example here.

Arguments

Type

Description

name

string

The name of the dataset

Required

file

dataframe

File that stores feature values. Right now, only CSV files are supported.

Required

features

list<dict>

The list of columns for the dataset. It is a list of dictionaries, each containing two keys, one name and the other type. Valid values of type is DatasetType.STRING, DatasetType.INT, DatasetType.BOOLEAN, DatasetType.DECIMAL.

Required

timestamp

dict

The default timestamp column to be processed, if it is part of the dataset. The accepted timestamp types are DatasetType.UNIX_MS which represent unix format in milliseconds

Optional

Optional for categorical variable representation for feature

categorical = Boolean
category_map = Map(key, actual)

⏰ Dataset feature and target variable name are only accepted if they satisfy the following constraints: Lowercase Alphanumeric and underscore (eg: age_in_years)

import pandas as pd
from datetime import datetime

dataframe_object = pd.read_csv("training-titanic.csv")

datasetDetails = client.register_dataset(
    name="titanic_dataset",
    file=dataframe_object,
    features=[
        {"name": "age_in_years", "type": DatasetType.INT},
        {
            "name": "gender",
            "type": DatasetType.INT,
            "categorical": True,
            "category_map": {0: "male", 1: "female"},
        },
        {"name": "pclass", "type": DatasetType.INT},
        {"name": "sibsp", "type": DatasetType.INT},
        {"name": "parch", "type": DatasetType.INT},
        {"name": "fare", "type": DatasetType.DECIMAL},
        {"name": "survived", "type": DatasetType.INT},
    ],
)

datasetId = datasetDetails["dataset_id"]

`register_model()`

You can use this API to register a new model to the Censius platform. For subsequent updates to the model with new versions, register_new_model_version() should be called.

Arguments

Type

Description

Mandate

model_id

string

The ID of the model

Required

model_name

string

The name of the model

Required

model_type

enum

This is the type of the targets of the model. Currently supported value is ModelType.BINARY_CLASSIFICATION and ModelType.REGRESSION

Required

model_version

string

A string to represent the version of the model

Required

training_info

dict

Recording the ID of the dataset the model is trained on

Required

targets

list<string>

These are the columns the model predicts.

Required

features

list<string>

They are the columns the model uses to predict the targets

Required

uniq_key = datetime.today().strftime('%M%H%d')
input_features = ["age_in_years", "gender", "pclass", "sibsp", "parch", "fare"]
target_feature = ["survived"]
model_id = "titanic_model_" + uniq_key # its supposed to be unique.

modelDetails = client.register_model(
    model_id=model_id,
    model_name="titanic model",
    model_type=ModelType.BINARY_CLASSIFICATION,
    model_version="v1",
    training_info={"method": Dataset.ID, "id": datasetId},
    targets=target_feature,
    features=input_features,
)

modelId = modelDetails["userDefinedModelID"]
modelVersion = modelDetails["version"]

model_id must be unique across an entire tenant. Combination of model_id and model_version must be unique as well.

`register_new_model_version()`

You can use this API to add a new version to an existing model. Example, “v2” of a model.

Arguments

Type

Description

model_id

string

The ID of the model

Required

model_version

string

A string to represent the version of the model

Required

training_info

dict

Recording the ID of the dataset the model is trained on

Required

targets

list<string>

These are the columns the model predicts.

Required

features

list<string>

They are the columns the model uses to predict the targets

Required

newVersion = "v2"

client.register_new_model_version(
    model_id=modelId,
    model_version=newVersion,
    training_info={"method": Dataset.ID, "id": datasetId},
    targets=["survived"],
    features=["gender", "pclass", "sibsp"],
)

model_version must be unique here as you are registering a new version of an existing model

Logging predictions, features, and explanations

`log()`

This function enables logging individual predictions, features (and optionally explanations). It can be integrated as part of the production environment to log these values as predictions are made.

Arguments

Type

Description

prediction_id

string

The ID of this prediction log. This can be used to update the actual of this log later

Required

model_id

string

The model ID against which you want to log the prediction

Required

model_version

string

The version of the model against which you want to log the prediction

Required

features

dict

A dict with feature names as keys and processed feature values as values.

Required

prediction

dict

A dictionary containing feature headings as keys, and a dict that contains two keys, label and optionally, confidence as values. For example, ”Loan Status”: {”label”: 2, "confidence": 0.2}

Required

timestamp

int

UNIX epoch timestamp in milliseconds or time.time.now() to indicate the current time.

Required

actual

dict

A dictionary containing actual for the prediction log. The keys are the target features and the values are the ground truth values of the feature.

Optional

When using time.time.now() remember that the time is calculated in UTC on the client side, not the server side.

import time

predictionId="<unique_id>" #supposed to be random or system associated.

client.log(
    prediction_id=predictionId,
    model_id=modelId,
    model_version=modelVersion,
    features={
        "age_in_years": 23.0,
        "gender": 0,
        "pclass": 3,
        "sibsp": 0,
        "parch": 0,
        "fare": 6.975,
    },
    prediction={"survived": {"label": 1, "confidence": 1}},
    timestamp=int(round(time.time() * 1000)),
)

Logs are currently being aggregated every 60 mins by default. This can be changed in custom deployments—reach out to us if you need a different frequency.

`log_actual()`

If the actual wasn't available when log() was called, it can be updated at a later time using log_actual(). This can be the case for certain types of models where the ground truth isn't immediately available.

Arguments

Type

Description

prediction_id

int

The prediction ID against which you want to update the actual

Required

actual

dict

A dictionary containing actual to be updated. The keys are the target feature headings and the values are the ground truth values of the feature.

Required

model_id

string

The model ID for the prediction for which you need to update the actual

Required

model_version

string

The model version for the prediction for which you need to update the actual

Required

Keys in theactualthe attribute should match the target attribute of the model.

client.update_actual(
    prediction_id=predictionId,
    model_id=modelId,
    model_version=modelVersion,
    actual={
        "survived": 1,
    },
)

`log_explanations()`

Arguments

Type

Description

prediction_id

int

The prediction ID against which you want to update the actual

Required

model_id

string

The model ID for the prediction for which you need to update the actual

Required

model_version

string

The model version for the prediction for which you need to update the actual

Required

explanation_type

enum

The type of explanation. Currently supports ExplanationType.SHAP

Required

explanation_values

dict

A dictionary containing features and their explanations. The keys are the target feature headings and the values are the explanation values.

Required

client.log_explanation(
    prediction_id=predictionId,
    model_id=modelId,
    model_version=modelVersion,
    explanation_type=ExplanationType.SHAP,
    explanation_values={
        "age_in_years": 0.467479,
        "gender": 0.038536,
        "pclass": 0.665614,
        "sibsp": 0.607935,
        "parch": 0.240294,
        "fare": -0.522526,
    },
)

Bulk Log Insertion

`bulk_log()`

This function enables you to send the predictions, actuals, and explanations logs in bulk. It can be integrated as part of the production environment where you are collecting the model logs and send them all together in a single insertion call (something like once-in-a-day frequency).

💡

The following cases have to be part of bulk_log calls

All logs (predictions, actuals, and explanations) details must be present in the bulk_log call.
Combination of predictions, and explanations.
Just the predictions.
Just the actuals (predictions have to be sent prior)
Combination of *actuals, and explanations. (predictions have to be sent prior) Note: Just the explanation logging is not accepted

Arguments

Type

Description

input

Pandas Dataframe

Pandas DataFrame of bulk logs containing predictions, actuals, and explanations values.

Required

model_id

string

The model ID against which you want to log the bulk insertion.

Required

model_version

string

The version of the model for which you want to log the bulk insertion.

Required

prediction_id_column

string

Name of the <ID> column in input DataFrame. The values of this columns must be NOT NULL & Unique.

Required

predictions

object

The object used is Prediction.Tabular this collect information regarding the predictions and feature columns in the input DataFrame. More details in Prediction.Tabular table below.

optional

actuals

string

Name of the column in input DataFrame which refers to the values of Actual.

optional

explanations

object

The object used is Explanation.Tabular this collect information regarding the explanations values, explanation type, and feature columns in the input DataFrame. More details in Explanation.Tabular table below.

optional

`Prediction.Tabular`

Arguments

Type

Description

timestamp_column

timestamp

Name of the column which specify the timestamp for each prediction in the input DataFrame.

Required

prediction_column

string

Name of the column which specify the Prediction values in the input DataFrame. This column must be NOT NULL.

Required

prediction_confidence_column

float

Name of the column which specify the prediction_score value in the input DataFrame. This column must be NOT NULL.

Required

features

list<object>

List of object with a mapping of registered features to column names in the input DataFrame. Example: {"feature": "Age" , "input_column": "age_in_years"}

Here, “Age” was mentioned while registering model, and “age_in_years” is a column in DataFrame which corresponds to “Age” feature values in bulk_logs. | Optional

`Explanation.Tabular`

Arguments

Type

Description

type

enum

The type of explanation. Currently supports ExplanationType.SHAP

Required

explanation_mapper

list<object>

List of object with a mapping of registered features to column names in the input DataFrame. Example: {"feature": "Age" , "input_column": "age_shap"}

Here, “Age” was mentioned while registering model and “age_shap” is a column in DataFrame which corresponds to SHAP values of “Age” feature in bulk_logs.

Required

Download the Titanic dataset bulk log sample file here.

from censius.ml import CensiusClient, Prediction, Explanation, ExplanationType
import pandas as pd

BULK_LOG_CSV_PATH = "<path-to-csv>"
bulk_log_data = pd.read_csv(BULK_LOG_CSV_PATH)

client.bulk_log(
    input=bulk_log_data[:], prediction_id_column="log_id", 
    model_id=modelId, model_version=modelVersion,
    predictions=Prediction.Tabular(
        timestamp_column="timestamp",
        prediction_column="prediction_survived",
        prediction_confidence_column="prediction_confidence",
        features=[
            {"feature": "age_in_years", "input_column": "age_in_years"},
            {"feature": "gender", "input_column": "gender"},
            {"feature": "pclass", "input_column": "pclass"},
            {"feature": "sibsp", "input_column": "sibsp"},
            {"feature": "parch", "input_column": "parch"},
            {"feature": "fare", "input_column": "fare"},
        ],
    ),
    actuals="actual_survived",
    explanations=Explanation.Tabular(
        type=ExplanationType.SHAP,
        explanation_mapper=[
            {"feature": "age_in_years", "input_column": "age_shap"},
            {"feature": "gender", "input_column": "gender_shap"},
            {"feature": "pclass", "input_column": "pclass_shap"},
            {"feature": "sibsp", "input_column": "sibsp_shap"},
            {"feature": "parch", "input_column": "parch_shap"},
            {"feature": "fare", "input_column": "fare_shap"},
        ],
    ),
)

PreviousAPI Reference NextNLP API's

Last updated 2 years ago

hashtagInitialization

hashtagRegister Dataset

hashtagregister_dataset()

hashtagregister_model()

hashtagregister_new_model_version()

hashtagLogging predictions, features, and explanations

hashtaglog()

hashtaglog_actual()

hashtaglog_explanations()

hashtagBulk Log Insertion

hashtagbulk_log()

hashtagPrediction.Tabular

hashtagExplanation.Tabular

Initialization

Register Dataset

`register_dataset()`

`register_model()`

`register_new_model_version()`

Logging predictions, features, and explanations

`log()`

`log_actual()`

`log_explanations()`

Bulk Log Insertion

`bulk_log()`

`Prediction.Tabular`

`Explanation.Tabular`