Datasets

Image Dataset

class torchflare.datasets.ImageDataset(convert_mode: str, *args, **kwargs)[source]

Class to create the dataset for Image Classification.

classmethod from_csv(csv_path: Union[str, pathlib.Path], path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[albumentations.Compose] = None, convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]

Classmethod to read inputs from the given csv.

Parameters

path – The path where images are saved.
csv_path – Full path to the csv file.
input_columns – A list containing names of the image columns containing the image name/ids.
transforms – The augmentations to be used on images.
extension – The image file extension.
convert_mode – The mode to be passed to PIL.Image.convert.

Example

from torchflare.datasets import ImageDataset
import albumentations as A

ds = ImageDataset.from_csv(csv_path = "train/train.csv",
    path = "train/images",
    input_columns = ['image_ids'],
    transforms = A.Compose([A.Resize(256,256)]
   ).targets_from_df(target_columns = ["targets"])

classmethod from_df(df: pandas.DataFrame, path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[albumentations.Compose] = None, convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]

Classmethod to read inputs from the given dataframe.

Parameters

path – The path where images are saved.
df – The dataframe containing the image name/ids, and the targets
input_columns – A list containing name/names of the image columns containing the image name/ids.
transforms – The augmentations to be used on images.
extension – The image file extension.
convert_mode – The mode to be passed to PIL.Image.convert.

Example

from torchflare.datasets import ImageDataset

ds = ImageDataset.from_df(df = df,
    path = "train/images",
    input_columns = ['image_ids'],
    transforms = A.Compose([A.Resize(256,256)]
).targets_from_df(target_columns = ["targets"])

classmethod from_folders(path: Union[str, pathlib.Path], transforms: Optional[albumentations.Compose] = None, convert_mode: str = 'RGB', **kwargs)[source]

Classmethod to create pytorch dataset from folders.

Parameters

path – The path where images are stored.
transforms – The transforms to be applied to images.
convert_mode – The mode to be passed to PIL.Image.convert.

Note

Augmentations must be Compose objects from albumentations.

The training directory structure should be as follows:: train/class_1/xxx.jpg . . train/class_n/xxz.jpg
The test directory structure should be as follows:: test_dir/xxx.jpg test_dir/xyz.jpg test_dir/ppp.jpg

Example

from torchflare.datasets import ImageDataset

import albumentations as A

ds = ImageDataset.from_folders(
    path="/train/images",
    transforms=A.Compose[A.Resize(256, 256)],
    convert_mode="RGB"
).targets_from_folders(target_path="/train/images")

Tabular Dataset

class torchflare.datasets.TabularDataset(items: List, transforms, df: Optional[pandas.DataFrame] = None, path: Optional[pathlib.Path] = None, **kwargs)[source]

PyTorch style datasets for Tabular-data.

classmethod from_csv(csv_path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[Callable] = None, **kwargs)[source]

Classmethod to create pytorch style dataset from csv file.

Parameters

csv_path – The full path to csv.
input_columns – A list containing name of input columns.
transforms – A callable which applies transforms on input data.

Example

from torchflare.datasets import TabularDataset

ds = TabularDataset.from_csv(
    csv_path="/train/train_data.csv", feature_cols=["col1", "col2"]
    ).targets_from_df(target_columns=["labels"])

classmethod from_df(df: pandas.DataFrame, input_columns: List[str], transforms: Optional[Callable] = None, **kwargs)[source]

Classmethod to create pytorch style dataset from dataframes.

Parameters

df – The dataframe which has inputs, and the labels/targets.
input_columns – A list containing name of input columns.
transforms – A callable which applies transforms on input data.

Example

from torchflare.datasets import TabularDataset
ds = TabularDataset.from_df(df=df,
        feature_cols=["col1", "col2"]
    ).targets_from_df(target_columns=["labels"])

Segmentation Dataset

class torchflare.datasets.SegmentationDataset(input_cols, image_convert_mode, **kwargs)[source]

PyTorch style dataset for image segmentation.

add_test()[source]: Method to create dataset for inference.

classmethod from_df(df: pandas.DataFrame, path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[albumentations.Compose] = None, image_convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]

Method to read images from dataframe.

Parameters

df – The dataframe containing the image names/ids.
input_columns – A list containing columns which have names of images.
path – The path where images are saved.
transforms – The transforms to be used on the inputs.
image_convert_mode – The mode to be passed to PIL.Image.convert.
extension – The extension of image file.

Example

from torchflare.datasets import SegmentationDataset
ds = SegmentationDataset.from_df(
    df=df,
    path="/train/images",
    input_columns=["image_id"],
    extension=".jpg",
    augmentations=augs,
    image_convert_mode="RGB",
).masks_from_rle(mask_cols=["EncodedPixles"],
                mask_size=(320, 320),
                num_classes=4)

classmethod from_folders(image_path: Union[str, pathlib.Path], transforms: Optional[albumentations.Compose] = None, image_convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]

Classmethod to create pytorch dataset from folders.

Parameters

image_path – The path where images are stored.
transforms – The transforms to apply on images and masks.
image_convert_mode – The mode to be passed to PIL.Image.convert for input images
extension – The extension for image like .jpg, etc

Example

from torchflare.datasets import SegmentationDataset
ds = SegmentationDataset.from_folders(
        image_path="/train/images",
        transforms=augs,
        image_convert_mode="L",
    ).masks_from_folders(mask_convert_mode="L",
    mask_path="/train/masks",
    mask_convert_mode = "L")

masks_from_folders(mask_path: Union[str, pathlib.Path], mask_convert_mode: str)[source]

Read masks from folders.

Parameters

mask_path – The path where masks are stored.
mask_convert_mode – The mode to be passed to PIL.Image.convert for masks.

masks_from_rle(shape: Tuple[int, int], num_classes: int, mask_columns: Optional[List[str]])[source]

Create masks from rule length encoding.

Parameters

mask_columns – The list of columns containing the rule length encoding.
shape – The shape for masks.
num_classes – The number of num_classes

Text Dataset

class torchflare.datasets.TextDataset(tokenizer, max_len, **kwargs)[source]

Class for text data as required by transformers.

classmethod from_csv(csv_path: Union[str, pathlib.Path], input_columns: List[str], tokenizer=None, max_len=None, **kwargs)[source]

Classmethod to create the dataset from dataframe.

Parameters

csv_path – The full path to csv.
input_columns – A list containing names of inputs columns.
tokenizer – The tokenizer to be used.(Use only tokenizer available in huggingface.
max_len (int) – The max_len to be used.

Example

import transformers
from torchflare.datasets import TextClassificationDataset

tokenizer = transformers.BertTokenizer.from_pretrained("bert-base-uncased")

ds = TextClassificationDataset.from_csv(
    csv_path="/train/train.csv", input_col="tweet", tokenizer=tokenizer, max_len=128
    ).targets_from_df(target_columns=["label"])

classmethod from_df(df: pandas.DataFrame, input_columns: List[str], tokenizer=None, max_len=None, **kwargs)[source]

Classmethod to create the dataset from dataframe.

Parameters

df – The dataframe which has the input sentences and targets.
input_columns – A list containing names of input columns.
tokenizer – The tokenizer to be used.(Use only tokenizer available in huggingface.
max_len (int) – The max_len to be used.

Example

import transformers
from torchflare.datasets import TextClassificationDataset

tokenizer = transformers.BertTokenizer.from_pretrained("bert-base-uncased")

ds = TextClassificationDataset.from_df(
    df=df, input_col=["tweet"], tokenizer=tokenizer, max_len=128
    ).targets_from_df(target_columns=["label"])