Datasets
Image Dataset
- class torchflare.datasets.ImageDataset(convert_mode: str, *args, **kwargs)[source]
Class to create the dataset for Image Classification.
- classmethod from_csv(csv_path: Union[str, pathlib.Path], path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[albumentations.Compose] = None, convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]
Classmethod to read inputs from the given csv.
- Parameters
path – The path where images are saved.
csv_path – Full path to the csv file.
input_columns – A list containing names of the image columns containing the image name/ids.
transforms – The augmentations to be used on images.
extension – The image file extension.
convert_mode – The mode to be passed to PIL.Image.convert.
Example
from torchflare.datasets import ImageDataset import albumentations as A ds = ImageDataset.from_csv(csv_path = "train/train.csv", path = "train/images", input_columns = ['image_ids'], transforms = A.Compose([A.Resize(256,256)] ).targets_from_df(target_columns = ["targets"])
- classmethod from_df(df: pandas.DataFrame, path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[albumentations.Compose] = None, convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]
Classmethod to read inputs from the given dataframe.
- Parameters
path – The path where images are saved.
df – The dataframe containing the image name/ids, and the targets
input_columns – A list containing name/names of the image columns containing the image name/ids.
transforms – The augmentations to be used on images.
extension – The image file extension.
convert_mode – The mode to be passed to PIL.Image.convert.
Example
from torchflare.datasets import ImageDataset ds = ImageDataset.from_df(df = df, path = "train/images", input_columns = ['image_ids'], transforms = A.Compose([A.Resize(256,256)] ).targets_from_df(target_columns = ["targets"])
- classmethod from_folders(path: Union[str, pathlib.Path], transforms: Optional[albumentations.Compose] = None, convert_mode: str = 'RGB', **kwargs)[source]
Classmethod to create pytorch dataset from folders.
- Parameters
path – The path where images are stored.
transforms – The transforms to be applied to images.
convert_mode – The mode to be passed to PIL.Image.convert.
Note
Augmentations must be Compose objects from albumentations.
- The training directory structure should be as follows:
train/class_1/xxx.jpg . . train/class_n/xxz.jpg
- The test directory structure should be as follows:
test_dir/xxx.jpg test_dir/xyz.jpg test_dir/ppp.jpg
Example
from torchflare.datasets import ImageDataset import albumentations as A ds = ImageDataset.from_folders( path="/train/images", transforms=A.Compose[A.Resize(256, 256)], convert_mode="RGB" ).targets_from_folders(target_path="/train/images")
Tabular Dataset
- class torchflare.datasets.TabularDataset(items: List, transforms, df: Optional[pandas.DataFrame] = None, path: Optional[pathlib.Path] = None, **kwargs)[source]
PyTorch style datasets for Tabular-data.
- classmethod from_csv(csv_path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[Callable] = None, **kwargs)[source]
Classmethod to create pytorch style dataset from csv file.
- Parameters
csv_path – The full path to csv.
input_columns – A list containing name of input columns.
transforms – A callable which applies transforms on input data.
Example
from torchflare.datasets import TabularDataset ds = TabularDataset.from_csv( csv_path="/train/train_data.csv", feature_cols=["col1", "col2"] ).targets_from_df(target_columns=["labels"])
- classmethod from_df(df: pandas.DataFrame, input_columns: List[str], transforms: Optional[Callable] = None, **kwargs)[source]
Classmethod to create pytorch style dataset from dataframes.
- Parameters
df – The dataframe which has inputs, and the labels/targets.
input_columns – A list containing name of input columns.
transforms – A callable which applies transforms on input data.
Example
from torchflare.datasets import TabularDataset ds = TabularDataset.from_df(df=df, feature_cols=["col1", "col2"] ).targets_from_df(target_columns=["labels"])
Segmentation Dataset
- class torchflare.datasets.SegmentationDataset(input_cols, image_convert_mode, **kwargs)[source]
PyTorch style dataset for image segmentation.
- classmethod from_df(df: pandas.DataFrame, path: Union[str, pathlib.Path], input_columns: List[str], transforms: Optional[albumentations.Compose] = None, image_convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]
Method to read images from dataframe.
- Parameters
df – The dataframe containing the image names/ids.
input_columns – A list containing columns which have names of images.
path – The path where images are saved.
transforms – The transforms to be used on the inputs.
image_convert_mode – The mode to be passed to PIL.Image.convert.
extension – The extension of image file.
Example
from torchflare.datasets import SegmentationDataset ds = SegmentationDataset.from_df( df=df, path="/train/images", input_columns=["image_id"], extension=".jpg", augmentations=augs, image_convert_mode="RGB", ).masks_from_rle(mask_cols=["EncodedPixles"], mask_size=(320, 320), num_classes=4)
- classmethod from_folders(image_path: Union[str, pathlib.Path], transforms: Optional[albumentations.Compose] = None, image_convert_mode: str = 'RGB', extension: Optional[str] = None, **kwargs)[source]
Classmethod to create pytorch dataset from folders.
- Parameters
image_path – The path where images are stored.
transforms – The transforms to apply on images and masks.
image_convert_mode – The mode to be passed to PIL.Image.convert for input images
extension – The extension for image like .jpg, etc
Example
from torchflare.datasets import SegmentationDataset ds = SegmentationDataset.from_folders( image_path="/train/images", transforms=augs, image_convert_mode="L", ).masks_from_folders(mask_convert_mode="L", mask_path="/train/masks", mask_convert_mode = "L")
Text Dataset
- class torchflare.datasets.TextDataset(tokenizer, max_len, **kwargs)[source]
Class for text data as required by transformers.
- classmethod from_csv(csv_path: Union[str, pathlib.Path], input_columns: List[str], tokenizer=None, max_len=None, **kwargs)[source]
Classmethod to create the dataset from dataframe.
- Parameters
csv_path – The full path to csv.
input_columns – A list containing names of inputs columns.
tokenizer – The tokenizer to be used.(Use only tokenizer available in huggingface.
max_len (int) – The max_len to be used.
Example
import transformers from torchflare.datasets import TextClassificationDataset tokenizer = transformers.BertTokenizer.from_pretrained("bert-base-uncased") ds = TextClassificationDataset.from_csv( csv_path="/train/train.csv", input_col="tweet", tokenizer=tokenizer, max_len=128 ).targets_from_df(target_columns=["label"])
- classmethod from_df(df: pandas.DataFrame, input_columns: List[str], tokenizer=None, max_len=None, **kwargs)[source]
Classmethod to create the dataset from dataframe.
- Parameters
df – The dataframe which has the input sentences and targets.
input_columns – A list containing names of input columns.
tokenizer – The tokenizer to be used.(Use only tokenizer available in huggingface.
max_len (int) – The max_len to be used.
Example
import transformers from torchflare.datasets import TextClassificationDataset tokenizer = transformers.BertTokenizer.from_pretrained("bert-base-uncased") ds = TextClassificationDataset.from_df( df=df, input_col=["tweet"], tokenizer=tokenizer, max_len=128 ).targets_from_df(target_columns=["label"])