ConstantColumns

task
Prepare
subtask
data cleaning
host
chemml
function
ConstantColumns
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of ChemML’s Constant class
types: (“<class ‘chemml.preprocessing.purge.ConstantColumns’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of ChemML’s Constant class
types: (“<class ‘chemml.preprocessing.purge.ConstantColumns’>”,)
removed_columns_ : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
wrapper parameters
func_method : string, (default:None)

choose one of: (‘fit_transform’, ‘transform’, None)
required packages
ChemML, 0.4.1
pandas, 0.20.3
config file view
##
<< host = chemml    << function = ConstantColumns
<< func_method = None
>> id df
>> id api
>> id df
>> id api
>> id removed_columns_

Note

The documentation page for function parameters:

MissingValues

task
Prepare
subtask
data cleaning
host
chemml
function
MissingValues
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of ChemML’s MissingValues class
types: (“<class ‘chemml.preprocessing.handle_missing.missing_values’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of ChemML’s MissingValues class
types: (“<class ‘chemml.preprocessing.handle_missing.missing_values’>”,)
wrapper parameters
func_method : String, (default:None)

choose one of: (‘fit_transform’, ‘transform’, None)
required packages
ChemML, 0.4.1
pandas, 0.20.3
config file view
##
<< host = chemml    << function = MissingValues
<< func_method = None
<< strategy = ignore_row
<< inf_as_null = True
<< string_as_null = True
<< missing_values = False
>> id df
>> id api
>> id df
>> id api

Note

The documentation page for function parameters:

Outliers

task
Prepare
subtask
data cleaning
host
chemml
function
Outliers
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of ChemML’s Constant class
types: (“<class ‘chemml.preprocessing.purge.Outliers’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of ChemML’s Constant class
types: (“<class ‘chemml.preprocessing.purge.Outliers’>”,)
removed_columns_ : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
wrapper parameters
func_method : string, (default:None)

choose one of: (‘fit_transform’, ‘transform’, None)
required packages
ChemML, 0.4.1
pandas, 0.20.3
config file view
##
<< host = chemml    << function = Outliers
<< func_method = None
<< m = 2.0
<< strategy = median
>> id df
>> id api
>> id df
>> id api
>> id removed_columns_

Note

The documentation page for function parameters:

Split

task
Prepare
subtask
data manipulation
host
chemml
function
Split
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
output tokens (senders)
df1 : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
df2 : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
required packages
ChemML, 0.4.1
pandas, 0.20.3
config file view
##
<< host = chemml    << function = Split
<< selection = 1
>> id df
>> id df1
>> id df2

Note

The documentation page for function parameters:

Binarizer

task
Prepare
subtask
feature representation
host
sklearn
function
Binarizer
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s Binarizer class
types: (“<class ‘sklearn.preprocessing.data.Binarizer’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s Binarizer class
types: (“<class ‘sklearn.preprocessing.data.Binarizer’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = Binarizer
<< track_header = True
<< func_method = None
<< threshold = 0.0
<< copy = True
>> id df
>> id api
>> id df
>> id api

Imputer

task
Prepare
subtask
data cleaning
host
sklearn
function
Imputer
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s Imputer class
types: (“<class ‘sklearn.preprocessing.imputation.Imputer’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s Imputer class
types: (“<class ‘sklearn.preprocessing.imputation.Imputer’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = Imputer
<< track_header = True
<< func_method = None
<< verbose = 0
<< missing_values = NaN
<< strategy = mean
<< copy = True
<< axis = 0
>> id df
>> id api
>> id df
>> id api

KFold

task
Prepare
subtask
split
host
sklearn
function
KFold
input tokens (receivers)
dfx : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
output tokens (senders)
api : instance of scikit-learn’s KFold class
types: (“<class ‘sklearn.model_selection._split.KFold’>”,)
fold_gen : Generator of indices to split data into training and test set
types: (“<type ‘generator’>”,)
wrapper parameters
func_method : string, (default:None)

choose one of: (‘split’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = KFold
<< func_method = None
<< random_state = None
<< shuffle = False
<< n_splits = 3
>> id dfx
>> id api
>> id fold_gen

KernelPCA

task
Prepare
subtask
feature transformation
host
sklearn
function
KernelPCA
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s KernelPCA class
types: (“<class ‘sklearn.decomposition.kernel_pca.KernelPCA’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s KernelPCA class
types: (“<class ‘sklearn.decomposition.kernel_pca.KernelPCA’>”,)
wrapper parameters
track_header : Boolean, (default:False)
Always False, the header of input dataframe is not equivalent with the transformed dataframe
choose one of: False
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; inverse_transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, ‘inverse_transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = KernelPCA
<< track_header = False
<< func_method = None
<< fit_inverse_transform = False
<< kernel = linear
<< n_jobs = 1
<< eigen_solver = auto
<< degree = 3
<< max_iter = None
<< copy_X = True
<< kernel_params = None
<< random_state = None
<< n_components = None
<< remove_zero_eig = False
<< tol = 0
<< alpha = 1.0
<< coef0 = 1
<< gamma = None
>> id df
>> id api
>> id df
>> id api

LeaveOneOut

task
Prepare
subtask
split
host
sklearn
function
LeaveOneOut
input tokens (receivers)
dfx : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
output tokens (senders)
api : instance of scikit-learn’s LeaveOneOut class
types: (“<class ‘sklearn.model_selection._split.LeaveOneOut’>”,)
fold_gen : Generator of indices to split data into training and test set
types: (“<type ‘generator’>”,)
wrapper parameters
func_method : string, (default:None)

choose one of: (‘split’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = LeaveOneOut
<< func_method = None
>> id dfx
>> id api
>> id fold_gen

MaxAbsScaler

task
Prepare
subtask
scaling
host
sklearn
function
MaxAbsScaler
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s MaxAbsScaler class
types: (“<class ‘sklearn.preprocessing.data.MaxAbsScaler’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s MaxAbsScaler class
types: (“<class ‘sklearn.preprocessing.data.MaxAbsScaler’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; inverse_transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, ‘inverse_transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = MaxAbsScaler
<< track_header = True
<< func_method = None
<< copy = True
>> id df
>> id api
>> id df
>> id api

MinMaxScaler

task
Prepare
subtask
scaling
host
sklearn
function
MinMaxScaler
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s MinMaxScaler class
types: (“<class ‘sklearn.preprocessing.data.MinMaxScaler’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s MinMaxScaler class
types: (“<class ‘sklearn.preprocessing.data.MinMaxScaler’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; inverse_transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, ‘inverse_transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = MinMaxScaler
<< track_header = True
<< func_method = None
<< copy = True
<< feature_range = (0, 1)
>> id df
>> id api
>> id df
>> id api

Normalizer

task
Prepare
subtask
scaling
host
sklearn
function
Normalizer
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s Normalizer class
types: (“<class ‘sklearn.preprocessing.data.Normalizer’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s Normalizer class
types: (“<class ‘sklearn.preprocessing.data.Normalizer’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = Normalizer
<< track_header = True
<< func_method = None
<< copy = True
<< norm = l2
>> id df
>> id api
>> id df
>> id api

OneHotEncoder

task
Prepare
subtask
feature representation
host
sklearn
function
OneHotEncoder
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s OneHotEncoder class
types: (“<class ‘sklearn.preprocessing.data.OneHotEncoder’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s OneHotEncoder class
types: (“<class ‘sklearn.preprocessing.data.OneHotEncoder’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = OneHotEncoder
<< track_header = True
<< func_method = None
<< dtype = <type'numpy.float64'>
<< categorical_features = all
<< n_values = auto
<< sparse = True
<< handle_unknown = error
>> id df
>> id api
>> id df
>> id api

PCA

task
Prepare
subtask
feature transformation
host
sklearn
function
PCA
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s PCA class
types: (“<class ‘sklearn.decomposition.pca.PCA’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s PCA class
types: (“<class ‘sklearn.decomposition.pca.PCA’>”,)
wrapper parameters
track_header : Boolean, (default:False)
Always False, the header of input dataframe is not equivalent with the transformed dataframe
choose one of: False
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; inverse_transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, ‘inverse_transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = PCA
<< track_header = False
<< func_method = None
<< svd_solver = auto
<< iterated_power = auto
<< random_state = None
<< whiten = False
<< tol = 0.0
<< copy = True
<< n_components = None
>> id df
>> id api
>> id df
>> id api

PolynomialFeatures

task
Prepare
subtask
feature representation
host
sklearn
function
PolynomialFeatures
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s PolynomialFeatures class
types: (“<class ‘sklearn.preprocessing.data.PolynomialFeatures’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s PolynomialFeatures class
types: (“<class ‘sklearn.preprocessing.data.PolynomialFeatures’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = PolynomialFeatures
<< track_header = True
<< func_method = None
<< include_bias = True
<< interaction_only = False
<< degree = 2
>> id df
>> id api
>> id df
>> id api

RobustScaler

task
Prepare
subtask
scaling
host
sklearn
function
RobustScaler
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s RobustScaler class
types: (“<class ‘sklearn.preprocessing.data.RobustScaler’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s RobustScaler class
types: (“<class ‘sklearn.preprocessing.data.RobustScaler’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; inverse_transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, ‘inverse_transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = RobustScaler
<< track_header = True
<< func_method = None
<< copy = True
<< with_scaling = True
<< with_centering = True
<< quantile_range = (25.0, 75.0)
>> id df
>> id api
>> id df
>> id api

ShuffleSplit

task
Prepare
subtask
split
host
sklearn
function
ShuffleSplit
input tokens (receivers)
dfx : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
output tokens (senders)
api : instance of scikit-learn’s ShuffleSplit class
types: (“<class ‘sklearn.model_selection._split.ShuffleSplit’>”,)
fold_gen : Generator of indices to split data into training and test set
types: (“<type ‘generator’>”,)
wrapper parameters
func_method : string, (default:None)

choose one of: (‘split’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = ShuffleSplit
<< func_method = None
<< n_splits = 10
<< train_size = None
<< random_state = None
<< test_size = default
>> id dfx
>> id api
>> id fold_gen

StandardScaler

task
Prepare
subtask
scaling
host
sklearn
function
StandardScaler
input tokens (receivers)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s StandardScaler class
types: (“<class ‘sklearn.preprocessing.data.StandardScaler’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
api : instance of scikit-learn’s StandardScaler class
types: (“<class ‘sklearn.preprocessing.data.StandardScaler’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
func_method : string, (default:None)
fit_transform: always make a new api; transform: must receive an api; inverse_transform: must receive an api; None: only make a new api
choose one of: (‘fit_transform’, ‘transform’, ‘inverse_transform’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = StandardScaler
<< track_header = True
<< func_method = None
<< copy = True
<< with_mean = True
<< with_std = True
>> id df
>> id api
>> id df
>> id api

StratifiedShuffleSplit

task
Prepare
subtask
split
host
sklearn
function
StratifiedShuffleSplit
input tokens (receivers)
dfx : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
output tokens (senders)
api : instance of scikit-learn’s StratifiedShuffleSplit class
types: (“<class ‘sklearn.model_selection._split.StratifiedShuffleSplit’>”,)
fold_gen : Generator of indices to split data into training and test set
types: (“<type ‘generator’>”,)
wrapper parameters
func_method : string, (default:None)

choose one of: (‘split’, None)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = StratifiedShuffleSplit
<< func_method = None
<< n_splits = 10
<< train_size = None
<< random_state = None
<< test_size = default
>> id dfx
>> id api
>> id fold_gen

train_test_split

task
Prepare
subtask
split
host
sklearn
function
train_test_split
input tokens (receivers)
dfy : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
dfx : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
output tokens (senders)
dfx_test : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
dfy_train : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
dfy_test : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
dfx_train : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
wrapper parameters
track_header : Boolean, (default:True)
if True, the input dataframe’s header will be transformed to the output dataframe
choose one of: (True, False)
required packages
scikit-learn, 0.19.0
pandas, 0.20.3
config file view
##
<< host = sklearn    << function = train_test_split
<< track_header = True
<< shuffle = True
<< train_size = None
<< random_state = None
<< test_size = 0.25
<< stratify = None
>> id dfy
>> id dfx
>> id dfx_test
>> id dfy_train
>> id dfy_test
>> id dfx_train

concat

task
Prepare
subtask
data manipulation
host
pandas
function
concat
input tokens (receivers)
df1 : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
df3 : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
df2 : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
output tokens (senders)
df : pandas dataframe
types: (“<class ‘pandas.core.frame.DataFrame’>”,)
required packages
pandas, 0.20.3
config file view
##
<< host = pandas    << function = concat
<< join = outer
<< verify_integrity = False
<< keys = None
<< levels = None
<< ignore_index = False
<< names = None
<< join_axes = None
<< copy = True
<< axis = 0
>> id df1
>> id df3
>> id df2
>> id df

Note

The documentation page for function parameters: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html