*Sebastian Raschka*

last updated: **03/29/2014**

Executed in Python 3.4.0

• 1. Introduction

- Defining a criterion function for testing

• 2. Sequential Forward Selection (SFS)

- SFS Code

- Example SFS

• 3. Sequential Backward Selection (SBS)

- SBS Code

- Example SBS

• 4. "Plus L take away R" (+L -R)

- +L -R Code

- Example +L -R

- Example 1: L > R

- Example 2: R > L

• 5. Sequential Floating Forward Selection (SFFS)

- SFFS Code

- Example SFFS

• 6. Sequential Floating Backward Selection (SFBS)

- SFBS Code

- Example SFBS

One of the biggest challenges of designing a good classifier for solving a ** Statistical Pattern Classification** problem is to estimate the underlying parameters to fit the model - given that the forms of the underlying probability distributions are known. The larger the number of parameters becomes, the more difficult it naturally is to estimate those parameters accurately (

In order to avoid the ** Curse of Dimensionality**, pattern classification is often accompanied by

An alternative to a projection-based dimensionality reduction approach is the so-called ** Feature Selection**, and in this article, we will take a look at some of the established algorithms to tackle this combinatorial search problem. Note that those algorithms are considered as "subpoptimal" in contrast to an

Therefore, the goal of the presented ** sequential selection algorithms** is to reduce the feature space $D = {x_1, x_2, x_n}$ to a subset of features $D - n$ in order to improve or optimize the

The goal is to select a "sufficiently reduced" subset from the feature space $D$ "without significantly reducing" the performance of the classifier. In the process of choosing an "optimal" feature subset of size $k$, a so-called

F. Ferri, P. Pudil, M. Hatef, and J. Kittler investigated the performance of different ** Sequential Selection Algorithms** for

Choosing an "appropriate" algorithm really depends on the problem - the size and desired recognition rate and computational performance. Thus, I want to encourage you to take (at least) a brief look at their paper and the results they obtained from experimenting with different problems feature space dimensions.

In order to evaluate the performance of our selected ** feature subset** (typically the recognition rate of the classifier), we need to define a

For the sake of simplicity, and in order to get an intuitive idea if our algorithm returns an "appropriate" result, let us define a very simple criterion function here.

The criterion function defined below simply returns the sum of numerical values in a list.

```
In [1]:
```def simple_crit_func(feat_sub):
""" Returns sum of numerical values of an input list. """
return sum(feat_sub)
# Example:
simple_crit_func([1,2,4])

```
Out[1]:
```

The ** Sequential Fortward Selection (SFS)** is one of the simplest and probably fastest

Let's summarize its mechanics in words:

Note that included features are never removed, which is one of the biggest downsides of this algorithm.

**Input:** the set of all features, $Y = {y_1, y_2, ..., y_d}$

- The
algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions (*SFS*).*d = 10*

**Output:** a subset of features, $X_k = {x_j \;|\; j = 1, 2, ..., k; x_j \in Y}$, where $k = (0, 1, 2, ..., d)$

- The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, d = 10$).

**Initialization:** $X_0 = "null set", k = 0$

- We initialize the algorithm with an empty set ("null set") so that the $k = 0$ (where $k$ is the size of the subset)

**Step 1 (Inclusion):**

$\hat{x}^+ = \text{ arg max } J(x_k+x) \quad, \text{where } x \in Y - X_k$

$X_k+1 = X_k + \hat{x}^+$

$k = k + 1$

*Go to Step 1*

- We go through the
and look for the feature $x^+$ which maximizes our criterion if we add it to the*feature space*(where*feature subset*is the criterion function). We repeat this process until we reach the*J*criterion.*Termination*

**Termination:** stop when $k$ equals the number of desired features

- We add features to the new feature subset $X_k$ until we reach the number of specified features for our final subset. E.g., if our desired number of features is 5 and we start with the "null set" (
*Initialization*), we would add features to the subset until it contains 5 features.

```
In [2]:
```# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Forward Selection (SBS)
def seq_forw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of a Sequential Forward Selection algorithm.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = []
k = 0
d = len(features)
if max_k > d:
max_k = d
while True:
# Inclusion step
if print_steps:
print('\nInclusion from feature space', features)
crit_func_max = criterion_func(feat_sub + [features[0]])
best_feat = features[0]
for x in features[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
feat_sub.append(best_feat)
if print_steps:
print('include: {} -> feature subset: {}'.format(best_feat, feat_sub))
features.remove(best_feat)
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub

In this example, we take a look at the individual steps of the ** SFS** algorithmn to select a

The input feature space consists of 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

```
In [3]:
```def example_seq_forw_select():
ex_features = [6, 3, 1, 6, 8, 2, 3, 7, 9, 1]
res_forw = seq_forw_select(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, print_steps=True)
return res_forw
# Run example
res_forw = example_seq_forw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_forw)

```
```

The returned result is definitely what we would expect: the 3 highest values (note that we defined 3 as the number of desired features for our subset) in the input feature list, since our ** criterion** is to select the numerical values (= features) that yield the maximum mathematical sum in the feature subset.

The ** Sequential Backward Selection (SBS)** algorithm is very similar to the

Note that features are never added back once they were removed, which (similar to

- The
algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions (*SBS*).*d = 10*

**Output:** a subset of features, $X_k = {x_j \; | \; j = 1, 2, ..., k; x_j \in Y}$, where $k = (0, 1, 2, ..., d)$

- The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, d = 10$).

**Initialization:** $X_0 = Y$, $k = d$

- We initialize the algorithm with the given feature set so that the
(where*k = d*has the size of the feature set $d$)*k*

**Step 1 (Exclusion):**

$x^- = \text{ arg max } J(x_k - x), \text{ where } x \in X_k$

$X_k-1 = X_k - x^-$

$k = k - 1$

*Go to Step 1*

- We go through the
and look for the feature $x^-$ which minimizes our criterion if we*feature subset*it from the*remove*(where $J$ is the criterion function). We repeat this process until we reach the*feature subset*criterion.*Termination*

**Termination:** stop when $k$ equals the number of desired features

- We remove features from the feature subset $X_k$ until we reach the number of specified features for our final subset. E.g., if our desired number of features is 5 and we start with the entire feature space (
*Initialization*), we would remove features from the subset until it contains 5 features.

```
In [4]:
```# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Backward Selection (SBS)
from copy import deepcopy
def seq_backw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of a Sequential Backward Selection algorithm.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = deepcopy(features)
k = len(feat_sub)
i = 0
while True:
# Exclusion step
if print_steps:
print('\nExclusion from feature subset', feat_sub)
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for i in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub

Like we did for the ** SFS example** above, we take a look at the individual steps of the

The input feature space consists of 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

```
In [5]:
```def example_seq_backw_select():
ex_features = [6,3,1,6,8,2,3,7,9,1]
res_backw = seq_backw_select(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, print_steps=True)
return (res_backw)
# Run example
res_backw = example_seq_backw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_backw)

```
```

The returned ** feature subset** is similar to the result of the

The ** "Plus L take away R" (+L -R)** is basically a combination of

**Variant 1: L > R**

If *L > R*, the algorithm starts with an empty ** feature subset** and adds

Those steps are repeated until the

Else, if

Those steps are repeated until the

- The
algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions ($d = 10$).*+L -R*

**Output:** a subset of features, $X_k = {x_j \; | \; j = 1, 2, ..., k; x_j \in Y}$, where $k = (0, 1, 2, ..., d)$

- The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, d = 10$).

**Initialization:** $X_0 = Y$, $k = d$

- We initialize the algorithm with the given feature set so that the $k = d$ (where $k$ has the size of the feature set $d$)

**Step 1 (Inclusion):**

*repeat L-times:*

$x^+ = \text{ arg max } J(x_k + x), \text{ where } x \in Y - X_k$

$X_k+1 = X_k + x^+$

$k = k + 1$

*Go to Step 2*

**Step 2 (Exclusion):**

*repeat R-times:*

$x^- = \text{ arg max } J(x_k - x), \text{where } x \in X_k$

$X_k-1 = X_k - x^-$
$k = k - 1$

*Go to Step 1*

- In step 1, we go
*L-times*through theand look for the feature*feature space*which maximizes our criterion if we add it to the*x^+*(where*feature subset**J()*is the criterion function). Then we go over to step 2. - In step 2, we go
*R-times*through theand look for the feature*feature subset*which minimizes our criterion if we remove it from the*x^-*(where*feature subset*is the criterion function). Then we go back to step 1.*J()* - Note that this order of steps only applies if
, in the opposite case (*L > R*), we have to start with the*R > L*on the whole*Exlusion*, followed by inclusion of removed features.*feature space*

**Termination:** stop when ** k** equals the number of desired features

- We add and remove features from the feature subset
until we reach the number of specified features for our final subset. E.g., if our desired number of features is 5 and we start with the entire feature space (*X_k**Initialization*), we would remove features from the subset until it contains 5 features.

```
In [6]:
```# Sebastian Raschka
# last updated: 03/29/2014
# "Plus L take away R" (+L -R)
from copy import deepcopy
def plus_L_minus_R(features, max_k, criterion_func, L=3, R=2, print_steps=False):
"""
Implementation of a "Plus l take away r" algorithm.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
L (int): Number of features added per iteration.
R (int): Number of features removed per iteration.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
assert(L != R), 'L must be != R to avoid an infinite loop'
############################
### +L -R for case L > R ###
############################
if L > R:
feat_sub = []
k = 0
# Initialization
while True:
# +L (Inclusion)
if print_steps:
print('\nInclusion from features', features)
for i in range(L):
if len(features) > 0:
crit_func_max = criterion_func(feat_sub + [features[0]])
best_feat = features[0]
if len(features) > 1:
for x in features[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
features.remove(best_feat)
feat_sub.append(best_feat)
if print_steps:
print('include: {} -> feature_subset: {}'.format(best_feat, feat_sub))
# -R (Exclusion)
if print_steps:
print('\nExclusion from feature_subset', feat_sub)
for i in range(R):
if len(features) + len(feat_sub) > max_k:
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for j in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:j] + feat_sub[j+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = j, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub
############################
### +L -R for case L < R ###
############################
else:
# Initialization
feat_sub = deepcopy(features)
k = len(feat_sub)
i = 0
count = 0
while True:
count += 1
# Exclusion step
removed_feats = []
if print_steps:
print('\nExclusion from feature subset', feat_sub)
for i in range(R):
if len(feat_sub) > max_k:
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for i in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
removed_feats.append(feat_sub.pop(worst_feat))
if print_steps:
print('exclude: {} -> feature subset: {}'.format(removed_feats, feat_sub))
# +L (Inclusion)
included_feats = []
if len(feat_sub) != max_k:
for i in range(L):
if len(removed_feats) > 0:
crit_func_max = criterion_func(feat_sub + [removed_feats[0]])
best_feat = removed_feats[0]
if len(removed_feats) > 1:
for x in removed_feats[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
removed_feats.remove(best_feat)
feat_sub.append(best_feat)
included_feats.append(best_feat)
if print_steps:
print('\nInclusion from removed features', removed_feats)
print('include: {} -> feature_subset: {}'.format(included_feats, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
if count >= 30:
break
return feat_sub

Like we did for the ** SFS** example above, let's have look at the individual steps of the

Again, the input feature space consists of the 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

```
In [7]:
```def example_plus_L_minus_R():
ex_features = [6, 3, 1, 6, 8, 2, 3, 7, 9, 1]
res_plmr = plus_L_minus_R(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, L=3, R=2, print_steps=True)
return (res_plmr)
# Run example
res = example_plus_L_minus_R()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res)

```
```

```
In [8]:
```def example_plus_L_minus_R():
ex_features = [6, 3, 1, 6, 8, 2, 3, 7, 9, 1]
res_plmr = plus_L_minus_R(features=ex_features, max_k=3,\
criterion_func=simple_crit_func, L=2, R=3, print_steps=True)
return (res_plmr)
# Run example
res = example_plus_L_minus_R()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res)

```
```

The returned ** feature subset** really is suboptimal in this particular example for L > R (Example 1: L > R). This is due to the fact that we add multiple features to our

**Modifying the "Plus L take away R" algorithm**

This algorithm can be tweaked by adding back *r - 1* features to the ** feature subset** after each

The ** Sequential Floating Forward Selection (SFFS)** algorithm can be considered as extension of the simpler

- The
algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions (*SFFS*).*d = 10*

**Output:** a subset of features, $X_k = \{x_j \; | \;j = 1, 2, ..., k; \; x_j \in Y\}$, where $k = (0, 1, 2, ..., d)$

- The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space (
).*k = 5, d = 10*

**Initialization:** $X_0 = Y$, $k = d$

- We initialize the algorithm with an empty set ("null set") so that the
(where*k = 0*is the size of the subset)*k*

**Step 1 (Inclusion):**

$x^+ = \text{ arg max } J(x_k + x), \text{ where } x \in Y - X_k$

$X_k+1 = X_k + x^+$

$k = k + 1$

*Go to Step 2*

**Step 2 (Conditional Exclusion):**

$x^- = \text{ arg max } J(x_k - x), \text{ where } x \in X_k$

$if \; J(x_k - x) > J(x_k - x)$:

$X_k-1 = X_k - x^- $

$k = k - 1$

*Go to Step 1*

- In step 1, we include the feature from the
that leads to the best performance increase for our*feature space*(assessed by the*feature subset*). Then, we go over to step 2*criterion function* - In step 2, we only remove a feature if the resulting subset would gain an increase in performance. We go back to step 1.
- Steps 1 and 2 are reapeated until the
**Termination**criterion is reached.

**Termination:** stop when ** k** equals the number of desired features

```
In [9]:
```# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Floating Forward Selection (SFFS)
def seq_float_forw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of Sequential Floating Forward Selection.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = []
k = 0
while True:
# Step 1: Inclusion
if print_steps:
print('\nInclusion from features', features)
if len(features) > 0:
crit_func_max = criterion_func(feat_sub + [features[0]])
best_feat = features[0]
if len(features) > 1:
for x in features[1:]:
crit_func_eval = criterion_func(feat_sub + [x])
if crit_func_eval > crit_func_max:
crit_func_max = crit_func_eval
best_feat = x
features.remove(best_feat)
feat_sub.append(best_feat)
if print_steps:
print('include: {} -> feature_subset: {}'.format(best_feat, feat_sub))
# Step 2: Conditional Exclusion
worst_feat_val = None
if len(features) + len(feat_sub) > max_k:
crit_func_max = criterion_func(feat_sub)
for i in reversed(range(0,len(feat_sub))):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
if worst_feat_val:
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Termination condition
k = len(feat_sub)
if k == max_k:
break
return feat_sub

Since the ** Exclusion** step in the

`simple_criterion_func()`

, which returns the integer sum of a feature set. Thus, let us define another simple criterion function that we use for testing our Just as we did for the previous examples above, let's take a look at the individual steps of the ** SFFS** algorithmn to select a

Also here, the input feature space consists of the 10 integers: 6, 3, 1, 6, 8, 2, 3, 7, 9, 1,

and our criterion is to find a subset of size 3 in this

The criterion function we define below is also calculates the sum of a subset similar to the `simple_criterion_func()`

we used before. However, here we add a random integer ranging from -15 to 15 to the returned sum. Therefore, in some occasions, our criterion function can return a larger sum for a smaller subset - after we removed a feature from the subset after the ** Inclusion** step - in order to trigger the

```
In [10]:
```from random import randint
def simple_rand_crit_func(feat_sub):
"""
Returns sum of numerical values of an input list plus
a random integer ranging from -15 to 15.
"""
return sum(feat_sub) + randint(-15,15)
# Example:
simple_rand_crit_func([1,2,4])

```
Out[10]:
```

```
In [11]:
```def example_seq_float_forw_select():
ex_features = [6,3,1,6,8,2,3,7,9,1]
res_seq_flforw = seq_float_forw_select(features=ex_features, max_k=3,\
criterion_func=simple_rand_crit_func, print_steps=True)
return res_seq_flforw
# Run example
res_seq_flforw = example_seq_float_forw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_seq_flforw)

```
```

Just as in the ** SFFS** algorithm, we have a conditional step: Here, we start with the whole feature subset and exclude features sequentially. Only if adding one of the previously excluded features back to a new

- The
algorithm takes the whole feature set as input, if our feature space consists of, e.g. 10, if our feature space consists of 10 dimensions (*SFBS*).*d = 10*

**Output:** a subset of features, $X_k = {x_j | j = 1, 2, ..., k; x_j \in Y}$, where $k = (0, 1, 2, ..., d)$

- The returned output of the algorithm is a subset of the feature space of a specified size. E.g., a subset of 5 features from a 10-dimensional feature space ($k = 5, d = 10$).

**Initialization:** $X_0 = Y$, $k = d$

- We initialize the algorithm with the given feature set so that the $k = d$ (where $k$ has the size of the feature set $d$)

**Step 1 (Exclusion):**

$x^- = \text{ arg max } J(x_k - x), \text{ where } x \in X_k$

$X_k-1 = X_k - x^-$

$k = k - 1$

*Go to Step 2*

**Step 2 (Conditional Inclusion):**

$x^+ = \text{ arg max } J(x_k + x), \text{ where } x \in Y - X_k$

*if J(x_k + x) > J(x_k + x)*:

$X_k+1 = X_k + x^+$

$k = k + 1$
*Go to Step 1*

- In step 1, we exclude the feature from the
that yields the best performance increase of the*feature space*(assessed by the*feature subset*). Then, we go over to step 2.*criterion function* - In step 2, we only include one of the removed features if the resulting subset would gain an increase in performance. We go back to step 1.
- Steps 1 and 2 are reapeated until the
**Termination**criterion is reached.

```
In [16]:
```# Sebastian Raschka
# last updated: 03/29/2014
# Sequential Floating Backward Selection (SFBS)
from copy import deepcopy
def seq_float_backw_select(features, max_k, criterion_func, print_steps=False):
"""
Implementation of Sequential Floating Backward Selection.
Keyword Arguments:
features (list): The feature space as a list of features.
max_k: Termination criterion; the size of the returned feature subset.
criterion_func (function): Function that is used to evaluate the
performance of the feature subset.
print_steps (bool): Prints the algorithm procedure if True.
Returns the selected feature subset, a list of features of length max_k.
"""
# Initialization
feat_sub = deepcopy(features)
k = len(feat_sub)
i = 0
excluded_features = []
while True:
# Termination condition
k = len(feat_sub)
if k == max_k:
break
# Step 1: Exclusion
if print_steps:
print('\nExclusion from feature subset', feat_sub)
worst_feat = len(feat_sub)-1
worst_feat_val = feat_sub[worst_feat]
crit_func_max = criterion_func(feat_sub[:-1])
for i in reversed(range(0,len(feat_sub)-1)):
crit_func_eval = criterion_func(feat_sub[:i] + feat_sub[i+1:])
if crit_func_eval > crit_func_max:
worst_feat, crit_func_max = i, crit_func_eval
worst_feat_val = feat_sub[worst_feat]
excluded_features.append(feat_sub[worst_feat])
del feat_sub[worst_feat]
if print_steps:
print('exclude: {} -> feature subset: {}'.format(worst_feat_val, feat_sub))
# Step 2: Conditional Inclusion
if len(excluded_features) > 0 and len(feat_sub) != max_k:
best_feat = None
best_feat_val = None
crit_func_max = criterion_func(feat_sub)
for i in range(len(excluded_features)):
crit_func_eval = criterion_func(feat_sub + [excluded_features[i]])
if crit_func_eval > crit_func_max:
best_feat, crit_func_max = i, crit_func_eval
best_feat_val = excluded_features[best_feat]
if best_feat:
feat_sub.append(excluded_features[best_feat])
del excluded_features[best_feat]
if print_steps:
print('include: {} -> feature subset: {}'.\
format(best_feat_val, feat_sub))
return feat_sub

Note that the ** Inclusion** step in the

Therefore, we have to be a little bit careful about the

`simple_criterion_func()`

, which returns the integer sum of a subset, we would trigger an infinite loop and never reach the termination criterion - assuming that our feature space consists of positive integers. The reason is that the `simple_criterion_func()`

would always return a larger sum if we include a positive integer (our feature) to the In order to reduce the number of iterations, we set the number of desired features for the

```
In [18]:
```def example_seq_float_backw_select():
ex_features = [6,3,1,6,8,2,3,7,9,1]
res_seq_flbackw = seq_float_backw_select(features=ex_features, max_k=7,\
criterion_func=simple_rand_crit_func, print_steps=True)
return res_seq_flbackw
# Run example
res_seq_flbackw = example_seq_float_backw_select()
print('\nRESULT: [6, 3, 1, 6, 8, 2, 3, 7, 9, 1] ->', res_seq_flbackw)

```
```