QUESTIONS_MACHINE

Which is different from the others:

Expectation Maximisation
Apriori
K-means
Decision Tree

What is the cross validation:

A technique to obtain a good estimation of the performance of a classifier when it will be used with data different from the training set
A technique to obtain a good estimation of the performance of a classifier with the training set
A technique to improve the quality of a classifier
A technique to improve the speed of a classifier

What is the single linkage?

A method to compute the distance between two objects, it can be used in hierarchical clustering
A method to compute the distance between two sets of items, it can be used in hierarchical clustering
A method to compute the distance between two classes, it can be used in decision trees
A method to compute the separation of the objects inside a cluster

Which of the following characteristic of data can reduce the effectiveness of DBSCAN?

Clusters have concavities
All the variables are the same range of values
Presence of outliers
Presence of clusters with different densities

Which of the following types of data allows the use of the euclidean distance?

Points in a vector space
Ordered data
Transactional data
Document representations

How does pruning work when generating frequent itemsets?

If an itemset is not frequent, then none of its supersets can be frequent, therefore the frequencies of the supersets are not evaluated
If an itemset is frequent, then none of its supersets can be frequent, therefore the frequencies of the supersets are not evaluated
If an itemset is not frequent, then none of its subsets can be frequent, therefore the frequencies of the subsets are not evaluated
If an itemset is frequent, then none of its subsets can be frequent, therefore the frequencies of the subsets are not evaluated

In a decision tree, the number of objects in a node…

… is smaller than the number of objects in its ancestor
… is not related to the number of objects in its ancestor
… is smaller than or equal to the number of objects in its ancestor
… is bigger than the number of objects in its ancestor

What does K-means try to minimise?

The distortion, that is the sum of the squared distances of each point with respect to the points of the other clusters
The distortion, that is the sum of the squared distances of each point with respect to its centroid
The separation, that is the sum of the squared distances of each point with respect to its centroid
The separation, that is the sum of the squared distances of each cluster centroid with respect to the global centroid of the dataset

Given the following definitions: TP = True Positives; TN = True Negatives; FP = False Positives; FN = False Negatives. Which of the formulas below computes the precision of a binary classifier?

$TP / (TP + FP)$
$(TP + TN) / (TP + FP + TN + FN)$
$TN / (TN + FP)$
$TP / (TP + FN)$

Given the two binary vectors below, which is their similarity according to the Jaccard Coefficient?

1000101101
1011101010

0.1
0.2
0.5
0.375

Consider the transactional dataset below: Which is the support of the rule A,C→B

ID Items
A,B,C
A,B,D
B,D,E
C,D
A,C,D,E

100%
20%
40%
50%

Which of the following statements is true?

The noise can generate outliers
The noise always generate outliers
The data which are similar to the majority are never noise
Outliers can be due to noise

Which of the following clustering methods is not based on distances

between objects?
Hierarchical Agglomerative
DBSCAN
K-Means
Expectation Maximization

In a decision tree, an attribute which is used only in nodes near the leaves…

… is irrelevant with respect to the target
… guarantees high increment of purity
… gives little insight with respect to the target
… has a high correlation with respect to the target

Which of the statements below is true? (Only one)

Sometimes k-means stops to a configuration which does not give the minimum distortion for the chosen value of the number of clusters
K-means always stop to a configuration which gives the minimum distortion for the chosen value of the number of clusters
K-means works well also with datasets having a very large number of attributes
K-means finds the number of clusters which give the minimum distortion

Which of the statements below is true? (One or more)

Increasing the radius of the neighourhood can decrease the number of noise points
DBSCAN always stops to a configuration which gives the optimal number of clusters
Sometimes DBSCAN stops to a configuration which does not include any cluster
DBScan can give good performance when clusters have concavities

In data preparation which is the effect of normalisation?

Map all the numeric attributes to the same range, without altering the distribution, in order to avoid that attributes with large ranges have more influence
Map all the numeric attributes subtracting the respective mean and dividing by variance
Map all the numeric attributes to the same range and to have a Gaussian (or normal distribution, in order to avoid that attributes with large ranges have more influence
Map all the numeric attributes in order to have a Gaussian (or 𝘯𝘰𝘳𝘮𝘢𝘭## 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯, in order to avoid that attributes with large ranges have more influence

Consider the transactional dataset below: Which is the confidence of the rule A,C→B

ID Items
A,B,C
A,B,D
B,D,E
C,D
A,C,D,E

50%
40%
20%
100%

Which of the statements below about Hierarchical Agglomerative Clustering is true?

Requires the definition of Inertia of clusters
Requires the definition of distance between sets of objects
Is very efficient also with large datasets
Is based on a well founded statistical model

When training a neural network, what is the learning rate?

The speed of convergence to a stable solution during the learning process
The ratio between the size of the hidden layer and the input layer of the network
The slope of the activation function in a specific node
A multiplying factor of the correction to be applied to the connection weights

What are the hyperparameters of a Neural Network? (Possibly non exhaustive)

Hidden layers structure, Output layer structure, Activation function, Number of epochs
Network structure, Learning rate, Backpropagation algorithm, Number of epochs
Input layers structure, Learning rate, Activation function, Number of epochs
Hidden layers structure, Learning rate, Activation function, Number of epochs

Which of the following preprocessing activities is useful to build a Naive Bayes classifier if the independence hypothesis is violated:

Discretisation
Normalisation
Standardisation
Feature selection

Which is different from the others?

Silhouette Index
Misclassification Error
Entropy
Gini Index

Why do we prune a decision tree?

To eliminate parts of the tree where the decisions could be influenced by random effects
To eliminate rows of the dataset which could be influenced by random effects
To eliminate parts of the tree where the decision could generate underfitting
To eliminate attributes which could be influenced by random effects

In a dataset with D attributes, how many subsets of attributes should be considered for feature selection according tp an exhaustive search?

$o (2^{d})$
$o (D!)$
$o (D)$
$o (D^{2})$

In which mining activity the Information Gain can be useful?

Discovery of association rules
Classification
Clustering
Discretization

Given the two binary vectors below, which is their similarity according to the Simple Matching Coefficient?

1000101101
1011101010

0.1
0.3
0.5
0.2

Which is the main reason for the standardization of numeric attributes?

Change the distribution of the numeric attributes, in order to obtain gaussian distributions
Map all the numeric attributes to a new range such that the mean is zero and the variance is one
Map all the nominal attributes to the same range, in order to prevent the values with higher frequency from having prevailing influence
Remove non-standard values

Which of the following measure can be used as an alternative to the Information Gain?

Sihlouette Index
Gini Index
Rand Index
Jaccard Index

Which of the following is not an objective of feature selection

Select the features with higher range, which have more influence on the computation
Reduce time and memory complexity of the mining algorithms
Reduce the effect of noise
Avoid the curse of dimensionality

After fitting DBSCAN with the default parameter values the results are: 0 clusters, 100% of noise points. Which will be your next trial?

Reduce the minimum number of objects in the neighborhood
Reduce the minimum number of objects in the neighborhood and the radius of the neighborhood
Increase the radius of the neighborhood
Decrease the radius of the neighborhood

What is the meaning of the statement “the support is anti-monotone”?

The support of an itemset never exceeds the support if its subsets
The support of an itemset is always smaller than the support of its supersets
The support of an itemsets never exceeds the support if its supersets
The support of an itemset is always smaller than the support of its subsets

In feature selection, what is the Principal Component Analysis?

A mathematical technique used to transform non numeric attributes into numeric attributes
A mathematical technique used to transform a set of numeric attributes into a smaller set of numeric attributes which capture most of the variability in data
A heuristic technique used to find a subset of the attributes which produces the same classifier
A mathematical technique used to find the principal attributes which determine the classification process

A Decision Tree is…

A tree-structured plan of tests on single attributes to forecast the cluster
A tree-structured plan of tests on single attributes to forecast the target
A tree-structured plan of tests on single attributes to obtain the maximum purity of a node
A tree-structured plan of tests on multiple attributes to forecast the target

In data preprocessing, which of the following are the objectives of the aggregation of attributes?

Obtain a more detailed description of data
Obtain a less detailed scale
Reduce the variability of data
Reduce the number of attributes or distinct values

In order to reduce the dimensionality of a dataset, which is the advantage of Multi Dimensional Scaling (MDS), with respect to Principal Component Analysis (PCA)

MDS requires less computational power
MDS can be used also with categorical data, provided that the matrix of the distance is available, while PCA is limited to vector spaces
MDS can be used with categorical data after a transformation in a vector space
MDS can work on any kind of data, while PCA is limited to categorical data

What is the main purpose of smoothing in Bayesian classification?

Classifying an object containing attribute values which are missing from some classes in the training set
Dealing with missing values
Reduce the variability of the data
Classifying an object containing attribute values which are missing from some classes in the test set

Consider the transactional dataset below: Which is the confidence of the rule B → E ?

ID Items
A,B,C
A,B,D
B,D,E
C,D
A,C,D,E

100%
50%
33%
20%

In a Decision Tree for classification, what is a lead node?

A node which assigns a class value to the objects passing the tests on the path from the root to the node itself
A node where all the objects belong to the same class
A node which allows classification without errors
A node which assigns a class value only by majority of examples

Match the rule evaluation formulas with their names:

1. $s u p (A \cup C) - s u p (A) s u p (C)$
1. $\frac{co n f ( A \Rightarrow C )}{s u p ( C )}$
1. $\frac{s u p ( A \Rightarrow C )}{s u p ( A )}$
1. $\frac{1 - s u p ( C )}{1 - co n f ( A \Rightarrow C )}$ Options:
Confidence → 3
Lift → 2
Leverage → 1
Conviction → 4

Which of the statements below best describes the strategy of Apriori in finding the frequent itemsets?

Evaluation of the support of the itemsets in an order such that uninteresting parts of the search space are considered only at the end of the execution
Evaluation of the confidence of the itemsets in an order such that uninteresting parts of the search space are pruned as soon as possible
Evaluation of the support of the itemsets in an order such that uninteresting parts of the search space are pruned as soon as possible
Evaluation of the support of the itemsets in an order such that the interesting parts of the search space are pruned as soon as possible

What is the Gini index?

A measure of the entropy of a dataset
An accuracy measure of a dataset alternative to the Information Gain and to the Misclassification Index
An impurity measure of a dataset alternative to the Information Gain and to the Misclassification Index
An impurity measure of a dataset alternative to overfitting and underfitting

What measure is maximised by the Expectation Maximisation algorithm for clustering?

The likelihood of a class label, given the attributes of the example
The likelihood of an attribute, given the class label
The likelihood of an example
The support of a class

Given the following definitions: TP = True Positives; TN = True Negatives; FP = False Positives; FN = False Negatives. Which of the formulas below computes the accuracy of a binary classifier?

$\frac{TN}{TN + FP}$
$\frac{TP + TN}{TP + FP + TN + FN}$
$\frac{TP}{TP + FN}$
$\frac{TP}{TP + FP}$

answer: 2

In data preprocessing, which of the following is not an objective of the aggregation of attributes

Reduce the variability of data
Obtain a less detailed scale
Reduce the number of attributes or objects
Obtain a more detailed description of data

Which of the following is a strength of the clustering algorithm DBSCAN?

Requires to set the number of clusters as a parameter
Ability to find cluster with concavities
Ability to separate outliers from regular data
Very fast by computation

Which of the following statements regarding the discovery of association rules is true? (One or more)

The confidence of a rule can be computed starting from the supports of the itemsets
The support of an itemset is anti-monotonic with respect to the composition of the itemset
The support of a rule can be computed given the confidence of the rule
The confidence of an itemset is anti-monotonic with respect to the composition of the itemset

For each type of data choose the best suited distance function

Vector space with real values → euclidean distance
Vectors of terms representing documents → cosine distance
High dimensional spaces → Manhattan distance
Boolean data → jaccard coefficient

Which of the following is not a strength point of Dbscan with respect to K-means

The robustness with respect to outliers
The effectiveness, even in presence of noise
The effectiveness even if there are clusters with non-convex shape
The efficiency even in large datasets

Which is the effect of the curse of dimensionality

When the number of dimensions increases the results tend to be prone to overfitting
When the number of dimensions increases the euclidean distance becomes less effective to discriminate between points in the space
When the number of dimensions increases the computing power necessary to compute the distances becomes too high
When the number of dimensions increases the classifiers cannot be correctly tuned

Which is different from the others?

Expectation Maximisation
Decision Tree
K-means
Dbscan

In a Neural Network, what is the backpropagation?

The technique used to adjust the node weights according to the difference between the desired output and the output generated by the network
The technique used to adjust the weights limiting the probability of overfitting
The technique used to adjust the output according to the difference between the desired weights and the actual weights
The technique used to adjust the connection weights according to the difference between the desired output and the output generated by the network

Which is different from the others?

Dbscan
SVM
Neural Network
Decision Tree

Which of the following is not a property of a metric distance function

Symmetry
Positive definiteness
Triangle inequality
Boundedness

Which of the statements below is true? (One or more)

K-mean always stops to a configuration which gives the minimum distortion for the chosen value of the number of clusters
K-means is very sensitive to the initial assignment of the centers
K-means is quite efficient even for large datasets
Sometimes k-means stops to a configuration which does not give the minimum distortion for the chosen value of the number of clusters

Which of the following characteristic of data can reduce the effectiveness of K-Means?

Presence of values with high frequency
Presence of outliers
All the variables have the same distribution of values
All the variables have the same range of values

Which is the purpose of discretisation?

Increase the number of distinct values in an attribute, in order to put in evidence possible patterns and regularities
Reduce the number of distinct values in an attribute, in order to increase the efficiency of the computations
Reduce the number of distinct values in an attribute, in order to put in evidence possible patterns and regularities
Reduce the range of values of a numeric attribute, to make all the attributes more comparable

Given the following definitions: TP = True Positives; TN = True Negatives; FP = False Positives; FN = False Negatives. Which of the formulas below computes the recall of a binary classifier?

$\frac{TP}{TP + FN}$
$\frac{TN}{TN + FP}$
$\frac{TP}{TP + FP}$
$\frac{TP + TN}{TP + FP + TN + FN}$

answer: 1

The Information Gain is used to

select the attribute which maximises, for a given training set, the ability to predict the class value
select the attribute which maximises, for a given test set, the ability to predict the class value
select the attribute which maximises, for a given training set, the ability to predict all the other attribute values
select the class with maximum probability

Which is the main reason for the MinMax scaling (also known as “rescaling”) of attributes?

Change the distribution of the numeric attributes, in order to obtain gaussian distributions
Map all the nominal attributes to the same range in order to prevent the values with higher frequency from having prevailing influence
Remove abnormal values
Map all the numeric attributes to the same range, in order to prevent attributes with higher range from having prevalent influence

Which of the following is a base hypothesis for a bayesian classifier?

The attributes must be statistically independent inside each class
The attributes must have negative correlation
The attributes must be statistically independent
The attributes must have zero correlation

When developing a classifier, which of the following is a symptom of overfitting?

The error rate in the test set is much smaller than the error rate in the training set
The precision is much greater than the recall
The error rate in the test set is more than 30%
The error rate in the test set is much greater than the error rate in the training set

With reference to the total sum of squared errors and separation of a clustering scheme, which of the statements below is true?

They are two ways to measure the same thing
They are strictly correlated, if, changing the clustering scheme, one increases, then the other does the same
It is possible to optimise them (i.e. minimise SSE and maximise SSB) separately
They are strictly correlated, if, changing the clustering scheme, one increases, then the other decreases

Explorer

Which is different from the others:

What is the cross validation:

What is the single linkage?

Which of the following characteristic of data can reduce the effectiveness of DBSCAN?

Which of the following types of data allows the use of the euclidean distance?

How does pruning work when generating frequent itemsets?

In a decision tree, the number of objects in a node…

What does K-means try to minimise?

Given the following definitions: TP = True Positives; TN = True Negatives; FP = False Positives; FN = False Negatives. Which of the formulas below computes the precision of a binary classifier?

Given the two binary vectors below, which is their similarity according to the Jaccard Coefficient?

Consider the transactional dataset below: Which is the support of the rule A,C→B

Which of the following statements is true?

Which of the following clustering methods is not based on distances

In a decision tree, an attribute which is used only in nodes near the leaves…

Which of the statements below is true? (Only one)

Which of the statements below is true? (One or more)

In data preparation which is the effect of normalisation?

Consider the transactional dataset below: Which is the confidence of the rule A,C→B

Which of the statements below about Hierarchical Agglomerative Clustering is true?

When training a neural network, what is the learning rate?

What are the hyperparameters of a Neural Network? (Possibly non exhaustive)

Which of the following preprocessing activities is useful to build a Naive Bayes classifier if the independence hypothesis is violated:

Which is different from the others?

Why do we prune a decision tree?

In a dataset with D attributes, how many subsets of attributes should be considered for feature selection according tp an exhaustive search?

In which mining activity the Information Gain can be useful?

Given the two binary vectors below, which is their similarity according to the Simple Matching Coefficient?

Which is the main reason for the standardization of numeric attributes?

Which of the following measure can be used as an alternative to the Information Gain?

Which of the following is not an objective of feature selection

After fitting DBSCAN with the default parameter values the results are: 0 clusters, 100% of noise points. Which will be your next trial?

What is the meaning of the statement “the support is anti-monotone”?

In feature selection, what is the Principal Component Analysis?

A Decision Tree is…

In data preprocessing, which of the following are the objectives of the aggregation of attributes?

In order to reduce the dimensionality of a dataset, which is the advantage of Multi Dimensional Scaling (MDS), with respect to Principal Component Analysis (PCA)

What is the main purpose of smoothing in Bayesian classification?

Consider the transactional dataset below: Which is the confidence of the rule B → E ?

In a Decision Tree for classification, what is a lead node?

Match the rule evaluation formulas with their names:

Which of the statements below best describes the strategy of Apriori in finding the frequent itemsets?

What is the Gini index?

What measure is maximised by the Expectation Maximisation algorithm for clustering?

Given the following definitions: TP = True Positives; TN = True Negatives; FP = False Positives; FN = False Negatives. Which of the formulas below computes the accuracy of a binary classifier?

In data preprocessing, which of the following is not an objective of the aggregation of attributes

Which of the following is a strength of the clustering algorithm DBSCAN?

Which of the following statements regarding the discovery of association rules is true? (One or more)

For each type of data choose the best suited distance function

Which of the following is not a strength point of Dbscan with respect to K-means

Which is the effect of the curse of dimensionality

Which is different from the others?

In a Neural Network, what is the backpropagation?

Which is different from the others?

Which of the following is not a property of a metric distance function

Which of the statements below is true? (One or more)

Which of the following characteristic of data can reduce the effectiveness of K-Means?

Which is the purpose of discretisation?

Given the following definitions: TP = True Positives; TN = True Negatives; FP = False Positives; FN = False Negatives. Which of the formulas below computes the recall of a binary classifier?

The Information Gain is used to

Which is the main reason for the MinMax scaling (also known as “rescaling”) of attributes?

Which of the following is a base hypothesis for a bayesian classifier?

When developing a classifier, which of the following is a symptom of overfitting?

With reference to the total sum of squared errors and separation of a clustering scheme, which of the statements below is true?

Graph View

Table of Contents

Backlinks