Skip Nav Destination

*PDF*
*PDF*
*PDF*
*PDF*

Update search

### NARROW

Format

Journal

TocHeadingTitle

Date

Availability

1-4 of 4

Yu Wang

Close
**Follow your search**

Access your saved searches in your account

Would you like to receive an alert when new items match your search?

*Close Modal*

Sort by

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2017) 29 (7): 1986–2003.

Published: 01 July 2017

Abstract

View article
Since combining features from heterogeneous data sources can significantly boost classification performance in many applications, it has attracted much research attention over the past few years. Most of the existing multiview feature analysis approaches separately learn features in each view, ignoring knowledge shared by multiple views. Different views of features may have some intrinsic correlations that might be beneficial to feature learning. Therefore, it is assumed that multiviews share subspaces from which common knowledge can be discovered. In this letter, we propose a new multiview feature learning algorithm, aiming to exploit common features shared by different views. To achieve this goal, we propose a feature learning algorithm in a batch mode, by which the correlations among different views are taken into account. Multiple transformation matrices for different views are simultaneously learned in a joint framework. In this way, our algorithm can exploit potential correlations among views as supplementary information that further improves the performance result. Since the proposed objective function is nonsmooth and difficult to solve directly, we propose an iterative algorithm for effective optimization. Extensive experiments have been conducted on a number of real-world data sets. Experimental results demonstrate superior performance in terms of classification against all the compared approaches. Also, the convergence guarantee has been validated in the experiment.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2017) 29 (2): 313–331.

Published: 01 February 2017

FIGURES
| View All (15)

Abstract

View article
Binary undirected graphs are well established, but when these graphs are constructed, often a threshold is applied to a parameter describing the connection between two nodes. Therefore, the use of weighted graphs is more appropriate. In this work, we focus on weighted undirected graphs. This implies that we have to incorporate edge weights in the graph measures, which require generalizations of common graph metrics. After reviewing existing generalizations of the clustering coefficient and the local efficiency, we proposed new generalizations for these graph measures. To be able to compare different generalizations, a number of essential and useful properties were defined that ideally should be satisfied. We applied the generalizations to two real-world networks of different sizes. As a result, we found that not all existing generalizations satisfy all essential properties. Furthermore, we determined the best generalization for the clustering coefficient and local efficiency based on their properties and the performance when applied to two networks. We found that the best generalization of the clustering coefficient is , defined in Miyajima and Sakuragawa ( 2014 ), while the best generalization of the local efficiency is , proposed in this letter. Depending on the application and the relative importance of sensitivity and robustness to noise, other generalizations may be selected on the basis of the properties investigated in this letter. Abstract Binary undirected graphs are well established, but when these graphs are constructed, often a threshold is applied to a parameter describing the connection between two nodes. Therefore, the use of weighted graphs is more appropriate. In this work, we focus on weighted undirected graphs. This implies that we have to incorporate edge weights in the graph measures, which require generalizations of common graph metrics. After reviewing existing generalizations of the clustering coefficient and the local efficiency, we proposed new generalizations for these graph measures. To be able to compare different generalizations, a number of essential and useful properties were defined that ideally should be satisfied. We applied the generalizations to two real-world networks of different sizes. As a result, we found that not all existing generalizations satisfy all essential properties. Furthermore, we determined the best generalization for the clustering coefficient and local efficiency based on their properties and the performance when applied to two networks. We found that the best generalization of the clustering coefficient is , defined in Miyajima and Sakuragawa ( 2014 ), while the best generalization of the local efficiency is , proposed in this letter. Depending on the application and the relative importance of sensitivity and robustness to noise, other generalizations may be selected on the basis of the properties investigated in this letter.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2017) 29 (2): 519–554.

Published: 01 February 2017

FIGURES
| View All (12)

Abstract

View article
A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error. Abstract A cross-validation method based on replications of two-fold cross validation is called an cross validation. An cross validation is used in estimating the generalization error and comparing of algorithms’ performance in machine learning. However, the variance of the estimator of the generalization error in cross validation is easily affected by random partitions. Poor data partitioning may cause a large fluctuation in the number of overlapping samples between any two training (test) sets in cross validation. This fluctuation results in a large variance in the cross-validated estimator. The influence of the random partitions on variance becomes serious as increases. Thus, in this study, the partitions with a restricted number of overlapping samples between any two training (test) sets are defined as a block-regularized partition set. The corresponding cross validation is called block-regularized cross validation ( BCV). It can effectively reduce the influence of random partitions. We prove that the variance of the BCV estimator of the generalization error is smaller than the variance of cross-validated estimator and reaches the minimum in a special situation. An analytical expression of the variance can also be derived in this special situation. This conclusion is validated through simulation experiments. Furthermore, a practical construction method of BCV by a two-level orthogonal array is provided. Finally, a conservative estimator is proposed for the variance of estimator of the generalization error.

Journal Articles

Publisher: Journals Gateway

*Neural Computation*(2016) 28 (8): 1694–1722.

Published: 01 August 2016

FIGURES

Abstract

View article
In typical machine learning applications such as information retrieval, precision and recall are two commonly used measures for assessing an algorithm's performance. Symmetrical confidence intervals based on K -fold cross-validated t distributions are widely used for the inference of precision and recall measures. As we confirmed through simulated experiments, however, these confidence intervals often exhibit lower degrees of confidence, which may easily lead to liberal inference results. Thus, it is crucial to construct faithful confidence (credible) intervals for precision and recall with a high degree of confidence and a short interval length. In this study, we propose two posterior credible intervals for precision and recall based on K -fold cross-validated beta distributions. The first credible interval for precision (or recall) is constructed based on the beta posterior distribution inferred by all K data sets corresponding to K confusion matrices from a K -fold cross-validation. Second, considering that each data set corresponding to a confusion matrix from a K -fold cross-validation can be used to infer a beta posterior distribution of precision (or recall), the second proposed credible interval for precision (or recall) is constructed based on the average of K beta posterior distributions. Experimental results on simulated and real data sets demonstrate that the first credible interval proposed in this study almost always resulted in degrees of confidence greater than 95%. With an acceptable degree of confidence, both of our two proposed credible intervals have shorter interval lengths than those based on a corrected K -fold cross-validated t distribution. Meanwhile, the average ranks of these two credible intervals are superior to that of the confidence interval based on a K -fold cross-validated t distribution for the degree of confidence and are superior to that of the confidence interval based on a corrected K -fold cross-validated t distribution for the interval length in all 27 cases of simulated and real data experiments. However, the confidence intervals based on the K -fold and corrected K -fold cross-validated t distributions are in the two extremes. Thus, when focusing on the reliability of the inference for precision and recall, the proposed methods are preferable, especially for the first credible interval.