| Peer-Reviewed

Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data

Received: 9 March 2023    Accepted: 2 April 2023    Published: 15 April 2023
Views:       Downloads:
Abstract

With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods.

Published in International Journal of Statistical Distributions and Applications (Volume 9, Issue 2)
DOI 10.11648/j.ijsd.20230902.11
Page(s) 49-61
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

Semi-Parametric Models with Interaction, Variable Selection, Modal Regression, Adaptive LASSO

References
[1] Liang, H., H. Wang a, and C.-L. Tsai, Profiled forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models. Statistica Sinica, 2012. 22 (2): p. 531-554.
[2] Zhao, W., et al., Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 2013. 66 (1): p. 165-191.
[3] Zhang, R., W. Zhao, and J. Liu, Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. Journal of Nonparametric Statistics, 2013. 25 (2): p. 523-544.
[4] Bradley, E., et al., Least angle regression. The Annals of Statistics, 2004. 32 (2): p. 407-499.
[5] Hao, N. and H. H. Zhang, Interaction Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 2014. 109 (507): p. 1285-1301.
[6] Hao, N., Y. Feng, and H. H. Zhang, Model Selection for High-Dimensional Quadratic Regression via Regularization. Journal of the American Statistical Association, 2018. 113 (522): p. 615-625.
[7] Dong, Y. and H. Jiang, A Two-Stage Regularization Method for Variable Selection and Forecasting in High-Order Interaction Model. Complexity, 2018. 2018: p. 1-12.
[8] Lv, J., H. Yang, and C. Guo, Variable selection in partially linear additive models for modal regression. Communications in Statistics - Simulation and Computation, 2017. 46 (7): p. 5646-5665.
[9] Yao, W., B. G. Lindsay, and R. Li, Local Modal Regression. J Nonparametr Stat, 2012. 24 (3): p. 647-663.
[10] Li, J., S. Ray, and B. G. Lindsay, A nonparametric statistical approach to clustering via mode identification. Journal of Machine Learning Research, 2007. 8: p. 1687-1723.
[11] Fan, J. and R. Li, Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 2001. 96 (456): p. 1348-1360.
[12] Hao, N. and H. H. Zhang, A Note on High-Dimensional Linear Regression With Interactions. The American Statistician, 2018. 71 (4): p. 291-297.
[13] Wainwright, M. J., Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using-Constrained Quadratic Programming (Lasso). IEEE Transactions on Information Theory, 2009. 55 (5): p. 2183-2202.
[14] Liu, X., L. Wang, and H. Liang, Estimation and Variable Selection for Semiparametric Additive Partial Linear Models STATISTICA SINICA, 2011. 21 (3): p. 1225-1248.
[15] Zou, H., The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 2012. 101 (476): p. 1418-1429.
[16] R, J. and C. de Boor, A Practical Guide to Splines. Mathematics of Computation, 1980. 34 (149).
Cite This Article
  • APA Style

    Yafeng Xia, Na Kui. (2023). Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. International Journal of Statistical Distributions and Applications, 9(2), 49-61. https://doi.org/10.11648/j.ijsd.20230902.11

    Copy | Download

    ACS Style

    Yafeng Xia; Na Kui. Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. Int. J. Stat. Distrib. Appl. 2023, 9(2), 49-61. doi: 10.11648/j.ijsd.20230902.11

    Copy | Download

    AMA Style

    Yafeng Xia, Na Kui. Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. Int J Stat Distrib Appl. 2023;9(2):49-61. doi: 10.11648/j.ijsd.20230902.11

    Copy | Download

  • @article{10.11648/j.ijsd.20230902.11,
      author = {Yafeng Xia and Na Kui},
      title = {Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data},
      journal = {International Journal of Statistical Distributions and Applications},
      volume = {9},
      number = {2},
      pages = {49-61},
      doi = {10.11648/j.ijsd.20230902.11},
      url = {https://doi.org/10.11648/j.ijsd.20230902.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsd.20230902.11},
      abstract = {With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods.},
     year = {2023}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data
    AU  - Yafeng Xia
    AU  - Na Kui
    Y1  - 2023/04/15
    PY  - 2023
    N1  - https://doi.org/10.11648/j.ijsd.20230902.11
    DO  - 10.11648/j.ijsd.20230902.11
    T2  - International Journal of Statistical Distributions and Applications
    JF  - International Journal of Statistical Distributions and Applications
    JO  - International Journal of Statistical Distributions and Applications
    SP  - 49
    EP  - 61
    PB  - Science Publishing Group
    SN  - 2472-3509
    UR  - https://doi.org/10.11648/j.ijsd.20230902.11
    AB  - With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods.
    VL  - 9
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • School of Science, Lanzhou University of Technology, Lanzhou, China

  • School of Science, Lanzhou University of Technology, Lanzhou, China

  • Sections