当前位置:优学网  >  在线题库

从下划线分隔的字符串中提取第n个元素

发表时间:2022-07-20 00:07:36 阅读:85

我想提取myproductamyproductb.

我认为用正则表达式可以,但只适用于:cc string,但不适用于aa.怎么会这样?两者长度相同.

aa <- "e220juju_uk_yy_aon_aon_conversion_mystore_facebook-network_ppl_primaria_myproducta_galaxycombos_20220520"
cc <- "e220tyty_bo_oo_aon_aon_conversion_mystore_facebook-network_ppl_lal_myproductb_wd95m4473mw_diasdecyber_20220718"

正则表达式部分:

gsub(cc, pattern = ".*_.*_.*_.*_.*_.*_.*_.*_.*_(.*)_.*_.*_.*", replacement = "\\1", perl = TRUE) #works: returns: myproductb

gsub(aa, pattern = ".*_.*_.*_.*_.*_.*_.*_.*_.*_(.*)_.*_.*_.*", replacement = "\\1", perl = TRUE) #don't work: returns: primaria
🎖️ 优质答案
  • 以下是一些方法

    read.table(text = aa, sep = "_")[[11]]
    ## [1] "myproducta"
    
    strsplit(aa, "_")[[1]][11]
    ## [1] "myproducta"
    
    scan(text = aa, sep = "_", what = "", quiet = TRUE)[11]
    ## [1] "myproducta"
    
    sub("^(([^_]*)_){10}([^_]*)_.*", "\\3", aa)
    ## [1] "myproducta"
    
  • You can use anchors and a negated character class, and then repeat 10 times matching an underscore before capturing the 11th occurrence.

    ^(?:[^_]*_){10}([^_]*).*$

    Regex demo | R demo

    aa <- "e220juju_uk_yy_aon_aon_conversion_mystore_facebook-network_ppl_primaria_myproducta_galaxycombos_20220520"
    cc <- "e220tyty_bo_oo_aon_aon_conversion_mystore_facebook-network_ppl_lal_myproductb_wd95m4473mw_diasdecyber_20220718"
    
    pattern <- "^(?:[^_]*_){10}([^_]*).*$"
    
    gsub(pattern, "\\1", aa, perl = TRUE)
    gsub(pattern, "\\1", cc, perl = TRUE)

    Output:

    [1] "myproducta"
    [1] "myproductb"
  • 相关问题