Grokbase Groups R r-help June 2016
FAQ
Dear Group,


I am trying to simulate a dataset with 200 individuals with random
assignment of Sex (1,0) and Weight from lognormal distribution specific to
Sex. I am intrigued by the behavior of rlnorm function to impute a value
of Weight from the specified distribution. Here is the code:
ID<-1:200
Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
fulldata<-data.frame(ID,Sex)
fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog
= sqrt(0.0329)),
                     rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)))


mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85


I see that the number of simulated values has an effect on the mean
calculated after imputation. That is, the code rlnorm(100, meanlog =
log(73), sdlog = sqrt(0.0442)) gives much better match compared to
rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in
the code above.


My understanding is that ifelse will be imputing only one value where the
condition is met as specified. I appreciate your insights on the behavior
for better performance of increasing sample number. I appreciate your
comments.


Regards,
Ayyappa


  [[alternative HTML version deleted]]

Search Discussions

  • Thierry Onkelinx at Jun 14, 2016 at 3:15 pm
    Dear Ayyappa,


    ifelse works on a vector. See the example below.


    ifelse(
       sample(c(TRUE, FALSE), size = length(letters), replace = TRUE),
       letters,
       LETTERS
    )


    However, note that it will recycle short vectors when they are not of equal
    length.


    ifelse(
       sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE),
       letters,
       LETTERS
    )


    In your code the length of the condition vector is 200, the length of the
    two other vectors is 100.


    Best regards,


    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
    Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium


    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey


    2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach@gmail.com>:

    Dear Group,

    I am trying to simulate a dataset with 200 individuals with random
    assignment of Sex (1,0) and Weight from lognormal distribution specific to
    Sex. I am intrigued by the behavior of rlnorm function to impute a value
    of Weight from the specified distribution. Here is the code:
    ID<-1:200
    Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
    fulldata<-data.frame(ID,Sex)
    fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1), sdlog
    = sqrt(0.0329)),
    rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)))

    mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
    mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85

    I see that the number of simulated values has an effect on the mean
    calculated after imputation. That is, the code rlnorm(100, meanlog =
    log(73), sdlog = sqrt(0.0442)) gives much better match compared to
    rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in
    the code above.

    My understanding is that ifelse will be imputing only one value where the
    condition is met as specified. I appreciate your insights on the behavior
    for better performance of increasing sample number. I appreciate your
    comments.

    Regards,
    Ayyappa

    [[alternative HTML version deleted]]

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]
  • Thierry Onkelinx at Jun 14, 2016 at 3:42 pm
    Please keep r-help in cc.


    Yes. Have a look at this example


    ifelse(
       sample(c(TRUE, FALSE), size = 0.5 * length(letters), replace = TRUE),
       letters,
       LETTERS
    )




    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
    Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium


    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey


    2016-06-14 17:31 GMT+02:00 Ayyappa Chaturvedula <ayyappach@gmail.com>:

    Thank you very much for your kind support. The length of my condition
    vector is ~80 because I want only Sex==1 and else will be the other. I
    understand now how ifelse works. If the vector of the simulated vector is
    longer than the condition vector, then it takes the first few elements to
    match the length of condition vector and discards the rest?

    Regards,
    Ayyappa

    On Tue, Jun 14, 2016 at 10:15 AM, Thierry Onkelinx <
    thierry.onkelinx at inbo.be> wrote:
    Dear Ayyappa,

    ifelse works on a vector. See the example below.

    ifelse(
    sample(c(TRUE, FALSE), size = length(letters), replace = TRUE),
    letters,
    LETTERS
    )

    However, note that it will recycle short vectors when they are not of
    equal length.

    ifelse(
    sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE),
    letters,
    LETTERS
    )

    In your code the length of the condition vector is 200, the length of the
    two other vectors is 100.

    Best regards,

    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature
    and Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium

    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey

    2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach@gmail.com>:
    Dear Group,

    I am trying to simulate a dataset with 200 individuals with random
    assignment of Sex (1,0) and Weight from lognormal distribution specific
    to
    Sex. I am intrigued by the behavior of rlnorm function to impute a value
    of Weight from the specified distribution. Here is the code:
    ID<-1:200
    Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
    fulldata<-data.frame(ID,Sex)
    fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1),
    sdlog
    = sqrt(0.0329)),
    rlnorm(100, meanlog = log(73), sdlog = sqrt(0.0442)))

    mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
    mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85

    I see that the number of simulated values has an effect on the mean
    calculated after imputation. That is, the code rlnorm(100, meanlog =
    log(73), sdlog = sqrt(0.0442)) gives much better match compared to
    rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement in
    the code above.

    My understanding is that ifelse will be imputing only one value where the
    condition is met as specified. I appreciate your insights on the
    behavior
    for better performance of increasing sample number. I appreciate your
    comments.

    Regards,
    Ayyappa

    [[alternative HTML version deleted]]

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]
  • Ayyappa Chaturvedula at Jun 14, 2016 at 3:47 pm
    I am sorry, I missed that. I think I made it more appropriate and not
    using unnecessary simulated values. Thank you for your help.


    fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(length(fulldata$Sex[fulldata$Sex==1]),
    meanlog = log(85.1), sdlog = sqrt(0.0329)),
                         rlnorm(length(fulldata$Sex[fulldata$Sex==0]), meanlog =
    log(73), sdlog = sqrt(0.0442)))


    On Tue, Jun 14, 2016 at 10:42 AM, Thierry Onkelinx wrote:

    Please keep r-help in cc.

    Yes. Have a look at this example

    ifelse(
    sample(c(TRUE, FALSE), size = 0.5 * length(letters), replace = TRUE),
    letters,
    LETTERS
    )


    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
    Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium

    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey

    2016-06-14 17:31 GMT+02:00 Ayyappa Chaturvedula <ayyappach@gmail.com>:
    Thank you very much for your kind support. The length of my condition
    vector is ~80 because I want only Sex==1 and else will be the other. I
    understand now how ifelse works. If the vector of the simulated vector is
    longer than the condition vector, then it takes the first few elements to
    match the length of condition vector and discards the rest?

    Regards,
    Ayyappa

    On Tue, Jun 14, 2016 at 10:15 AM, Thierry Onkelinx <
    thierry.onkelinx at inbo.be> wrote:
    Dear Ayyappa,

    ifelse works on a vector. See the example below.

    ifelse(
    sample(c(TRUE, FALSE), size = length(letters), replace = TRUE),
    letters,
    LETTERS
    )

    However, note that it will recycle short vectors when they are not of
    equal length.

    ifelse(
    sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE),
    letters,
    LETTERS
    )

    In your code the length of the condition vector is 200, the length of
    the two other vectors is 100.

    Best regards,

    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature
    and Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium

    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey

    2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach@gmail.com>:
    Dear Group,

    I am trying to simulate a dataset with 200 individuals with random
    assignment of Sex (1,0) and Weight from lognormal distribution specific
    to
    Sex. I am intrigued by the behavior of rlnorm function to impute a
    value
    of Weight from the specified distribution. Here is the code:
    ID<-1:200
    Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
    fulldata<-data.frame(ID,Sex)
    fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1),
    sdlog
    = sqrt(0.0329)),
    rlnorm(100, meanlog = log(73), sdlog =
    sqrt(0.0442)))

    mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
    mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85

    I see that the number of simulated values has an effect on the mean
    calculated after imputation. That is, the code rlnorm(100, meanlog =
    log(73), sdlog = sqrt(0.0442)) gives much better match compared to
    rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement
    in
    the code above.

    My understanding is that ifelse will be imputing only one value where
    the
    condition is met as specified. I appreciate your insights on the
    behavior
    for better performance of increasing sample number. I appreciate your
    comments.

    Regards,
    Ayyappa

    [[alternative HTML version deleted]]

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]
  • Thierry Onkelinx at Jun 14, 2016 at 4:08 pm
    You need to study my examples and the helpfile of ifelse more carefully.
    Then you'll understand why your code is wrong.


    ?
    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature and
    Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium


    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey
    Op 14 jun. 2016 17:47 schreef "Ayyappa Chaturvedula" <ayyappach@gmail.com>:

    I am sorry, I missed that. I think I made it more appropriate and not
    using unnecessary simulated values. Thank you for your help.

    fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(length(fulldata$Sex[fulldata$Sex==1]),
    meanlog = log(85.1), sdlog = sqrt(0.0329)),
    rlnorm(length(fulldata$Sex[fulldata$Sex==0]), meanlog
    = log(73), sdlog = sqrt(0.0442)))

    On Tue, Jun 14, 2016 at 10:42 AM, Thierry Onkelinx <
    thierry.onkelinx at inbo.be> wrote:
    Please keep r-help in cc.

    Yes. Have a look at this example

    ifelse(
    sample(c(TRUE, FALSE), size = 0.5 * length(letters), replace = TRUE),
    letters,
    LETTERS
    )


    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature
    and Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium

    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does not
    ensure that a reasonable answer can be extracted from a given body of data.
    ~ John Tukey

    2016-06-14 17:31 GMT+02:00 Ayyappa Chaturvedula <ayyappach@gmail.com>:
    Thank you very much for your kind support. The length of my condition
    vector is ~80 because I want only Sex==1 and else will be the other. I
    understand now how ifelse works. If the vector of the simulated vector is
    longer than the condition vector, then it takes the first few elements to
    match the length of condition vector and discards the rest?

    Regards,
    Ayyappa

    On Tue, Jun 14, 2016 at 10:15 AM, Thierry Onkelinx <
    thierry.onkelinx at inbo.be> wrote:
    Dear Ayyappa,

    ifelse works on a vector. See the example below.

    ifelse(
    sample(c(TRUE, FALSE), size = length(letters), replace = TRUE),
    letters,
    LETTERS
    )

    However, note that it will recycle short vectors when they are not of
    equal length.

    ifelse(
    sample(c(TRUE, FALSE), size = 2 * length(letters), replace = TRUE),
    letters,
    LETTERS
    )

    In your code the length of the condition vector is 200, the length of
    the two other vectors is 100.

    Best regards,

    ir. Thierry Onkelinx
    Instituut voor natuur- en bosonderzoek / Research Institute for Nature
    and Forest
    team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
    Kliniekstraat 25
    1070 Anderlecht
    Belgium

    To call in the statistician after the experiment is done may be no more
    than asking him to perform a post-mortem examination: he may be able to say
    what the experiment died of. ~ Sir Ronald Aylmer Fisher
    The plural of anecdote is not data. ~ Roger Brinner
    The combination of some data and an aching desire for an answer does
    not ensure that a reasonable answer can be extracted from a given body of
    data. ~ John Tukey

    2016-06-14 17:02 GMT+02:00 Ayyappa Chaturvedula <ayyappach@gmail.com>:
    Dear Group,

    I am trying to simulate a dataset with 200 individuals with random
    assignment of Sex (1,0) and Weight from lognormal distribution
    specific to
    Sex. I am intrigued by the behavior of rlnorm function to impute a
    value
    of Weight from the specified distribution. Here is the code:
    ID<-1:200
    Sex<-sample(c(0,1),200,replace=T,prob=c(0.4,0.6))
    fulldata<-data.frame(ID,Sex)
    fulldata$Wt<-ifelse(fulldata$Sex==1,rlnorm(100, meanlog = log(85.1),
    sdlog
    = sqrt(0.0329)),
    rlnorm(100, meanlog = log(73), sdlog =
    sqrt(0.0442)))

    mean(fulldata$Wt[fulldata$Sex==0]);to check the mean is close to 73
    mean(fulldata$Wt[fulldata$Sex==1]);to check the mean is close to 85

    I see that the number of simulated values has an effect on the mean
    calculated after imputation. That is, the code rlnorm(100, meanlog =
    log(73), sdlog = sqrt(0.0442)) gives much better match compared to
    rlnorm(1, meanlog = log(73), sdlog = sqrt(0.0442)) in ifelse statement
    in
    the code above.

    My understanding is that ifelse will be imputing only one value where
    the
    condition is met as specified. I appreciate your insights on the
    behavior
    for better performance of increasing sample number. I appreciate your
    comments.

    Regards,
    Ayyappa

    [[alternative HTML version deleted]]

    ______________________________________________
    R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
    https://stat.ethz.ch/mailman/listinfo/r-help
    PLEASE do read the posting guide
    http://www.R-project.org/posting-guide.html
    and provide commented, minimal, self-contained, reproducible code.

      [[alternative HTML version deleted]]

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupr-help @
categoriesr
postedJun 14, '16 at 3:02p
activeJun 14, '16 at 4:08p
posts5
users2
websiter-project.org
irc#r

People

Translate

site design / logo © 2017 Grokbase