STATISTICS HOW TO: ELEMENTARY STATISTICS FOR THE REST OF US!

1

 Example: you theorize that 75% of physics students are male. You survey a random sample of 12 physics students and find that 7 are male. Do your results significantly differ from the expected results?

 Solution: Use the binomial formula to find the probability of getting your results. The null hypothesis for this test is that your results do not differ significantly from what is expected.

 Out of the two possible events, you want to solve for the event that gave you the least expected result. You expected 9 males (i.e. 75% of 12), but got 7, so for this example solve for 7 or fewer students.

 0.158, which is the probability of 7 or fewer males out of 12. Doubling this (for a two tailed test), gives 0.315. These are your p-values. With very few exceptions, you’ll always use the doubled value.

 As the p-value of 0.315 is large (I’m assuming a 5% alpha level here, which would mean p-values of less than 5% would be significant), you cannot reject the null hypothesis that the results are expected. In other words, 7 is not outside of the range of what you would expect.

 If, on the other hand, you had run the test with 4 males (p=.333 and q=.666), the doubled p-value would have been .006, which means you would have rejected the null.

 A block plot helps you figure out what the most important factors in your experiment are, including interactions. A basic block plot will show if a factor is important. It will also show if that factor stays the same (i.e. if it is robust) for all settings of other factors.

 Block plots can also assess statistical significance. Statistical significance is usually tested with ANOVA. However, ANOVA is based on the assumption of normality. Block plots don’t have this assumption, so they can be extremely useful for non-normal data.

 The vertical axis represents the response variable. The taller the bar, the more impact on the response variable, so the more important the factor. Where the blocks are relative to each other is not important. In the following box plots, all of the block heights in the factor 1 plot are taller than all of the block heights in the factor 2 plot. Therefore, primary factor 1 is more important than primary factor 2.

 To assess how statistically significant each factor is, look at where each level falls within the bars. The characters inside each bar (these may be symbols, numbers or some other notation) represent the levels.

 In the following plot for Factor 1, the response for level 2 is higher than the response for level 1 in each of the bars. The level ordering in plot 2 is inconsistent (sometimes 1 is above 2 and vice-versa), so factor 2 is not statistically significant.

 To figure out if factor 1 is statistically significant, you first have to calculate the probability of that particular level ordering to happen. If you’re rusty on how to find probabilities, start here: What is the Probability of A and B.

 There are only two ways the levels can be ordered (1 then 2 or 2 then 1). So the probability of one block being ordered 1 then 2 is ½. The probability of all six blocks showing 1 and then 2 is: ½ * ½ * ½ * ½ * ½ * ½ = 1/(26) = 1 / 64 = 0.02.

 Finally, compare your probability to your chosen significance level. If the probability you calculate is less than the significance level, then that factor is significant. At a 5% significance level, this block ordering (and therefore Factor 1) is statistically significant.

 To assess interactions, look at whether the heights of the bars are changing in a systematic fashion. While the block plot for Factor 1 appears to be random, the blocks for Factor 2 seem to be decreasing steadily (up to a point), so this may warrant further attention.

 Simply put, Bolzano’s theorem (sometimes called the intermediate zero theorem) states that continuous functions have zeros if their extreme values are opposite signs (- + or + -). For example, every odd-degree polynomial has a zero.

 If a function f on the closed interval [a, b] ⊂ ℝ → ℝ is a continuous function and it holds that f(a) f(b) < 0, then there is at least one x ∈ (a, b) such that f(x) = 0

 Given a function, you can use the theorem to prove that the function has at least one root. The theorem states nothing about what the value for the function’s zero will be: it merely states that the zero exists.

 Here, you’ve been given a function (x3 + x – 1) set to zero. So if the function has at least one solution, then that solution is a root (i.e. a zero). In order to apply Bolzano’s theorem, you need to find out two things:

Best Statistics Site

 Step 2: Locate the endpoints and see if they have opposite signs. Here, you’re given the function and the endpoints [0, 1], so plug the endpoints into the function and see what values come out:

 The two values have opposite signs, and the function is continuous. Therefore, Bolzano’s theorem tells us that the equation does indeed have a real solution. A quick look at the graph of x3 + x – 1 can verify our finding:

 Differential equations have many solutions and it’s usually impossible to find them all. To narrow down the set of answers from a family of functions to a particular solution, conditions are set. These conditions can be initial conditions (which define a starting point at the extreme of an interval) or boundary conditions (which define bounds that constrain the whole solution). Different types of boundary conditions can be imposed on the boundary of a domain.

 One way to think of the difference between the two is that initial conditions deal with time, while boundary conditions deal with space. Boundaries can describe all manner of shapes: e.g. triangles, circles, polygons.

 Dirichlet: Specifies the function’s value on the boundary. For example, you could specify Dirichlet boundary conditions for the interval domain [a, b], giving the unknown at the endpoints a and b. For two dimensions, the boundary conditions stretch along an entire curve; for three dimensions, they must cover a surface. This type of problem is called a Dirichlet Boundary Value Problem..

 Neumann: Similar to the Dirichlet, except the boundary condition specifies the derivative of the unknown function. For example, we could specify u′(a) = α which imposes a Neumann boundary condition at the right endpoint of the interval domain [a, b].

 Robin: A weighted combination of the function’s value and its derivative. For example, for unknown u(x) on the interval domain [a, b] we could specify the Robin condition u(a) −2u′(a) = 0.

 Mixed: Similar to the Robin, except that parts of the boundary are specified by different conditions. For example, on the interval [a, b], the unknown u′(x) at x = a could be specified by a Neumann condition and the unkownn u(x) at x = b could be specified by a Dirichlet condition. [1]

 As an example, let’s say you wanted to find the equation for a straight line on a curve-length function between two points (a, A) and (b, B). The function could be set up as with the points as boundary conditions [2]:

 The Bray Curtis dissimilarity is used to quantify the differences in species populations between two different sites. It’s used primarily in ecology and biology, and can be calculated with the following formula:

 To calculate Bray-Curtis, let’s first calculate Cij (the sum of only the lesser counts for each species found in both sites). Goldfish are found on both sites; the lesser count is 6. Guppies are only on one site, so they can’t be added in here. Rainbow fish, though, are on both, and the lesser count is 4.

 To make it easy to work with, it’s often multiplied by 100, and then treated as a percentage. You may see a Bray Curtis dissimilarity of 0.21, for instance, being referred to as a Bray Curtis dissimilarity percent of 21%.

 There’s another percentage which is often used to describe species counts; this one, though, tells you how similar two sites are rather than how different. It’s called the Bray Curtis index, and to calculate it you simply subtract the Bray Curtis dissimilarity (remember, a number between 0 and 1) from 1, then multiply by 100.

 Let’s calculate this number for the fish example. The Bray Curtis dissimilarity was 0.39, and if we wanted it in terms of percentages we would have called it 39%. But the Bray Curtis index will be (1 – 0.39) · 100, or 61%. Notice this is, in a way, the opposite of the Bray Curtis dissimilarity. Identical sites have a Bray Curtis dissimilarity of 0, or 0%, and a Bray Curtis index of 100%. Sites which share no species would have a Bray Curtis dissimilarity of 1 (100%), and a Bray Curtis index of 0.

 To calculate the Bray-Curtis dissimilarity between two sites you must assume that both sites are the same size, either in area or volume (as is relevant to species counts). This is because the equation doesn’t include any notion of space; it works only with the counts themselves.

 Both the Levene and B-F tests transform dependent variables for use in an ANOVA test. The only difference between the two tests is in how those transformed variables are constructed. The Levene test uses deviations from group means, which usually results in a highly-skewed set of data; This violates the assumption of normality. The Brown-Forsythe test attempts to correct for this skewness by using deviations from group medians. The result is a test that’s more robust. In other words, the B-F test is less likely than the Levene test to incorrectly declare that the assumption of equal variances has been violated.

Post a Comment

1Comments
* Please Don't Spam Here. All the Comments are Reviewed by Admin.
  1. Your blog is a great community of people who are interested in the same things as I am.

    English WAEC Question 2024

    ReplyDelete
Post a Comment

#buttons=(Accept !) #days=(30)

Our website uses cookies to enhance your experience. Learn More
Accept !
To Top