Nissan 370Z Forum - View Single Post

wstar · 04-21-2009, 12:09 AM

So back on the statistics stuff, as some of us have already pointed out, while the table from that 350Z thread is really nice work, it's also too small a sample size to draw any real conclusions from. Some of the data values are suspicious relative to each other as well, probably due to different analysis companies and different tests run (granularity of the data, and some zeros seem to really be missing data). It's really too small for statistics for that matter, but we have to do what we can with what little data we have.

All that said, I re-ran the numbers from that graph through a Google Docs spreadsheet, and I recalced the red/green squares like I described before just to see what kind of difference it makes. As expected, there are fewer of both reds and greens when the popularity effect is cancelled. Especially in such a small sample, this popularity effect made a big difference in both the average and the standard deviation (4 samples of Syntec vs 1 sample of 300V for example. Imagine if it were 40 to 1, and how that would affect which ones get flagged as statistical outliers).

The following is basically the same as the original (red = 1+ std dev high, green = 1+ std dev low), but the mean and std dev are based on using the averages of each oil type, rather than looking at all raw samples, so that the oils are on the same footing regardless of popularity. You'll see some of the same trends, but there aren't nearly as many reds, and hardly any greens. Mostly the widening of the deviation just re-iterates how little we know statistically speaking about these oils. Any of these samples could be complete flukes. Anyways:

ETA: That's about as much as I'm willing to mess around with it, but really you could go further in making it "fairer" by grouping the ones that are based on similar formulations. The fact that some formulations are sampled in multiple weights distorts everything too (see # of Mobil1 columns vs # of Eneos columns, for example). We've got so little data to go on though, the rewards of further re-analysis are pretty slim.

04-21-2009, 12:09 AM	#112 (permalink)
wstar A True Z Fanatic Join Date: Mar 2009 Location: Houston, TX Posts: 4,024 Drives: too slow Rep Power: 3595	So back on the statistics stuff, as some of us have already pointed out, while the table from that 350Z thread is really nice work, it's also too small a sample size to draw any real conclusions from. Some of the data values are suspicious relative to each other as well, probably due to different analysis companies and different tests run (granularity of the data, and some zeros seem to really be missing data). It's really too small for statistics for that matter, but we have to do what we can with what little data we have. All that said, I re-ran the numbers from that graph through a Google Docs spreadsheet, and I recalced the red/green squares like I described before just to see what kind of difference it makes. As expected, there are fewer of both reds and greens when the popularity effect is cancelled. Especially in such a small sample, this popularity effect made a big difference in both the average and the standard deviation (4 samples of Syntec vs 1 sample of 300V for example. Imagine if it were 40 to 1, and how that would affect which ones get flagged as statistical outliers). The following is basically the same as the original (red = 1+ std dev high, green = 1+ std dev low), but the mean and std dev are based on using the averages of each oil type, rather than looking at all raw samples, so that the oils are on the same footing regardless of popularity. You'll see some of the same trends, but there aren't nearly as many reds, and hardly any greens. Mostly the widening of the deviation just re-iterates how little we know statistically speaking about these oils. Any of these samples could be complete flukes. Anyways: ETA: That's about as much as I'm willing to mess around with it, but really you could go further in making it "fairer" by grouping the ones that are based on similar formulations. The fact that some formulations are sampled in multiple weights distorts everything too (see # of Mobil1 columns vs # of Eneos columns, for example). We've got so little data to go on though, the rewards of further re-analysis are pretty slim. __________________ 7AT Track Car! Journal thread / Car setup details Last edited by wstar; 04-21-2009 at 12:20 AM.