|
Post by dalethorn on May 1, 2012 23:33:09 GMT
"To have some idea of your biases requires simple psychology & a tendency towards some level of self awareness. Asking others to point out your biases is not really useful"
Heck, I can do that (point out others' biases). It doesn't even matter if I'm right - one bias is as good as another, yes?
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 7:53:52 GMT
I never said that there was anything wrong with taking out the sighted bias - there you go again mis-stating what I said - I said it may not be the most important bias at play & you need to deal with the more important ones first. A bit of rigour in your thinking & tests is called for, Frans & is what I'm asking. To be constructive... There is nothing wrong with taking out 'the knowing' (blind testing) but more important biases need to be taken out first. Since I obviously am not aware of these more important biases (otherwise would have taken them out as well if possible) can you tell me what these biases are ? When the 'blind' part MAY not be that important how can the differences in findings between the 2 methods otherwise be explained ? (Please keep the answers civilised and factual, I am receiving complaints about the 'tone' of this thread.)
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 8:08:24 GMT
Civilised is one thing , and should be adhered to at all times, but "Factual" may have a different interpretation, depending what side of the discussion you are coming from. Perhaps those who are complaining about the factual side of things should stop being anonymous and join the discussion, otherwise it may give the appearance of being an attempt to stifle arguments by the other side.
|
|
|
Post by PinkFloyd on May 2, 2012 9:13:27 GMT
Perhaps somebody in the U.K.could organise a test using Frans's guidelines, for listening to unmodified Indeed/ Bravo varieties, and the various versions of Frans hybrid designs ? Let's take the "knowing" out of the equation! Some members might even save themselves a few hundred dollars if this was done. That's a very good idea Alex, I'll take the bravo, indeed and Project sunrise into the bike shop and get a few of the guys in there blindfolded
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 9:17:23 GMT
Perhaps by taking out the 'personal discrediting attempts' from now on it could become more civilised. Let's keep it about the possible biases and WHY they MIGHT be valid and in what degree. I know it isn't possible to reach any concensus because of different 'religion' but (like in so many other forums) there could be an exchange of viewpoints about the tests and possible pitfalls of each of them. Perhaps even 'meet' or reach a form of consensus in certain areas ? about putting my 'products' up for (D)BT or other comparative tests... Those that wish to review the only 'commercial product' I do NOT sell they can contact me... it is available for review, (D)BT or otherwise. I have already BT'ed my own designs (with more than one person) on a specially for comparing HA's designed testing device which also can be made available for testing. I am completely O.K. with blind comparisons even taken by bikers. The question is HOW these tests are performed as that will have a very significant influence to the outcome of these tests .
|
|
|
Post by PinkFloyd on May 2, 2012 9:18:29 GMT
You're giving them away free now Frans? That's very sporting of you old chap Put me down for 5 please.
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 9:25:26 GMT
I'll send you 10 ... I just don't know when.
|
|
XTRProf
Fully Modded
Pssst ! Got any spare capacitors ?
Posts: 5,689
|
Post by XTRProf on May 2, 2012 10:50:27 GMT
Hi all, I think we should stop this discussion from going further as it's not making headway in either direction based on current methodologies. We should have a post Labour Day strike on this to tell the world we as moderators are sensible to come back onto Earth after blasting off into "orbit". We already have our Labour Day fun and time to go back to official work, shall we? Anything conducted after this should be reported again after the cooling off period to make more sense of the what ever experiments for the betterment of Mankind. We are not Mankind Vs Alien forces. We are one common civilised Mankind whatever colour your skin may be. Time to show again that RG is that Mankind civilised unity as we had done for Sean's "lost" HA. Yeah, we are different from other forums and know when is the time to cool off to rejuvenate senses again.
|
|
jkeny
Been here a while!
Posts: 463
|
Post by jkeny on May 2, 2012 10:58:06 GMT
There is nothing wrong with taking out 'the knowing' (blind testing) but more important biases need to be taken out first. Since I obviously am not aware of these more important biases (otherwise would have taken them out as well if possible) can you tell me what these biases are ? I am completely O.K. with blind comparisons even taken by bikers. The question is HOW these tests are performed as that will have a very significant influence to the outcome of these tests . This is called experimenter bias, Frans Let see if we can unearth another one - what if I suggested that Alex & I would be the only blind testers of your device - are you happy with that? You already recognise another one in your previous suggestion to use only "golden eared" participants for the Jplay file comparisons - this is pre-screening to test participants for actually being able to hear adequately for the test. You can check out what is needed for conducting blind tests, I'm sure (in fact I would have thought you might have done this already seeing as you have a tendency to recommend them) & tell us what factors you eliminate & what factors you don't in your blind tests.
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 12:47:40 GMT
I have NO problems whatsoever with you and/or Alex performing blind tests (it would involve someone else to administer/control the test) AND sighted tests of my 'device' for that matter. You can compare it with other apples, pears or even eggs if that is fun to you. You are also free to post the results regardless of the consequences, so can any one else.
Contact me (or anyone else willing) via PM if you like to do this. You guys are equally serious about audio as I am so have no problems with you or Alex not being honest enough.
How can experimenter bias be of influence on one or more tests subject(s) when reviewers have no knowledge about what is playing and are only asked to express preference or describe differences when 'switched' often enough to get statistical relevance ?
To do a blind test of say.. HP amps (the devices in question at this moment) it depends on what you want 'proven'. For starters both amps would need to have the same signal applied, be matched in output amplitude (under load conditions) within 0.1dB. They should not be driven outside the limits of the lowest spec'd device. If output resistances are considerably different the used headphone will be more determining of the outcome than the amps and provisions have to be made (my test device can) but if that's what you consider part of the device then must be tested as is. IF differences can be found depends on many factors including the hearing capabilities of the listener(s) and how 'blind' the reviewers are when doing the switching themselves. This switching can thus be done by either the tester or person(s) doing the listening. Data needs to be logged and once a result is there the tester needs to (randomly) change (or not change) the A or B setting and repeat the test a couple of times to be sure differences are caused by the equipment and get statistical relevance. When the amps differ considerably in specifications within the (considered human detectable) audio limits, specs must be known, the blind test will show differences to be there and will favor one or the other. When the differences are below the hearing capabilities of human species the statistical relevance of the test will prove that. So multiple tests where the A and B position varies when the listener is switching and where the switcher does not know if A and B has changed 'position' or not. When the tester (that cannot be seen by the listener switches, on indication by the listener or not) switches the A and B do not need to be switched. Records have to kept.
BT tests that are repeated often enough to get statistical relevance and are well documented should be able to pass the scrutiny of both 'parties'.
It would be wise to choose only interested people in any test that has to be done. Inviting people to do tests who do not care about the matter in question would be a bit silly. Those that are doing/taking tests usually HAVE an interest in the matter.
For players, cables, files other tests are needed. The outline for a 'file/format' test has already been outlines so pointless to repeat as it is already documented. The test for IC is quite easy and similar to amps and easier to do.
Comparing DAC's and players requires yet other technigues. Headphones can ONLY be compared subjectively (sound wise) or can be measured. I already outlined the numorous pitfalls of these tests and measurements in other threads.
What is your idea of a rigourous/relevant enough test ? What would be the conditions when comparing amplifiers, digital formats, players ?
|
|
jkeny
Been here a while!
Posts: 463
|
Post by jkeny on May 2, 2012 13:14:44 GMT
I have NO problems whatsoever with you and/or Alex performing blind tests AND sighted tests of my 'device' for that matter. You can compare it with other apples, pears or even eggs if that is fun to you. You are also free to post the results regardless of the consequences, so can any one else. Contact me (or anyone else willing) via PM if you like to do this. You guys are equally serious about audio as I am so have no problems with you or Alex not being honest enough. How can experimenter bias influence? By the way the test is set up - what is being tested, how is it being tested, you already touched on it yourself - if the equipment isn't correctly chosen or set-up or what's being tested for is not sufficient to reveal differences. You're getting there - some type of qualification for test subjects is required. Some sort of qualification for the test set-up is required i.e both the equipment & participants have to be shown to be capable of identifying the stated goal of the test i.e a calibration phase is required. What's the point in testing for a 0.5dB difference if listeners are not capable of hearing this level of differences or equipment masks this? You are introducing a pre-selection bias here! And you also admit that the participants have prior knowledge about what is being tested! A whole minefield of psychology is opened up. This is why rigorous DBTs are conducted!!
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 13:27:23 GMT
How can experimenter bias influence? By the way the test is set up - what is being tested, how is it being tested, you already touched on it yourself - if the equipment isn't correctly chosen or set-up or what's being tested for is not sufficient to reveal differences. In the proposed test setup (former post) what (in your opinion) needs to change ? You are introducing a pre-selection bias here! And you also admit that the participants have prior knowledge about what is being tested! A whole minefield of psychology is opened up. This is why rigorous DBTs are conducted!! It is perfectly allright to let the listeners know WHICH equipment is compared and HOW the test is setup. They can even play with/fondle/scrutinise the devices BEFORE the test begins. Why ?... simply because during the test itself the listener(s) do not KNOW what is playing and only have to rely on their hearing. They can even operate the A-B switch as much as they like and when they like. The only condition is the signal levels have to be the same AND the test is being performed a statistical relevant amount of times AND (most important) the A and B position on the switch must be changed or the listeners must be lead to believe it can change. In this case A position of the switch can be amp X one test and Y in the next (or still be X) The one taking the test and controlling the test CANNOT be the same person. If the tester and listener are the same person ONLY DBT can be done to make it really blind again. The 'intelligence' of the D in this case replaces the independent tester. This requires expensive equipment that possibly needs to be built yourself. Much easier to involve someone else. What's flawed ? What biases are NOT removed and why ? DBT is ONLY needed in case the tester and listener are either in the same camp or have some gain to be had and are suspected of possibly rigging the test.
|
|
jkeny
Been here a while!
Posts: 463
|
Post by jkeny on May 2, 2012 14:35:04 GMT
What's flawed ? What biases are NOT removed and why ? DBT is ONLY needed in case the tester and listener are either in the same camp or have some gain to be had and are suspected of possibly rigging the test. Wow! I don't see what your problem in understanding is? I don't believe solid gold pins on my mains plug will make any difference to the sound & furthermore they cost 10,000 each which I think is just duping fools & shouldn't be allowed. Is this going to effect my testing of these devices? Will it perhaps mask any small difference that might be there?
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 15:01:41 GMT
Those that have read the NwAvGuy thread will understand this:
Spock: 'unknow anomaly ahead sir, I recommend altering our present course' Uhuru: 'we are being hailed sir' J.T. Kirk: 'ignore them, do not answer them, they are trying to trick us' Spock: 'We MUST act now captian' J.T. Kirk: : 'evasive action mister Sulu' Sulu: 'Aye captain' J.T. Kirk: 'Scotty... give her all she's got'
The above is about as relevant as your answer to my question about WHAT is exactly wrong with the test and which bias is still present and HOW that can be further improved. A straight forward, short or elaborate, clear answer isn't too much to ask ?
I obviously have some severe troubles understanding the problem(s) involved, as I cannot see what is fouling up the test but it appears to be obvious to you.
If you want this thread to be a constructive and open debate you can simply spell out what is wrong with it, to you that is, and is not recognised by me as such. That would be very helpfull and constructive.
What exactly is it I missed and is of importance to the test results obtained this way ?
|
|
jkeny
Been here a while!
Posts: 463
|
Post by jkeny on May 2, 2012 17:05:05 GMT
I've already given you expectation bias as the major factor that you haven't screened for. Can you tell us your thinking behind this statement? "It would be wise to choose only interested people in any test that has to be done. Inviting people to do tests who do not care about the matter in question would be a bit silly. Those that are doing/taking tests usually HAVE an interest in the matter."
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 19:06:49 GMT
WHAT expectation bias factor is there ? Can you tell me what they expect or what the testers expect and how that is a bias that may alter perception during THIS test ?
The listeners can have no expectation not even when they know what is tested simply because they do not KNOW what is playing, they can only listen and state a preference or describe what they hear or the difference. It is either A or B and one can expect to be listening to A or B.... how can that expectation be goofing up the test ?
Tell me what is the point of doing 'can you hear the difference test' with a milkman and housewife that have no interest in the matter and cannot tell a cheap chinese stereo from a high-end hifi system ? What is the point of having an 'open' test with people that doubt or deliberately can f.u. a test to prove their point ? It is important to have interested and devoted people with good ears and gears to perform a test that has statistical relevance. THAT was my point. I am amazed you didn't get that knowing so much about testing.
What you refer to has to do with the J-play test and that I stated was only interested in those people that have proven beyond all doubt they could (I can not) discern between the files. As explained in that thread the results of that test say nothing just like the Italian test says nothing. Both tests were a complete waste of time simply because these kind of tests are FLAWED (big time) as explained. Of course at the same time the cause was supported and again proven beyond all doubt (see below).
You see it all is because of statistical relevance. That's ALSO needed in a BT and DBT otherwise these tests too have NO meaning. The J-Play test was lacking that and the only 'thing' that gave it a slightly bit more relevance was the 3rd file that had to be 'matched by ear' to one of the other files as it was the same.
The results can be seen from both 'camps':
The statistics say... 50% score means it cannot be discerned. Since only 2 people entered the 3rd file as well and got it wrong says to them... they cannot hear it. The 2 people claiming they could not hear were truthfull.
From the subjectivists poit of view the test was a great succes. 50% got it right and heard a difference (and all reported similar findings, not disclosed though) The other 50% heard a difference and due to preference chose the other file as the best. So (aside from 2 who couldn't tell) and those who got the control file wrong the others could detect a difference with 100% certainty. The fact that the control file wasn't picked correctly can be easily explained as it may have been degraded in SQ during copying or transfer and simply sounded different and was more closer to the other file.
So the J-Play test has great proving powers for BOTH parties yet the conclusion they will write down in their memoires will be the complete opposite.. good test.. for sure.. totally meaningless IMHO.
Now IF that test had been performed as outlined by me in that thread, with an x out of 10, the same people who got it right AND got it wrong but heard a difference would be able to score 90% to 100%.
Anyone confident enough they could pass that test ?
In case the score would be closer to 50%, with a certain standard deviation depending on the test size (amount of files), that would have proven something else that was more in line with the statistical outcome of the held J-Play test.
It would also clearly reveal WHICH person has the ears and gears (now they all have it except the 2 who couldn't tell). Take away all doubt, create recognision of differences in perception, end the discussion and finally both parties would join their efforts.
BUT since the J-Play test, your own tests, M.C. and Alex and followers already had found it is REAL the confirmation that was seeked could easily be validated with a flawed test. Also as already predicted and explained why.
So a BT test is flawed because of expectation bias (that is arguably not present at all) but the J-Play test, which proved statistically a fail, was a great succes because at the same time it showed the majority could tell a difference.
When statistically relevant tests were performed with controversial subjects (cables .e.t.c.) I know why the test fails... BT creates stress which makes the differences go away. It was not in a relaxed setting. The listeners were not familiar with the gear. The time was too short and not the right part of the day/evening. The listener had a bad day/was tired and couldn't focus. They heard correctly at least 50% of the time and didn get them all wrong. Too much people so could not concentrate. The test was rigged and setup to discredit only. The test gear was not revealing enough. The wrong source material was used. Wrong interlinks were used which caused degardation. mains filtering was not present and made the better gear sound as bad as the good gear. The switch box altered the sound as there were added contacts. They can clearly hear it at home or at this or that guys house.
Anything BUT in those cases they really could not tell The list of 'excuses' is really endless. One must know what is playing otherwise it is too hard to tell.
Yup ... BT is flawed, DBT impossible to do and all because they are rigged or laced with different biases.
I see we have reached the same conclusion again. Shall I lock the thread ? like Startrek... it's going nowhere in particular but all over the place.
|
|
jkeny
Been here a while!
Posts: 463
|
Post by jkeny on May 2, 2012 19:22:34 GMT
WHAT expectation bias factor is there ? Can you tell me what they expect or what the testers expect and how that is a bias that may alter perception during THIS test ? The listeners can have no expectation not even when they know what is tested simply because they do not KNOW what is playing, they can only listen and state a preference or describe what they hear or the difference. It is either A or B and one can expect to be listening to A or B.... how can that expectation be goofing up the test ? Ah come on - are we really going to play this game? You really don't see it? They can of course say that there is NO DIFFERENCE thus proving that A is the same as B Again, you are being blind to psychology I'm afraid & I'm not the one who pushes blind tests, you are so you should be the expert as you continually promote them? I'm no expert & yet I can pick major holes in your technique & procedure. Holes big enough to invalidate any such test that you have performed or will perform! Sorry, I don't have a clue what you are talking about. I said that the Jplay test results were no better than guesswork??? It's not the thread that is all over the place, I'm afraid
|
|
jkeny
Been here a while!
Posts: 463
|
Post by jkeny on May 2, 2012 20:59:19 GMT
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 21:59:17 GMT
Ah come on - are we really going to play this game? You really don't see it? They can of course say that there is NO DIFFERENCE thus proving that A is the same as B So that is the bias I don't see... that people might declare A similar to B and that causes them NOT to hear differences.. I can see why you would think that though... the participants in BT are by definition those that seek confirmation and found a way to do so. Strange as MANY BT have been taken by many people who feel they can easily pick the differences and show the ignorant EE's they are RIGHT. What happens... it appears when the differences are rather small they cannot. Blame it on the test, the bias it creates or circumstances... O.K. well... I would NEVER claim A = B. The only conclusions (if the test was done rigid enough) I would draw would be: A: there is a detectable difference (and would do measurements to verify) B: There is a certain technical difference but too small to detect by ear (and do measurements to verify) The implications for the last outcome would be hard on subjectivists though. Again, you are being blind to psychology I'm afraid & I'm not the one who pushes blind tests, you are so you should be the expert as you continually promote them? I'm no expert & yet I can pick major holes in your technique & procedure. Holes big enough to invalidate any such test that you have performed or will perform! The obvious major holes in my technique and procedure you mention (the ones that invalidates MY tests I have performed and will perform) I would be anxious to hear from you what they are instead of just stating you can. I might learn in the process. I am no expert as well, I just experiment a lot... On a professional bases though I do a lot of electronics testing in fibre-optic equipment (fibre optic sensor technology, not IT or data transport), data acquisition and the analog signal handling/conditioning. It has to work in harsh EMC environments (railway and industrial surroundings) and have to do a lot of official testing where certain parameters should be ruled out and need official approval by independent notified bodys such as TÜV. It's what I do for a living... designing/measuring/testing/validation/trouble shooting/upgrading electronics and have been in audio since a young kid, I am specialised in analog circuitery and signal handling. Have expertise in repairing/modifying audio and video and done so for all kinds of brands and qualities, importers and retailers and have several 'diplomas' to officially service equipment. I believe it might give some insight in designing rigorous tests and how to interpret gathered data and test results. Feel free to brush it aside as non relevant to this discussion. Sorry, I don't have a clue what you are talking about. I said that the Jplay test results were no better than guesswork??? It's not the thread that is all over the place, I'm afraid You did say so on the record, but do you also like to officially declare the test as a failure or seen in the light of the test results the consensus would be the reported differences are not there ? Yet this is also what you wrote: Most of the people percieved track one as brighter, in my computer it also sounds brighter but whimpier, thin, although it sounds a little brightier is missing a few harmonics above. Indicating you are not of the opinion there are no differences but you state harmonics are removed. This is verifiable in a nulling test. Want me to do one for you ?
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 22:00:31 GMT
I clearly stated in the original thread in C.A. that both supplied .wav files had a noise floor of around -75dB and supplied a screengrab from Sound Forge 9 showing the noise floor.I was poo-pooed. A typical reel to reel recorder is likely to have a noise floor of around that figure if using noise reduction . The "experts" in C.A. didn't seem to understand that this made it pointless trying to reliably hear triangular dither a further 30 to 45dB below the recordings noise floor.They didn't even seem to know what I was talking about. I simply gave up. Julf agreed with me when I contacted him via email about this. So much for "experts". Many may have a good general background, but few have indepth knowledge outside their little area that they specialise in. IF the clever people who designed this flawed test had used a well recorded high resolution digital recording, the results MAY have been statistically meaningful.
I think that to further continue this thread is likely to be a waste of time, just as it has been in every other forum where it has been discussed. If Frans and John both agree, I will lock this thread. Alex
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 22:01:09 GMT
Glanced through it and agree with the comments. Some tests were not performed well. Little tips for me there I am afraid. For you on the other hand ?
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 22:09:09 GMT
I clearly stated in the original thread in C.A. that both supplied .wav files had a noise floor of around -75dB and supplied a screengrab from Sound Forge 9 showing the noise floor.I was poo-pooed. A typical reel to reel recorder is likely to have a noise floor of around that figure if using noise reduction . The "experts" in C.A. didn't seem to understand that this made it pointless trying to reliably hear triangular dither a further 30 to 45dB below the recordings noise floor.They didn't even seem to know what I was talking about. I simply gave up. Julf agreed with me when I contacted him via email about this. So much for "experts". Many may have a good general background, but few have indepth knowledge outside their little area that they specialise in. IF the clever people who designed this flawed test had used a well recorded high resolution digital recording, the results MAY have been statistically meaningful. Alex, The test is flawed for several reasons (just like the J-Play test). I adressed most of them in those threads already. The ones taking and posting the 'test' are amateurs with no background in any of the needed areas to conduct a proper test and they even stated it was just for fun. Hardly a test where any 'expert' would put even the slightest value to. No reason not to conduct a better test yourself. Just mind the statistical relevance ...
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 22:19:38 GMT
Alex, I am still hoping to get a kind answer to my question about the obvious holes John sees.
Since John and me are the only ones posting here (and you occasionally, and Dale perhaps) locking it will only block Johns replies (and Dale's). There is an impasse but I hoped to find out why John consideres BT flawed and the reasoning behind it.
The controversie won't go away by locking a thread IMO. It'll fizzle by itself as things are almost clear enough to me. Once posting stops the thread buries itself in the anals. I am (almost ?) done with it myself as the replies are hardly productive for me.
|
|
jkeny
Been here a while!
Posts: 463
|
Post by jkeny on May 2, 2012 22:21:01 GMT
Frans, your incessant repeating of the same dismissal of the biases I have presented to you show that you are unwilling to ensure that a test is rigorous enough to even qualify for consideration in a statistical evaluation. You would be laughed at if you presented such a test as you are describing without accounting for the biases I have identified. It reminds me of your RMAA tests that you posted about some time ago!
Your statement about your professional testing experiences reveal that you have never had to deal with trying to eliminate possible biases in testing which is plainly evident in all your answers so far. Your CV only serves to confirm this.
Yes, I stand by everything I already said about the Jplay test results & also my own findings in my listening.
You really don't want to get into a discussion on nulling tests now!
|
|
Deleted
Deleted Member
Posts: 0
|
Post by Deleted on May 2, 2012 22:21:18 GMT
Frans At least one of those who was participating, AND refused to accept what I was trying to say was a Senior Project Manager. Far too many qualified people refuse to accept their limitations in areas outside their own little areas of expertise. I also find it interesting that in another recent thread in C.A. regarding a Linear PSU for a P.C., that I was the only one to take issue with the claim of only a .0001% change in voltage from no load to maximum designed load , especially with the large temperature variations. Only .0012 millivolts change with a 12V rail ? Alex
|
|