Well, I have played with Grog and asked it some quite complex questions. I did not expect too much but in some cases was surprised by how precise the answers where. I give you one example:
If there was a earth size planet where the asteroid belt is now and it was destroyed by a sudden explosion, not a slow breakup, and this happened 1 million years ago how much in % of its original mass would still be found there.
It took Grog almost 10 minutes to do the calculations and while it was working it displayed the logic it used plus finally also the source code it used for the calculations.
For those interested: the result was 7% which does not match what we think the belt contains today in consolidated mass (its much less)
and I am sure a human proficient in celestial mechanics and geology of a planetary crust and a few other items of interest could have come up with the same result - but not in 10 minutes. And I am not even on the paid tier, just a freeloader.
This is a down to earth question that relies mostly on know laws of physics and secure data about the elements involved.
The picture is very different if you ask about e.g. the probability of the reality of Telepathy (100%) or remote viewing (100%) or the visitation of aliens in this solar system today (as good as 100%) it will give you ridiculously low values that are totally off kilter. From this I deduct it will reflect mainstream accepted stuff and it will also not learn if you correct it. AI today has a very short memory. It may remember a session that was immediately before the current but not what happened yesterday and not even what happened 5 minutes ago if a different user gave that information. I tried with verifiable data that it got wrong the first time, I corrected it, but in a follow up I was served the same wrong info again.