NQ contains 307,372 training examples, 7,830 examples for development, and we withold a further 7,842 examples for testing. In the paper, we demonstrate a human upper bound of 87% F1 on the long ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results