The project provides lockfiles for every supported package manager. If you only have Python and a JS runtime, then you may instead run ./hatch_build.py. This will transparently invoke one of the ...
VisualWebArena is a realistic and diverse benchmark for evaluating multimodal autonomous language agents. It comprises of a set of diverse and complex web-based visual tasks that evaluate various ...