while llava based ui automation is still difficult,
for now visual component based ui automation works
https://files.catbox.moe/u3124h.png
```
screenshot
run_yolo "screenshot.png"
find_element_by_type 3 # TextEdit
focus
send_keys "hello from ui automation"
find_element_by_type 11 # TextButton
click
```
workaround could be to add more element classes, like
- login button
- register button
- submit form button
- write note textarea
etc.
then even text based llm could interpret these classes to perform actions