petrockblock.com – Fun Stuff for Technics Enthusiasts › Forums › RetroPie Project › Peoples Projects › Updated Python Scraper for EmulationStation
Tagged: python scraper emulationstation
thadmillerParticipant01/22/2015 at 16:28Post count: 12
I added a number of updates (and some removals) to the elpendor ES-scraper, also adding in chugcup’s title matching algorithm. It’s been working well for me (and for one friend who’s also been testing), so I thought I would share.
Instructions are on https://github.com/thadmiller/ES-scraper. The simple version is:
– before running, make sure you have updated the RetroPie Setup script and binaries (the initial 2.x version had invalid XML in es_systems.cfg).
$ sudo apt-get install python-imaging $ git clone https://github.com/thadmiller/ES-scraper.git $ cd ES-scraper $ python scraper.py -pisize -p
(remove the -p if you want to scrape all platforms, add a -l if you want to run it in the fully-automated “I’m feeling lucky mode”).
ThadbrakanjeParticipant01/22/2015 at 18:37Post count: 60
login as: pi firstname.lastname@example.org's password: Linux raspberrypi 3.18.3+ #740 PREEMPT Wed Jan 21 23:55:56 GMT 2015 armv6l The programs included with the Debian GNU/Linux system are free software; the exact distribution terms for each program are described in the individual files in /usr/share/doc/*/copyright. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent permitted by applicable law. Last login: Thu Jan 22 16:47:59 2015 .~~. .~~. Thursday, 22 January 2015, 5:33:34 pm UTC '. \ ' ' / .' Linux 3.18.3+ armv6l GNU/Linux .~ .~~~..~. : .~.'~'.~. : Filesystem Size Used Avail Use% Mounted on ~ ( ) ( ) ~ rootfs 29G 7.1G 21G 26% / ( : '~'.~.'~' : ) Uptime.............: 0 days, 00h45m51s ~ .~ ~. ~ Memory.............: 64268kB (Free) / 250872kB (Total) ( | | ) Running Processes..: 76 '~ ~' IP Address.........: 192.168.1.129 *--~-~--* The RetroPie Project, www.petrockblock.com pi@raspberrypi /ES-scraper $ python scraper.py -pisize -l Traceback (most recent call last): File "scraper.py", line 586, in <module> ES_systems = readConfig(config) File "scraper.py", line 84, in readConfig config = ET.parse(file) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1183, in parse tree.parse(source, parser) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 656, in parse parser.feed(data) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1643, in feed self._raiseerror(v) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1507, in _raiseerror raise err xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 101, column 237 pi@raspberrypi /ES-scraper $
I am guessing I did something wrong but am not exactly sure what.thadmillerParticipant01/22/2015 at 19:46Post count: 12
The error you encountered is due to invalid XML in the es_systems.cfg
Have you updated the RetroPie Setup script and Binaries?
$ sudo RetroPie-Setup/retropie_setup.sh UPDATE RetroPie Setup script --restart script-- UPDATE RetroPie Binaries
I believe that will fix your issue (and, in my experience, it also fixes many other issues with some of the emulators). Alternatively, you could try to fix the XML manually – just go to the location the script states: line 101, column 237 (but be warned, there will probably be a number of issues with the initial config).
ThadbrakanjeParticipant01/22/2015 at 22:42Post count: 60
I am using a heavily self modified XML that is a combo of the PC xml and the xml that came with the image. As far as me and NP++ can see the XML is valid though. I’ll have a look again to see what may be invalid about it.brakanjeParticipant01/22/2015 at 22:58Post count: 60
pi@raspberrypi /ES-scraper $ python scraper.py -pisize -l Traceback (most recent call last): File "scraper.py", line 586, in <module> ES_systems = readConfig(config) File "scraper.py", line 90, in readConfig platform = child.find('platform').text AttributeError: 'NoneType' object has no attribute 'text'
Got through all of the errors where your code didn’t like double hyphens and double anpersants and even removed any place where my theme was empty and I’m still getting this error although now it’s not giving me any specifics.
I would suggest that perhaps you rig your parser to ignore comments as that was half the trouble.thadmillerParticipant01/23/2015 at 04:32Post count: 12
Your last error is due to a <system> (in your es_systems.cfg) missing a <platform> element – this is needed to determine what platform to scrape from.
I have updated the script to ignore any systems missing information needed for scraping (this won’t fix the fact that information is missing, but should be able to skip over it, allowing any other correct systems to be scraped), you can easily get the updated script by:
$ cd ES-scraper $ git pull
As for the double-hyphens within comments, that is actually invalid XML, and it’s python’s xtree.parse choking on the invalid characters (the XML parser used by the original author). I may look at trying to use a different (more forgiving on errors) XML parser or parsing it manually, but either of those options would require rewriting a decent chunk of the script.
For an alternative, quicker solution, I’m going to add arguments to allow a path, platform, and extensions to be specified – this will be slightly more manual, but will allow the script to run, no matter how old or broken the es_settings.cfg file is.
ThadbrakanjeParticipant01/23/2015 at 05:48Post count: 60
Hrm. When I was at uni we were told that no matter what was in a comment it was valid code as a comment should never be run by any parser/interpreter/compiler. That aside even if those double hyphens are not in a comment your script seems to choke on it probably because it’s still not valid XML it is however needed when passing some Linux directives. Unless of course the ES team wrote their parser to replace something else with the double hyphen needed. But that’s a slightly different topic. :P
Just to be clear this whole time I have not been trying to be needy or demanding. I apologize if I at all came off poorly.thadmillerParticipant01/23/2015 at 06:09Post count: 12
I agree completely, any text within a comment should be fine, but unfortunately that’s not the case http://www.w3.org/TR/REC-xml/#sec-comments. If you ask me, it’s a “broken” definition. I suspect others feel the same way, and that’s probably why some other XML parsers ignore the — within comments.
And you’re not coming off needy or demanding – I wrote the script (or modified, anyway) because I was unhappy with the speed and poor matching of every other scraper I could find, and I want it to be helpful for others too.brakanjeParticipant01/23/2015 at 07:42Post count: 60
Ok I had it running. My putty window crashed I reopened it and tried running your script. “All Done” massive amounts of missing scrapses though. So I delete my gamelist directory and run it again. still “All Done” so I regit it and same thing. I dunno how I could have broken it when I finally had it working. >.<thadmillerParticipant01/23/2015 at 15:34Post count: 12
I’ve updated the script with optional arguments to ignore es_systems.cfg if the -name and -platform are manually specified. Ex:
$ python scraper.py -pisize -l -name mame -platform arcade
It’s a slightly more manual operation, but if these arguments are included, es_systems.cfg (and any of its issues) will be completely ignored.
@brakanje, I don’t think you’re doing anything wrong. I’m only TheGamesDB.net as the source, so there is a likelihood some of your games are missing (about 10 of my 170 games didn’t exist in their library), but if you have massive amounts of missing scrapes, I suspect something else may be the issue. If you wouldn’t mind running again with the -v option, and capture the output, I’d like to take a look at it to figure out what’s going on.
thanksbrakanjeParticipant01/23/2015 at 17:47Post count: 60
pi@raspberrypi /ES-scraper $ python scraper.py -pisize -l -v ES-scraper, a scraper for EmulationStation Using Raspberry Pi boxart size: (375px x 350px) Verbose mode enabled. All done! pi@raspberrypi /ES-scraper $
Not sure how much that will help. That is some kind of verbose log indeed. :PthadmillerParticipant01/23/2015 at 18:08Post count: 12
hah, true – looks like it didn’t find any ROMs at all.
In that case, maybe you want to try the manual method (ignoring es_systems.cfg entirely)
$ python scraper.py -pisize -v -l -name mame -platform arcade -rompath ~/RetroPie/roms/mame -ext ".zip .ZIP"
(those would be the default values on my retropie installation searching for mame, obviously change your name, platform, rompath, and ext as necessary)
In the meantime, I’ll add some more logging so we can see what platforms and paths it found (the verbose section, right now, just affects the scraping, but it looks like you’re having issues before it starts scraping at all).thadmillerParticipant01/23/2015 at 18:10Post count: 12
Also, I just added those new arguments a couple hours ago – if you haven’t already, you’ll want to do another git pull to get those updates:
$ cd ES-scraper $ git pullbrakanjeParticipant01/23/2015 at 18:30Post count: 60
K so manual mode is working. Not sure why auto mode wouldn’t be. Must be some error with my systems though if all we did was bypassed systems. :PthadmillerParticipant01/23/2015 at 20:13Post count: 12
I added some logging to the verbose mode while the script parses the es_systems.cfg file. For each <system> it should list name, path, platform, and ext (all the required data) along with the number of files found within the path (not exactly the ROM count, since it looks for any file rather than matching the extension, but it should be good enough for debugging).
Just update the script with a
$ git pull
and run with the -v flag
$ python scraper.py -pisize -v -lbrakanjeParticipant01/23/2015 at 20:51Post count: 60
I’m gonna say I think it would be handy if there was a way to interupt the script. :P I’ve been scrapign for like an hour on NES and I’m going to be getting picked up in like half an hour. :PthadmillerParticipant01/23/2015 at 21:02Post count: 12
you should be able to hit CTRL-C and it will exit cleanly (saving what has already been completed)brakanjeParticipant01/24/2015 at 01:24Post count: 60
When I hit ctrl+c it interupts just that rom and moves to the next one saddly.brakanjeParticipant01/24/2015 at 01:27Post count: 60
pi@raspberrypi /ES-scraper $ python scraper.py -pisize -v -l SYSTEM: Name: amiga Path: /home/pi/RetroPie/roms/amiga Platform: amiga Ext: .adf .ADF Potential ROMs: 0 SYSTEM: Name: atari800 Path: /home/pi/RetroPie/roms/atari800 Platform: atari800 Ext: .xex .XEX Potential ROMs: 0 SYSTEM: Name: atari2600 Path: /home/pi/RetroPie/roms/atari2600 Platform: atari2600 Ext: .a26 .A26 .bin .BIN .rom .ROM .zip .ZIP .gz .GZ Potential ROMs: 0 SYSTEM: Name: atari5200 Path: /home/pi/RetroPie/roms/atari5200 Platform: atari5200 Ext: .a26 .A26 .bin .BIN .rom .ROM .zip .ZIP .gz .GZ Potential ROMs: 0 SYSTEM: Name: atariststefalcon Path: /home/pi/RetroPie/roms/atariststefalcon Platform: atarist Ext: .st .ST .img .IMG .rom .ROM .ipf .IPF Potential ROMs: 0 SYSTEM: Name: macintosh Path: /home/pi/RetroPie/roms/macintosh Platform: mac Ext: .txt Potential ROMs: 0 SYSTEM: Name: c64 Path: /home/pi/RetroPie/roms/c64 Platform: c64 Ext: .crt .CRT .d64 .D64 .g64 .G64 .t64 .T64 .tap .TAP .x64 .X64 .zip .ZIP Potential ROMs: 0 SYSTEM: Name: amstradcpc Path: /home/pi/RetroPie/roms/amstradcpc Platform: cpc Ext: .cpc .CPC .dsk .DSK Potential ROMs: 0 SYSTEM: Name: fba Path: /home/pi/RetroPie/roms/fba Platform: arcade Ext: .zip .ZIP .fba .FBA Potential ROMs: 0 SYSTEM: Name: gb Path: /home/pi/RetroPie/roms/gb Platform: gb Ext: .gb .GB Potential ROMs: 0 SYSTEM: Name: gba Path: /home/pi/RetroPie/roms/gba Platform: gba Ext: .gba .GBA Potential ROMs: 0 SYSTEM: Name: sgb2 Path: /home/pi/RetroPie/roms/gbc Platform: gbc Ext: .gbc .GBC Potential ROMs: 0 SYSTEM: Name: gamegear Path: /home/pi/RetroPie/roms/gamegear Platform: gamegear Ext: .gg .GG Potential ROMs: 0 SYSTEM: Name: intellivision Path: /home/pi/RetroPie/roms/intellivision Platform: intellivision Ext: .int .INT .bin .BIN Potential ROMs: 0 SYSTEM: Name: mame Path: /home/pi/RetroPie/roms/mame Platform: arcade Ext: .zip .ZIP Potential ROMs: 0 SYSTEM: Name: neogeo Path: /home/pi/RetroPie/roms/neogeo Platform: neogeo Ext: .zip .ZIP .fba .FBA Potential ROMs: 0 SYSTEM: Name: nes Path: /home/pi/RetroPie/roms/nes Platform: nes Ext: .nes .unf .NES .UNF Potential ROMs: 0 SYSTEM: Name: n64 Path: /home/pi/RetroPie/roms/n64 Platform: n64 Ext: .z64 .Z64 .n64 .N64 .v64 .V64 Potential ROMs: 0 SYSTEM: Name: pcengine Path: /home/pi/RetroPie/roms/pcengine Platform: pcengine Ext: .pce .PCE Potential ROMs: 0 SYSTEM: Name: scummvm Path: /home/pi/RetroPie/roms/scummvm Platform: pc Ext: .exe .EXE Potential ROMs: 0 SYSTEM: Name: mastersystem Path: /home/pi/RetroPie/roms/mastersystem Platform: mastersystem Ext: .sms .SMS Potential ROMs: 0 SYSTEM: Name: megadrive Path: /home/pi/RetroPie/roms/megadrive Platform: genesis,megadrive Ext: .smd .SMD .bin .BIN .gen .GEN .md .MD .zip .ZIP Potential ROMs: 0 SYSTEM: Name: segacd Path: /home/pi/RetroPie/roms/segacd Platform: segacd Ext: .smd .SMD .bin .BIN .md .MD .zip .ZIP .iso .ISO Potential ROMs: 0 SYSTEM: Name: sega32x Path: /home/pi/RetroPie/roms/sega32x Platform: sega32x Ext: .32x .32X .smd .SMD .bin .BIN .md .MD .zip .ZIP Potential ROMs: 0 SYSTEM: Name: psx Path: /home/pi/RetroPie/roms/psx Platform: psx Ext: .img .IMG .7z .7Z .pbp .PBP .bin .BIN .cue .CUE Potential ROMs: 0 SYSTEM: Name: snes Path: /home/pi/RetroPie/roms/snes Platform: snes Ext: .smc .sfc .fig .swc .SMC .SFC .FIG .SWC Potential ROMs: 0 SYSTEM: Name: zxspectrum Path: /home/pi/RetroPie/roms/zxspectrum Platform: zxspectrum Ext: .z80 .Z80 .ipf .IPF Potential ROMs: 0 SYSTEM: Name: vboy Path: /home/pi/RetroPie/roms/vboy Platform: nintendo-virtual-boy Ext: .vb .VB Potential ROMs: 0 SYSTEM: Name: esconfig Path: /home/pi/RetroPie/roms/esconfig Platform: ignore Ext: .py .PY Potential ROMs: 0 ES-scraper, a scraper for EmulationStation Using Raspberry Pi boxart size: (375px x 350px) Verbose mode enabled. All done! pi@raspberrypi /ES-scraper $
More details but same result and no clear fix as everything looks right.brakanjeParticipant01/24/2015 at 18:47Post count: 60
Hey I just discovered your script seems to save images as JPEG or PNG but then reports them as JPG to the gamelist so all the images come up blank.thadmillerParticipant01/25/2015 at 06:47Post count: 12
I’ll update CTRL-C to cancel out of all scraping.
So, looking at your verbose output, it looks like all your paths are correct(?), but zero files were found in each directory. That explains why nothing is scraped, but I don’t know why it would be finding zero files unless the paths are wrong.
It’s odd that the gamelist contains a different extension than the actual file – it’s the same string that saves the file (technically, a rename, but whatever) that is written to the XML file. I’m not able to reproduce this, could you let me know a platform and ROM that didn’t work for you?brakanjeParticipant01/25/2015 at 14:43Post count: 60
It happened when I ran it on GoodNES. I used NP++ and imagemagic to rectify it which is no thing but figured I’d give you the heads up.brakanjeParticipant01/26/2015 at 06:39Post count: 60
It just occured to me. could it see no files in rom directory and decide not to bother scanning the subdirectories?thadmillerParticipant01/26/2015 at 21:39Post count: 12
Ah, yes, subfolders would cause an (easily fixable) issue. I didn’t even know ES would process them, but now that I know, it’s been fixed.
$ git pull
and the scraper should be okay with subfolders.
I’d still like to rectify the image discrepancy you ran into, but I’m not able to reproduce, and can’t think of any reason why it would do that. However, I wonder if paths, now being correct for your subfolders, might straighten things out a bit (but you’ll either want to add the -f parameter, or remove your old gamelist.xml files so you don’t have old stuff hanging around).brakanjeParticipant01/27/2015 at 03:55Post count: 60
When I’m done reimagining I’ll let you know if it happens again.brakanjeParticipant01/28/2015 at 19:06Post count: 60
Did it again and this time no issue. Very confusing. Is there some reason that it doesn’t go through folders in alphabetical order?thadmillerParticipant01/28/2015 at 20:20Post count: 12
I’m going to guess the “confusing” part is the image extension discrepancy – if so, this may be due to other scrapers – I put in the effort to make sure this scraper plays nicely with the built-in ES scraper, but if another scraper was run, it could have created conflicting entries. The fact that running it on a fresh install produces good images, seems to confirm that possibility. But if you do find that my scraper is causing causing an issue, I’d like to know the steps to reproduce, so I can fix it.
As for the order the scraper processes – the order of systems in the es_systems.cfg defines the order the platforms are processed. The order the ROMs are processed within each platform folder is arbitrary (I’m not sorting it) – so the order is actually defined by the OS.brakanjeParticipant01/29/2015 at 00:51Post count: 60
Ahh. so that is probably how the sub-folders are being run as well then. :PpoochieParticipant06/25/2015 at 22:43Post count: 5
sry for bumping this ‘old’ thread
running this script with
python scraper.py -pisize -lgives me the following errors.
Traceback (most recent call last): File "scraper.py", line 665, in <module> scanFiles(ES_systems) File "scraper.py", line 448, in scanFiles platforms = getPlatformNames(SystemInfo) File "scraper.py", line 435, in getPlatformNames for (i, platform) in enumerate(_platforms.split(',')): AttributeError: 'NoneType' object has no attribute 'split'
how can i fix this?
Forums are currently read only - please visit the new RetroPie forums at https://retropie.org.uk/forums/