Viewing 35 posts - 1 through 35 (of 59 total)
  • Author
    Posts
  • sselph
    Participant
    Post count: 170

    Hi Everyone,

    I’ve been working on my scraper to refactor much of the code to make it easier to add features and I added a few features since I’ve posted last.
    https://github.com/sselph/scraper

    New Features:

    • MAME/Arcade descriptions – I added in information from arcade-history so that MAME and other arcade systems should have more complete data.
    • PSX Support – I added support for bin/cue PSX games from redump dat files. It will create a single entry for each cue file.
    • Dreamcast Support – I added support for gdi/bin games from redump dat files. It seems reicast supports this format but it isn’t enabled in es_systems.cfg.
    • Zip/Gzip support – since retroarch added zip/gzip support I now scan inside zip files for the first file that looks like a rom and scan it.
    • More accurate and complete scraping on several systems. Thanks to @robertybob for adding literally ~1000 games to thegamesdb.
    • Ability to append to a gamelist – You can now use -append to skip files that are already in the gamelist.zml file.

    Guide:
    Thanks to Floob there is a very nice video guide that is still valid:

    Issues:
    Since I’ve changed most of the code and don’t have a lot of tests, I’m sure I have created bugs. Please create issues here:
    https://github.com/sselph/scraper/issues

    Floob
    Member
    Post count: 1629

    Thanks very much for the update. Its great!
    Loving the extra description detail on mame roms.

    I’ve added an error I found on your issue list, it may just be me doing something odd though.

    I like the PSX support, although as I have very few PSX games, and I dont use a .cue for single track games I’ll probably still use the ad-hoc in built scraper for those.

    Thanks again for all the work you put into this, it makes Emulation Station so much nicer to use.

    I’ll try to sort a new video for all these updates!

    sselph
    Participant
    Post count: 170

    Thanks for the report. I’ll release a fix soon if I don’t hear any other issues.

    Regarding bin/cue: The scraper will still scrape the bin file if there isn’t a cue file. How it works is it looks for cue files, parses them then gets a list of associated bin files. Then hashes files cue/track1/track2/etc until it finds a match and uses that. So if there isn’t a cue it will just treat the .bin as a binary and hash that like normal.

    Floob
    Member
    Post count: 1629

    Ah I see – thats great. I’ll give it a go.

    Can you remind me how the mame lookup works – which database does it check?

    For example, I’ve got ddp2.zip which is:
    http://www.progettoemma.net/index.php?gioco=ddp2&lang=en

    but nothing scraped?

    Floob
    Member
    Post count: 1629

    Scrap that – it found it this time – just no image returned.

    One it didnt find was wyvernf0.zip
    http://www.progettoemma.net/index.php?gioco=wyvernf0&lang=en

    sselph
    Participant
    Post count: 170

    It uses mamedb.com. It strips off the file extension and pulls the url http://www.mamedb.com/game/wyvernf0

    mamedb.com uses .147 and wyvernf0 is .154

    Floob
    Member
    Post count: 1629

    Also, when processing mame4all roms I seem to periodically get these errors

    I dont think its rom specific though, as its a consecutive batch, then next scrape they are fine and others complain?

    /07/05 01:47:12 INFO: Starting: bosco.zip
    2015/07/05 01:47:12 ERR: error processing bosco.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: bouldash.zip
    2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: bouldash.zip
    2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: bouldash.zip
    2015/07/05 01:47:12 ERR: error processing bouldash.zip: ILM Bad HTML
    2015/07/05 01:47:12 INFO: Starting: brain.zip
    2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: brain.zip
    2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: brain.zip
    2015/07/05 01:47:13 ERR: error processing brain.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: breakers.zip
    2015/07/05 01:47:13 ERR: error processing breakers.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: breakers.zip
    2015/07/05 01:47:13 ERR: error processing breakers.zip: ILM Bad HTML
    2015/07/05 01:47:13 INFO: Starting: breakers.zip
    2015/07/05 01:47:14 ERR: error processing breakers.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brkthru.zip
    2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brkthru.zip
    2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brkthru.zip
    2015/07/05 01:47:14 ERR: error processing brkthru.zip: ILM Bad HTML
    2015/07/05 01:47:14 INFO: Starting: brubber.zip
    2015/07/05 01:47:15 ERR: error processing brubber.zip: ILM Bad HTML
    2015/07/05 01:47:15 INFO: Starting: brubber.zip
    
    Floob
    Member
    Post count: 1629

    [quote=101371]It uses mamedb.com. It strips off the file extension and pulls the url http://www.mamedb.com/game/wyvernf0

    [/quote]

    Ah – ok, that explains it. Thanks.

    sselph
    Participant
    Post count: 170

    Hmm those errors are from the mame scraper trying to parse the result of getting the URL and getting a response it can’t parse. Since it happens with different roms and in bursts might be some throttling or issues with the website.

    Floob
    Member
    Post count: 1629

    Could a backupdb query work like this?

    http://www.progettoemma.net/gioco.php?game=wyvernf0

    with the image being:
    http://www.progettoemma.net/snap/wyvernf0/0000.png

    Just a thought. I’m more than impressed with what it does already!

    sselph
    Participant
    Post count: 170

    Yeah we can create a backup DB. The metadata I could probably download another dat file parse it and shove it in the same data store I’m using for history then point to images in another site or see how taxing it would be to host them.

    Floob
    Member
    Post count: 1629

    [quote=101375]Hmm those errors are from the mame scraper trying to parse the result of getting the URL and getting a response it can’t parse. Since it happens with different roms and in bursts might be some throttling or issues with the website.

    [/quote]

    Just tried it again, and its fine now. Must have been a temporary bottleneck like you said.

    Floob
    Member
    Post count: 1629

    Just had a major meltdown with some atarilynx rom scraping which seemed fine before. Can you see where the issue may be?

    github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462a4c sp=0x1a4629e0
    github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462ab8 sp=0x1a462a4c
    github.com/sselph/scraper/ds.(*Hasher).Hash(0x1080aa90, 0x10f1d320, 0x23, 0x0, 0x0, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/ds/hasher.go:32 +0x170 fp=0x1a462b24 sp=0x1a462ab8
    ...additional frames elided...
    created by main.CrawlROMs
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:173 +0x5e4
    
    goroutine 1 [chan send]:
    main.CrawlROMs(0x11522cc0, 0x10a48010, 0x1, 0x1, 0x10810140, 0x1080aa88, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:184 +0xf98
    main.Scrape(0x10a48010, 0x1, 0x1, 0x10810140, 0x1080aa88, 0x0, 0x0)
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:285 +0x194
    main.main()
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:414 +0xf54
    
    goroutine 5 [syscall]:
    os/signal.loop()
            /usr/local/go/src/os/signal/signal_unix.go:21 +0x1c
    created by os/signal.init·1
            /usr/local/go/src/os/signal/signal_unix.go:27 +0x40
    
    goroutine 15 [chan receive]:
    main.func·003()
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:187 +0x60
    created by main.CrawlROMs
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:184 +0x938
    
    goroutine 14 [chan receive]:
    main.func·002()
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:177 +0x94
    created by main.CrawlROMs
            /home/sselph/go/src/github.com/sselph/scraper/scraper.go:180 +0x6b8
    
    goroutine 10 [select]:
    net.func·019()
            /usr/local/go/src/net/dnsclient_unix.go:241 +0x310
    sselph
    Participant
    Post count: 170

    Thanks!

    I think I see the error and have submitted a fix and releasing a new version. Hopefully I get all the issues before I hit 1.0.0 :)

    robertybob
    Participant
    Post count: 219

    Keep up the great work Sselph! If ever you want to add more systems and want someone to help you match up IDs or whatever, just ask me :)

    ekstreme
    Participant
    Post count: 24

    Working well for me. Just scraped my GnGeo set.

    socalretrogamer
    Participant
    Post count: 8

    Thanks for this scraper! It works great! Much, much better than the scraper on Emulation Station. There are still a lot of games it didn’t scrape, but I think that’s because some of the ROM file names are truncated. For example, “Zelda2” (no space) didn’t scrape for NES. At some point, I plan on renaming all the files that didn’t scrape, so would you have any tips to ensure the scraper recognizes the title? Particularly a sequel game like Zelda 2? Thanks again!

    sselph
    Participant
    Post count: 170

    Hi socialretrogamer,

    To minimize false positives, On consoles I’m not actually using the name of the file only the extension so you could name them 1.nes, 2.nes and it should still work. The scraper is using the rom data itself. It hashes it and compare it to a hash to ID mapping database I generated by hand for each system it supports.

    So there are several reasons it may not have scraped a rom:

    • Different ROM dump and therefore different hash. The Zelda2 could be bad, hacked, overdumped, or a rev no in the no-intro hashes I used.
    • No entry in thegamesdb, for SNES there are 3385 No-Intro roms and only 1055 games in the GDB. With clones I matched 2434.
    • No entry in my DB, because I have to manually add the hash>ID, I don’t automatically have new entries.
    gutossn
    Participant
    Post count: 31

    The scraper is amazing! Very fast and doesn’t freezes the ES. So could you include the wonderswan and neogeo pocket (and color too) to database? Thank you.

    sselph
    Participant
    Post count: 170

    I have issues tracking adding new systems on github. It is a function of: are there available hashes, what are the file formats, are there entries in thegamesdb.net, how many games, how busy I am, etc.

    Feel free to add issues for each system but I can’t make any promises until I look more closely.

    Anonymous
    Inactive
    Post count: 4

    Hi sselph!!
    First of all, too many thanks for this awesome scraper!!!

    I’ve one question that I can’t find a solution: (may be, I’m to newbee ;)

    I start one scraper session and, if for any reason (like I abort execution crtl+C, or scraper show errors and exit), the scraper don’t finish a complete rom directory, ¿How can I continue the scraper session without analyze all roms I’ve now correctly scraped?

    Thanks again for your hard work with this great super-tool!! :)

    *EDIT*
    Ok, I think I need to use -append=true param…

    sselph
    Participant
    Post count: 170

    Hi,

    Yes the -append flag should be what you are looking for, although the scraper will skip downloading any images that already exist so should be fast to catch back up either way.

    I have too many flags :)

    Omnija
    Participant
    Post count: 155

    Will there be support for psx .pbp formats?

    sselph
    Participant
    Post count: 170

    I don’t know enough about the pbp file format to know if I could translate the information it contains to what would have been in the original bin file to match it against the hash in redump.

    Anonymous
    Inactive
    Post count: 94

    Great work on version 1.0.0 sselph!!

    I have a question, i have a complete collection of PAL Megadrive boxart……why you may ask, well i feel that the PAL look of the boxart is much more appealing to me (being from the UK) and actually has MegaDrive on the boxart. Is there a way we can implement scrapping just PAL box art for the Megadrive at all. I can upload these images to a place of your discretion if you like, if this would bring this idea into reality??

    sselph
    Participant
    Post count: 170

    There are a couple issues with the whole megadrive/genesis situation. First one is when I did the mapping from hash to gamedb id I didn’t really care which version I chose as long as there was a match. So if there were a US version and a EU version I just chose one at random, sometimes I looked to see which one had the best description or clearer image. The other issue is data quality from thegamedb, there are several megadrive games that have genesis art and possibly vice versa.

    When I have time to remap MD and GEN I’ll take better care at only giving a MD version a GEN match if there isn’t a MD entry in the DB and vice versa. Ideally we could get the entries in thegamesdb fixed and improved so that other projects benefit as well.

    I have tinkered with the idea of setting up a repository of my own to improve some of the MAME stuff but haven’t had time. If I do, I’ll see if I could do something similar for other systems but I imagine the cost would be prohibitive and I won’t actually do any of it :)

    greyhulk
    Participant
    Post count: 13

    hi guys, im using the inbuilt scraper on psx games its finds the relevants artwork etc but when i restart my pi its all missing again? any advice..

    thanks
    steve

    herbfargus
    Member
    Post count: 1858

    It may not be writing manual changes unless you cleanly exit emulationstation. So select quite emulationstation from the start menu and when it reloads see if your changes save.

    Anonymous
    Inactive
    Post count: 94

    Is there a build for windows at all?

    sselph
    Participant
    Post count: 170

    I make several prebuilt binaries available at https://github.com/sselph/scraper/releases

    or if your the type that likes compiling it yourself, there are no special instructions for doing it on windows.

    Anonymous
    Inactive
    Post count: 94

    Nice!, thanks

    phantom27
    Participant
    Post count: 2

    Ok… So I might be dumb…. No… I’m pretty sure I am… but I need help.

    I have a ROM database that I tried running this on. I did it on my mac. It looked like it worked. Even said saving session… etc. But I can’t find the gamelist.xml file. I even searched my mac for it.

    I’m probably doing something wrong.

    phantom27
    Participant
    Post count: 2

    Yep, I’m an idiot apparently. I didn’t realize it would put it in my ‘home’ folder. Found it.

    Ok, stupid question. If I put this file in my ROM folder on my Pi, will it work or is the paths all messed up since I ran it on my mac?

    sselph
    Participant
    Post count: 170

    Hmm the gamelist should be in the same directory where you ran the script was run. I’ve heard some other complaints about this so maybe something has changed.

    Anyway if you ran the script from inside a folder with a bunch of roms and didn’t change any of the flags, all the paths should be correct just put the gamelist in the rom folder along with all the roms and the images folder.

    proxycell
    Participant
    Post count: 203

    Hey Steven,
    Long time since I last used your scraper

    I hope this thread is the one to be used for such things:

    How would I go about ADDING to this database? I have every fan-translated game there is and I would love for them to be scraped as the original game

Viewing 35 posts - 1 through 35 (of 59 total)
  • The forum ‘Everything else related to the RetroPie Project’ is closed to new topics and replies.