Auto Scraper v0.6

Tagged: 

This topic contains 117 replies, has 20 voices, and was last updated by  rafaelr 1 year, 9 months ago.

Viewing 35 posts - 71 through 105 (of 118 total)
  • Author
    Posts
  • #84677

    sur0x
    Participant

    MAME/FBA scraper from Floob works perfect, the only drawback it’s doesn’t really scrape the metadata, so no description, info, rating , etc 🙁

    #84688

    imsuperduckie
    Participant

    Thanks, will check it out. Congrats on your newborn!

    #84693

    petrockblog
    Keymaster

    congrats on the new arrival 🙂

    #85464

    sselph
    Participant

    Thanks!

    Some good news. I was able to rewrite the core functionality of my scraper in C++ and get it merged into Emulation Station so it will eventually be an option in the list of scrapers.

    While I was there I noticed a pull request to add a mame scraper so I ported that over to Go and added it to my scraper and you can access it with the -mame flag. It should get name, image, player count, rating, developer, genre, date. I don’t have any MAME ROMs so I haven’t fully tested it.

    Auto-scraper: https://github.com/sselph/scraper

    #85510

    Floob
    Member

    Thats great news! It will be good to see it part of EmulationStation.

    The update to your scraper to support mame is amazing! It works very well.
    Would a future update be possible to get the description as well?

    RetroPie help guides --> https://goo.gl/Yfy8kj
    Please read this before asking for help --> http://goo.gl/eLErnl

    #85519

    sselph
    Participant

    The data is coming from mamedb.com and there doesn’t seem to be a description. If there is another source that is indexed based on filename that has the description, let me know and I’ll see if I can pull it in.

    Auto-scraper: https://github.com/sselph/scraper

    #85521

    Roo
    Participant

    MAME’s history.dat is kind of the defacto standard for that information.

    http://www.arcade-history.com/

    Not sure where to get a older version, since some ROM names have changed…

    #85523

    Floob
    Member

    Is there a way of parsing data from this file?
    https://code.google.com/p/romcollectionbrowser/wiki/HowToAddMAMEOffline

    Or a way to grab the description here (url has filename):
    http://caesar.logiqx.com/php/history.php?id=bloodbro

    RetroPie help guides --> https://goo.gl/Yfy8kj
    Please read this before asking for help --> http://goo.gl/eLErnl

    #85527

    Roo
    Participant

    I’m no programmer. That said, MAMEUI and other front ends and emulators use history.dat to provide info in the GUI. So it can’t be too hard.

    It looks to me like the format is:

    $info=romname,
    $bio
    blaa blaa blaa history here
    $end

    So search the file for the rom name and just grab the bio section, right? It may sound like I’m being a smart-ass but I promise you I’m not 🙂 Just trying to be helpful

    #85547

    sselph
    Participant

    Parsing the history file doesn’t look too difficult. Find $info=<name>\n$bio then grab text until I get to a – UPPERCASEWORD – line or an $end. That appears to be the description portion. I’m clueless when it comes to MAME so the data I’m pulling from mamedb says it is 1.47 I would’ve assumed 1.57 was just more complete version of 1.47 but you are saying that some files were renamed. This file I found seems to show the renaming http://www.progettosnaps.net/renameset/pS_renameSET.txt but seems like they are shuffling things around with every version.

    What version is retropie supposed to be using? From there I could verify that the files are correctly named based on hashes of the internal files then go through the renaming to get them to 1.47 for mamedb data and to 1.57 for the history.

    Auto-scraper: https://github.com/sselph/scraper

    #85550

    proxycell
    Participant

    hey Steven, I once again wanted to heap praise upon you for not only creating this amazing tool but implementing so many of the suggestions that have been put forth

    now i have yet another suggestion, i’m never certain as to how much work these require you but please consider adding in Vectrex support – I’m a fan of any controller-based console that lets me exit back to ES lol…

    #85552

    Roo
    Participant

    What version is retropie supposed to be using? From there I could verify that the files are correctly named based on hashes of the internal files then go through the renaming to get them to 1.47 for mamedb data and to 1.57 for the history.

    RetroPie’s (v2.4x beta, not sure about earlier versions) default MAME emulator is MAME4ALL, which is a fork from MAME v0.37beta5. Which is old 🙂 Like circa 2000 old.

    Not sure why this is where the magical spot they forked from, but in general the older version performs better on the Pi.

    http://wiki.mamedev.org/index.php/Previous_MAME_Versions

    #85749

    sselph
    Participant

    please consider adding in Vectrex support

    Adding new systems isn’t too difficult especially then there are only 30ish games. The issue with this platform is that thegamesdb doesn’t contain the platform or the games. If you get them added to the DB I’ll happily get the hashes and IDs added to my mapping.

    RetroPie’s (v2.4x beta, not sure about earlier versions) default MAME emulator is MAME4ALL, which is a fork from MAME v0.37beta5.

    Thanks. When I get a moment I’ll see if I can do a more accurate job of scraping the mame games for the RetroPie.

    Auto-scraper: https://github.com/sselph/scraper

    #85831

    Floob
    Member

    I made a quick MAME based update here:

    RetroPie help guides --> https://goo.gl/Yfy8kj
    Please read this before asking for help --> http://goo.gl/eLErnl

    #89136

    ceuse
    Participant

    I would have another Request. I fucked up my manually modified Gamelist because i forgot that the tool creates a completly new list:-(

    could you built in a optional switch (i.e) found somethig difrent in gamelist allready. do you want to overwrite it ?

    also i would love to have a way to manually create a gamelist with thegamesdb ids (somewhat like a batch programm manuscrape.exe -attend -path .\gamelist.xml -imgdir “.\imgs” -thegamesdbid 1255) and that would add to the text file the completly scraped data for that id (for faster manual adding missing stuff)

    #89268

    zbh23
    Participant

    I’m not sure if you guys have seen this:

    Updated Python Scraper

    But a co-worker and I got this working brilliantly on a rpi B+

    #89372

    sselph
    Participant

    ceuse:
    I’ve got an open issue on github to allow appending to an existing gamelist and if I get than implemented it will have options on how to handle conflicts.

    I have a tool that will report missing rom data and create a CSV. If I modified that slightly to add allow you to add your own ID information, that could be used as input to the script to fetch the missing data for those files. It has the side effect of allowing you to send it to me so that I could improve my DB in the future.

    zbh23:
    Nice I hadn’t seen that. I wrote this because the older version of that script and the ES scraper weren’t very good unless you manually chose everything. Maybe people can use it to find things my scraper doesn’t know about.

    Auto-scraper: https://github.com/sselph/scraper

    #89491

    killer101
    Participant

    Just tried your scraper. Very good work, saves quite some time.

    There is one function that I miss. It would be nice to have an option to “fake images”!

    To explain …

    I have all the screenshots I need right at hand and don’t want to use the ones provided by whom ever. A function to add the image information to the gamelist file whether or not it is found in a database would be great!

    #89495

    sselph
    Participant

    The scraper will skip the image download if it sees a file named the same as the one it would save so if you have a rom roms/nes/rom.nes and a image named rom.jpg you could place the images in roms/nes/images and add the flags -image_suffix=”” -no_thumb

    the suffix defaults to “-image” (rom-image.jpg) which i think I copied from ES’s scraper. no_thumb says skip a thumbnail which isn’t used by ES i just include it since it is part of the gamelist spec. I always convert the image to jpg and don’t include an option to change it right now so if you have png’s you could convert or i can expose a flag to choose the image format.

    Auto-scraper: https://github.com/sselph/scraper

    #89501

    killer101
    Participant

    Excellent, thanks!

    #89513

    killer101
    Participant

    I found another slight issue.

    I have images which the scraper doesn’t find online. So there is no image entry in the gamelist created and ES can’t display the images I already have.

    #89520

    sselph
    Participant

    Ah okay. Let me make a few tweaks to fix that.

    Auto-scraper: https://github.com/sselph/scraper

    #89527

    sselph
    Participant

    I added checks to see if the file exists locally even if there isn’t a file on the server. Also added a -download_images flag that you can set to false to force it to only look locally. It is in the process of pushing the new release now.

    Auto-scraper: https://github.com/sselph/scraper

    #89530

    killer101
    Participant

    Thanks, will give it a try!

    #89563

    killer101
    Participant

    Tried it, doesn’t really work.

    Before scraping I put all my images into the images folder and start scraping with this command …

    scraper -add_not_found=true -download_images=false -image_suffix=”” -mame -no_thumbs

    Gamelist is generated, but no images on not found games.

    For example …

    I have the screenshot for the game abcop.zip. This is what the gamelist look like:

    <game id=”abcop” source=”mamedb.com”>
    <path>./abcop.zip</path>
    <name>A.B. Cop (World, FD1094 317-0169b) </name>
    <desc></desc>
    <rating>0.833</rating>
    <releasedate>1990</releasedate>
    <developer>Sega</developer>
    <publisher></publisher>
    <genre>Driving / Race (chase view) Bike</genre>
    <players>1</players>
    </game>

    #89583

    sselph
    Participant

    Ah sorry, my mame handling is completely separate from the console handling. I just implemented the console part. I’ll go back in this evening to add similar code to mame.

    Auto-scraper: https://github.com/sselph/scraper

    #89599

    vretro
    Participant

    Hello sselph, nice work!
    Have you considered adding Amiga to your compatible systems?

    There are a few resources online which catalogue Commodore Amiga box art and screen shots.

    I would guess you’d have to use a similar technique to how you handle MAME name lookup because of the nature of Amiga adf files.

    Useful resources to help, if you consider this:
    http://www.exotica.org.uk/wiki/Amiga_Game_Box_Scans (wiki style, box art)
    http://hol.abime.net/hol_search.php (large collection of 6616 entries, screenshots, box art)
    http://www.lemonamiga.com (3518 entries, screen shots, title screens, some box art)
    https://archive.org/details/Commodore_Amiga_TOSEC_2012_04_10 (xml file containing disk titles, meta data and old reviews – perhaps newer versions of these files are available in the same format?)

    Thank you for your efforts

    #89704

    killer101
    Participant

    I tried to scrape GBA today, but it doesn’t work for me too.

    Command..

    scraper -add_not_found=true -image_suffix=”” -no_thumb

    I have this game “007 – Everything or Nothing (UE) (M3) [!].gba” and the corresponding sreeenshot.

    This is what the gamelist looks like ..

    <game id=”” source=””>
    <path>./007 – Everything or Nothing (UE) (M3) [!].gba</path>
    <name>007 – Everything or Nothing (USA, Europe) (En,Fr,De)</name>
    <desc></desc>
    <releasedate></releasedate>
    <developer></developer>
    <publisher></publisher>
    <genre></genre>
    </game>

    Image is still missing.

    #89723

    sselph
    Participant

    killer101:
    Thanks for testing. I obviously didn’t think this change through all the way. I’ve updated the change so that if something is being written to XML with empty image lines, it will check to see if they exist and add them. This will hopefully cover all the different options.

    vretro:
    My initial thought is that Amiga seems difficult. The hash data I’ve found is for the ipf formatting not the adf formatting and it doesn’t seem possible to convert from one to the other. The HOL site has the best set of data but nothing is keyed off file name. So would end up relying on search similar to the built in scraper which would have similar issues of being unreliable. I’ll continue to look in to it.

    Auto-scraper: https://github.com/sselph/scraper

    #89748

    killer101
    Participant

    Scraped around a bit, works really good now. Thanks!

    I stumbled over 2 bugs, I think.

    When scraping FBA or NeoGeo with the -mame and the -no_thumb switches, the scraper generates the thumb listings anyway. Scraping MAME with these 2 switches, no problem.

    When scraping Megadrive, I almost get an unexpected EOF error on nearly every file. Looked a bit closer. Seems to occur if the file ending is anything else than .md!

    #89954

    sselph
    Participant

    I’m assuming these are .SMD since .MGD aren’t accepted by the emulator.

    This could occur if the file’s size wasn’t an increment of 16kB. The SMD file format breaks the file in to blocks of 16k then swaps bits around so that all the even bits are at the beginning of the block and odd bits at the end. So I read the file in chunks of 16k assuming it is possible since all no-intro entries except a couple prototypes have sizes divisible by 16384.

    The other possibility is there is a bug. I’ll read over the code and write some tests. In the meantime, do you mind checking the size of a couple of these to confirm they are indeed divisible by 16384.

    Auto-scraper: https://github.com/sselph/scraper

    #89985

    killer101
    Participant

    I checked about 15 out of 140. All are divisible by 16384. I checked the filesize of the .smd files. All my files are zipped by the way!

    #90094

    sselph
    Participant

    Think I found the issue. SMD are supposed to have a 512 byte header so it shouldn’t be divisible by 16384. I went ahead and refactored the logic for MD. I read the file’s content to try and determine the formatting instead of relying on the extension and make the 512 byte header optional.

    Thanks again for testing this.

    Auto-scraper: https://github.com/sselph/scraper

    #90258

    blockaboots
    Participant

    Is the anyway to scrap just for PAL Megadrive boxart?

    #90260

    Floob
    Member

    Is the anyway to scrap just for PAL Megadrive boxart?

    There are 526 PAL boxart covers available from Emumovies if that helps:
    http://emumovies.com/forums/index.php/page/portal

    You could then use these local images for the scrape.

    RetroPie help guides --> https://goo.gl/Yfy8kj
    Please read this before asking for help --> http://goo.gl/eLErnl

Viewing 35 posts - 71 through 105 (of 118 total)

Forums are currently read only - please visit the new RetroPie forums at https://retropie.org.uk/forums/