Author Topic: MRF non-ascii charset support  (Read 12000 times)

0 Members and 1 Guest are viewing this topic.

August 24, 2017, 04:47:57 AM

jumpingwhale

  • Newbie

  • Offline
  • *

  • 4
  • Reputation:
    0
    • View Profile
MRF non-ascii charset support
« on: August 24, 2017, 04:47:57 AM »
Hello tigzy. I have bunch of samples with non-ascii character file name.
Uploading large amount of samples trough WebUI at once is bit difficult, so I wrote small python script to use API.

During upload, I found there was no response from MRF handling non-ascii file named samples.
If I use WebUI, file upload succeeded but file names are corrupted.

Is there any solution about this problem?

Reply #1August 24, 2017, 01:55:50 PM

Tigzy

  • Administrator
  • Hero Member

  • Offline
  • *****

  • 954
  • Reputation:
    90
  • Personal Text
    Owner, Adlice Software
    • View Profile
    • Adlice Software
Re: MRF non-ascii charset support
« Reply #1 on: August 24, 2017, 01:55:50 PM »
Hey :)
Can you give me an example of non-ascii file name? (picture preferred, I don't know if the forum supports it)
Also, can you show me the upload script, I'll check there's no error.

Reply #2August 28, 2017, 09:38:27 AM

jumpingwhale

  • Newbie

  • Offline
  • *

  • 4
  • Reputation:
    0
    • View Profile
Re: MRF non-ascii charset support
« Reply #2 on: August 28, 2017, 09:38:27 AM »
Hey :)
Can you give me an example of non-ascii file name? (picture preferred, I don't know if the forum supports it)
Also, can you show me the upload script, I'll check there's no error.




First of all thx for your fast feedback





This is non-ascii file name example. `DB구입문의 연락처.doc` and its virustotal report.
https://www.virustotal.com/ko/file/106f6241cc72c38b53ba33ac0fc484695cd676594847f8dee9962e0aa56cacc0/analysis/1502953541/





This is upload result using web browser.







I used this python script to upload file. I know this script is not well coded but in case of uploading, works fine.(except vtsubmit, cksubmit and tags...)


Code: [Select]
TOKEN='MyToken'

def upload(filetoupload, vtsubmit=False, tags=None):
# URL setup
_get_param = (('token', TOKEN),
  ('action', 'uploadfiles'))
_url = '?'.join([URL, urllib.parse.urlencode(_get_param)])

# POST params setup
_metadata = {'index': 0,
'vtsubmit': vtsubmit,
'cksubmit': not vtsubmit,
'tags': tags}
_file = {'upload_file': open(filetoupload, 'rb'), }
_post_param = {'files_data': (_metadata, )}

_res = requests.post(_url, files=_file, data=_post_param)

if _res.status_code is 200:
return _res.content


for file in filelist:
upload(file, vtsubmit=True, tags='Malware, doc')


I felt difficulties coding this and even `tags` still not working. In case of `vtsubmit` and `cksubmit`, I don't know even what these mean... It was first time for me to upload files using its handle(or file object?) without reading its binary. Would you please suggest sample code for people like me? it would be great helpful.

I found your sample script on your last post. Why don't you add them to api document page?

Reply #3August 30, 2017, 02:34:39 PM

Tigzy

  • Administrator
  • Hero Member

  • Offline
  • *****

  • 954
  • Reputation:
    90
  • Personal Text
    Owner, Adlice Software
    • View Profile
    • Adlice Software
Re: MRF non-ascii charset support
« Reply #3 on: August 30, 2017, 02:34:39 PM »
Hey,
I'll add the upload script example to the documentation, that's indeed a good idea.

The character issue is due to the mysql database storing the data as ASCII.
We'll try to see if adding utf8 encoding/decoding solves the issue, it's added to the backlog (todo list for next version).

Is it working better (tags) with my upload script?

Reply #4August 31, 2017, 03:31:42 AM

jumpingwhale

  • Newbie

  • Offline
  • *

  • 4
  • Reputation:
    0
    • View Profile
Re: MRF non-ascii charset support
« Reply #4 on: August 31, 2017, 03:31:42 AM »
Hey,
I'll add the upload script example to the documentation, that's indeed a good idea.

The character issue is due to the mysql database storing the data as ASCII.
We'll try to see if adding utf8 encoding/decoding solves the issue, it's added to the backlog (todo list for next version).

Is it working better (tags) with my upload script?


Surely better, it works fine. Now I'm dealing with 'getfiles' API.
What I want to do is adding tag to certain samples. To achieve this, I follow steps below.

  • 'getfiles' using certain 'tag'
  • merge original tag with new tag (overwrite issue)
  • 'updatefile'

In step 1. API returns only one page number of samples which configured in config.php file(As you mentioned in API page, 'by default').

How can I 'getfiles' more than 1 page? my 'getfiles_by_tags' script described below

Code: [Select]
    def getfiles_by_tags(self, tags):
        # setup url
        _get_param = (('token', self.token), ('action', 'getfiles'), ('tags', tags))
        _url = '?'.join([self.url, urllib.parse.urlencode(_get_param)])

        _res = requests.get(_url)
       
        if _res.status_code is 200:
            _result = _res.json()
            for file in _result['files']:
                yield file['md5']
        else:
            return False

Reply #5September 01, 2017, 04:11:15 PM

Tigzy

  • Administrator
  • Hero Member

  • Offline
  • *****

  • 954
  • Reputation:
    90
  • Personal Text
    Owner, Adlice Software
    • View Profile
    • Adlice Software
Re: MRF non-ascii charset support
« Reply #5 on: September 01, 2017, 04:11:15 PM »
Unfortunately the current getfiles API doesn't allow to retrieve all the samples, for performances reason.
Imagine you have 1,000,000 samples, it's really dangerous to return everything.

What I would do here is to do the getfiles call into a loop with an incrementing page number like this:

Code: [Select]
thepage = 0
while True:
  results = query ( http://localhost/mrf/api.php?action=getfiles&token=bd5ca2c860ff601186cc50a42b213bbe&page=thepage&tags=test )
  if results.empty:
    break
  # do something with results
  thepage = thepage + 1

Reply #6September 04, 2017, 04:00:55 AM

jumpingwhale

  • Newbie

  • Offline
  • *

  • 4
  • Reputation:
    0
    • View Profile
Re: MRF non-ascii charset support
« Reply #6 on: September 04, 2017, 04:00:55 AM »
There was `page` variable in `getfiles` api... Thanks a lot, I was dang illiterate person.

Reply #7September 05, 2017, 08:56:19 PM

Tigzy

  • Administrator
  • Hero Member

  • Offline
  • *****

  • 954
  • Reputation:
    90
  • Personal Text
    Owner, Adlice Software
    • View Profile
    • Adlice Software
Re: MRF non-ascii charset support
« Reply #7 on: September 05, 2017, 08:56:19 PM »
No problem, anytime :)