Difference between revisions of "Dropbox Crawler"

From SimpleWiki
Jump to navigationJump to search
Line 23: Line 23:
  
 
The program will also send to us:  
 
The program will also send to us:  
* Hash (MD5) of your Dropbox configuration files (hostkeys) or your interfaces
+
* Hash (MD5) of your Dropbox configuration files (or a hash of your MAC address if we cannot read the former)
 
* Hash (MD5) of the path of your Dropbox home folder
 
* Hash (MD5) of the path of your Dropbox home folder
 
* Your IP address
 
* Your IP address
 +
 +
  
 
== How will we use this information? ==
 
== How will we use this information? ==

Revision as of 15:46, 7 January 2013

Personal cloud storage is becoming more and more popular - Dropbox is certainly the best known example. Cloud storage already generates a huge amount of Internet traffic. Because of that, understanding how people interact with such applications is essential for designing efficient cloud storage systems.

We have been doing research on the usage of Dropbox (see our results here). As a next step, we need to know what type of files people store in the service. This would allow us to understand the impact of some technologies on the system performance and on network traffic, among other things. For that, we need volunteers to provide us basic statistics (size, type etc) about files stored in their folders.

Be part of the crowd

All you need to do is run a Java application at your PC. This application will read your Dropbox folder, calculate basic statistics, show everything to your approval and, only after that, send the statistics to us.

  • Most people will be able to run the application by clicking here
  • In case your browser does not support that, you can download the package and run it: Just double click on it!


What will be captured?

For each file/folder in your Dropbox, the program will collect:

  • Size in bytes
  • Last modification time
  • Mime type found by this Java Mime Type Detection Utility
  • File extension - the sub-string after the last "." on the file name
  • Hash (MD5) of both initial and final 8 bytes of the file
  • Hash (MD5) of the file name

The program will also send to us:

  • Hash (MD5) of your Dropbox configuration files (or a hash of your MAC address if we cannot read the former)
  • Hash (MD5) of the path of your Dropbox home folder
  • Your IP address


How will we use this information?

We ensure that:

All data we collect are anonymized. We do not copy any file content. We do not collect any personal information and file/dir names.

We also will make our data publicity in a near future. Thus, anyone will be able to use this important data source.


What this program will NOT do?

  • Copy any file, folder our of your computer
  • Copy any other information than what is stated above
  • Install or store anything in your computer

You can also take a look on the source code of the program if you have doubts about the program.

Client source code

Download the source code by clicking [here here]. You can compile the project using the ant tool, or any Java IDE (we use NetBeans v7.2.1)


Format

All files are in a simple format. Each line has files attributes, separeted by #.

The following columns are found in these traces:

############################################################################
#     #     # Short description      # Unit  # Long description            #
############################################################################
#  1  #     # Lenght                 # -     # File Size in Bytes
#  2  #     # Modified               # -     # Last modification on file (Unix date/time format)
#  3  #     # MIME                   # -     # File Mime Type using Magic Java Unit
#  4  #     # EXTENSION              # -     # File extension (substring after the last "." on the string)
#  5  #     # MD5                    # -     # MD5 hash code of the initial/final 8 bytes of the file.
#  6  #     # MD5 of the name        # -     # MD5 hash code of file name string.
############################################################################


More information

  • You can find more information about our work on this paper:

Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012

  • This page has more information about the data we used in our research so far.

External Links

These institutes are running this research: