Difference between revisions of "Dropbox Crawler"

From SimpleWiki
Jump to navigationJump to search
Line 3: Line 3:
  
 
We have been doing research on the usage of Dropbox ([http://eprints.eemcs.utwente.nl/22286/01/imc140-drago.pdf see our results here]). As a next step, we need to know what type of files people store in the service. This would allow us to understand the impact of some technologies on the system performance and on network traffic, among other things.
 
We have been doing research on the usage of Dropbox ([http://eprints.eemcs.utwente.nl/22286/01/imc140-drago.pdf see our results here]). As a next step, we need to know what type of files people store in the service. This would allow us to understand the impact of some technologies on the system performance and on network traffic, among other things.
We are looking for volunteers to provide us basic statistics (see below) of what files are stored in their Dropbox folders.  
+
We are looking for volunteers to provide us basic statistics (see below) of what files are stored in their Dropbox folders.
  
 
== Be part of the crowd ==
 
== Be part of the crowd ==
  
All you need to do is run am application at your PC. This application will:
+
All you need to do is run am application at your PC. Download the application '''by clicking the logo of your operating system, and double clicking on it!'''. A java version is also available:
* Scan your Dropbox folder
 
* Calculate basic statistics
 
* '''Show you what has been collected for your approval'''  
 
* Send the statistics to us.
 
  
 
{| border="0"
 
{| border="0"
Line 20: Line 16:
 
|-
 
|-
 
|}
 
|}
 
  
The application has been designed to be as simple as possible.
+
This application will:
In case your browser/system does not support the '"one click to run"', you can [http://www.simpleweb.org/dropbox_research/HelpOurResearch.jar '''download the package and run it''']: Just double click on the jar file!
+
* Scan your Dropbox folder
 +
* Calculate basic statistics
 +
* '''Show you what has been collected for your approval'''
 +
* Send the statistics to us.
 +
 
 +
The application has been designed to be as simple as possible. In case you have any difficult, please contact us.
  
 
== What will be logged? ==
 
== What will be logged? ==
  
For each file/folder in your Dropbox, the program will collect:  
+
For each file/folder in your Dropbox, the program will collect:
 
<pre>
 
<pre>
 
* Size in bytes
 
* Size in bytes
 
* Last modification time
 
* Last modification time
* Mime type found by the Mime Type Detection Utility
+
* Mime type of the file
* File extension - the sub-string after the last "." on the file name
+
* File extension
 
* MD5 Hash of both initial and final 8 kbytes of the file
 
* MD5 Hash of both initial and final 8 kbytes of the file
 
* MD5 Hash of the file name
 
* MD5 Hash of the file name
 
</pre>
 
</pre>
  
The program will also send to us:  
+
The program will also send to us:
 
<pre>
 
<pre>
 
* MD5 Hash of your Dropbox configuration files (or a MD5 hash of your MAC address if we cannot read the former)
 
* MD5 Hash of your Dropbox configuration files (or a MD5 hash of your MAC address if we cannot read the former)
 
* MD5 Hash of the path of your Dropbox home folder
 
* MD5 Hash of the path of your Dropbox home folder
* Your IP address
+
* Your IP address and operating system version
 
</pre>
 
</pre>
  
Line 49: Line 49:
  
 
Collected data, postprocessing scripts, and all results will be submitted to publication and made freely available in this website.
 
Collected data, postprocessing scripts, and all results will be submitted to publication and made freely available in this website.
Thus, anyone will be able to use our data sources for further researches.  
+
Thus, anyone will be able to use our data sources for further researches.
  
 
We will, however, take extra actions to ensure that no sensitive information will be in these datasets. Note that the only information that could potentially reveal your identity is your IP address, which we will '''anonymize'''. All other statistics cannot be related to the person owning the files.
 
We will, however, take extra actions to ensure that no sensitive information will be in these datasets. Note that the only information that could potentially reveal your identity is your IP address, which we will '''anonymize'''. All other statistics cannot be related to the person owning the files.
Line 69: Line 69:
 
== More information ==
 
== More information ==
  
* You can find more information about our work on this paper:  
+
* You can find more information about our work on this paper:
  
 
[http://eprints.eemcs.utwente.nl/22286/01/imc140-drago.pdf '''Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012''']
 
[http://eprints.eemcs.utwente.nl/22286/01/imc140-drago.pdf '''Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012''']
  
* [[Dropbox Traces|This page]] has more information about the data we used in our research so far.  
+
* [[Dropbox Traces|This page]] has more information about the data we used in our research so far.
  
 
== External Links ==
 
== External Links ==
  
These institutes are running this research:  
+
These institutes are running this research:
 
* [http://www.utwente.nl/ewi/dacs/ DACS - University of Twente]
 
* [http://www.utwente.nl/ewi/dacs/ DACS - University of Twente]
 
* [http://www.ufjf.br/portal/ Universidade Federal de Juiz de Fora]
 
* [http://www.ufjf.br/portal/ Universidade Federal de Juiz de Fora]
 
* [http://www.tlc-networks.polito.it/ Telecommunication Networks Group - Politecnico di Torino]
 
* [http://www.tlc-networks.polito.it/ Telecommunication Networks Group - Politecnico di Torino]

Revision as of 22:07, 27 January 2013

Personal cloud storage is becoming more and more popular with Dropbox certainly the best known example. It generates a huge amount of Internet traffic, but how it works? How is it uses? What are possible improvements? Because of that, understanding how people interact with such applications is essential for designing more efficient cloud storage systems.

We have been doing research on the usage of Dropbox (see our results here). As a next step, we need to know what type of files people store in the service. This would allow us to understand the impact of some technologies on the system performance and on network traffic, among other things. We are looking for volunteers to provide us basic statistics (see below) of what files are stored in their Dropbox folders.

Be part of the crowd

All you need to do is run am application at your PC. Download the application by clicking the logo of your operating system, and double clicking on it!. A java version is also available:

Windows.png Mac.png Linux.png Java.png

This application will:

  • Scan your Dropbox folder
  • Calculate basic statistics
  • Show you what has been collected for your approval
  • Send the statistics to us.

The application has been designed to be as simple as possible. In case you have any difficult, please contact us.

What will be logged?

For each file/folder in your Dropbox, the program will collect:

* Size in bytes
* Last modification time
* Mime type of the file
* File extension
* MD5 Hash of both initial and final 8 kbytes of the file
* MD5 Hash of the file name

The program will also send to us:

* MD5 Hash of your Dropbox configuration files (or a MD5 hash of your MAC address if we cannot read the former)
* MD5 Hash of the path of your Dropbox home folder
* Your IP address and operating system version

Collected information is sent via plain HTTP (let Wireshark be with you!) to a centralized collection server.

How will we use this information?

Collected data, postprocessing scripts, and all results will be submitted to publication and made freely available in this website. Thus, anyone will be able to use our data sources for further researches.

We will, however, take extra actions to ensure that no sensitive information will be in these datasets. Note that the only information that could potentially reveal your identity is your IP address, which we will anonymize. All other statistics cannot be related to the person owning the files.

What this program will NOT do?

  • Copy any file or folder out of your computer
  • Copy any other information than what is listed above
  • Install or store anything in your computer
  • ...

You can also take a look on the source code if you have any doubts about the program, recompile it on your own (and improve it :))

Client source code

Download the source code by clicking here. You can compile the project using the ant tool, or any Java IDE (we use NetBeans v7.2.1)


More information

  • You can find more information about our work on this paper:

Drago, I. and Mellia, M. and Munafò, M. M. and Sperotto, A. and Sadre, R. and Pras, A. (2012) Inside Dropbox: Understanding Personal Cloud Storage Services. Proceedings of the 12th ACM Internet Measurement Conference - IMC'12, Boston, Nov. 2012

  • This page has more information about the data we used in our research so far.

External Links

These institutes are running this research: