{"diffoscope-json-version": 1, "source1": "/srv/reproducible-results/rbuild-debian/r-b-build.XWQqXz6m/b1/joblib_1.3.2-5_i386.changes", "source2": "/srv/reproducible-results/rbuild-debian/r-b-build.XWQqXz6m/b2/joblib_1.3.2-5_i386.changes", "unified_diff": null, "details": [{"source1": "Files", "source2": "Files", "unified_diff": "@@ -1,3 +1,3 @@\n \n- 03663126aa1a11de0ed023a2526d665a 278860 doc optional python-joblib-doc_1.3.2-5_all.deb\n+ 1049814aa94d1cbaa39f4ec674c8ef4a 281120 doc optional python-joblib-doc_1.3.2-5_all.deb\n dd50f9a29a3a60e6e71df7cc62fe936a 216092 python optional python3-joblib_1.3.2-5_all.deb\n"}, {"source1": "python-joblib-doc_1.3.2-5_all.deb", "source2": "python-joblib-doc_1.3.2-5_all.deb", "unified_diff": null, "details": [{"source1": "file list", "source2": "file list", "unified_diff": "@@ -1,3 +1,3 @@\n -rw-r--r-- 0 0 0 4 2024-11-04 16:30:00.000000 debian-binary\n--rw-r--r-- 0 0 0 3736 2024-11-04 16:30:00.000000 control.tar.xz\n--rw-r--r-- 0 0 0 274932 2024-11-04 16:30:00.000000 data.tar.xz\n+-rw-r--r-- 0 0 0 3740 2024-11-04 16:30:00.000000 control.tar.xz\n+-rw-r--r-- 0 0 0 277188 2024-11-04 16:30:00.000000 data.tar.xz\n"}, {"source1": "control.tar.xz", "source2": "control.tar.xz", "unified_diff": null, "details": [{"source1": "control.tar", "source2": "control.tar", "unified_diff": null, "details": [{"source1": "./control", "source2": "./control", "unified_diff": "@@ -1,13 +1,13 @@\n Package: python-joblib-doc\n Source: joblib\n Version: 1.3.2-5\n Architecture: all\n Maintainer: Debian Science Maintainers
memory.clear(warn=False)\n
Total running time of the script: (0 minutes 15.254 seconds)
\n+Total running time of the script: (0 minutes 15.412 seconds)
\n \n \nFirst round - caching the data\n-Elapsed time for the entire processing: 4.24 s\n+Elapsed time for the entire processing: 4.40 s\n
By using 2 workers, the parallel processing gives a x2 speed-up compared to\n the sequential case. By executing again the same process, the intermediate\n results obtained by calling costly_compute_cached
will be loaded from the\n cache instead of executing the function.
Second round - reloading from the cache\n-Elapsed time for the entire processing: 0.01 s\n+Elapsed time for the entire processing: 0.02 s\n
\n\n \nHaving cached the intermediate results of the
costly_compute_cached
\n@@ -166,27 +166,27 @@\n \n print('\\nReusing intermediate checkpoints')\n print('Elapsed time for the entire processing: {:.2f} s'\n .format(stop - start))\n\n\nReusing intermediate checkpoints\n-Elapsed time for the entire processing: 0.01 s\n+Elapsed time for the entire processing: 0.02 s\nThe processing time only corresponds to the execution of the
\nmax
\n function. The internal call tocostly_compute_cached
is reloading the\n results from the cache.
memory.clear(warn=False)\n
Total running time of the script: (0 minutes 12.274 seconds)
\n+Total running time of the script: (0 minutes 12.468 seconds)
\n \n \nRunning tasks with return_as='list'...\n Accumulate results:......................................................................................................................................................\n All tasks completed and reduced successfully.\n-Peak memory usage: 1.90GB\n+Peak memory usage: 2.43GB\n
If we use return_as="generator"
, res
is simply a generator on the\n results that are ready. Here we consume the results as soon as they arrive\n with the accumulator_sum
and once they have been used, they are collected\n by the gc. The memory footprint is thus reduced, typically around 300MB.
monitor_gen = MemoryMonitor()\n@@ -169,15 +169,15 @@\n peak = max(monitor_gen.memory_buffer) / 1e6\n print(f"Peak memory usage: {peak:.2f}MB")\n
Create result generator with return_as='generator'...\n Accumulate results:......................................................................................................................................................\n All tasks completed and reduced successfully.\n-Peak memory usage: 117.49MB\n+Peak memory usage: 119.98MB\n
We can then report the memory usage accross time of the two runs using the\n MemoryMonitor.
\nIn the first case, as the results accumulate in res
, the memory grows\n linearly and it is freed once the accumulator_sum
function finishes.
In the second case, the results are processed by the accumulator as soon as\n@@ -200,15 +200,15 @@\n plt.show()\n \n \n
It is important to note that with return_as="generator"
, the results are\n still accumulated in RAM after computation. But as we asynchronously process\n them, they can be freed sooner. However, if the generator is not consumed\n the memory still grows linearly.
Total running time of the script: (1 minutes 29.766 seconds)
\n+Total running time of the script: (1 minutes 24.720 seconds)
\n